Hello, how can I extract all image url(s) from an HTML page? Example: <html> ... <img src=http://www.rebol.com/graphics/reb-logo.gif> ... I want http://www.rebol.com/graphics/reb-logo.gif http://www.rebol.com/graphics/view.jpg ....
Un compkarori rebsite I found imagegrabber, but I obtain: ** Script Error: imagelist has no value ** Where: grabimages ** Near: clear imagelist if not equal? >>
Is the IMG tag the only one with the SRC option? Could you load the whole web page as a string and set up a loop to "find" the string "src=", and then pick off characters to the ">"? Not that I personally know how to do that...
OK, image fecther works great! But I'm interested in parsing to find string matching something like: Code: src="http*.jpg"
webimager.r works with correct html pages. Nowdays a lot of sites use a strange mix of php + html + javascript, so a image link is something like: Code: <script>big_pipe.onPageletArrive({"phase":1,"id":"pagelet_profile_photo","bootloadable":{"profile-boxes-edit":["nzRk6","+ULAZ","nAGI9","sW6GL","1iIBU","FJ3LF","s5vTT","Xi9ib"],"flyout-menu":["nzRk6","+ULAZ","1iIBU","sW6GL","FJ3LF"]},"css":["nAGI9","s5vTT"],"js":["nzRk6","+ULAZ"],"resource_map":{"s5vTT":{"type":"css","src":"http://static.ak.fbcdn.net/rsrc.php/v1/yE/r/zTXUbxO6Js_.css"},"Xi9ib":{"type":"js","src":"http://static.ak.fbcdn.net/rsrc.php/v1/yF/p/r/2sXANa0-shd.js"}},"onload":["window.__UIControllerRegistry["c4d76202aa9b9b4525988444"] = new UIPagelet("c4d76202aa9b9b4525988444", "/pagelet/profile/profile_photo.php", {"profile_id":99459740077,"sk":"wall","story_fbid":null}, {});; ;"],"content":{"pagelet_profile_photo":"u003cdiv id="c4d76202aa9b9b4525988444">u003cdiv id="profileimage" class="profileimage can_edit" onmouseover="CSS.removeClass($("edit_profilepicture"), "hidden_elem")" onmouseout="CSS.addClass($("edit_profilepicture"), "hidden_elem")">u003cspan>u003cimg class="logo img" src="http://profile.ak.fbcdn.net/hprofile-ak-snc4/41779_99459740077_2290_n.jpg" alt="Rebol" id="profile_pic" />u003c/span>u003ca class="hidden_elem" href="#" onclick="Bootloader.loadComponents(["profile-boxes-edit","flyout-menu"], function() { var flyout = profilePictureEditorCreateFlyout();});" id="edit_profilepicture" title="Cambia immagine">Cambia immagineu003cspan id="edit_profilepicture_icon">u003c/span>u003c/a>u003cdiv class="flyout_menu hidden_elem flyout_menu_18 link_menu" id="profile_picture_flyout">u003cdiv class="flyout_menu_header_shadow">u003cdiv class="flyout_menu_header clearfix">u003cdiv class="flyout_menu_mask">u003c/div>u003cdiv class="flyout_menu_title">Modifica la tua immagine del profilou003c/div>u003c/div>u003c/div>u003cdiv class="flyout_menu_content_shadow">u003cdiv class="menu_content">u003cdiv class="wrapper">u003ca rel="dialog" id="profile_picture_upload" href="/ajax/profile/picture/upload.php?id=99459740077" class="icon_link" title="Carica una nuova immagine del profilo">Carica un'immagineu003c/a>u003ca rel="dialog" id="profile_picture_camera" href="/camera/dialog.php?id=99459740077&inline=1" class="icon_link" title="Usa la webcam per scattare una foto per il tuo profilo">Scatta una fotou003c/a>u003c/div>u003c/div>u003c/div>u003c/div>u003ciframe class="fbUploadIframe" name="profile_edit_pic_iframe" src="http://static.ak.facebook.com/common/redirectiframe.html" style="width:1px;height:1px;position:absolute;top:-10000px">u003c/iframe>u003c/div>u003c/div>"}});</script> in this mess there is: Code: src="http://profile.ak.fbcdn.net/hprofile-ak-snc4/41779_99459740077_2290_n.jpg" How to find that string?
I resolved this way: copy page to openoffice (open office purge all code in simple html!!!!) openoffice convert to html upload to internet html page use your image grabber WOW I'll publish soon something very interesting....!!!!
It's not so easy, for me. If I have many src=*.jpg , parse take the first src and the last .jpg; pratically take all the source.
rebol [] code: read to-url request-text/title "URL:" strings: copy [] parse code [any [thru {src="} copy a-string to {.jpg"} (append strings a-string)] to end] editor code