Just try wget2 -nd -l2 -r -A "*little-nemo*s.jpeg" 'http://comicstriplibrary.org/search?search=little+nemo'
and you only get little-nemo-19051015-s.jpeg little-nemo-19051022-s.jpeg little-nemo-19051029-s.jpeg little-nemo-19051105-s.jpeg little-nemo-19051112-s.jpeg little-nemo-19051119-s.jpeg little-nemo-19051126-s.jpeg little-nemo-19051203-s.jpeg little-nemo-19051210-s.jpeg little-nemo-19051217-s.jpeg little-nemo-19051224-s.jpeg little-nemo-19051231-s.jpeg little-nemo-19060107-s.jpeg little-nemo-19060114-s.jpeg little-nemo-19060121-s.jpeg little-nemo-19060128-s.jpeg little-nemo-19060204-s.jpeg little-nemo-19060211-s.jpeg little-nemo-19060218-s.jpeg little-nemo-19060225-s.jpeg Regards, Tim On 06/20/2018 09:59 PM, Tim Rühsen wrote: > On 20.06.2018 18:20, Nils Gerlach wrote: >> It does not delete any html-file or anything else. Either it is accepted >> and kept or it is saved forever. >> With the tip about --accept and --acept-regex I can get wget to traverse >> the links but it does not go deep >> enough to get the *l.jpgs I tried to increase -l but to no avail. It seems >> like it is going only 1 link deep. >> And not deletes. > > Yes, my failure. Looking at the code, the regex options are applied > without taking --recursive or --level into account. They are dumb URL > filters. > > We are back at > > wget -d -olog -r -Dcomicstriplibrary.org -A "*little-nemo*s.jpeg" > 'http://comicstriplibrary.org/search?search=little+nemo' > > that doesn't work as expected. Somehow it doesn't follow certain links > so that little-nemo*s.jpeg files aren't found. > > Interestingly, the same options with wget2 are finding + downloading > those files. From a first glimpse: those files are linked from an RSS / > Atom file. Those aren't supported by wget, but wget2 does parse them for > URLs. > > Want to give it a try ? https://gitlab.com/gnuwget/wget2 > > Regards, Tim > >> >> 2018-06-20 16:58 GMT+02:00 Tim Rühsen <tim.rueh...@gmx.de>: >> >>> Hi Niels, >>> >>> please always answer to the mailing list (no problem if you CC me, but >>> not needed). >>> >>> It was just an example for POSIX regexes - it's up to you to work out >>> the details ;-) Or maybe there is a volunteer reading this. >>> >>> The implicitly downloaded HTML pages should be removed after parsing >>> when you use --accept-regex. Except the explicitly 'starting' page from >>> your command line. >>> >>> Regards, Tim >>> >>> On 06/20/2018 04:28 PM, Nils Gerlach wrote: >>>> Hi Tim, >>>> >>>> I am sorry but your command does not work. It only downloads the >>> thumbnails >>>> from the first page >>>> and follows none of the links. Open the link in a browser. Click on the >>>> pictures to get a larger picture. >>>> There is a link "high quality picture" the pictures behind those links >>> are >>>> the ones i want to download. >>>> Regex being ".*little-nemo.*n\l.jpeg". And not only the first page but >>> from >>>> the other search result pages, too. >>>> Can you work that one out? Does this work with wget? Best result would be >>>> if the visited html-pages were >>>> deleted by wget. But if they stay I can delete them afterwards. But >>>> automatism would be better, that's why I am >>>> trying to use wget ;) >>>> >>>> Thanks for the information on the filename and path, though. >>>> >>>> Greetings >>>> >>>> 2018-06-20 16:13 GMT+02:00 Tim Rühsen <tim.rueh...@gmx.de>: >>>> >>>>> Hi Nils, >>>>> >>>>> On 06/20/2018 06:16 AM, Nils Gerlach wrote: >>>>>> Hi there, >>>>>> >>>>>> in #wget on freenode I was suggested to write this to you: >>>>>> I tried using wget to get some images: >>>>>> wget -nd -rH -Dcomicstriplibrary.org -A >>>>>> "little-nemo*s.jpeg","*html*","*.html.*","*.tmp","*page*","*display*" >>>>> -p -e >>>>>> robots=off 'http://comicstriplibrary.org/search?search=little+nemo' >>>>>> I wanted to download the images only but wget was not following any of >>>>> the >>>>>> links so I got that much more into -A. But it still does not follow the >>>>>> links. >>>>>> Page numbers of the search result contain "page" in the link, links to >>>>> the >>>>>> big pictures i want wget to download contain "display". Both are given >>> in >>>>>> -A and are seen in the html-document wget gets. Neither is followed by >>>>> wget. >>>>>> >>>>>> Why does this not work at all? Website is public, anybody is free to >>>>> test. >>>>>> But this is not my website! >>>>> >>>>> -A / -R works only on the filename, not on the path. The docs (man page) >>>>> is not very explicit about it. >>>>> >>>>> Instead try --accept-regex / --reject-regex which acts on the complete >>>>> URL - but shell wildcard's won't work. >>>>> >>>>> For your example this means to replace '.' by '\.' and '*' by '.*'. >>>>> >>>>> To download those nemo jpegs: >>>>> wget -d -rH -Dcomicstriplibrary.org --accept-regex >>>>> ".*little-nemo.*n\.jpeg" -p -e robots=off >>>>> 'http://comicstriplibrary.org/search?search=little+nemo' >>>>> --regex-type=posix >>>>> >>>>> Regards, Tim >>>>> >>>>> >>>> >>> >>> >
signature.asc
Description: OpenPGP digital signature