spanhost and recursive.
Could you make an option to only fetch from other hosts what is directly referenced from the orig page? Is this a TODO? -- Make software - not war The box said win95 or better, so I installed linux
RE: Help me!!!
Title: RE: Help me!!! -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Say What? - -Original Messag e- From: Irina [mailto:[EMAIL PROTECTED]] Sent: 22 August 2001 23:18 To: [EMAIL PROTECTED] Subject: Help me!!! Ïîæàëóéñòà, îòêëèêíèòåñü êòî ìîæåò! ß æèâó íà Óêðàèíå. Ìíå 37 ëåò. Ìóæ áðîñèë. Ðîäñòâåííèêîâ íåò. ß îñòàëàñü îäíà ñ äâóìÿ äåòüìè. Ïîëó÷èëà èíâàëèäíîñòü, ïîñëå ÷åãî íà ðàáîòå ìåíÿ ñîêðàòèëè. Áîëüøå íà ðàáîòó ìåíÿ íèãäå íå áåðóò - èíâàëèäû íèêîìó íå íóæíû. Ïåíñèþ ïëàòÿò î÷åíü ìàëåíüêóþ, íà íå¸ äàæå õëåáà íå êóïèøü. Äåòè íå ïîëó÷àþò íîðìàëüíîãî ïèòàíèÿ, è îáðàçîâàíèå ÿ èì äàòü íå ìîãó - çà îáó÷åíèå íóæíî ïëàòèòü. Ó íèõ íåò íèêàêîãî áóäóùåãî. Ïîæàëóéñòà, îòêëèêíèòåñü. Ïîìîãèòå ìíå è ìîèì äåòÿì, ó ìåíÿ íà âàñ ïîñëåäíÿÿ íàäåæäà! ß áëàãîäàðíà ëþäÿì, êîòîðûå ïîìîãëè îòïðàâèòü ýòî ïèñüìî â Internet. P.S. Æåíùèíà â î÷åíü ñëîæíîé ñèòóàöèè, åñëè âû ñìîæåòå ïîìî÷ü, îòêëèêíèòåñü ïî ýòîìó àäðåñó: [EMAIL PROTECTED] Please, respond who can! I live on Ukraine. To me 37 years. The husband has thrown. The relatives are not present. I have remained one with two children. Has received physical inability, then at work me have reduced. More on work me the invalids nobody anywhere do not take - are necessary. Pension pay very small, on ¸ even bread will buy. Children do not receive a normal feed(meal), and formation(education) I give to them I can not - for training it is necessary to pay. They do not have any future. Please, respond. Help me and my children, at me on you last hope! I am grateful to the people, which have helped to send this letter in Internet. P.S. The woman in a very complex(difficult) situation, if you can help, respond to this address: [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: PGPfreeware 7.0.3 for non-commercial use http://www.pgp.com iQA/AwUBO4ZKkwxuP44/+NFmEQIn+QCeKWz5Cqr9fFqnhDAvqoktESbvrI8AoL0M d038RQ5CyTglCFeIQCmTCe6R =ZngW -END PGP SIGNATURE- PLEASE READ: The information contained in this e-mail is confidential and intended for the named recipient(s) only. If you are not an intended recipient of this email you must not copy, distribute or take any further action in reliance on it and you should delete it and notify the sender immediately. Email is not a secure method of communication and Nomura International plc cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please examine this e-mail for virus infection, for which Nomura International plc accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated any views or opinions presented are solely those of the author and do not represent those of Nomura International plc. This email is intended for informational purposes only and is not a solicitation or offer to buy or sell securities or related financial instruments. Nomura International plc is regulated by the Securities and Futures Authority Limited and is a member of the London Stock Exchange. PGPexch.rtf.asc
Re: wget -k crashes when converting a specific url
On 23 Aug 2001, at 13:33, Edward J. Sabol wrote: Nathan J. Yoder wrote: Please fix this soon, ***COMMAND*** wget -k http://reality.sgi.com/fxgovers_houst/yama/panels/panelsIntro.html [snip] 02:30:05 (23.54 KB/s) - `panelsIntro.html' saved [3061/3061] Converting panelsIntro.html... zsh: segmentation fault (core dumped) Ian Abbott replied: I cannot reproduce this failure on my RedHat 7.1 box. I was able to reproduce this pretty easily on both Irix 6.5.2 and Digital Unix 4.0d, using gcc 2.95.2. (I bet Linux's glibc has code to protect against fwrite() calls with negative lengths.) The problem occurs when you have a single tag with multiple attributes that specify links that need to be converted. In this case, it's an IMG tag with SRC and LOWSRC attributes. The urlpos structure passed to convert_links() is a linked list of pointers to where the links are that needed to be converted. The problem is that the links are not in positional order. The second attribute is in the linked list before the first attribute, causing the length of the string to be printed out to be a negative number. Thanks for tracking that down. I've now found the problem, fixed it and created a patch (attached) against the current CVS sources. Here's a diff (against the current CVS sources) which will prevent the core dump, but please note that it does not fix the problem. html-parse.c and html-url.c are some dense code, and I'm still wading through it. (Also, it's not clear if the linked list is supposed to be in positional order or if convert_links() is wrongly assuming that.) [snipped the diff] At least that extra code was a convenient place for me to stick a brreakpoint on in gdb, and also helped me verify that I've nailed the bug (I checked the converted html file too, of course!). It's a shame Hrvoje Niksik's not arround at the moment to apply all these patches to the repository. Index: src/html-url.c === RCS file: /pack/anoncvs/wget/src/html-url.c,v retrieving revision 1.10 diff -u -r1.10 html-url.c --- src/html-url.c 2001/05/27 19:35:02 1.10 +++ src/html-url.c 2001/08/24 15:07:49 @@ -383,7 +383,7 @@ { case TC_LINK: { - int i; + int i, id, first; int size = ARRAY_SIZE (url_tag_attr_map); for (i = 0; i size; i++) if (url_tag_attr_map[i].tagid == tagid) @@ -391,25 +391,34 @@ /* We've found the index of url_tag_attr_map where the attributes of our tags begin. Now, look for every one of them, and handle it. */ - for (; (i size url_tag_attr_map[i].tagid == tagid); i++) + /* Need to process the attributes in the order they appear in + the tag, as this is required if we convert links. */ + first = i; + for (id = 0; id tag-nattrs; id++) { - char *attr_value; - int id; - if (closure-dash_p_leaf_HTML -(url_tag_attr_map[i].flags AF_EXTERNAL)) - /* If we're at a -p leaf node, we don't want to retrieve - links to references we know are external to this document, -such as a href= */ - continue; + /* This nested loop may seem inefficient (O(n^2)), but it's + not, since the number of attributes (n) we loop over is + extremely small. In the worst case of IMG with all its + possible attributes, n^2 will be only 9. */ + for (i = first; (i size url_tag_attr_map[i].tagid == tagid); +i++) + { + char *attr_value; + if (closure-dash_p_leaf_HTML +(url_tag_attr_map[i].flags AF_EXTERNAL)) + /* If we're at a -p leaf node, we don't want to retrieve +links to references we know are external to this document, +such as a href= */ + continue; - /* This find_attr() buried in a loop may seem inefficient - (O(n^2)), but it's not, since the number of attributes - (n) we loop over is extremely small. In the worst case - of IMG with all its possible attributes, n^2 will be - only 9. */ - attr_value = find_attr (tag, url_tag_attr_map[i].attr_name, id); - if (attr_value) - handle_link (closure, attr_value, tag, id); + if (!strcasecmp (tag-attrs[id].name, +url_tag_attr_map[i].attr_name)) + { + attr_value = tag-attrs[id].value; + if (attr_value) + handle_link (closure, attr_value, tag, id); + } + } } } break;
Re: spanhost and recursive.
Anders Rosendal asked: Could you make an option to only fetch from other hosts what is directly referenced from the orig page? Have you tried the --page-requisites (a.k.a. -p) command line option? The info documentation says this: Actually, to download a single page and all its requisites (even if they exist on separate websites), and make sure the lot displays properly locally, this author likes to use a few options in addition to `-p': wget -E -H -k -K -nh -p http://SITE/DOCUMENT In one case you'll need to add a couple more options. If DOCUMENT is a `FRAMESET' page, the one more hop that `-p' gives you won't be enough--you'll get the `FRAME' pages that are referenced, but you won't get _their_ requisites. Therefore, in this case you'll need to add `-r -l1' to the commandline. The `-r -l1' will recurse from the `FRAMESET' page to to the `FRAME' pages, and the `-p' will get their requisites. If you're already using a recursion level of 1 or more, you'll need to up it by one. In the future, `-p' may be made smarter so that it'll do two more hops in the case of a `FRAMESET' page. To finish off this topic, it's worth knowing that Wget's idea of an external document link is any URL specified in an `A' tag, an `AREA' tag, or a `LINK' tag other than `LINK REL=stylesheet'.
Patch to make SocksV5 work
The attached patch will fix wget so that it works through a socks firewall. The client library calls changed from the form Rconnect to teh form SOCKS5connect. In addition, it works with some pretty cute c preprocessor token pasting techniques that must be seen to be believed. If you don't need wget to get passed a socks firewall then you don't need this patch. It is possible someone has already done this, but not as of June 2001. I'd like to get feedback on this because I only tested it on redhat 7.1 against an installation of socks5-v1.0r11 If you need it and this fix does not work, let me know. I'll try to fix it. (See attached file: wget-1.7-socksv5.patch.sig) (See attached file: wget-1.7-socksv5.patch) wget-1.7-socksv5.patch.sig wget-1.7-socksv5.patch
Patch to make SocksV5 work
08/24/2001 02:28 PM Peter Korman To:[EMAIL PROTECTED] Subject:Patch to make SocksV5 work The attached patch will fix wget so that it works through a socks firewall. The client library calls changed from the form Rconnect to the form SOCKSconnect. In addition, it works with some pretty cute c preprocessor token pasting techniques that must be seen to be believed. If you don't need wget to get passed a socks firewall then you don't need this patch. It is possible someone has already done this, but not as of June 2001. I'd like to get feedback on this because I only tested it on redhat 7.1 against an installation of socks5-v1.0r11 If you need it and this fix does not work, let me know. I'll try to fix it. (See attached file: wget-1.7-socksv5.patch.sig) (See attached file: wget-1.7-socksv5.patch) Please note. This patch will break wget on those libraries that expect a call to Rconnect and not SOCKSconnect wget-1.7-socksv5.patch.sig wget-1.7-socksv5.patch