Re: File name still not printed out?
Hrvoje Niksic wrote: Hrvoje Niksic [EMAIL PROTECTED] writes: The code in question is: if (!hs-local_file) { if (resp_header_copy (resp, Content-Disposition, hdrval, sizeof (hdrval))) /* Honor Content-Disposition. */ { hs-local_file = xstrdup (hdrval); } else /* Choose filename according to URL name. */ { hs-local_file = url_file_name (u); } } I now see that this code has more serious problems than not parsing Content-Disposition correctly. Specifically: * The local name is copied from the header verbatim without inspecting it for dangerous characters, such as / (on Windows also \). * There seems to be no code to check for uniqueness of file name. So far Wget's philosophy has been not to overwrite file names by default. If this is being changed, some people will be confused... and it leaves too much room for abuse. i was already aware of these problems. anyway, i just commited a patch which implements correct parsing of HTTP Content-Disposition header. * Now that the local name is determined and printed after the headers have been received, shouldn't we remove the local file has sprung to existence kludge? (See http://tinyurl.com/7b69q for details about it.) you're definitely right. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: File name still not printed out?
Mauro Tortonesi [EMAIL PROTECTED] writes: * The local name is copied from the header verbatim without inspecting it for dangerous characters, such as / (on Windows also \). * There seems to be no code to check for uniqueness of file name. So far Wget's philosophy has been not to overwrite file names by default. If this is being changed, some people will be confused... and it leaves too much room for abuse. i was already aware of these problems. If you were aware of so serious security issues, maybe it would have been a better idea to refrain from committing the code before fixing them. (But I'm not saying the code should be backed out now.) Some people are using Wget directly from Subversion, and they might be unpleasantly surprised. Also note that Content-Disposition is parsed by default, and that there's no way to turn it off. I'm not suggesting to change the default, just that, because it's the default, the implementation of this features requires all the more care and thought.
Re: File name still not printed out?
Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: * The local name is copied from the header verbatim without inspecting it for dangerous characters, such as / (on Windows also \). * There seems to be no code to check for uniqueness of file name. So far Wget's philosophy has been not to overwrite file names by default. If this is being changed, some people will be confused... and it leaves too much room for abuse. i was already aware of these problems. If you were aware of so serious security issues, maybe it would have been a better idea to refrain from committing the code before fixing them. of course, i was not aware of these issues at the moment of commiting the code. only many days later i noticed the bug. (But I'm not saying the code should be backed out now.) Some people are using Wget directly from Subversion, and they might be unpleasantly surprised. you're right. Also note that Content-Disposition is parsed by default, and that there's no way to turn it off. I'm not suggesting to change the default, just that, because it's the default, the implementation of this features requires all the more care and thought. i agree. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: File name still not printed out?
Hrvoje Niksic [EMAIL PROTECTED] writes: The code in question is: if (!hs-local_file) { if (resp_header_copy (resp, Content-Disposition, hdrval, sizeof (hdrval))) /* Honor Content-Disposition. */ { hs-local_file = xstrdup (hdrval); } else /* Choose filename according to URL name. */ { hs-local_file = url_file_name (u); } } I now see that this code has more serious problems than not parsing Content-Disposition correctly. Specifically: * The local name is copied from the header verbatim without inspecting it for dangerous characters, such as / (on Windows also \). * There seems to be no code to check for uniqueness of file name. So far Wget's philosophy has been not to overwrite file names by default. If this is being changed, some people will be confused... and it leaves too much room for abuse. * Now that the local name is determined and printed after the headers have been received, shouldn't we remove the local file has sprung to existence kludge? (See http://tinyurl.com/7b69q for details about it.)
Re: File name still not printed out?
Mauro Tortonesi [EMAIL PROTECTED] writes: You might want to put the URL last in line, so that the (try...) text is not confused with what is being downloaded. do you mean: --13:20:41-- (try:02) http://www.tortonesi.com/ Yes, why not? Is the empty line intentional? yes. i thought that a separation line would make the output more readable. what do you think? I think it makes it less readable when there is more than one URL. Personally I'd consider it a good idea to reduce the quantity of text that Wget produces, it's already quite verbose. But if you have different plans, I'm ok with that.
Re: File name still not printed out?
Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: You might want to put the URL last in line, so that the (try...) text is not confused with what is being downloaded. do you mean: --13:20:41-- (try:02) http://www.tortonesi.com/ Yes, why not? ok, i followed your suggestion. the code in the trunk implements this output format. Is the empty line intentional? yes. i thought that a separation line would make the output more readable. what do you think? I think it makes it less readable when there is more than one URL. Personally I'd consider it a good idea to reduce the quantity of text that Wget produces, it's already quite verbose. But if you have different plans, I'm ok with that. i don't really mind. right now the output code prints two empty lines: the second one and the one before the Storing resource in file: message. here is an example: [EMAIL PROTECTED] wget]$ ./src/wget http://download.skype.com/linux/skype_1.2.0.18-1_i386.deb --10:38:54-- http://download.skype.com/linux/skype_1.2.0.18-1_i386.deb Resolving download.skype.com... 198.63.211.251, 212.72.49.140, 195.215.8.138 Connecting to download.skype.com|198.63.211.251|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 7906384 (7,5M) [application/x-debian-package] Storing resource in file: `skype_1.2.0.18-1_i386.deb.1' 100%[=] 7.906.384 2,38M/s in 3,2s 10:38:58 (2,38 MB/s) - `skype_1.2.0.18-1_i386.deb.1' saved [7906384/7906384] do you think we should remove those two lines? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: File name still not printed out?
Mauro Tortonesi [EMAIL PROTECTED] writes: Resolving download.skype.com... 198.63.211.251, 212.72.49.140, 195.215.8.138 Connecting to download.skype.com|198.63.211.251|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 7906384 (7,5M) [application/x-debian-package] Storing resource in file: `skype_1.2.0.18-1_i386.deb.1' 100%[=] 7.906.384 2,38M/s in 3,2s 10:38:58 (2,38 MB/s) - `skype_1.2.0.18-1_i386.deb.1' saved [7906384/7906384] do you think we should remove those two lines? IMO yes. Storing resource in file might also be shortened to saving to: or something like that.
Re: File name still not printed out?
Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: Resolving download.skype.com... 198.63.211.251, 212.72.49.140, 195.215.8.138 Connecting to download.skype.com|198.63.211.251|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 7906384 (7,5M) [application/x-debian-package] Storing resource in file: `skype_1.2.0.18-1_i386.deb.1' 100%[=] 7.906.384 2,38M/s in 3,2s 10:38:58 (2,38 MB/s) - `skype_1.2.0.18-1_i386.deb.1' saved [7906384/7906384] do you think we should remove those two lines? IMO yes. Storing resource in file might also be shortened to saving to: or something like that. good. here is the new output format: [EMAIL PROTECTED] wget]$ ./src/wget http://download.skype.com/linux/skype_1.2.0.18-1_i386.deb --11:09:28-- http://download.skype.com/linux/skype_1.2.0.18-1_i386.deb Resolving download.skype.com... 198.63.211.251, 212.72.49.140, 195.215.8.138 Connecting to download.skype.com|198.63.211.251|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 7906384 (7,5M) [application/x-debian-package] Saving to: `skype_1.2.0.18-1_i386.deb' 100%[==] 7.906.384 5,02M/s in 1,5s 11:09:30 (5,02 MB/s) - `skype_1.2.0.18-1_i386.deb' saved [7906384/7906384] notice that in case -N is used, a HEAD request precedes the GET request, so with the new HTTP code, the output format is something like: [EMAIL PROTECTED]:~/code/svn/wget/tests$ ./src/wget -N http://localhost:8080/dummy.html --11:07:37-- http://localhost:8080/dummy.html Resolving localhost... 127.0.0.1 Connecting to localhost|127.0.0.1|:8080... connected. HTTP request sent, awaiting response... 200 OK Lunghezza: non specificato [text/plain] Il file remoto è più recente, lo scarico. --11:07:37-- http://localhost:8080/dummy.html Connecting to localhost|127.0.0.1|:8080... connected. HTTP request sent, awaiting response... 200 OK Lunghezza: 8 [text/plain] Saving to: `dummy.html' 100%[=] 8 --.-K/s in 0s 11:07:37 (27,7 KB/s) - `dummy.html' saved [8/8] it makes sense to me. any objections? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: File name still not printed out?
BTW it looks like the Content-Disposition header isn't being parsed correctly. For example: $ ./wget http://www.mininova.org/get/212851 --22:11:52-- http://www.mininova.org/get/212851 Resolving www.mininova.org... 83.149.119.115 Connecting to www.mininova.org|83.149.119.115|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [application/x-bittorrent] Saving to: `attachment; filename=Snooker - Ronnie O'Sullivan - World Champ 1997.avi +{mininova.org}+.torrent' The code in question is: if (!hs-local_file) { if (resp_header_copy (resp, Content-Disposition, hdrval, sizeof (hdrval))) /* Honor Content-Disposition. */ { hs-local_file = xstrdup (hdrval); } else /* Choose filename according to URL name. */ { hs-local_file = url_file_name (u); } } I don't think you're supposed to just take Content-Disposition literally; we should parse the filename attribute. Also note that you can use resp_header_strdup rather than resp_header_copy+xstrdup. The benefit is that it avoids the arbitrary limit of sizeof(hdrval) (which is meant for numerical quantities such as Content-Length and such).
Re: File name still not printed out?
Mauro Tortonesi [EMAIL PROTECTED] writes: [EMAIL PROTECTED]:~/code/svn/wget/src$ ./wget http://www.tortonesi.com --13:20:41-- http://www.tortonesi.com/ (try:02) Resolving www.tortonesi.com... 62.149.140.31 You might want to put the URL last in line, so that the (try...) text is not confused with what is being downloaded. Is the empty line intentional? Connecting to www.tortonesi.com|62.149.140.31|:80... connected. HTTP richiesta inviata, aspetto la risposta... 200 OK Lunghezza: non specificato [text/html] Storing resource in file: `index.html' This looks good. OTOH, the bug I described in the first message of this thread is still present.
Re: File name still not printed out?
Hrvoje Niksic wrote: Is it intentional that the Wget from the repository is still not printing the file name it uses for download? I (and some other users) tend to use the Wget from CVS, so this is making Wget less useful for me. For example: $ ./wget http://download.skype.com/linux/skype_1.2.0.18-1_i386.deb --13:41:42-- http://download.skype.com/linux/skype_1.2.0.18-1_i386.deb Resolving download.skype.com... 212.72.49.140, 198.63.211.251, 195.215.8.138 Connecting to download.skype.com|212.72.49.140|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 7906384 (7.5M) [application/x-debian-package] 0% [ ] 22,482 52.9K/s As you can see, there is no indication whether the file is written to skype_1.2.0.18-1_i386.deb, skype_1.2.0.18-1_i386.deb.1 or something else entirely. it's a small bug in the current http code. the logprintf call that prints the destination filename used LOG_VERBOSE instead of LOG_NOTQUIET. however, i propose the switch to a new output format, consistent with the new http implementation, that delays the choice of the filename after the parsing of http response headers. this is the new output format that i suggest: [EMAIL PROTECTED]:~/code/svn/wget/src$ ./wget http://www.tortonesi.com --13:20:41-- http://www.tortonesi.com/ (try:02) Resolving www.tortonesi.com... 62.149.140.31 Connecting to www.tortonesi.com|62.149.140.31|:80... connected. HTTP richiesta inviata, aspetto la risposta... 200 OK Lunghezza: non specificato [text/html] Storing resource in file: `index.html' [ = ] 3.359--.-K/s in 0s 13:20:42 (7,21 MB/s) - `index.html' saved [3359] the two modifications are: 1) in case of a further download attempt the (try:##) string is appended to the first line: --13:20:41-- http://www.tortonesi.com/ (try:02) the second line of output is always blank. 2) the destination filename is printed immediately before the progess bar: Storing resource in file: `index.html' i will posted a patch implementing the new output format to wget-patches in a few minutes. i will commit the patch into svn as soon as we reach consensus about the new output format. let me know what you think about it. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it