Re: File name still not printed out?

2006-02-03 Thread Mauro Tortonesi

Hrvoje Niksic wrote:

Hrvoje Niksic [EMAIL PROTECTED] writes:



The code in question is:

 if (!hs-local_file) 
   {

 if (resp_header_copy (resp, Content-Disposition, hdrval, sizeof 
(hdrval)))
   /* Honor Content-Disposition. */
   {
 hs-local_file = xstrdup (hdrval);
   }
 else
   /* Choose filename according to URL name. */
   {
 hs-local_file = url_file_name (u);
   }
   }



I now see that this code has more serious problems than not parsing
Content-Disposition correctly.  Specifically:

* The local name is copied from the header verbatim without inspecting
  it for dangerous characters, such as / (on Windows also \).

* There seems to be no code to check for uniqueness of file name.  So
  far Wget's philosophy has been not to overwrite file names by
  default.  If this is being changed, some people will be confused...
  and it leaves too much room for abuse.


i was already aware of these problems. anyway, i just commited a patch 
which implements correct parsing of HTTP Content-Disposition header.



* Now that the local name is determined and printed after the headers
  have been received, shouldn't we remove the local file has sprung
  to existence kludge?  (See http://tinyurl.com/7b69q for details
  about it.)


you're definitely right.

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: File name still not printed out?

2006-02-03 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 * The local name is copied from the header verbatim without
 inspecting
   it for dangerous characters, such as / (on Windows also \).
 * There seems to be no code to check for uniqueness of file name.  So
   far Wget's philosophy has been not to overwrite file names by
   default.  If this is being changed, some people will be confused...
   and it leaves too much room for abuse.

 i was already aware of these problems.

If you were aware of so serious security issues, maybe it would have
been a better idea to refrain from committing the code before fixing
them.  (But I'm not saying the code should be backed out now.)  Some
people are using Wget directly from Subversion, and they might be
unpleasantly surprised.

Also note that Content-Disposition is parsed by default, and that
there's no way to turn it off.  I'm not suggesting to change the
default, just that, because it's the default, the implementation of
this features requires all the more care and thought.


Re: File name still not printed out?

2006-02-03 Thread Mauro Tortonesi

Hrvoje Niksic wrote:

Mauro Tortonesi [EMAIL PROTECTED] writes:


* The local name is copied from the header verbatim without
inspecting
 it for dangerous characters, such as / (on Windows also \).
* There seems to be no code to check for uniqueness of file name.  So
 far Wget's philosophy has been not to overwrite file names by
 default.  If this is being changed, some people will be confused...
 and it leaves too much room for abuse.


i was already aware of these problems.


If you were aware of so serious security issues, maybe it would have
been a better idea to refrain from committing the code before fixing
them.  


of course, i was not aware of these issues at the moment of commiting 
the code. only many days later i noticed the bug.



(But I'm not saying the code should be backed out now.)  Some
people are using Wget directly from Subversion, and they might be
unpleasantly surprised.


you're right.


Also note that Content-Disposition is parsed by default, and that
there's no way to turn it off.  I'm not suggesting to change the
default, just that, because it's the default, the implementation of
this features requires all the more care and thought.


i agree.

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: File name still not printed out?

2006-01-31 Thread Hrvoje Niksic
Hrvoje Niksic [EMAIL PROTECTED] writes:

 The code in question is:

   if (!hs-local_file) 
 {
   if (resp_header_copy (resp, Content-Disposition, hdrval, sizeof 
 (hdrval)))
 /* Honor Content-Disposition. */
 {
   hs-local_file = xstrdup (hdrval);
 }
   else
 /* Choose filename according to URL name. */
 {
   hs-local_file = url_file_name (u);
 }
 }

I now see that this code has more serious problems than not parsing
Content-Disposition correctly.  Specifically:

* The local name is copied from the header verbatim without inspecting
  it for dangerous characters, such as / (on Windows also \).

* There seems to be no code to check for uniqueness of file name.  So
  far Wget's philosophy has been not to overwrite file names by
  default.  If this is being changed, some people will be confused...
  and it leaves too much room for abuse.

* Now that the local name is determined and printed after the headers
  have been received, shouldn't we remove the local file has sprung
  to existence kludge?  (See http://tinyurl.com/7b69q for details
  about it.)


Re: File name still not printed out?

2006-01-30 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 You might want to put the URL last in line, so that the (try...)
 text is not confused with what is being downloaded.

 do you mean:

 --13:20:41--  (try:02)  http://www.tortonesi.com/

Yes, why not?

 Is the empty line intentional?

 yes. i thought that a separation line would make the output more
 readable. what do you think?

I think it makes it less readable when there is more than one URL.
Personally I'd consider it a good idea to reduce the quantity of text
that Wget produces, it's already quite verbose.  But if you have
different plans, I'm ok with that.


Re: File name still not printed out?

2006-01-30 Thread Mauro Tortonesi

Hrvoje Niksic wrote:

Mauro Tortonesi [EMAIL PROTECTED] writes:



You might want to put the URL last in line, so that the (try...)
text is not confused with what is being downloaded.


do you mean:

--13:20:41--  (try:02)  http://www.tortonesi.com/


Yes, why not?


ok, i followed your suggestion. the code in the trunk implements this 
output format.



Is the empty line intentional?


yes. i thought that a separation line would make the output more
readable. what do you think?


I think it makes it less readable when there is more than one URL.
Personally I'd consider it a good idea to reduce the quantity of text
that Wget produces, it's already quite verbose.  But if you have
different plans, I'm ok with that.


i don't really mind. right now the output code prints two empty lines: 
the second one and the one before the Storing resource in file: 
message. here is an example:


[EMAIL PROTECTED] wget]$ ./src/wget 
http://download.skype.com/linux/skype_1.2.0.18-1_i386.deb

--10:38:54--  http://download.skype.com/linux/skype_1.2.0.18-1_i386.deb

Resolving download.skype.com... 198.63.211.251, 212.72.49.140, 195.215.8.138
Connecting to download.skype.com|198.63.211.251|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7906384 (7,5M) [application/x-debian-package]

Storing resource in file: `skype_1.2.0.18-1_i386.deb.1'

100%[=] 7.906.384   2,38M/s   in 3,2s

10:38:58 (2,38 MB/s) - `skype_1.2.0.18-1_i386.deb.1' saved [7906384/7906384]

do you think we should remove those two lines?

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: File name still not printed out?

2006-01-30 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 Resolving download.skype.com... 198.63.211.251, 212.72.49.140, 195.215.8.138
 Connecting to download.skype.com|198.63.211.251|:80... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: 7906384 (7,5M) [application/x-debian-package]

 Storing resource in file: `skype_1.2.0.18-1_i386.deb.1'

 100%[=] 7.906.384   2,38M/s   in 3,2s

 10:38:58 (2,38 MB/s) - `skype_1.2.0.18-1_i386.deb.1' saved [7906384/7906384]

 do you think we should remove those two lines?

IMO yes.  Storing resource in file might also be shortened to
saving to:  or something like that.


Re: File name still not printed out?

2006-01-30 Thread Mauro Tortonesi

Hrvoje Niksic wrote:

Mauro Tortonesi [EMAIL PROTECTED] writes:



Resolving download.skype.com... 198.63.211.251, 212.72.49.140, 195.215.8.138
Connecting to download.skype.com|198.63.211.251|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7906384 (7,5M) [application/x-debian-package]

Storing resource in file: `skype_1.2.0.18-1_i386.deb.1'

100%[=] 7.906.384   2,38M/s   in 3,2s

10:38:58 (2,38 MB/s) - `skype_1.2.0.18-1_i386.deb.1' saved [7906384/7906384]

do you think we should remove those two lines?



IMO yes.  Storing resource in file might also be shortened to
saving to:  or something like that.


good. here is the new output format:

[EMAIL PROTECTED] wget]$ ./src/wget 
http://download.skype.com/linux/skype_1.2.0.18-1_i386.deb

--11:09:28--  http://download.skype.com/linux/skype_1.2.0.18-1_i386.deb
Resolving download.skype.com... 198.63.211.251, 212.72.49.140, 195.215.8.138
Connecting to download.skype.com|198.63.211.251|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7906384 (7,5M) [application/x-debian-package]
Saving to: `skype_1.2.0.18-1_i386.deb'

100%[==] 7.906.384   5,02M/s   in 1,5s

11:09:30 (5,02 MB/s) - `skype_1.2.0.18-1_i386.deb' saved [7906384/7906384]

notice that in case -N is used, a HEAD request precedes the GET request, 
so with the new HTTP code, the output format is something like:


[EMAIL PROTECTED]:~/code/svn/wget/tests$ ./src/wget -N 
http://localhost:8080/dummy.html

--11:07:37--  http://localhost:8080/dummy.html
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:8080... connected.
HTTP request sent, awaiting response... 200 OK
Lunghezza: non specificato [text/plain]
Il file remoto è più recente, lo scarico.

--11:07:37--  http://localhost:8080/dummy.html
Connecting to localhost|127.0.0.1|:8080... connected.
HTTP request sent, awaiting response... 200 OK
Lunghezza: 8 [text/plain]
Saving to: `dummy.html'

100%[=] 8   --.-K/s   in 0s

11:07:37 (27,7 KB/s) - `dummy.html' saved [8/8]

it makes sense to me. any objections?

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: File name still not printed out?

2006-01-30 Thread Hrvoje Niksic
BTW it looks like the Content-Disposition header isn't being parsed
correctly.  For example:

$ ./wget http://www.mininova.org/get/212851
--22:11:52--  http://www.mininova.org/get/212851
Resolving www.mininova.org... 83.149.119.115
Connecting to www.mininova.org|83.149.119.115|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-bittorrent]
Saving to: `attachment; filename=Snooker - Ronnie O'Sullivan - World Champ 
1997.avi +{mininova.org}+.torrent'

The code in question is:

  if (!hs-local_file) 
{
  if (resp_header_copy (resp, Content-Disposition, hdrval, sizeof 
(hdrval)))
/* Honor Content-Disposition. */
{
  hs-local_file = xstrdup (hdrval);
}
  else
/* Choose filename according to URL name. */
{
  hs-local_file = url_file_name (u);
}
}

I don't think you're supposed to just take Content-Disposition
literally; we should parse the filename attribute.

Also note that you can use resp_header_strdup rather than
resp_header_copy+xstrdup.  The benefit is that it avoids the arbitrary
limit of sizeof(hdrval) (which is meant for numerical quantities such
as Content-Length and such).


Re: File name still not printed out?

2006-01-28 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 [EMAIL PROTECTED]:~/code/svn/wget/src$ ./wget http://www.tortonesi.com
 --13:20:41--  http://www.tortonesi.com/  (try:02)

 Resolving www.tortonesi.com... 62.149.140.31

You might want to put the URL last in line, so that the (try...)
text is not confused with what is being downloaded.

Is the empty line intentional?

 Connecting to www.tortonesi.com|62.149.140.31|:80... connected.
 HTTP richiesta inviata, aspetto la risposta... 200 OK
 Lunghezza: non specificato [text/html]

 Storing resource in file: `index.html'

This looks good.

OTOH, the bug I described in the first message of this thread is still
present.


Re: File name still not printed out?

2006-01-23 Thread Mauro Tortonesi

Hrvoje Niksic wrote:

Is it intentional that the Wget from the repository is still not
printing the file name it uses for download?  I (and some other users)
tend to use the Wget from CVS, so this is making Wget less useful
for me.

For example:

$ ./wget http://download.skype.com/linux/skype_1.2.0.18-1_i386.deb
--13:41:42--  http://download.skype.com/linux/skype_1.2.0.18-1_i386.deb

Resolving download.skype.com... 212.72.49.140, 198.63.211.251, 195.215.8.138
Connecting to download.skype.com|212.72.49.140|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7906384 (7.5M) [application/x-debian-package]

 0% [  ] 22,482  52.9K/s

As you can see, there is no indication whether the file is written to
skype_1.2.0.18-1_i386.deb, skype_1.2.0.18-1_i386.deb.1 or
something else entirely.


it's a small bug in the current http code. the logprintf call that 
prints the destination filename used LOG_VERBOSE instead of LOG_NOTQUIET.


however, i propose the switch to a new output format, consistent with 
the new http implementation, that delays the choice of the filename 
after the parsing of http response headers. this is the new output 
format that i suggest:



[EMAIL PROTECTED]:~/code/svn/wget/src$ ./wget http://www.tortonesi.com
--13:20:41--  http://www.tortonesi.com/  (try:02)

Resolving www.tortonesi.com... 62.149.140.31
Connecting to www.tortonesi.com|62.149.140.31|:80... connected.
HTTP richiesta inviata, aspetto la risposta... 200 OK
Lunghezza: non specificato [text/html]

Storing resource in file: `index.html'

[ = ] 3.359--.-K/s   in 0s

13:20:42 (7,21 MB/s) - `index.html' saved [3359]


the two modifications are:

1) in case of a further download attempt the (try:##) string is appended
   to the first line:

   --13:20:41--  http://www.tortonesi.com/  (try:02)

   the second line of output is always blank.

2) the destination filename is printed immediately before the progess
   bar:

   Storing resource in file: `index.html'


i will posted a patch implementing the new output format to wget-patches 
in a few minutes. i will commit the patch into svn as soon as we reach 
consensus about the new output format. let me know what you think about it.


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it