Re: conditional url encoding
Ryan Underwood wrote: It seems that some servers are broken and in order to fetch files with certain filenames, some characters that are normally encoded in HTTP sequences must be sent through unencoded. For example, I had a server the other day that I was fetching files from at the URL: http://server.com/~foobar/files I'm having a hard time figuring out why wget is encoding the tilde in the first place. They way I read RFC 2396, tilde is one of several marks that are not encoded. The complete set of marks defined in RFC 2396 is -_.!~*'(). Perhaps the encoding rules in wget were written prior to the publication of RFC 2396 and are based on the national character discussion of RFC 1630. If so, tilde is the only character that was defined as national in RFC 1630 and as a mark in RFC 2396. For what it's worth, the national characters in RFC 1630 are {}|[]\^~. Tony
Re: conditional url encoding
On Sat, 22 Feb 2003, Tony Lewis wrote: Ryan Underwood wrote: It seems that some servers are broken and in order to fetch files with certain filenames, some characters that are normally encoded in HTTP sequences must be sent through unencoded. For example, I had a server the other day that I was fetching files from at the URL: http://server.com/~foobar/files I'm having a hard time figuring out why wget is encoding the tilde in the first place. They way I read RFC 2396, tilde is one of several marks that are not encoded. The complete set of marks defined in RFC 2396 is -_.!~*'(). Perhaps the encoding rules in wget were written prior to the publication of RFC 2396 and are based on the national character discussion of RFC 1630. If so, tilde is the only character that was defined as national in RFC 1630 and as a mark in RFC 2396. For what it's worth, the national characters in RFC 1630 are {}|[]\^~. I've had this problem with homepages.go.com, and noticed that a windows version of wget for some reason didn't do it (there's some kind of if $windows) define in there that I noticed. Tony -- If you aren't going to try something, then we might as well just be friends. We can't have that now, can we? -SK Dan Mahoney, December 9, 1998 Dan Mahoney Techie, Sysadmin, WebGeek Gushi on efnet/undernet IRC ICQ: 13735144 AIM: LarpGM Site: http://www.gushi.org ---
conditional url encoding
Hi, I don't know how stupid of a question this is, but it was worth a hack for me, so maybe other users might benefit from it too. It seems that some servers are broken and in order to fetch files with certain filenames, some characters that are normally encoded in HTTP sequences must be sent through unencoded. For example, I had a server the other day that I was fetching files from at the URL: http://server.com/~foobar/files Sending the normal request for GET /%7Efoobar/files caused the server to return a 404. However, I hacked wget and added a noencodetilde option, and changed the following line around url.c:109: #define UNSAFE_CHAR(c) (urlchr_test(c, urlchr_unsafe)) to: #define UNSAFE_CHAR(c) (urlchr_test(c, urlchr_unsafe) !(opt.noencodetilde c=='~')) This caused the tilde to be sent through unencoded, and the files were fetched properly. The reason I mention this is that perhaps a general option to give wget a list of characters not to encode (that would normally be encoded) would be useful in fetching files from corner cases such as this. Thanks! -- Ryan Underwood, nemesis at icequake.net, icq=10317253