Re: conditional url encoding

2003-02-22 Thread Tony Lewis
Ryan Underwood wrote:

 It seems that some servers are broken and in order to fetch files with
certain
 filenames, some characters that are normally encoded in HTTP sequences
must
 be sent through unencoded.  For example, I had a server the other day that
 I was fetching files from at the URL:
 http://server.com/~foobar/files

I'm having a hard time figuring out why wget is encoding the tilde in the
first place. They way I read RFC 2396, tilde is one of several marks that
are not encoded. The complete set of marks defined in RFC 2396 is
-_.!~*'().

Perhaps the encoding rules in wget were written prior to the publication of
RFC 2396 and are based on the national character discussion of RFC 1630.
If so, tilde is the only character that was defined as national in RFC
1630 and as a mark in RFC 2396.

For what it's worth, the national characters in RFC 1630 are {}|[]\^~.

Tony



Re: conditional url encoding

2003-02-22 Thread Dan Mahoney, System Admin
On Sat, 22 Feb 2003, Tony Lewis wrote:

 Ryan Underwood wrote:
 
  It seems that some servers are broken and in order to fetch files with
 certain
  filenames, some characters that are normally encoded in HTTP sequences
 must
  be sent through unencoded.  For example, I had a server the other day that
  I was fetching files from at the URL:
  http://server.com/~foobar/files

 I'm having a hard time figuring out why wget is encoding the tilde in the
 first place. They way I read RFC 2396, tilde is one of several marks that
 are not encoded. The complete set of marks defined in RFC 2396 is
 -_.!~*'().

 Perhaps the encoding rules in wget were written prior to the publication of
 RFC 2396 and are based on the national character discussion of RFC 1630.
 If so, tilde is the only character that was defined as national in RFC
 1630 and as a mark in RFC 2396.

 For what it's worth, the national characters in RFC 1630 are {}|[]\^~.

I've had this problem with homepages.go.com, and noticed that a windows
version of wget for some reason didn't do it (there's some kind of if
$windows) define in there that I noticed.




 Tony


--

If you aren't going to try something, then we might as well just be
friends.

We can't have that now, can we?

-SK  Dan Mahoney,  December 9, 1998

Dan Mahoney
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---




conditional url encoding

2003-02-21 Thread Ryan Underwood

Hi,

I don't know how stupid of a question this is, but it was worth a hack for
me, so maybe other users might benefit from it too.

It seems that some servers are broken and in order to fetch files with certain
filenames, some characters that are normally encoded in HTTP sequences must
be sent through unencoded.  For example, I had a server the other day that
I was fetching files from at the URL:
http://server.com/~foobar/files

Sending the normal request for
GET /%7Efoobar/files

caused the server to return a 404.  However, I hacked wget and added a
noencodetilde option, and changed the following line around url.c:109:

#define UNSAFE_CHAR(c) (urlchr_test(c, urlchr_unsafe))

to:
#define UNSAFE_CHAR(c) (urlchr_test(c, urlchr_unsafe)  !(opt.noencodetilde  
c=='~'))

This caused the tilde to be sent through unencoded, and the files were fetched
properly.

The reason I mention this is that perhaps a general option to give wget a list
of characters not to encode (that would normally be encoded) would be useful
in fetching files from corner cases such as this.

Thanks!

-- 
Ryan Underwood, nemesis at icequake.net, icq=10317253