-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mishari Almishari wrote:
> hi,
> if the webpage is encoded in some encoding scheme, does the wget save
> the webpage with the encoding or after decoding? What about if i used
> the -E option, does is save it after decoding it to its original html
> content?

Wget currently does no character set decoding whatsoever. It just saves
pages as-is. One side-effect of this is that it cannot handle certain
shifting or multibyte character sets, notably ISO-2022-JP, as it will
always interpret the byte value for '<' as representing the character
'<', which isn't always the case in such character sets.

This will be addressed at some future point, in the meantime, though,
the answer is that Wget doesn't do any decoding, and neither does it add
or save any information about what the original character encoding was.
A plan is in place (for "Wget 2.0") to save such information in a
"download session database"; it might also be useful at some point to
have Wget add such information directly into the page, using the META
tag. And of course, to offer transcoding support.

- --
HTH,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHaAyV7M8hyUobTrERAotaAJsGhNObhZ9x3IHlXZYjXLlCP76N0gCgjEkb
0vIyjhouSkR8o/CsOepseWs=
=2XLE
-----END PGP SIGNATURE-----

Reply via email to