Re: string_t breaks compilation under Solaris

Mauro Tortonesi Mon, 21 Feb 2005 12:00:01 -0800

On Monday 21 February 2005 12:35 pm, Leonid wrote:
> Mauro,
>
>     I tend to agree with Hrvoje. If you decide to open the
> Pandora's box and implement iconv support, please, please,
> provide an option, preferably default one, to configure or
> use wget without iconv.


of course.

> FYI, there are languages which actively use more than one coding. For 
> example, I know 14 different codings for Russian language. 

i know, i know. i18n is a great pain in the neck.

> It is rather common that either the charset at the remote host or the 
> charset at the local host are set incorrectly. 

this is not a problem. actually (apart from the case of a document returned as 
an HTTP response) we cannot be sure that the charset used by the server is 
exactly our locale. the only two reasonable things we can do are:

- assume all data is ASCII
- assume all data is in our locale charset

the second assumption allows us to avoid problems like this one: 

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=271931

> iconv will choke in an attempt to recode strings to UTF-8 and back in such a 
> situation. 

actually, the interpretation of data as a sequence of multi byte charachters 
encoded in the local charset is done using mbrtowc(3), which allows us to get 
an array of wide chars (see current CVS code in string_t.c). we would need to 
use iconv(3) only to translate the obtained wide char string into a UTF8 
encoded (normal) char string and eventually for UTF8 {de,en}coding.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi

University of Ferrara - Dept. of Eng.    http://www.ing.unife.it
Institute of Human & Machine Cognition   http://www.ihmc.us
Deep Space 6 - IPv6 for Linux            http://www.deepspace6.net
Ferrara Linux User Group                 http://www.ferrara.linux.it

Re: string_t breaks compilation under Solaris

Reply via email to