Re: iconv limitations

2004-04-08 Thread Michael B Allen
Markus Kuhn said: >> > AFAICT iconv(3) requires that the length of the input be known in >> advance. > On the other hand, I have great difficulty to envision a real-world > situation, where the user of iconv > > - knows that the input is zero terminated I have great difficulty in envisioning th

Re: iconv limitations

2004-04-08 Thread Glenn Maynard
On Thu, Apr 08, 2004 at 04:17:41AM -0400, Michael B Allen wrote: > > - knows that the input is zero terminated > > I have great difficulty in envisioning the opposite. Binary file formats > and network protocols have a lot of zero terminated strings in all sorts > of encodings. > > > - does n

Re: iconv limitations

2004-04-08 Thread Wu Yongwei
IMHO, if the input contains possibly embedded nulls, it is then simply NOT "null-terminated string". The concept of null-terminated string should only be used with data without embedded nulls, such as ASCII, EUC-JP, EUC-CN, UTF-8, and so on. If this is not the case (as in UTF-16), using data length

Re: iconv limitations

2004-04-08 Thread Michael B Allen
Glenn Maynard said: > However, the case where 1: data is zero-terminated *and* 2: you don't at > least know whether you're dealing with an 8-, 16- or 32-bit encoding is, > in my experience, non-existant. After all, "zero-terminated" is > meaningless unless you know what "zero" means--an 8-bit, 16-

Re: iconv limitations

2004-04-08 Thread Michael B Allen
Wu Yongwei said: > IMHO, if the input contains possibly embedded nulls, it is then simply > NOT "null-terminated string". The concept of null-terminated string > should only be used with data without embedded nulls, such as ASCII, > EUC-JP, EUC-CN, UTF-8, and so on. If this is not the case (as in >

Re: iconv limitations

2004-04-08 Thread jmaiorana
Iconv is just clumsy. You can't even make (sane) wrappers to do this stuff. It's as if it were designed by people just converting big chunks of raw text. Maybe it's just me but I'm not seeing that in real world apps. On the other hand, the iconv API is more flexible the way it is. It can handle

Re: iconv limitations

2004-04-08 Thread Glenn Maynard
On Thu, Apr 08, 2004 at 05:57:31AM -0400, Michael B Allen wrote: > And it's still a little slow and it makes for ugly code. It makes me wince > to have to scan for the terminator when I know it could be done much much > cleaner in the conversion routine. Personally, I consider this a flaw in the c

Re: iconv limitations

2004-04-08 Thread Michael B Allen
[EMAIL PROTECTED] said: > >>Iconv is just clumsy. You can't even make (sane) wrappers to do this >>stuff. It's as if it were designed by people just converting big chunks >> of >>raw text. Maybe it's just me but I'm not seeing that in real world apps. >> >> > > On the other hand, the iconv API is m

Re: iconv limitations

2004-04-08 Thread Glenn Maynard
On Thu, Apr 08, 2004 at 06:17:55PM -0400, Michael B Allen wrote: > > On the other hand, the iconv API is more flexible the way it is. It > > can handle strings with embedded zeroes, > > Now *that* is rare. I use std::string, which is 8-bit clean, and I always like to make things remain that way u

Re: iconv limitations

2004-04-08 Thread srintuar
The encdec interface I described can convert non-null terminated strings by limiting the number of bytes inspected in src using the sn parameter. The iconv interface is still cleaner. I do admit though, I would also like lower-level access to the iconv internal structures. Until we've eliminated a

W3C and UTF-16

2004-04-08 Thread Michael B Allen
srintuar said: >>The W3C claims all apps should use UTF-16 internally >> > Ghastly recommendation. I'd sooner see utf-16 deprecated as a > unicode encoding than advise it be used anywhere where its not strictly > mandatory for *backwards* compatibility. > > Do you have a link to this malfeasance? P

Re: W3C and UTF-16

2004-04-08 Thread Wesley J Landaker
On Thursday 08 April 2004 6:35 pm, Michael B Allen wrote: > srintuar said: > >>The W3C claims all apps should use UTF-16 internally > > > > Ghastly recommendation. I'd sooner see utf-16 deprecated as a > > unicode encoding than advise it be used anywhere where its not > > strictly mandatory for *ba

Re: W3C and UTF-16

2004-04-08 Thread Glenn Maynard
On Thu, Apr 08, 2004 at 08:35:21PM -0400, Michael B Allen wrote: > This is probably states the definitive position for text handling: > > http://www.w3.org/TR/1999/WD-charmod-19991129/#Encodings > > But even though the encoding is not clearly stated as UTF-16, the Document > Object Model (DOM) wh

Re: W3C and UTF-16

2004-04-08 Thread Michael B Allen
Glenn Maynard said: > On Thu, Apr 08, 2004 at 08:35:21PM -0400, Michael B Allen wrote: >> This is probably states the definitive position for text handling: >> >> http://www.w3.org/TR/1999/WD-charmod-19991129/#Encodings >> >> But even though the encoding is not clearly stated as UTF-16, the >> Docu