Re: Unicode in C

2012-03-13 Thread Nadav Har'El
On Mon, Mar 12, 2012, Omer Zak wrote about Re: Unicode in C: It depends upon your tradeoffs. ... 2. Otherwise, specify two such APIs - one is UTF-8 based, one is fixed size wide character based. Create two binary variants of the libhspell ... This is why I asked this question in the first

Re: Unicode in C

2012-03-13 Thread Nadav Har'El
On Tue, Mar 13, 2012, kobi zamir wrote about Re: Unicode in C: imho because hspell only use hebrew, it can internally continue to use hebrew only charset without nikud iso-8859-8 (or with nikud win-1255). I agree, and this has been my feeling all along. By using iso-8859-8 internally

Re: Unicode in C

2012-03-13 Thread kobi zamir
So I guess that you're also in the UTF-8 camp. yes, but my opinion about utf-8 is just my opinion. i like python and python defaults to utf-8. gtk likes unicode and utf-8: http://www.gtk.org/api/2.6/glib/glib-Unicode-Manipulation.html qt likes more options:

Re: Unicode in C

2012-03-13 Thread Ely Levy
I don't think that input/output matters so much, In something like hspell I/O should be modular so later on encoding can be added. After all it already has function to translate to/from internal representation. I believe that iso-8859-8 and utf8 should be good enough for starts. Ely 2012/3/13

Re: Unicode in C

2012-03-13 Thread Elazar Leibovich
2012/3/13 kobi zamir kobi.za...@gmail.com So I guess that you're also in the UTF-8 camp. yes, but my opinion about utf-8 is just my opinion. i like python and python defaults to utf-8. Python's internal representation is not UTF-8, but UTF-16, or UTF-32, depends on build parameters. Thus

Re: Unicode in C

2012-03-13 Thread Meir Kriheli
Hi, 2012/3/13 Elazar Leibovich elaz...@gmail.com 2012/3/13 kobi zamir kobi.za...@gmail.com So I guess that you're also in the UTF-8 camp. yes, but my opinion about utf-8 is just my opinion. i like python and python defaults to utf-8. Python's internal representation is not UTF-8, but

Re: Unicode in C

2012-03-13 Thread Elazar Leibovich
On Tue, Mar 13, 2012 at 1:19 PM, Meir Kriheli mkrih...@gmail.com wrote: Nitpick: It's actually ucs2/ucs4 (which preceded the above but are compatible). Double nitpick, UTF-16 and UCS-2 are identical representation, and it's better to always use the name UTF-16 as the FAQ

Re: Unicode in C

2012-03-13 Thread Dan Kenigsberg
these strings to be in any language? Obviously the answer is Unicode, but that doesn't really answer the question... How is Unicode used in C? As far as I can see, there are two major approaches to this problem. One approach, used in the Win32 C APIs on MS-Windows, and also in Java and other

Re: Unicode in C

2012-03-13 Thread Nadav Har'El
On Tue, Mar 13, 2012, Dan Kenigsberg wrote about Re: Unicode in C: In my opinion, it is nice to fit to modern standards of your major target environment (read: utf8), but not necessary to cater to all encodings. It appears that the consensus on this list is that UTF-8 is indeed the right way

Re: Unicode in C

2012-03-13 Thread Elazar Leibovich
On Tue, Mar 13, 2012 at 5:22 PM, Nadav Har'El n...@math.technion.ac.ilwrote: Qt appears to use internally UTF-16. What major free software C library actually prefer UTF-8? Are you talking about the internal representation, or the external interface? The internal representation is in many

Re: Unicode in C

2012-03-13 Thread Elazar Leibovich
library), with functions which take strings as arguments. What am I supposed to use if I want these strings to be in any language? Obviously the answer is Unicode, but that doesn't really answer the question... How is Unicode used in C? As far as I can see, there are two major approaches

Re: Unicode in C

2012-03-13 Thread Nadav Har'El
On Tue, Mar 13, 2012, Elazar Leibovich wrote about Re: Unicode in C: Something very important, one need to consider is Unicode normalization. That is, how to strip out the Niqud, and to substitute, say KAF WITH DAGESH (U+FB3B) with just a KAF (U+05DB) etc. Is this really important? Does

Re: Unicode in C

2012-03-13 Thread Elazar Leibovich
On Tue, Mar 13, 2012 at 10:16 PM, Nadav Har'El n...@math.technion.ac.ilwrote: On Tue, Mar 13, 2012, Elazar Leibovich wrote about Re: Unicode in C: Something very important, one need to consider is Unicode normalization. That is, how to strip out the Niqud, and to substitute, say KAF

Re: Unicode in C

2012-03-13 Thread Daniel Shahaf
Nadav Har'El wrote on Tue, Mar 13, 2012 at 22:16:23 +0200: On Tue, Mar 13, 2012, Elazar Leibovich wrote about Re: Unicode in C: Something very important, one need to consider is Unicode normalization. That is, how to strip out the Niqud, and to substitute, say KAF WITH DAGESH (U+FB3B

Re: Unicode in C

2012-03-13 Thread kobi zamir
imho: hspell does hebrew spelling well. we have iconv, glib, qt ... for doing encoding conversions well. http://en.wikipedia.org/wiki/Unix_philosophy#McIlroy:_A_Quarter_Century_of_Unix on the other side, it will be very nice to have a utf-8 interface to hspell :-)

Unicode in C

2012-03-12 Thread Nadav Har'El
, but that doesn't really answer the question... How is Unicode used in C? As far as I can see, there are two major approaches to this problem. One approach, used in the Win32 C APIs on MS-Windows, and also in Java and other languages, is to use wide characters - characters of 16 or 32 bit size

Re: Unicode in C

2012-03-12 Thread Omer Zak
if I want these strings to be in any language? Obviously the answer is Unicode, but that doesn't really answer the question... How is Unicode used in C? As far as I can see, there are two major approaches to this problem. One approach, used in the Win32 C APIs on MS-Windows, and also in Java

Re: Unicode in C

2012-03-12 Thread Elazar Leibovich
On Mon, Mar 12, 2012 at 3:20 PM, Omer Zak w...@zak.co.il wrote: If you need to use Far Eastern fonts and/or have random access for your text, use fixed size wide character encoding (16 bit or 32 bit size). Note that UTF-16, doesn't really offer random access, due to surrogate pairs (not all

Re: Unicode in C

2012-03-12 Thread Elazar Leibovich
want these strings to be in any language? Obviously the answer is Unicode, but that doesn't really answer the question... How is Unicode used in C? As far as I can see, there are two major approaches to this problem. One approach, used in the Win32 C APIs on MS-Windows, and also in Java

Re: Unicode in C

2012-03-12 Thread Dov Grobgeld
supposed to use if I want these strings to be in any language? Obviously the answer is Unicode, but that doesn't really answer the question... How is Unicode used in C? As far as I can see, there are two major approaches to this problem. One approach, used in the Win32 C APIs on MS-Windows

Re: Unicode in C

2012-03-12 Thread Ely Levy
, but that doesn't really answer the question... How is Unicode used in C? As far as I can see, there are two major approaches to this problem. One approach, used in the Win32 C APIs on MS-Windows, and also in Java and other languages, is to use wide characters - characters of 16 or 32 bit

Re: Unicode in C

2012-03-12 Thread Elazar Leibovich
On Mon, Mar 12, 2012 at 5:39 PM, E L elyl...@cs.huji.ac.il wrote: What's the advantage of using ucs-4 internally? Especially if the program needs to save memory (embedded devices are pretty common these days). UTF-32 or UCS-4, is the only encoding form that allows random access to each

Re: Unicode in C

2012-03-12 Thread Nadav Har'El
On Mon, Mar 12, 2012, Elazar Leibovich wrote about Re: Unicode in C: The simplest option is, to accept StringPiece-like structure (pointer to buffer + size), and encoding, then to convert the data internally to your encoding (say, ISO-8859-8, replacing illegal characters with whitespace

Re: Unicode in C

2012-03-12 Thread Elazar Leibovich
On Mon, Mar 12, 2012 at 7:37 PM, Nadav Har'El n...@math.technion.ac.ilwrote: On Mon, Mar 12, 2012, Elazar Leibovich wrote about Re: Unicode in C: The simplest option is, to accept StringPiece-like structure (pointer to buffer + size), and encoding, then to convert the data internally to your

Re: Unicode in C

2012-03-12 Thread kobi zamir
...@math.technion.ac.ilwrote: On Mon, Mar 12, 2012, Elazar Leibovich wrote about Re: Unicode in C: The simplest option is, to accept StringPiece-like structure (pointer to buffer + size), and encoding, then to convert the data internally to your encoding (say, ISO-8859-8, replacing illegal characters