On Mon, Mar 12, 2012, Omer Zak wrote about Re: Unicode in C:
It depends upon your tradeoffs.
...
2. Otherwise, specify two such APIs - one is UTF-8 based, one is fixed
size wide character based. Create two binary variants of the libhspell
...
This is why I asked this question in the first
On Tue, Mar 13, 2012, kobi zamir wrote about Re: Unicode in C:
imho because hspell only use hebrew, it can internally continue to use
hebrew only charset without nikud iso-8859-8 (or with nikud win-1255).
I agree, and this has been my feeling all along. By using iso-8859-8
internally
So I guess that you're also in the UTF-8 camp.
yes, but my opinion about utf-8 is just my opinion. i like python and
python defaults to utf-8.
gtk likes unicode and utf-8:
http://www.gtk.org/api/2.6/glib/glib-Unicode-Manipulation.html
qt likes more options:
I don't think that input/output matters so much,
In something like hspell I/O should be modular so later on encoding can be
added.
After all it already has function to translate to/from internal
representation.
I believe that iso-8859-8 and utf8 should be good enough for starts.
Ely
2012/3/13
2012/3/13 kobi zamir kobi.za...@gmail.com
So I guess that you're also in the UTF-8 camp.
yes, but my opinion about utf-8 is just my opinion. i like python and
python defaults to utf-8.
Python's internal representation is not UTF-8, but UTF-16, or UTF-32,
depends on build parameters. Thus
Hi,
2012/3/13 Elazar Leibovich elaz...@gmail.com
2012/3/13 kobi zamir kobi.za...@gmail.com
So I guess that you're also in the UTF-8 camp.
yes, but my opinion about utf-8 is just my opinion. i like python and
python defaults to utf-8.
Python's internal representation is not UTF-8, but
On Tue, Mar 13, 2012 at 1:19 PM, Meir Kriheli mkrih...@gmail.com wrote:
Nitpick: It's actually ucs2/ucs4 (which preceded the above but are
compatible).
Double nitpick, UTF-16 and UCS-2 are identical representation, and it's
better to always use the name UTF-16 as the FAQ
these strings
to be in any language? Obviously the answer is Unicode, but that
doesn't really answer the question... How is Unicode used in C?
As far as I can see, there are two major approaches to this problem.
One approach, used in the Win32 C APIs on MS-Windows, and also in Java and
other
On Tue, Mar 13, 2012, Dan Kenigsberg wrote about Re: Unicode in C:
In my opinion, it is nice to fit to modern standards of your major target
environment (read: utf8), but not necessary to cater to all encodings.
It appears that the consensus on this list is that UTF-8 is indeed the
right way
On Tue, Mar 13, 2012 at 5:22 PM, Nadav Har'El n...@math.technion.ac.ilwrote:
Qt appears to use internally UTF-16. What major free software C library
actually prefer UTF-8?
Are you talking about the internal representation, or the external
interface?
The internal representation is in many
library), with functions which
take strings as arguments. What am I supposed to use if I want these
strings
to be in any language? Obviously the answer is Unicode, but that
doesn't really answer the question... How is Unicode used in C?
As far as I can see, there are two major approaches
On Tue, Mar 13, 2012, Elazar Leibovich wrote about Re: Unicode in C:
Something very important, one need to consider is Unicode normalization.
That is, how to strip out the Niqud, and to substitute, say KAF WITH DAGESH
(U+FB3B) with just a KAF (U+05DB) etc.
Is this really important? Does
On Tue, Mar 13, 2012 at 10:16 PM, Nadav Har'El n...@math.technion.ac.ilwrote:
On Tue, Mar 13, 2012, Elazar Leibovich wrote about Re: Unicode in C:
Something very important, one need to consider is Unicode normalization.
That is, how to strip out the Niqud, and to substitute, say KAF
Nadav Har'El wrote on Tue, Mar 13, 2012 at 22:16:23 +0200:
On Tue, Mar 13, 2012, Elazar Leibovich wrote about Re: Unicode in C:
Something very important, one need to consider is Unicode normalization.
That is, how to strip out the Niqud, and to substitute, say KAF WITH DAGESH
(U+FB3B
imho: hspell does hebrew spelling well.
we have iconv, glib, qt ... for doing encoding conversions well.
http://en.wikipedia.org/wiki/Unix_philosophy#McIlroy:_A_Quarter_Century_of_Unix
on the other side, it will be very nice to have a utf-8 interface to hspell
:-)
, but that
doesn't really answer the question... How is Unicode used in C?
As far as I can see, there are two major approaches to this problem.
One approach, used in the Win32 C APIs on MS-Windows, and also in Java and
other languages, is to use wide characters - characters of 16 or 32 bit
size
if I want these strings
to be in any language? Obviously the answer is Unicode, but that
doesn't really answer the question... How is Unicode used in C?
As far as I can see, there are two major approaches to this problem.
One approach, used in the Win32 C APIs on MS-Windows, and also in Java
On Mon, Mar 12, 2012 at 3:20 PM, Omer Zak w...@zak.co.il wrote:
If you need to use Far Eastern fonts and/or have random access for your
text, use fixed size wide character encoding (16 bit or 32 bit size).
Note that UTF-16, doesn't really offer random access, due to surrogate
pairs (not all
want these
strings
to be in any language? Obviously the answer is Unicode, but that
doesn't really answer the question... How is Unicode used in C?
As far as I can see, there are two major approaches to this problem.
One approach, used in the Win32 C APIs on MS-Windows, and also in Java
supposed to use if I want these
strings
to be in any language? Obviously the answer is Unicode, but that
doesn't really answer the question... How is Unicode used in C?
As far as I can see, there are two major approaches to this problem.
One approach, used in the Win32 C APIs on MS-Windows
, but that
doesn't really answer the question... How is Unicode used in C?
As far as I can see, there are two major approaches to this problem.
One approach, used in the Win32 C APIs on MS-Windows, and also in Java
and
other languages, is to use wide characters - characters of 16 or 32 bit
On Mon, Mar 12, 2012 at 5:39 PM, E L elyl...@cs.huji.ac.il wrote:
What's the advantage of using ucs-4 internally?
Especially if the program needs to save memory (embedded devices are
pretty common these days).
UTF-32 or UCS-4, is the only encoding form that allows random access to
each
On Mon, Mar 12, 2012, Elazar Leibovich wrote about Re: Unicode in C:
The simplest option is, to accept StringPiece-like structure (pointer to
buffer + size), and encoding, then to convert the data internally to your
encoding (say, ISO-8859-8, replacing illegal characters with whitespace
On Mon, Mar 12, 2012 at 7:37 PM, Nadav Har'El n...@math.technion.ac.ilwrote:
On Mon, Mar 12, 2012, Elazar Leibovich wrote about Re: Unicode in C:
The simplest option is, to accept StringPiece-like structure (pointer to
buffer + size), and encoding, then to convert the data internally to your
...@math.technion.ac.ilwrote:
On Mon, Mar 12, 2012, Elazar Leibovich wrote about Re: Unicode in C:
The simplest option is, to accept StringPiece-like structure (pointer to
buffer + size), and encoding, then to convert the data internally to
your
encoding (say, ISO-8859-8, replacing illegal characters
25 matches
Mail list logo