On Mon, Mar 12, 2012, Omer Zak wrote about Re: Unicode in C:
It depends upon your tradeoffs.
...
2. Otherwise, specify two such APIs - one is UTF-8 based, one is fixed
size wide character based. Create two binary variants of the libhspell
...
This is why I asked this question in the first
On Tue, Mar 13, 2012, kobi zamir wrote about Re: Unicode in C:
imho because hspell only use hebrew, it can internally continue to use
hebrew only charset without nikud iso-8859-8 (or with nikud win-1255).
I agree, and this has been my feeling all along. By using iso-8859-8
internally
So I guess that you're also in the UTF-8 camp.
yes, but my opinion about utf-8 is just my opinion. i like python and
python defaults to utf-8.
gtk likes unicode and utf-8:
http://www.gtk.org/api/2.6/glib/glib-Unicode-Manipulation.html
qt likes more options:
I don't think that input/output matters so much,
In something like hspell I/O should be modular so later on encoding can be
added.
After all it already has function to translate to/from internal
representation.
I believe that iso-8859-8 and utf8 should be good enough for starts.
Ely
2012/3/13
2012/3/13 kobi zamir kobi.za...@gmail.com
So I guess that you're also in the UTF-8 camp.
yes, but my opinion about utf-8 is just my opinion. i like python and
python defaults to utf-8.
Python's internal representation is not UTF-8, but UTF-16, or UTF-32,
depends on build parameters. Thus
Hi,
2012/3/13 Elazar Leibovich elaz...@gmail.com
2012/3/13 kobi zamir kobi.za...@gmail.com
So I guess that you're also in the UTF-8 camp.
yes, but my opinion about utf-8 is just my opinion. i like python and
python defaults to utf-8.
Python's internal representation is not UTF-8, but
On Tue, Mar 13, 2012 at 1:19 PM, Meir Kriheli mkrih...@gmail.com wrote:
Nitpick: It's actually ucs2/ucs4 (which preceded the above but are
compatible).
Double nitpick, UTF-16 and UCS-2 are identical representation, and it's
better to always use the name UTF-16 as the FAQ
On Mon, Mar 12, 2012 at 03:05:56PM +0200, Nadav Har'El wrote:
Hi, I have a question that I was sort of sad that I couldn't readily
find the answer to...
Let's say I want to create a C API (a C library), with functions which
take strings as arguments. What am I supposed to use if I want these
On Tue, Mar 13, 2012, Dan Kenigsberg wrote about Re: Unicode in C:
In my opinion, it is nice to fit to modern standards of your major target
environment (read: utf8), but not necessary to cater to all encodings.
It appears that the consensus on this list is that UTF-8 is indeed the
right way
On Tue, Mar 13, 2012 at 5:22 PM, Nadav Har'El n...@math.technion.ac.ilwrote:
Qt appears to use internally UTF-16. What major free software C library
actually prefer UTF-8?
Are you talking about the internal representation, or the external
interface?
The internal representation is in many
Something very important, one need to consider is Unicode normalization.
That is, how to strip out the Niqud, and to substitute, say KAF WITH DAGESH
(U+FB3B) with just a KAF (U+05DB) etc.
I guess that you're doing that already to some degree in hspell, so (in
case you're translating to
On Tue, Mar 13, 2012, Elazar Leibovich wrote about Re: Unicode in C:
Something very important, one need to consider is Unicode normalization.
That is, how to strip out the Niqud, and to substitute, say KAF WITH DAGESH
(U+FB3B) with just a KAF (U+05DB) etc.
Is this really important? Does
On Tue, Mar 13, 2012 at 10:16 PM, Nadav Har'El n...@math.technion.ac.ilwrote:
On Tue, Mar 13, 2012, Elazar Leibovich wrote about Re: Unicode in C:
Something very important, one need to consider is Unicode normalization.
That is, how to strip out the Niqud, and to substitute, say KAF
Nadav Har'El wrote on Tue, Mar 13, 2012 at 22:16:23 +0200:
On Tue, Mar 13, 2012, Elazar Leibovich wrote about Re: Unicode in C:
Something very important, one need to consider is Unicode normalization.
That is, how to strip out the Niqud, and to substitute, say KAF WITH DAGESH
(U+FB3B
imho: hspell does hebrew spelling well.
we have iconv, glib, qt ... for doing encoding conversions well.
http://en.wikipedia.org/wiki/Unix_philosophy#McIlroy:_A_Quarter_Century_of_Unix
on the other side, it will be very nice to have a utf-8 interface to hspell
:-)
It depends upon your tradeoffs.
If you use mostly Western fonts (Latin, Hebrew, etc.) and want to
economize on memory use, use UTF-8. However, for Chinese, it costs more
memory than it saves.
If you need to use Far Eastern fonts and/or have random access for your
text, use fixed size wide
On Mon, Mar 12, 2012 at 3:20 PM, Omer Zak w...@zak.co.il wrote:
If you need to use Far Eastern fonts and/or have random access for your
text, use fixed size wide character encoding (16 bit or 32 bit size).
Note that UTF-16, doesn't really offer random access, due to surrogate
pairs (not all
The simplest option is, to accept StringPiece-like structure (pointer to
buffer + size), and encoding, then to convert the data internally to your
encoding (say, ISO-8859-8, replacing illegal characters with whitespace),
and convert the other output back.
Do you mind using iconv-like library?
On
My suggestion is go the glib/gtk approach and use utf-8 everywhere and have
the API accept char*, i.e. there is no typedef for a unicode character
strings. If this is not acceptable because of speed (this is its only
tradeoff), then use UCS-4 internally and provide two external interfaces
for
What's the advantage of using ucs-4 internally?
Especially if the program needs to save memory (embedded devices are pretty
common these days).
Ely
2012/3/12 Dov Grobgeld dov.grobg...@gmail.com
My suggestion is go the glib/gtk approach and use utf-8 everywhere and
have the API accept char*,
On Mon, Mar 12, 2012 at 5:39 PM, E L elyl...@cs.huji.ac.il wrote:
What's the advantage of using ucs-4 internally?
Especially if the program needs to save memory (embedded devices are
pretty common these days).
UTF-32 or UCS-4, is the only encoding form that allows random access to
each
On Mon, Mar 12, 2012, Elazar Leibovich wrote about Re: Unicode in C:
The simplest option is, to accept StringPiece-like structure (pointer to
buffer + size), and encoding, then to convert the data internally to your
encoding (say, ISO-8859-8, replacing illegal characters with whitespace
On Mon, Mar 12, 2012 at 7:37 PM, Nadav Har'El n...@math.technion.ac.ilwrote:
On Mon, Mar 12, 2012, Elazar Leibovich wrote about Re: Unicode in C:
The simplest option is, to accept StringPiece-like structure (pointer to
buffer + size), and encoding, then to convert the data internally to your
...@math.technion.ac.ilwrote:
On Mon, Mar 12, 2012, Elazar Leibovich wrote about Re: Unicode in C:
The simplest option is, to accept StringPiece-like structure (pointer to
buffer + size), and encoding, then to convert the data internally to
your
encoding (say, ISO-8859-8, replacing illegal characters
24 matches
Mail list logo