Re: More on the ASCII/Unicode support

David Elliott Thu, 27 Apr 2000 23:33:35 -0700
Excuse me.... since when did this list turn into the linux-kernel mailing
list???  (And I'm gonna add to it.. crap)

Seriously folks,
As I understand it, Unicode is supposedly able to express every character in
a standard 16-bit format.  In a few cases, we may need to deal with
surrogate pairs, but that's doable.

It should be possible to convert text from ANY codepage into unicode without
a whole hell of a lot of overhead.  7-bit ascii to unicode is simple because
you just add a 0 before every byte.  You cannot tell me that on a reasonably
decent processor that a quick for loop that runs as many iterations as there
are characters in the string is going to eat that much processor time.

Double compilation is just plain nuts (sorry Patrick).  Sure it doesn't have
to run that conversion, but compiling every source file 3 times (or
whatever) does not sound like my idea of fun.  And furthermore, you have
that many times as much object code, which is taking up memory.  Worse yet,
it makes the source file look ugly as hell.

Okay, right now we have some problems though.  UNIX only seems to understand
8-bit characters.  So we use UTF8 to handle this on some systems.
Unicode->UTF8 conversion is again not that big of a deal.  If you write your
code correctly you should be able to convert a plain ASCII in unicode format
to UTF8 (basically sending it back to plain ascii) with one iteration per
character.

So... let's see.. how much processing time did we waste going from
ASCII->Unicode->UTF8... hmmmm.... 2*N (where N is number of characters).
I think I can live with that.  Hell, I'd even be willing to live with more
than that.

Keep in mind that it's 1:30am and I didn't really think that through
thorougly, but thinking about it, it seems as if it's not a lot of work
converting between the two.

Now, the other issue is what to do about allocating memory.  Alexandre
mentioned that NT has a per-thread buffer that it uses.  That is probably
the way to go, since it's allocated ahead of time if you get an OOM error on
that, you have other issues.  You of course also need to deal with growing
and shrinking the buffer to accomodate very large strings.  Of course don't
most of the functions have a 255 character (as in 16-bits/char) limit
anyway??  So if you didn't want to deal with growing the buffer, hell, make
it 1k and you have twice as much space as you need.  Even if you have 1000
threads you only use a meg of RAM!

Anyway.. I am going to bed now, (SARCASM)it was kind of fun to watch
wine-devel to degenerate to linux-kernel style for a day.(/SARCASM)

-Dave


Alexandre Julliard wrote:

> Patrik Stridvall <[EMAIL PROTECTED]> writes:
>
> > This mean that we in theory could support with compile options
> > (1) W->A with auto generated conversions (pseudo Unicode support)
> > (2) A->W with auto generated conversions (your proposal)
> > (3) Double compilation (my proposal)
> > (4) A only
> > (5) W only
> >
> > Sure the input files will need to use TCHAR and friends,
> > but they will use just as much memory as they do now
> > if we compile with option (2).
> >
> > What do you think?
>
> No thanks.
>
> --
> Alexandre Julliard
> [EMAIL PROTECTED]
Re: More on the ASCII/Unicode support

Reply via email to