On Wed, Jan 24, 2007 at 11:02:00AM +0100, Karsten Otto wrote:
> Am 22.01.2007 um 17:55 schrieb Peter Amstutz:
> 
> Sounds good, in a binary encoding you might even want to replace  
> symbols completely by numbers.

Right.  That's actually the main reasoning behind this, to use it as a 
compression scheme for binary encodings as well as saving memory on the 
site itself.

> > Also, I would like to address internationalization issues in the next
> > iteration of VOS, but I don't know much about it, so advice (such as
> > what the best way is to handle unicode) is would be greatly  
> > appreciated.
> >
> No idea either, except that full i18n is a real pain... you have to  
> hink about text flow direction for once, and what happens if you  
> quote an English phrase in an Arabian text...

Well, I'm less concerned with display issues than with simply preserving 
the encoding (and, as Lalo suggested, the language) so that it can be 
transcoded as required for output, or when crossing API boundaries.

> I get the vague notion that UTF8 is what most people are doing right  
> now, at least in the western hemisphere. No idea about asian  
> languages, which might prefer UTF16, but which do have a bunch of  
> legacy solutions as well iirc.

We had a discussion about this on IRC, and it seems that for CJK 
(Chinese/Japanese/Korean) languages the most efficient encoding is 
actually the full-width 4 byte encoding.  Most of the rest of the world 
can get by with UTF-8 (which has the advantage of mapping directly to 
7-bit ASCII and encoding most European languages efficiently). UTF-16 
found in Win32 and Java rounds out the encoding zoo.  There are dozens 
of legacy encodings out there as well, but I think the least insane path 
is to try and work in some form of Unicode exclusively and only convert 
to other character sets as a last step when absolutely necessary.

> Well... While I like Java in many respects, I still resent their  
> notion of signed bytes. This is more an artifact of the type system  
> than something anybody would really want. I usually start cursing  
> when I have to re-construct multi-byte values from foreign file  
> formats or protocols.
>
> That said, it most certainly *is* possible to live with that, even  
> for byte arrays in I/O, as long as the arithmetic operations  
> basically treat the high-order-bit like any other.

The basic issue here is that I'd like to avoid compatibility problems 
between languages, but that languages that are not designed for 
bit-twiddling don't necessarily have unsigned types.  To some extent 
this mismatch is unavoidable, but the real question is whether it's 
easier/more efficient in the long run to compromise on the design (VOS 
won't have unsigned types) or handle it on a case-by-case basis (users 
of Java and other languages will have to suck it up and deal with a 
less-than-optimal mapping.)  This isn't academic, since data like 
indexed triangle lists are naturally unsigned (a negative index makes no 
sense).

-- 
[   Peter Amstutz  ][ [EMAIL PROTECTED] ][ [EMAIL PROTECTED] ]
[Lead Programmer][Interreality Project][Virtual Reality for the Internet]
[ VOS: Next Generation Internet Communication][ http://interreality.org ]
[ http://interreality.org/~tetron ][ pgpkey:  pgpkeys.mit.edu  18C21DF7 ]

Attachment: signature.asc
Description: Digital signature

_______________________________________________
vos-d mailing list
vos-d@interreality.org
http://www.interreality.org/cgi-bin/mailman/listinfo/vos-d

Reply via email to