[PATCH] languages/PIR fix string encoding, hex and binary numbers

2007-02-08 Thread Klaas-Jan Stol
hi attached a patch for languages/PIR fixing: * added optional utf8: encoding specifier (according to docs/imcc/syntax.pod) * fixed support for binary and hex. numbers * added test for these changes. regards, klaas-jan Index: languages/PIR/lib/pir.pg

[perl #34285] [BUG] Is string encoding written to PBC file

2005-02-28 Thread via RT
is that the string encoding is not properly written to the dumped PBC file: [EMAIL PROTECTED]:~/devel/Parrot/cvs/parrot cat t/op/string_cs_2.pasm set S0, ascii:ok 1\n charset I0, S0 charsetname S1, I0 print S1 print \n end [EMAIL PROTECTED]:~/devel/Parrot/cvs/parrot ./parrot t/op

Re: [PATCH] string encoding for ISO 8859-1

2001-10-17 Thread Simon Cozens
On Wed, Oct 17, 2001 at 08:21:51AM -0400, Gregor N. Purdy wrote: I may be misunderstanding, but I think 'strnative' needs to go away and we need to determine the precise native encoding, Please don't apply this; I think you are misunderstanding. strnative is equivalent to LANG=C. It *is* the

Re: [PATCH] string encoding for ISO 8859-1

2001-10-17 Thread James Mastros
On Wed, 17 Oct 2001, Gregor N. Purdy wrote: 2. The encoding for the chunk-o-memory the interpreter is about to turn into a STRING, having found said chunk in the packfile's const_table. Perhaps we should always output string constants as UTF8? This would avoid the problems with

Re: [PATCH] string encoding for ISO 8859-1

2001-10-17 Thread Dan Sugalski
At 12:37 PM 10/17/2001 -0400, James Mastros wrote: On Wed, 17 Oct 2001, Gregor N. Purdy wrote: 2. The encoding for the chunk-o-memory the interpreter is about to turn into a STRING, having found said chunk in the packfile's const_table. Perhaps we should always output string

Re: [PATCH] string encoding for ISO 8859-1

2001-10-17 Thread Dan Sugalski
At 01:49 PM 10/17/2001 +0100, Simon Cozens wrote: On Wed, Oct 17, 2001 at 08:21:51AM -0400, Gregor N. Purdy wrote: I may be misunderstanding, but I think 'strnative' needs to go away and we need to determine the precise native encoding, Please don't apply this; I think you are

Re: [PATCH] string encoding for ISO 8859-1

2001-10-17 Thread James Mastros
On Wed, 17 Oct 2001, Gregor N. Purdy wrote: Its still likely that I'm misunderstanding the intent, but I think that a .pbc file created by me with LANG=C is not necessarily going to generate string constants that have the same meaning when you go to run it on your platform of choice, which

Re: string encoding

2001-03-24 Thread nick
Dan Sugalski [EMAIL PROTECTED] writes: substr($foo, 233253, 14) is going to cost significantly more with variable sized characters than fixed sized ones. I don't believe so. Then you would be incorrect. To find the character at position 233253 in a variable-length encoding requires

Re: string encoding

2001-02-18 Thread Tom Lord
On the subject of Unicode string processing... I'm not a perl internals hacker and more of a passive reader of these lists than an active contributor. With that caveat, may I humbly point out a design document for what I think is a clean C library supporting the use of mixed encoding forms.

Re: string encoding

2001-02-16 Thread Simon Cozens
On Thu, Feb 15, 2001 at 05:09:45PM -0800, Hong Zhang wrote: People in Japan/China/Korea have been using multi-byte encoding for long time. I personally have used it for more 10 years. And now you have a chance to not do so. Isn't that *nice*? -- Term, holidays, term, holidays, till we leave

Re: string encoding

2001-02-16 Thread Simon Cozens
On Thu, Feb 15, 2001 at 04:55:00PM -0800, Hong Zhang wrote: On Thu, Feb 15, 2001 at 03:59:54PM -0800, Hong Zhang wrote: The concept of characters have nothing to do with codepoints. Many characters are composed by more than one codepoints. This isn't true. What do you mean? Have

Re: string encoding

2001-02-16 Thread Branden
Simon Cozens wrote: On Thu, Feb 15, 2001 at 03:59:54PM -0800, Hong Zhang wrote: The concept of characters have nothing to do with codepoints. Many characters are composed by more than one codepoints. This isn't true. Yes, for UTF-16 it is. For UTF-32 it isn't, but unless you want to read

Re: string encoding

2001-02-16 Thread Simon Cozens
On Fri, Feb 16, 2001 at 12:26:43PM +, Simon Cozens wrote: On Fri, Feb 16, 2001 at 10:24:51AM -0300, Branden wrote: Yes, for UTF-16 it is. For UTF-32 it isn't Yes, it damned well is. I mean, no, it damned well isn't. But you probably guessed that. You're confusing "codepoint" with

Re: string encoding

2001-02-16 Thread Simon Cozens
On Fri, Feb 16, 2001 at 10:24:51AM -0300, Branden wrote: Yes, for UTF-16 it is. For UTF-32 it isn't Yes, it damned well is. You're confusing "codepoint" with "number of bytes in representation". -- I would imagine most of the readers of this group would support abortion as long as fifty or

Re: string encoding

2001-02-16 Thread Branden
Dan Sugalski wrote: At 05:09 PM 2/15/2001 -0800, Hong Zhang wrote: People in Japan/China/Korea have been using multi-byte encoding for long time. I personally have used it for more 10 years. I never feel much of the "pain". Do you think I are using my computer with O(n) while you are using

Re: string encoding

2001-02-16 Thread Hong Zhang
People in Japan/China/Korea have been using multi-byte encoding for long time. I personally have used it for more 10 years. I never feel much of the "pain". Do you think I are using my computer with O(n) while you are using it with O(1)? There are 100 million people using variable-length

Re: string encoding

2001-02-16 Thread Hong Zhang
What do you mean? Have you seen people using multi-byte encoding in Japan/China/Korea? You're talking to the wrong person. Japanese data handling is my graduate dissertation. :) The Unified Hangul/Kanji/Ha'nzi' Characters in Unicode (so-called "Unihan") occupy one and only one codepoint

Re: string encoding

2001-02-16 Thread Simon Cozens
On Fri, Feb 16, 2001 at 12:32:10PM -0800, Hong Zhang wrote: Did it buy you much? I don't believe so. Can you give some examples why random character access is so important? substr's already been mentioned. Regular expressions. Perl does rather a lot of them. We've already found from Perl 5

Re: string encoding

2001-02-16 Thread Bryan C . Warnock
On Friday 16 February 2001 15:35, Simon Cozens wrote: On Fri, Feb 16, 2001 at 12:32:10PM -0800, Hong Zhang wrote: Did it buy you much? I don't believe so. Can you give some examples why random character access is so important? substr's already been mentioned. Regular expressions. Perl

Re: string encoding

2001-02-16 Thread Dan Sugalski
At 12:32 PM 2/16/2001 -0800, Hong Zhang wrote: What do you mean? Have you seen people using multi-byte encoding in Japan/China/Korea? You're talking to the wrong person. Japanese data handling is my graduate dissertation. :) The Unified Hangul/Kanji/Ha'nzi' Characters in Unicode

Re: string encoding

2001-02-16 Thread Hong Zhang
And address arithmetic and mem(cmp|cpy) is faster than array iteration. Ha Ha Ha. You must be kidding. The mem(cmp|cpy) work just fine on UTF-8 string comparison and copy. But the memcmp() can not be used for UTF-32 string comparison, because of endian issue. Hong

Re: string encoding

2001-02-16 Thread Hong Zhang
Did it buy you much? I don't believe so. Can you give some examples why random character access is so important? Most people are processing text linearly. Most, but not all. And as this is the internals list, we have to deal with all. We can't choose a convenient subset and ignore the rest.

Re: string encoding

2001-02-16 Thread Dan Sugalski
At 06:47 PM 2/16/2001 -0800, Hong Zhang wrote: I like to wrap up my argument. I recommend to use UTF-8 as the sole string encoding. If we end up with multiple encodings, there is absolutely no point for this argument. Um, I hate to point this out, but perl isn't going to have a single string

Re: string encoding

2001-02-16 Thread Hong Zhang
I like to wrap up my argument. I recommend to use UTF-8 as the sole string encoding. If we end up with multiple encodings, there is absolutely no point for this argument. Benefits of UTF-8 is more compact, less encoding conversion, more friendly to C API. UTF-16 is variable length encoding too

string encoding

2001-02-15 Thread Hong Zhang
Hi, All, I want to give some of my thougts about string encoding. Personally I like the UTF-8 encoding. The solution to the variable length can be handled by a special (virtual) function like class String { virtual UV iterate(/*inout*/ int* index); }; So in typical string iteration

Re: string encoding

2001-02-15 Thread Simon Cozens
On Thu, Feb 15, 2001 at 02:31:03PM -0800, Hong Zhang wrote: Personally I like the UTF-8 encoding. The solution to the variable length can be handled by a special (virtual) function like I'm expecting that the virtual, internal representation will not be in a UTF but will simply be an array of

Re: string encoding

2001-02-15 Thread Hong Zhang
On Thu, Feb 15, 2001 at 02:31:03PM -0800, Hong Zhang wrote: Personally I like the UTF-8 encoding. The solution to the variable length can be handled by a special (virtual) function like I'm expecting that the virtual, internal representation will not be in a UTF but will simply be an

Re: string encoding

2001-02-15 Thread Jarkko Hietaniemi
On Thu, Feb 15, 2001 at 11:16:29PM +, Simon Cozens wrote: On Thu, Feb 15, 2001 at 02:31:03PM -0800, Hong Zhang wrote: Personally I like the UTF-8 encoding. The solution to the variable length can be handled by a special (virtual) function like I'm expecting that the virtual,

Re: string encoding

2001-02-15 Thread Simon Cozens
On Thu, Feb 15, 2001 at 03:59:54PM -0800, Hong Zhang wrote: The concept of characters have nothing to do with codepoints. Many characters are composed by more than one codepoints. This isn't true. -- * DrForr digs around for a fresh IV drip bag and proceeds to hook up. dngor Coffee port.

Re: string encoding

2001-02-15 Thread Hong Zhang
On Thu, Feb 15, 2001 at 03:59:54PM -0800, Hong Zhang wrote: The concept of characters have nothing to do with codepoints. Many characters are composed by more than one codepoints. This isn't true. What do you mean? Have you seen people using multi-byte encoding in Japan/China/Korea?

Re: string encoding

2001-02-15 Thread Hong Zhang
...and because of this you can't randomly access the string, you are reduced to sequential access (*). And here I thought we could have left tape drives to the last millennium. (*) Yes, of course you could cache your sequential access so you only need to do it once, and build balanced

Re: string encoding

2001-02-15 Thread Dan Sugalski
At 05:09 PM 2/15/2001 -0800, Hong Zhang wrote: ...and because of this you can't randomly access the string, you are reduced to sequential access (*). And here I thought we could have left tape drives to the last millennium. (*) Yes, of course you could cache your sequential access so