hi
attached a patch for languages/PIR fixing:
* added optional utf8: encoding specifier (according to
docs/imcc/syntax.pod)
* fixed support for binary and hex. numbers
* added test for these changes.
regards,
klaas-jan
Index: languages/PIR/lib/pir.pg
is that the string
encoding is
not properly written to the dumped PBC file:
[EMAIL PROTECTED]:~/devel/Parrot/cvs/parrot cat t/op/string_cs_2.pasm
set S0, ascii:ok 1\n
charset I0, S0
charsetname S1, I0
print S1
print \n
end
[EMAIL PROTECTED]:~/devel/Parrot/cvs/parrot ./parrot t/op
On Wed, Oct 17, 2001 at 08:21:51AM -0400, Gregor N. Purdy wrote:
I may be misunderstanding, but I think 'strnative' needs to go away
and we need to determine the precise native encoding,
Please don't apply this; I think you are misunderstanding.
strnative is equivalent to LANG=C. It *is* the
On Wed, 17 Oct 2001, Gregor N. Purdy wrote:
2. The encoding for the chunk-o-memory the interpreter is about
to turn into a STRING, having found said chunk in the packfile's
const_table.
Perhaps we should always output string constants as UTF8? This would
avoid the problems with
At 12:37 PM 10/17/2001 -0400, James Mastros wrote:
On Wed, 17 Oct 2001, Gregor N. Purdy wrote:
2. The encoding for the chunk-o-memory the interpreter is about
to turn into a STRING, having found said chunk in the packfile's
const_table.
Perhaps we should always output string
At 01:49 PM 10/17/2001 +0100, Simon Cozens wrote:
On Wed, Oct 17, 2001 at 08:21:51AM -0400, Gregor N. Purdy wrote:
I may be misunderstanding, but I think 'strnative' needs to go away
and we need to determine the precise native encoding,
Please don't apply this; I think you are
On Wed, 17 Oct 2001, Gregor N. Purdy wrote:
Its still likely that I'm misunderstanding the intent, but I think
that a .pbc file created by me with LANG=C is not necessarily going
to generate string constants that have the same meaning when you go
to run it on your platform of choice, which
Dan Sugalski [EMAIL PROTECTED] writes:
substr($foo, 233253, 14)
is going to cost significantly more with variable sized characters than
fixed sized ones.
I don't believe so.
Then you would be incorrect. To find the character at position 233253 in a
variable-length encoding requires
On the subject of Unicode string processing...
I'm not a perl internals hacker and more of a passive reader of these
lists than an active contributor.
With that caveat, may I humbly point out a design document for
what I think is a clean C library supporting the use of mixed
encoding forms.
On Thu, Feb 15, 2001 at 05:09:45PM -0800, Hong Zhang wrote:
People in Japan/China/Korea have been using multi-byte encoding for
long time. I personally have used it for more 10 years.
And now you have a chance to not do so. Isn't that *nice*?
--
Term, holidays, term, holidays, till we leave
On Thu, Feb 15, 2001 at 04:55:00PM -0800, Hong Zhang wrote:
On Thu, Feb 15, 2001 at 03:59:54PM -0800, Hong Zhang wrote:
The concept of characters have nothing to do with codepoints.
Many characters are composed by more than one codepoints.
This isn't true.
What do you mean? Have
Simon Cozens wrote:
On Thu, Feb 15, 2001 at 03:59:54PM -0800, Hong Zhang wrote:
The concept of characters have nothing to do with codepoints.
Many characters are composed by more than one codepoints.
This isn't true.
Yes, for UTF-16 it is. For UTF-32 it isn't, but unless you want to read
On Fri, Feb 16, 2001 at 12:26:43PM +, Simon Cozens wrote:
On Fri, Feb 16, 2001 at 10:24:51AM -0300, Branden wrote:
Yes, for UTF-16 it is. For UTF-32 it isn't
Yes, it damned well is.
I mean, no, it damned well isn't. But you probably guessed that.
You're confusing "codepoint" with
On Fri, Feb 16, 2001 at 10:24:51AM -0300, Branden wrote:
Yes, for UTF-16 it is. For UTF-32 it isn't
Yes, it damned well is.
You're confusing "codepoint" with "number of bytes in representation".
--
I would imagine most of the readers of this group would support abortion
as long as fifty or
Dan Sugalski wrote:
At 05:09 PM 2/15/2001 -0800, Hong Zhang wrote:
People in Japan/China/Korea have been using multi-byte encoding for
long time. I personally have used it for more 10 years. I never feel
much of the "pain". Do you think I are using my computer with O(n)
while you are using
People in Japan/China/Korea have been using multi-byte encoding for
long time. I personally have used it for more 10 years. I never feel
much of the "pain". Do you think I are using my computer with O(n)
while you are using it with O(1)? There are 100 million people using
variable-length
What do you mean? Have you seen people using multi-byte encoding
in Japan/China/Korea?
You're talking to the wrong person. Japanese data handling is my graduate
dissertation. :)
The Unified Hangul/Kanji/Ha'nzi' Characters in Unicode (so-called
"Unihan")
occupy one and only one codepoint
On Fri, Feb 16, 2001 at 12:32:10PM -0800, Hong Zhang wrote:
Did it buy you much? I don't believe so. Can you give some examples why
random character access is so important?
substr's already been mentioned.
Regular expressions. Perl does rather a lot of them. We've already found from
Perl 5
On Friday 16 February 2001 15:35, Simon Cozens wrote:
On Fri, Feb 16, 2001 at 12:32:10PM -0800, Hong Zhang wrote:
Did it buy you much? I don't believe so. Can you give some examples why
random character access is so important?
substr's already been mentioned.
Regular expressions. Perl
At 12:32 PM 2/16/2001 -0800, Hong Zhang wrote:
What do you mean? Have you seen people using multi-byte encoding
in Japan/China/Korea?
You're talking to the wrong person. Japanese data handling is my graduate
dissertation. :)
The Unified Hangul/Kanji/Ha'nzi' Characters in Unicode
And address arithmetic and mem(cmp|cpy) is faster than array iteration.
Ha Ha Ha. You must be kidding.
The mem(cmp|cpy) work just fine on UTF-8 string comparison and copy.
But the memcmp() can not be used for UTF-32 string comparison, because
of endian issue.
Hong
Did it buy you much? I don't believe so. Can you give some examples why
random character access is so important? Most people are processing text
linearly.
Most, but not all. And as this is the internals list, we have to deal with
all. We can't choose a convenient subset and ignore the rest.
At 06:47 PM 2/16/2001 -0800, Hong Zhang wrote:
I like to wrap up my argument.
I recommend to use UTF-8 as the sole string encoding.
If we end up with multiple encodings, there is absolutely
no point for this argument.
Um, I hate to point this out, but perl isn't going to have a single string
I like to wrap up my argument.
I recommend to use UTF-8 as the sole string encoding.
If we end up with multiple encodings, there is absolutely
no point for this argument.
Benefits of UTF-8 is more compact, less encoding conversion,
more friendly to C API. UTF-16 is variable length encoding
too
Hi, All,
I want to give some of my thougts about string encoding.
Personally I like the UTF-8 encoding. The solution to the
variable length can be handled by a special (virtual)
function like
class String {
virtual UV iterate(/*inout*/ int* index);
};
So in typical string iteration
On Thu, Feb 15, 2001 at 02:31:03PM -0800, Hong Zhang wrote:
Personally I like the UTF-8 encoding. The solution to the
variable length can be handled by a special (virtual)
function like
I'm expecting that the virtual, internal representation will not
be in a UTF but will simply be an array of
On Thu, Feb 15, 2001 at 02:31:03PM -0800, Hong Zhang wrote:
Personally I like the UTF-8 encoding. The solution to the
variable length can be handled by a special (virtual)
function like
I'm expecting that the virtual, internal representation will not
be in a UTF but will simply be an
On Thu, Feb 15, 2001 at 11:16:29PM +, Simon Cozens wrote:
On Thu, Feb 15, 2001 at 02:31:03PM -0800, Hong Zhang wrote:
Personally I like the UTF-8 encoding. The solution to the
variable length can be handled by a special (virtual)
function like
I'm expecting that the virtual,
On Thu, Feb 15, 2001 at 03:59:54PM -0800, Hong Zhang wrote:
The concept of characters have nothing to do with codepoints.
Many characters are composed by more than one codepoints.
This isn't true.
--
* DrForr digs around for a fresh IV drip bag and proceeds to hook up.
dngor Coffee port.
On Thu, Feb 15, 2001 at 03:59:54PM -0800, Hong Zhang wrote:
The concept of characters have nothing to do with codepoints.
Many characters are composed by more than one codepoints.
This isn't true.
What do you mean? Have you seen people using multi-byte encoding
in Japan/China/Korea?
...and because of this you can't randomly access the string, you are
reduced to sequential access (*). And here I thought we could have
left tape drives to the last millennium.
(*) Yes, of course you could cache your sequential access so you only
need to do it once, and build balanced
At 05:09 PM 2/15/2001 -0800, Hong Zhang wrote:
...and because of this you can't randomly access the string, you are
reduced to sequential access (*). And here I thought we could have
left tape drives to the last millennium.
(*) Yes, of course you could cache your sequential access so
32 matches
Mail list logo