On Thursday 14 June 2001 12:01 pm, Dan Sugalski wrote:
Fancy character classes are probably enough to handle the various casing
issues and their analogs. They're probably not enough to handle things
like the arabic tatwheel, or proper word breaks in most asian languages.
Heck, unless I'm
On Fri, 15 Jun 2001 06:52:32 -0400, Bryan C. Warnock wrote:
On a side note (and this *will* sound stupid, but there is a reason I'm
asking). Why is there no logical opposite to '.'; that is, a character
which never matches another character? (Besides, of course, that it's
utterly useless
On Fri, Jun 15, 2001 at 11:50:49AM -0400, Dan Sugalski wrote:
Unless I'm missing something (Simon? Hong?) Japanese (and potentially all
the languages that use the Han characters) can interpret a particular
character as either a number or not a number, depending on context.
Uh, don't think
On Friday 15 June 2001 06:58 pm, Dan Sugalski wrote:
module Locale::Hawaiian;
use re 'class (\w = [aeiouâêîôûhklmnpw`])';
...
Sure. I expect Damian will write us something that lets you specify
them upside-down in Klingon or something by the time this is done. :)
This is
At 11:28 PM 6/15/2001 +0100, Simon Cozens wrote:
On Fri, Jun 15, 2001 at 11:50:49AM -0400, Dan Sugalski wrote:
Unless I'm missing something (Simon? Hong?) Japanese (and potentially all
the languages that use the Han characters) can interpret a particular
character as either a number or not
At 12:29 AM 6/16/2001 +0100, Simon Cozens wrote:
On Fri, Jun 15, 2001 at 07:12:45PM -0400, Dan Sugalski wrote:
The question, then, is should ya be considered a literal number in either
of those contexts?
The phrase in those contexts suggests that it should in some and shouldn't
in others.
On Fri, Jun 15, 2001 at 07:12:45PM -0400, Dan Sugalski wrote:
The question, then, is should ya be considered a literal number in either
of those contexts?
The phrase in those contexts suggests that it should in some and shouldn't
in others. This means that the regexp engine would need to
On Wed, 13 Jun 2001 13:39:16 -0400, Dan Sugalski wrote:
Something that should be part of the core? I'll leave
that for you to decide.
Most definitely NOT.
Most definitely sort of.
There is no reason to put fucntionality for free matching of Japanese
characters into the basic perl
At 01:10 PM 6/14/2001 +0200, Bart Lateur wrote:
On Wed, 13 Jun 2001 13:39:16 -0400, Dan Sugalski wrote:
Something that should be part of the core? I'll leave
that for you to decide.
Most definitely NOT.
Most definitely sort of.
There is no reason to put fucntionality for free
On Wed, 13 Jun 2001 01:22:32 +0100, Simon Cozens wrote:
Something that should be part of the core? I'll leave
that for you to decide.
Most definitely NOT.
There is no reason to put fucntionality for free matching of Japanese
characters into the basic perl executable. There were already voices
At 05:15 PM 6/13/2001 +0200, Bart Lateur wrote:
On Wed, 13 Jun 2001 01:22:32 +0100, Simon Cozens wrote:
Something that should be part of the core? I'll leave
that for you to decide.
Most definitely NOT.
Most definitely sort of.
There is no reason to put fucntionality for free matching of
On Tue, Jun 12, 2001 at 06:44:02PM -0400, Dan Sugalski wrote:
While that's true, KATAKANA LETTER A and HIRAGANA LETTER A are also
referring to distinct things. (Though arguably not as distinct as either
with LATIN CAPITAL A) If we do one, why not the other? I'm perfectly happy
with an
On Tue, Jun 12, 2001 at 05:03:17PM -0700, Damien Neil wrote:
I can say that I feel that providing a mechanism for Hiragana
characters to match Katakana and vice-versa is about as useful for a
person doing Japanese text processing as case-insensitive matching is
for a person working with
We should let external collator to handle all these fancy features.
People can always normalize/canonicalize/do-whatever-you-want
and send the result text/binary to regex. All the features we
argue about here can be easily done by a customized collator.
Do NOT expect the Perl regex be a
On Wed, Jun 13, 2001 at 01:22:32AM +0100, Simon Cozens wrote:
I'd say it was about as useful as providing a regexp option to translate
the search term into French and try that instead.[1] Handy, possibly.
Essential? No. Something that should be part of the core? I'll leave
that for you to
On Tue, Jun 12, 2001 at 05:41:40PM -0700, Hong Zhang wrote:
We should let external collator to handle all these fancy features.
People can always normalize/canonicalize/do-whatever-you-want
and send the result text/binary to regex. All the features we
argue about here can be easily done by
On Tue, Jun 12, 2001 at 05:40:32PM -0700, Damien Neil wrote:
The ability to match Hiragana as Katakana and vice-versa is almost
identical conceptually to the ability to perform case insensitive
matches on English text.
I am going to choose not to disagree with you on this, but...
What
On Tue, Jun 12, 2001 at 05:41:40PM -0700, Hong Zhang wrote:
We should let external collator to handle all these fancy features.
Phew, I've been saying this all along. :)
Please note regex is O(n) at best, adding an external collator
will make is O(2n).
While this is very true, I think
On Tue, Jun 12, 2001 at 06:44:02PM -0400, Dan Sugalski wrote:
We probably also ought to answer the question How accommodating to
non-latin writing systems are we going to be?
What if Perl 6 simply reserved tags for extensions? This could assume
processing similar to Perl 5 for compatibility,
On Wed, Jun 13, 2001 at 02:15:16AM +0100, Simon Cozens wrote:
Or we could keep it out of core. It's up to you, really.
No, it isn't. It's up to Larry, or to whoever gets the regex
pumpkin.
I'm withdrawing from this discussion: My intent was to clarify
exactly why someone might want to treat
On Tue, Jun 12, 2001 at 06:45:31PM -0700, Damien Neil wrote:
Hrm, no, not usually; furigana are almost always hiragana, and
learner's textbooks - bah, they're not real Japanese. :)
I believe you are confused;
*cough*. I believe I am not. But who am I? Let's ask Kenkyusha -
admittedly not
We've pretty much run this subthread out of Perl content by now, so it
ought to stop here, and I should start exercising some of that
restraint thing. (Does it grow if you exercise it?)
So Damien, we can take it to private mail or to sci.lang.japan or something,
but if you promise to stop
Dan Sugalski [EMAIL PROTECTED] writes:
We probably also ought to answer the question How accommodating to
non-latin writing systems are we going to be? It's an uncomfortable
question, but one that needs asking. Answering by Larry, probably, but
definitely asking. Perl's not really
On Tuesday 12 June 2001 09:16 pm, Simon Cozens wrote:
On Tue, Jun 12, 2001 at 05:41:40PM -0700, Hong Zhang wrote:
We should let external collator to handle all these fancy features.
Phew, I've been saying this all along. :)
I think we've *all* been saying that. We just need to determine
Perl came from ASCII-centric roots, so it's likely that most of our
biases are ASCII-centric. And for a couple of reasons, it's going to
be hard to deal with that:
1. Backwards compatability with existing Perl practice,
and
2. To do language-neutral right is -really- hard; look at
On Tuesday 12 June 2001 11:06 pm, Jarkko Hietaniemi wrote:
I. Make ranges work on Unicode code-points (if they don't already).
U, yes, they do, if you by code-point ranges mean \x{...}-\x{...}
but in general I would like to discourage the use of ranges. What do
you think [a-\N{KATAKANA
For reference, here's how Perl 5.8 will define \p{IsFoo} character
classes:
# 005F: SPACING UNDERSCROE
['IsWord', '$cat =~ /^[LMN]/ or $code eq 005F', ''],
['IsAlnum', '$cat =~ /^[LMN]/',''],
['IsAlpha', '$cat =~ /^[LM]/', ''],
# 0009: HORIZONTAL TABULATION
#
Dan Sugalski [EMAIL PROTECTED] writes:
Should perl's regexes and other character comparison bits have an option
to consider different characters for the same thing as identical beasts?
I'm thinking in particular of the Katakana/Hiragana bits of japanese,
but other languages may have the
At 01:05 PM 6/11/2001 -0700, Russ Allbery wrote:
Dan Sugalski [EMAIL PROTECTED] writes:
Should perl's regexes and other character comparison bits have an option
to consider different characters for the same thing as identical beasts?
I'm thinking in particular of the Katakana/Hiragana bits
Dan Sugalski [EMAIL PROTECTED] writes:
At 01:05 PM 6/11/2001 -0700, Russ Allbery wrote:
Dan Sugalski [EMAIL PROTECTED] writes:
Should perl's regexes and other character comparison bits have an
option to consider different characters for the same thing as
identical beasts? I'm thinking in
At 01:14 PM 06-11-2001 -0700, Russ Allbery wrote:
Dan Sugalski [EMAIL PROTECTED] writes:
At 01:05 PM 6/11/2001 -0700, Russ Allbery wrote:
Dan Sugalski [EMAIL PROTECTED] writes:
Should perl's regexes and other character comparison bits have an
option to consider different characters for
On Mon, Jun 11, 2001 at 01:05:43PM -0700, Russ Allbery wrote:
Dan Sugalski [EMAIL PROTECTED] writes:
Should perl's regexes and other character comparison bits have an option
to consider different characters for the same thing as identical beasts?
I'm thinking in particular of the
At 01:52 PM 6/11/2001 -0700, Damien Neil wrote:
In Japanese, ka and KA are two ways of writing the same syllable, in
much the same way that LATIN CAPITAL LETTER A and LATIN SMALL LETTER A
are. (Perhaps this is an argument for the /i modifier to apply to
more than just case?)
I don't think just
On Monday 11 June 2001 04:54 pm, Dan Sugalski wrote:
Would it, or should it, be possible to tell m// to treat Katakana
characters as the same as hiragana characters, in much the same way as
m//i treats UPPERCASE the same as lowercase? Canonicalization won't get
you that.
Yup, that's pretty
34 matches
Mail list logo