Re: snapshot perl@20845

2003-08-23 Thread Philip Newton
"for my $j (0,0x10)" should maybe "for my $j (0..0x10)" and "$utf8 .= ord($j+$i)" should almost certainly be something with chr() in it rather than ord(), and may also need $j*0x1 rather than plain $j. Copying perl-unicode for this reason.) Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: Unicode Perl Dependencies

2002-11-11 Thread Philip Newton
On Mon, 11 Nov 2002 14:10:43 -0500 (EST), [EMAIL PROTECTED] (Karl Matthias) wrote: > I'm trying to figure out which files we actually need from > that tree. I'd suggest you also post your message to [EMAIL PROTECTED] (they think about distributions, and the list was set up in response to someone

Re: perl pack function

2002-06-25 Thread Philip Newton
On Tue, 25 Jun 2002 16:19:05 +, [EMAIL PROTECTED] (Imran Khan) wrote: > Q: Does pack have to take a deminal integer - or can i somehow pass a hex > value to it? > ie something like: my $tmp_char= pack("U", 263A); pack('U') takes an integer. You can specify that integer in several ways, just

Re: Change 16302: Provide the \N{U+HHHH} syntax before we forget.

2002-05-02 Thread Philip Newton
On Fri, 03 May 2002 11:36:46 +0900, [EMAIL PROTECTED] (Sadahiro Tomoyuki) wrote: > But Unicode 3.1 extends U+ notation beyond 0x. Ah! Thanks for the reference. So maybe that is no longer necessary... by the time 5.8.0 is out, Unicode 3.2 will have been current for a while. Or should we

Re: Change 16308: Encode tweak from Dan Kogai.

2002-05-01 Thread Philip Newton
On Wed, 1 May 2002 09:45:05 -0700, [EMAIL PROTECTED] (Jarkko Hietaniemi) wrote: > if (check & ENCODE_DIE_ON_ERR) { > Perl_croak( > - aTHX_ "\"\\N{U+%" UVxf "}\" does not map to %s", > + aTHX_ "\"\\x{%04" UVxf "}\" does not

Re: Change 16302: Provide the \N{U+HHHH} syntax before we forget.

2002-05-01 Thread Philip Newton
On Wed, 1 May 2002 07:00:05 -0700, [EMAIL PROTECTED] (Jarkko Hietaniemi) wrote: > Change 16302 by jhi@alpha on 2002/05/01 12:54:24 > > Provide the \N{U+} syntax before we forget. Do we also want to support U-HH? I seem to recall from somewhere that U+ went to U+ and that c

Re: [BIG PATCH] Encode docs

2002-04-21 Thread Philip Newton
On Mon, 22 Apr 2002 01:13:01 +0300, [EMAIL PROTECTED] (Jarkko Hietaniemi) wrote: > While browsing through the patch I noticed one funny nit: > > > -=item various UP-UX encodings > > +=item Various UP-UX encodings > > Unless it's Uewlett-Packard I think a slight tweak might be in order :-) Um,

Re: Text::Unidecode

2002-04-10 Thread Philip Newton
On Wed, 10 Apr 2002 02:01:34 -0600, [EMAIL PROTECTED] (Sean M. Burke) wrote: > Random question: Has anyone besides me had occasion to use Text::Unidecode? I've played around with it once or twice (and sent you the occasional patch, IIRC), but never used it in anger. Cheers, Philip

Re: [Encode] 1.32 released

2002-04-10 Thread Philip Newton
On Wed, 10 Apr 2002 05:30:29 +0900, [EMAIL PROTECTED] (Dan Kogai) wrote: > ! lib/Encode/Supported.pod > ! lib/Encode/Unicode.pm >POD revise by Philip Newton. This adds Philip to AUTHORS list. >Thank you for the exact quote of Douglas Adams :) >Message-Id: &l

Re: [PATCH]s and questions [Encode] 1.30

2002-04-08 Thread Philip Newton
On Mon, 8 Apr 2002 15:24:57 +0400, [EMAIL PROTECTED] (Anton Tagunov) wrote: > 2) [PATCH], thanks to Philip Newton > > --- E:\anth\tmp\perl\b2\ext\Encode-1.30\lib\Encode\Supported.pod.orig Mon Apr 8 >14:06:12 2002 > +++ E:\anth\tmp\perl\b2\ext\Encode-1.30\lib\Encode\Supported

Re: A modest patch [Encode] 1.26

2002-04-07 Thread Philip Newton
On Mon, 8 Apr 2002 03:33:00 +0400, [EMAIL PROTECTED] (Anton Tagunov) wrote: > --- E:\anth\tmp\perl\b2\ext\Encode\lib\Encode\Supported.pod.origSun Apr 7 >20:39:07 2002 > +++ E:\anth\tmp\perl\b2\ext\Encode\lib\Encode\Supported.pod Mon Apr 8 03:22:03 >2002 > @@ -583,14 +583,15 @@ >

Re: Change 15689: What started as a small nit (the charnames test, nit found

2002-04-05 Thread Philip Newton
; > yy and yyFFFE.. > > I think this is the best choice: it makes no sense to control xFFFE and > x separately. That's what I would have thought, too. Thanks. Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: Change 15689: What started as a small nit (the charnames test, nit found

2002-04-05 Thread Philip Newton
On Tue, 2 Apr 2002 13:45:06 -0800, [EMAIL PROTECTED] (Jarkko Hietaniemi) wrote: > Change 15689 by jhi@alpha on 2002/04/02 20:35:13 > > What started as a small nit (the charnames test, nit found > be Hugo), ballooned a bit... the goal is Larry's wish that > illegal Unicode (such

Re: Encode::CJKguide

2002-03-27 Thread Philip Newton
On Wed, 27 Mar 2002 00:09:11 +, [EMAIL PROTECTED] (Markus Kuhn) wrote: > Dan Kogai wrote on 2002-03-26 22:35 UTC: > > And not all > > scripts are accepted or approved by Unicode Consotium. If you want to > > spell in Klingon, you have to find your own encoding. > > Klingon is a very bad exa

Re: So many Dans!

2002-03-27 Thread Philip Newton
On Mon, 25 Mar 2002 22:36:00 +0200, [EMAIL PROTECTED] (Jarkko Hietaniemi) wrote: > I simply have to retaliate with a piece of Finnish discussion: > > - Kokoo kokoon koko kokko. > - Koko kokkoko? > - Koko kokko. What is the first 'Kokoo'? http://websmart.kielikone.fi/ did not r

Re: So many Dans!

2002-03-27 Thread Philip Newton
On Tue, 26 Mar 2002 11:31:59 +0800, [EMAIL PROTECTED] (Autrijus Tang) wrote: > But the legendary Prof. Zhao Yuan-Ren (the first modern Chinese > linguist, translator extraordinarie) composed "The Tale of Mr. > Shi's Lion-eating endeavor" to demonstrate the impossibility of > reducing Chinese to p

Re: So many Dans!

2002-03-27 Thread Philip Newton
On Tue, 26 Mar 2002 05:27:08 +0900, [EMAIL PROTECTED] (Dan Kogai) wrote: >There is a famous sentence that goes like; > >KiSha no KiSha ga KiSha de KiSha shita. > >all KiSha are in two Kanjis but they are all spelled differently. "Your company's reporter returned to his company by t

Re: Reference Unicode Fonts

2002-02-16 Thread Philip Newton
On Fri, 15 Feb 2002 11:52:41 +0900, [EMAIL PROTECTED] (Dan Kogai) wrote: > * Reference Fixed Width (Misc TT? Courier Unicode? Monaco Unicode?) There was Everson Mono but that project appears to have stalled as well. Cheers, Philip

Re: FYI: yudit to handle Unicode text

2002-02-16 Thread Philip Newton
On Thu, 14 Feb 2002 23:16:10 +0200, [EMAIL PROTECTED] (Jarkko Hietaniemi) wrote: > Unifont stalled a bit since Roman Czyborra disappeared from the online > world, but someone picked up (forked) the project: > > http://dvdeug.dhis.org/unifont.html There's also a Yahoo! Group (mailing list) calle

Re: Unicode / Japanese and Transliteration problem

2002-02-02 Thread Philip Newton
On Thu, 31 Jan 2002 12:31:58 +, [EMAIL PROTECTED] (Jean-Michel Hiver) wrote: > Any ideas? I'm quite worried about the fact that I have a webapp that > works perfectly for Punjabi but that kind of screws Japanese up when > creating new documents and performing searches :-( Does it work for T

Re: Latin-1 to closest ascii equivalent characters

2001-12-22 Thread Philip Newton
On Thu, 20 Dec 2001 02:01:14 -0600, [EMAIL PROTECTED] (Michael A. Grady) wrote: > Does anyone already have Perl code to translate any Latin-1 accented > characters to the closet ascii equivalent character(s)? You may be able to press Sean M. Burke's Text::Unidecode into service. Cheers, Philip

Re: Starnge characters when displaying html files saved in UTF-8 format

2001-12-16 Thread Philip Newton
On Tue, 11 Dec 2001 13:24:46 -0800, [EMAIL PROTECTED] (Brian Stell) wrote: > The BOM is valid as the *first* character. I'm not sure what the > spec says about subsequent chars. As I understand it, 0xFEFF leads a double life: it's either "zero width no-break space" or "byte order mark". If it's

Re: Starnge characters when displaying html files saved in UTF-8 format

2001-12-16 Thread Philip Newton
On Tue, 11 Dec 2001 21:40:36 +, [EMAIL PROTECTED] (Jalal Kakavand) wrote: > my $mydoc = shift ; > # check BOM > my $top1 = unpack("C", substr($mydoc, 0, 1)); > my $top2 = unpack("C", substr($mydoc, 1, 1)); > my $top3 = unpack("C", substr($mydoc, 2, 1)); > > # UT

Re: Unicode / Transliteration

2001-12-13 Thread Philip Newton
On Mon, 10 Dec 2001 16:49:18 +, [EMAIL PROTECTED] (Jean-Michel Hiver) wrote: > The way I got around this was to build a lossy table mapping > ISO-8859-15 to US ASCII, and then applying a few simple regexes so > that a sentence like "Le rêve du café" gets turned into > "le-reve-du-cafe

Re: perlunitut - feedback appreciated

2001-11-22 Thread Philip Newton
On Thu, 22 Nov 2001 22:45:54 -0500, in perl.unicode you wrote: > I've just tried using this in a form like: > > my $i = "263a" > my $smiley = "\x{$i}"; > > and was disappointed that it didn't work. No -- you need a literal. Just like reading in the string '050' from a file and treating it as a

Re: UTF-16 -> UTF-8

2001-11-22 Thread Philip Newton
On Wed, 21 Nov 2001 22:04:52 -, in perl.unicode you wrote: > When adding the unicode value to the Sql string in > $sql="INSERT INTO Tipo_Referencia ( Descricao ) > VALUES ('$palavra_utf16');"; > there is an implicit conversion from the Unicode::String object > to a common Pe

Re: UTF-16 -> UTF-8

2001-11-21 Thread Philip Newton
On Wed, 21 Nov 2001 16:41:46 + (GMT), in perl.unicode you wrote: > Thanks - MS Mincho looks interesting. [...] > Also - the glyphs looked slightly different : do you know if it's a big- > or little-endian UTF-16 font or a UTF-8 font ? > Ideally I'd like to use a UTF-8 font. That doesn't make

Re: UTF-16 -> UTF-8

2001-11-21 Thread Philip Newton
On Wed, 21 Nov 2001 16:05:06 -, in perl.unicode you wrote: > now I can write to the DB, but the values are not properly recognized. If > you try to open the file I attached to my prior mail in Word, you'll > see exactly what I see in the DB record. In Word, I see ĨĩŨũ, but when I open it in

Re: UTF-16 -> UTF-8

2001-11-21 Thread Philip Newton
On Wed, 21 Nov 2001 16:34:48 -, in perl.unicode you wrote: > Don't lose more time over this. It seems there is some kind of problem with > the recognition of the encoding from other Office apps. > Its rather surprising that Notepad regosnizes the characters properly and > Word and Access don'

Re: UTF-16 -> UTF-8

2001-11-21 Thread Philip Newton
On Wed, 21 Nov 2001 15:14:38 -, in perl.unicode you wrote: > Still can't write to the BD though. The append SQL instruction has no effect. It looks wrong to me, too. > use Unicode::String qw(utf8 latin1); You don't need to import 'latin1' if you're not going to use it. (It's not going to h

Re: UTF-16 -> UTF-8

2001-11-20 Thread Philip Newton
On Wed, 21 Nov 2001 00:22:04 -, in perl.unicode you wrote: > Thank you for your help. Hope it was of some help :) > > But you said you wanted to convert from UTF-8 to UTF-16. So you probably > > want something like > > > > $palavra_objeito = utf8($_); > > $palavra_em_utf16 = $palavr

Re: UTF-16 -> UTF-8

2001-11-20 Thread Philip Newton
On Tue, 20 Nov 2001 16:49:38 + (GMT), in perl.unicode you wrote: > binmode STDIN; > while(<>) > { > $u = utf16($_); > $u->byteswap2 if defined $swap; # $swap defined based on command line options This looks strange. The way I read the manpage, byteswap2 is meant to be called as a functio

Re: UTF-16 -> UTF-8

2001-11-20 Thread Philip Newton
On Tue, 20 Nov 2001 16:35:25 -, in perl.unicode you wrote: > open(FICH1,"fich1.txt")||die"Nao foi possivel abrir o ficheiro fich1.txt"; > open(FICH3,">fich3.txt")||die"Nao foi possivel abrir o ficheiro fich3.txt"; Good that you check for success, but you should also include the reason -- it'

Re: UTF-16 -> UTF-8

2001-11-20 Thread Philip Newton
On Tue, 20 Nov 2001 15:02:53 -, in perl.unicode you wrote: > I saw your reference to the use of Unicode::String. Excuse me for > asking, but does it work? I believe so. It's been a while since I used it, but I think it did what I wanted it to back then. > We've tried to several funtions fro

Re: UTF-16 -> UTF-8

2001-11-20 Thread Philip Newton
On Tue, 20 Nov 2001 15:59:07 + (GMT), in perl.unicode you wrote: > b. One file worked fine, but for another it converted the Chinese > data to different Chinese data. Did you see any correlation between the code points? Like, say, turning 4567 into 6745? Can you give an example of "before"

Re: UTF-16 -> UTF-8

2001-11-16 Thread Philip Newton
On Fri, 16 Nov 2001 17:41:52 + (GMT), in perl.unicode you wrote: > I'm wanting to convert a file from UTF16 into UTF8. I believe I've > identified the tools to do it and all but installed them, apart > from Unicode::Map8 (v0.10). > > Can anyone help me with the build errors (below) or advise

Re: perlunitut - feedback appreciated

2001-11-11 Thread Philip Newton
On Sun, 11 Nov 2001 12:57:27 -0800, in perl.unicode you wrote: > ISO Latin-1 characters encoded as 10-FF in single bytes are not Unicode. Hm? ISO Latin-1 characters from 00 to 7F encoded in single bytes represent the same Unicode characters as those bytes interpreted as UTF-8, simply because ASC

Re: UTF-8 support for the ancient shell toys

2001-11-04 Thread Philip Newton
On Sat, 3 Nov 2001 18:16:32 + (GMT), in perl.unicode you wrote: > In practice, Perl has long ago replaced grep, sort, tr, awk, for all but > sentimental reasons. I'd like to disagree with 'sort'. In some cases, at least, the system-supplied sort(1) can do disk-based (merge?) sorting, enablin

Re: \x (backslash x) weirdness in perl 5.6.0

2001-01-15 Thread Philip Newton
this case, for example, `perldoc perlop` is a good place to start, specifically the section on "Quote and Quote-like Operators" and later on on "Interpolation". Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: Source data for perl encodings

2001-01-10 Thread Philip Newton
me. > > Yup. /^(?:ISO\W?)?(?:8859|Latin)-?1$/i Or even /^(?:ISO[\W_]?)?(?:8859|Latin)[-_]?1$/i Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: Source data for perl encodings

2001-01-09 Thread Philip Newton
f Unicode compliance documented in the Unicode book. I don't have it handy here, but I believe you could have a compliance level that doesn't know about BIDI, or doesn't know about compatibility decompositions, or something like that. Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: Cyrillic and Unicode

2000-12-05 Thread Philip Newton
n covering all cyrillic minority languages in Unicode and is constantly > looking for reference material and documentation. Thanks. Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]> I appreciate copies of replies to my messages to Perl5 lists.

Re: perl@7979

2000-12-05 Thread Philip Newton
yrillic-using languages (say, Russian, Belorussian, Ukrainian, Macedonian, and Serbian), then it's not complete. It also needs to do Bashkir, Azerbaidjani, Khanti, &c. Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: .enc docs comments [was Re: Encode's .enc files and a question]

2000-10-27 Thread Philip Newton
t that's reserved here. So if I'm translating a string containing NULs, those characters will be treated as "not-a-character"? Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: Encode's .enc files and a question

2000-10-27 Thread Philip Newton
FF as a character as long as it maps that character to something other than 0x or 0xFFFE when converting to Unicode. Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: Encode's .enc files and a question

2000-10-26 Thread Philip Newton
d strings? For example, if I'm processing UTF-8 text in C, "foo" is equivalent to 0066 006F 006F . In which case, it's very much in use already. Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: Encode's .enc files and a question

2000-10-25 Thread Philip Newton
he letters in the transcoding nroff-to-pod, which is bad here. Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: Encode's .enc files and a question

2000-10-25 Thread Philip Newton
kely, since the last line should then start "0030003100320033" -- that is, F0 .. F9 should map to U+0030 .. U+0039, the digits. I don't remember the code points for letters, but I'm fairly sure the digits fall in the range F0 .. F9 in all flavours of EBDIC. You have U+0031 at position 90. Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: help: utf8 to multiple encodings

2000-10-18 Thread Philip Newton
uot;. For example, for Russian -- KOI8? Windows codepage? Mac? Unicode? ISO-8859-x? Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: [EXPERIMENTAL] 1st draft of Encode

2000-09-14 Thread Philip Newton
On 14 Sep 2000, at 12:35, Dominic Dunlop wrote: > At 18:00 +0200 2000-09-13, Philip Newton wrote: > >What's Perl's take on characters where ord($c) > 0x, anyway? > > It seems to Just Work, as this one-ish-liner shows: [snip] In that case, if we want to go switch

Re: UCS-2 and UTF-16 [was Re: Encode, take five]

2000-09-14 Thread Philip Newton
On 13 Sep 2000, at 11:57, Mark Leisher wrote: > True, UTF-16 is not known as UCS-2. However, UTF-16 still consists > of 2-byte chunks. It is essentially UCS-2 plus high and low > surrogates (see the Unicode Standard 3.0 page 19). Yes, but if you just have a high surrogate, you can't do much w

Re: Encode, take five

2000-09-13 Thread Philip Newton
ised. > are encoded in Unicode with code points above 65,535 the > distinction between UCS-2 and UTF-16 is mostly academic at this > point in time. At this point in time, yes. I suppose I just wanted to point out that this *may* change, at some unspecified (and maybe even distant) point in the future. Cheers, Philip -- Philip Newton <[EMAIL PROTECTED]>

Re: Encode, take five

2000-09-13 Thread Philip Newton
On 12 Sep 2000, at 18:42, Jarkko Hietaniemi wrote: > UTF-16 is also known as UCS-2, 16 bit or 2-byte chunks, As I understand it, that's not true -- UTF-16 is 2-byte *or* 4-byte chunks, since UTF-16 contains surrogates (high-surrogate + low- surrogate [or the other way around?] = 1 character, re

Re: [EXPERIMENTAL] 1st draft of Encode

2000-09-13 Thread Philip Newton
On 12 Sep 2000, at 11:57, Jarkko Hietaniemi wrote: > I would go for UCS-2 (UTF-16) as soon as possible as the preferred > internal encoding. You know, of course, that UCS-2 ne UTF-16 (specifically, surrogates). What's Perl's take on characters where ord($c) > 0x, anyway? (These two issues