"for my $j (0,0x10)" should maybe
"for my $j (0..0x10)" and "$utf8 .= ord($j+$i)" should almost certainly
be something with chr() in it rather than ord(), and may also need
$j*0x1 rather than plain $j. Copying perl-unicode for this reason.)
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
On Mon, 11 Nov 2002 14:10:43 -0500 (EST), [EMAIL PROTECTED]
(Karl Matthias) wrote:
> I'm trying to figure out which files we actually need from
> that tree.
I'd suggest you also post your message to [EMAIL PROTECTED] (they think
about distributions, and the list was set up in response to someone
On Tue, 25 Jun 2002 16:19:05 +, [EMAIL PROTECTED] (Imran Khan)
wrote:
> Q: Does pack have to take a deminal integer - or can i somehow pass a hex
> value to it?
> ie something like: my $tmp_char= pack("U", 263A);
pack('U') takes an integer. You can specify that integer in several
ways, just
On Fri, 03 May 2002 11:36:46 +0900, [EMAIL PROTECTED] (Sadahiro
Tomoyuki) wrote:
> But Unicode 3.1 extends U+ notation beyond 0x.
Ah! Thanks for the reference.
So maybe that is no longer necessary... by the time 5.8.0 is out,
Unicode 3.2 will have been current for a while. Or should we
On Wed, 1 May 2002 09:45:05 -0700, [EMAIL PROTECTED] (Jarkko Hietaniemi) wrote:
> if (check & ENCODE_DIE_ON_ERR) {
> Perl_croak(
> - aTHX_ "\"\\N{U+%" UVxf "}\" does not map to %s",
> + aTHX_ "\"\\x{%04" UVxf "}\" does not
On Wed, 1 May 2002 07:00:05 -0700, [EMAIL PROTECTED] (Jarkko Hietaniemi) wrote:
> Change 16302 by jhi@alpha on 2002/05/01 12:54:24
>
> Provide the \N{U+} syntax before we forget.
Do we also want to support U-HH? I seem to recall from somewhere
that U+ went to U+ and that c
On Mon, 22 Apr 2002 01:13:01 +0300, [EMAIL PROTECTED] (Jarkko Hietaniemi)
wrote:
> While browsing through the patch I noticed one funny nit:
>
> > -=item various UP-UX encodings
> > +=item Various UP-UX encodings
>
> Unless it's Uewlett-Packard I think a slight tweak might be in order :-)
Um,
On Wed, 10 Apr 2002 02:01:34 -0600, [EMAIL PROTECTED] (Sean M. Burke)
wrote:
> Random question: Has anyone besides me had occasion to use Text::Unidecode?
I've played around with it once or twice (and sent you the occasional
patch, IIRC), but never used it in anger.
Cheers,
Philip
On Wed, 10 Apr 2002 05:30:29 +0900, [EMAIL PROTECTED] (Dan Kogai)
wrote:
> ! lib/Encode/Supported.pod
> ! lib/Encode/Unicode.pm
>POD revise by Philip Newton. This adds Philip to AUTHORS list.
>Thank you for the exact quote of Douglas Adams :)
>Message-Id: &l
On Mon, 8 Apr 2002 15:24:57 +0400, [EMAIL PROTECTED] (Anton Tagunov)
wrote:
> 2) [PATCH], thanks to Philip Newton
>
> --- E:\anth\tmp\perl\b2\ext\Encode-1.30\lib\Encode\Supported.pod.orig Mon Apr 8
>14:06:12 2002
> +++ E:\anth\tmp\perl\b2\ext\Encode-1.30\lib\Encode\Supported
On Mon, 8 Apr 2002 03:33:00 +0400, [EMAIL PROTECTED] (Anton Tagunov)
wrote:
> --- E:\anth\tmp\perl\b2\ext\Encode\lib\Encode\Supported.pod.origSun Apr 7
>20:39:07 2002
> +++ E:\anth\tmp\perl\b2\ext\Encode\lib\Encode\Supported.pod Mon Apr 8 03:22:03
>2002
> @@ -583,14 +583,15 @@
>
; > yy and yyFFFE..
>
> I think this is the best choice: it makes no sense to control xFFFE and
> x separately.
That's what I would have thought, too. Thanks.
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
On Tue, 2 Apr 2002 13:45:06 -0800, [EMAIL PROTECTED] (Jarkko Hietaniemi) wrote:
> Change 15689 by jhi@alpha on 2002/04/02 20:35:13
>
> What started as a small nit (the charnames test, nit found
> be Hugo), ballooned a bit... the goal is Larry's wish that
> illegal Unicode (such
On Wed, 27 Mar 2002 00:09:11 +, [EMAIL PROTECTED] (Markus
Kuhn) wrote:
> Dan Kogai wrote on 2002-03-26 22:35 UTC:
> > And not all
> > scripts are accepted or approved by Unicode Consotium. If you want to
> > spell in Klingon, you have to find your own encoding.
>
> Klingon is a very bad exa
On Mon, 25 Mar 2002 22:36:00 +0200, [EMAIL PROTECTED] (Jarkko Hietaniemi)
wrote:
> I simply have to retaliate with a piece of Finnish discussion:
>
> - Kokoo kokoon koko kokko.
> - Koko kokkoko?
> - Koko kokko.
What is the first 'Kokoo'? http://websmart.kielikone.fi/ did not
r
On Tue, 26 Mar 2002 11:31:59 +0800, [EMAIL PROTECTED] (Autrijus
Tang) wrote:
> But the legendary Prof. Zhao Yuan-Ren (the first modern Chinese
> linguist, translator extraordinarie) composed "The Tale of Mr.
> Shi's Lion-eating endeavor" to demonstrate the impossibility of
> reducing Chinese to p
On Tue, 26 Mar 2002 05:27:08 +0900, [EMAIL PROTECTED] (Dan Kogai)
wrote:
>There is a famous sentence that goes like;
>
>KiSha no KiSha ga KiSha de KiSha shita.
>
>all KiSha are in two Kanjis but they are all spelled differently.
"Your company's reporter returned to his company by t
On Fri, 15 Feb 2002 11:52:41 +0900, [EMAIL PROTECTED] (Dan Kogai)
wrote:
> * Reference Fixed Width (Misc TT? Courier Unicode? Monaco Unicode?)
There was Everson Mono but that project appears to have stalled as well.
Cheers,
Philip
On Thu, 14 Feb 2002 23:16:10 +0200, [EMAIL PROTECTED] (Jarkko Hietaniemi)
wrote:
> Unifont stalled a bit since Roman Czyborra disappeared from the online
> world, but someone picked up (forked) the project:
>
> http://dvdeug.dhis.org/unifont.html
There's also a Yahoo! Group (mailing list) calle
On Thu, 31 Jan 2002 12:31:58 +, [EMAIL PROTECTED] (Jean-Michel Hiver)
wrote:
> Any ideas? I'm quite worried about the fact that I have a webapp that
> works perfectly for Punjabi but that kind of screws Japanese up when
> creating new documents and performing searches :-(
Does it work for T
On Thu, 20 Dec 2001 02:01:14 -0600, [EMAIL PROTECTED] (Michael A. Grady) wrote:
> Does anyone already have Perl code to translate any Latin-1 accented
> characters to the closet ascii equivalent character(s)?
You may be able to press Sean M. Burke's Text::Unidecode into service.
Cheers,
Philip
On Tue, 11 Dec 2001 13:24:46 -0800, [EMAIL PROTECTED] (Brian Stell)
wrote:
> The BOM is valid as the *first* character. I'm not sure what the
> spec says about subsequent chars.
As I understand it, 0xFEFF leads a double life: it's either "zero width
no-break space" or "byte order mark". If it's
On Tue, 11 Dec 2001 21:40:36 +, [EMAIL PROTECTED] (Jalal Kakavand)
wrote:
> my $mydoc = shift ;
> # check BOM
> my $top1 = unpack("C", substr($mydoc, 0, 1));
> my $top2 = unpack("C", substr($mydoc, 1, 1));
> my $top3 = unpack("C", substr($mydoc, 2, 1));
>
> # UT
On Mon, 10 Dec 2001 16:49:18 +, [EMAIL PROTECTED] (Jean-Michel Hiver)
wrote:
> The way I got around this was to build a lossy table mapping
> ISO-8859-15 to US ASCII, and then applying a few simple regexes so
> that a sentence like "Le rêve du café" gets turned into
> "le-reve-du-cafe
On Thu, 22 Nov 2001 22:45:54 -0500, in perl.unicode you wrote:
> I've just tried using this in a form like:
>
> my $i = "263a"
> my $smiley = "\x{$i}";
>
> and was disappointed that it didn't work.
No -- you need a literal. Just like reading in the string '050' from a
file and treating it as a
On Wed, 21 Nov 2001 22:04:52 -, in perl.unicode you wrote:
> When adding the unicode value to the Sql string in
> $sql="INSERT INTO Tipo_Referencia ( Descricao )
> VALUES ('$palavra_utf16');";
> there is an implicit conversion from the Unicode::String object
> to a common Pe
On Wed, 21 Nov 2001 16:41:46 + (GMT), in perl.unicode you wrote:
> Thanks - MS Mincho looks interesting.
[...]
> Also - the glyphs looked slightly different : do you know if it's a big-
> or little-endian UTF-16 font or a UTF-8 font ?
> Ideally I'd like to use a UTF-8 font.
That doesn't make
On Wed, 21 Nov 2001 16:05:06 -, in perl.unicode you wrote:
> now I can write to the DB, but the values are not properly recognized. If
> you try to open the file I attached to my prior mail in Word, you'll
> see exactly what I see in the DB record.
In Word, I see ĨĩŨũ, but when I open it in
On Wed, 21 Nov 2001 16:34:48 -, in perl.unicode you wrote:
> Don't lose more time over this. It seems there is some kind of problem with
> the recognition of the encoding from other Office apps.
> Its rather surprising that Notepad regosnizes the characters properly and
> Word and Access don'
On Wed, 21 Nov 2001 15:14:38 -, in perl.unicode you wrote:
> Still can't write to the BD though. The append SQL instruction has no effect.
It looks wrong to me, too.
> use Unicode::String qw(utf8 latin1);
You don't need to import 'latin1' if you're not going to use it. (It's
not going to h
On Wed, 21 Nov 2001 00:22:04 -, in perl.unicode you wrote:
> Thank you for your help.
Hope it was of some help :)
> > But you said you wanted to convert from UTF-8 to UTF-16. So you probably
> > want something like
> >
> > $palavra_objeito = utf8($_);
> > $palavra_em_utf16 = $palavr
On Tue, 20 Nov 2001 16:49:38 + (GMT), in perl.unicode you wrote:
> binmode STDIN;
> while(<>)
> {
> $u = utf16($_);
> $u->byteswap2 if defined $swap; # $swap defined based on command line options
This looks strange. The way I read the manpage, byteswap2 is meant to be
called as a functio
On Tue, 20 Nov 2001 16:35:25 -, in perl.unicode you wrote:
> open(FICH1,"fich1.txt")||die"Nao foi possivel abrir o ficheiro fich1.txt";
> open(FICH3,">fich3.txt")||die"Nao foi possivel abrir o ficheiro fich3.txt";
Good that you check for success, but you should also include the reason
-- it'
On Tue, 20 Nov 2001 15:02:53 -, in perl.unicode you wrote:
> I saw your reference to the use of Unicode::String. Excuse me for
> asking, but does it work?
I believe so. It's been a while since I used it, but I think it did what
I wanted it to back then.
> We've tried to several funtions fro
On Tue, 20 Nov 2001 15:59:07 + (GMT), in perl.unicode you wrote:
> b. One file worked fine, but for another it converted the Chinese
> data to different Chinese data.
Did you see any correlation between the code points? Like, say, turning
4567 into 6745?
Can you give an example of "before"
On Fri, 16 Nov 2001 17:41:52 + (GMT), in perl.unicode you wrote:
> I'm wanting to convert a file from UTF16 into UTF8. I believe I've
> identified the tools to do it and all but installed them, apart
> from Unicode::Map8 (v0.10).
>
> Can anyone help me with the build errors (below) or advise
On Sun, 11 Nov 2001 12:57:27 -0800, in perl.unicode you wrote:
> ISO Latin-1 characters encoded as 10-FF in single bytes are not Unicode.
Hm? ISO Latin-1 characters from 00 to 7F encoded in single bytes
represent the same Unicode characters as those bytes interpreted as
UTF-8, simply because ASC
On Sat, 3 Nov 2001 18:16:32 + (GMT), in perl.unicode you wrote:
> In practice, Perl has long ago replaced grep, sort, tr, awk, for all but
> sentimental reasons.
I'd like to disagree with 'sort'. In some cases, at least, the
system-supplied sort(1) can do disk-based (merge?) sorting, enablin
this case, for example, `perldoc perlop` is a good place to
start, specifically the section on "Quote and Quote-like Operators" and
later on on "Interpolation".
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
me.
>
> Yup. /^(?:ISO\W?)?(?:8859|Latin)-?1$/i
Or even /^(?:ISO[\W_]?)?(?:8859|Latin)[-_]?1$/i
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
f Unicode compliance documented
in the Unicode book. I don't have it handy here, but I believe you could
have a compliance level that doesn't know about BIDI, or doesn't know
about compatibility decompositions, or something like that.
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
n covering all cyrillic minority languages in Unicode and is constantly
> looking for reference material and documentation.
Thanks.
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
I appreciate copies of replies to my messages to Perl5 lists.
yrillic-using languages (say,
Russian, Belorussian, Ukrainian, Macedonian, and Serbian), then it's
not complete. It also needs to do Bashkir, Azerbaidjani, Khanti, &c.
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
t that's
reserved here. So if I'm translating a string containing NULs, those
characters will be treated as "not-a-character"?
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
FF as a character as
long as it maps that character to something other than 0x or 0xFFFE
when converting to Unicode.
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
d strings? For
example, if I'm processing UTF-8 text in C, "foo" is equivalent to 0066
006F 006F . In which case, it's very much in use already.
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
he letters in the transcoding nroff-to-pod,
which is bad here.
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
kely, since the last line
should then start "0030003100320033" -- that is, F0 .. F9 should map to
U+0030 .. U+0039, the digits.
I don't remember the code points for letters, but I'm fairly sure the
digits fall in the range F0 .. F9 in all flavours of EBDIC. You have
U+0031 at position 90.
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
uot;. For example, for Russian -- KOI8?
Windows codepage? Mac? Unicode? ISO-8859-x?
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
On 14 Sep 2000, at 12:35, Dominic Dunlop wrote:
> At 18:00 +0200 2000-09-13, Philip Newton wrote:
> >What's Perl's take on characters where ord($c) > 0x, anyway?
>
> It seems to Just Work, as this one-ish-liner shows:
[snip]
In that case, if we want to go switch
On 13 Sep 2000, at 11:57, Mark Leisher wrote:
> True, UTF-16 is not known as UCS-2. However, UTF-16 still consists
> of 2-byte chunks. It is essentially UCS-2 plus high and low
> surrogates (see the Unicode Standard 3.0 page 19).
Yes, but if you just have a high surrogate, you can't do much w
ised.
> are encoded in Unicode with code points above 65,535 the
> distinction between UCS-2 and UTF-16 is mostly academic at this
> point in time.
At this point in time, yes. I suppose I just wanted to point out that this
*may* change, at some unspecified (and maybe even distant) point in the
future.
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>
On 12 Sep 2000, at 18:42, Jarkko Hietaniemi wrote:
> UTF-16 is also known as UCS-2, 16 bit or 2-byte chunks,
As I understand it, that's not true -- UTF-16 is 2-byte *or* 4-byte
chunks, since UTF-16 contains surrogates (high-surrogate + low-
surrogate [or the other way around?] = 1 character, re
On 12 Sep 2000, at 11:57, Jarkko Hietaniemi wrote:
> I would go for UCS-2 (UTF-16) as soon as possible as the preferred
> internal encoding.
You know, of course, that UCS-2 ne UTF-16 (specifically, surrogates).
What's Perl's take on characters where ord($c) > 0x, anyway?
(These two issues
54 matches
Mail list logo