Re: PLAN9

2006-02-02 Thread Nick Ing-Simmons
Mohammad Yaseen [EMAIL PROTECTED] writes: Hi, I'm using perl-5.8.7. What is PLAN9? An operating system. Ans what is plan9 directory is meant for in the source directory. Building perl for/on a plan9 system Thanks and Regards Yaseen

Re: Encode the subject line in MIME header using Perl 5.6

2005-12-30 Thread Nick Ing-Simmons
John Delacour [EMAIL PROTECTED] writes: use MIME::QuotedPrint; $qp = encode_qp ($_, ''); print =?UTF-8?Q?$qp?= . $/; That isn't quite right. MIME::QuotedPrint does NOT encode space or tab. RFC2047 says: The Q encoding is similar to the Quoted-Printable content- transfer-encoding defined

Re : Encode the subject line in MIME header using Perl 5 . 6 my €0 . 02

2005-12-29 Thread Nick Ing-Simmons
Wing [EMAIL PROTECTED] writes: John Delacour [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] At 12:42 am +0800 28/12/05, wing wrote: I need to encode the subject line in a MIME header in UTF8 (something like Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=). I know that this

Re: Local installation

2005-12-29 Thread Nick Ing-Simmons
David Olsson [EMAIL PROTECTED] writes: What is the easiest way to install Encode for a single user? Same as any other CPAN module. perl Makefile.PL PREFIX=/home/cedric/perl_modules make make install then: #!/usr/bin/perl use lib '/home/cedric/perl_modules'; # or if script is relative to

Re: iso-2022-jp encoding on EBCDIC

2005-12-15 Thread Nick Ing-Simmons
Rajarshi Das [EMAIL PROTECTED] writes: Hi, The following two line script gives an error on z/OS : Unknown encoding 'iso-2022- jp' at line ... - use Encode; use encoding 'iso-2022-jp'; On an EBCDIC platform like z/OS that is going to be one strange

Re: data written on ebcdic

2005-07-08 Thread Nick Ing-Simmons
Rajarshi Das [EMAIL PROTECTED] writes: I run the following on an ebcdic platform (perl-5.8.6), $BOM = chr(0xFEFF); open(UTF_PL, :raw:encoding(utf16le), utf.pl) or die utf.pl($enc,$tag): $!; print UTF_PL $BOM; print UTF_PL 1; should the data that is written using PerlLIO_write, be \xFF

Re: UTF-8 and matching [^\s]

2005-02-02 Thread Nick Ing-Simmons
Stuart Hughes [EMAIL PROTECTED] writes: Hi everyone, I've run into problems matching the regex [^\s] on RedHat 8/9 and the version of perl shipped with it (5.8.0). It isn't 5.8.0 is 5.8.0-with-RedHatBugs :-( To be fair to them it is some development track thing - there was an experimental

Re: Undocumented feature of Encode::{en,de}code()

2004-12-23 Thread Nick Ing-Simmons
Radoslaw Zielinski [EMAIL PROTECTED] writes: Hello, What's the point of lines 151 and 167 in Encode.pm? Respectively: # sub encode $_[1] = $string if $check; # sub decode $_[1] = $octets if $check; I really can't see a point in overwriting the input value... Why only if

Re: Make Encode.pm support the real UTF-8

2004-12-06 Thread Nick Ing-Simmons
Bjoern Hoehrmann [EMAIL PROTECTED] writes: Now that we have this problem, introducing more places where one needs to carefully check the documentation what is considered UTF-8 does not seem like the best option, having decode_utf8() and decode(utf8=...) mean some- thing different is likely

Re: :encoding() layer modifies read-only scalars

2004-11-29 Thread Nick Ing-Simmons
Bjoern Hoehrmann [EMAIL PROTECTED] writes: * Bjoern Hoehrmann wrote: Enocde 2.08, PerlIO::scalar 0.02, ActivePerl 5.8.2, #!perl -w use strict; use warnings; use Encode; my $string = encode(UTF16 = ); for (qw/UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE/) { my $backup =

Re: Website encoding

2004-11-19 Thread Nick Ing-Simmons
Rick Measham [EMAIL PROTECTED] writes: That being the case, I grab the charset and use Encode's decode function to turn it into 'perl's internal format' .. which in 5.8.5 is utf8 right? As it happens the answer is maybe, but it is the _internal_ form it is none of your business ;-) - so

Re: clearing the utf8 flag

2004-11-10 Thread Nick Ing-Simmons
Paul Bijnens [EMAIL PROTECTED] writes: I have a program that reads and writes (among others) strings that should be utf8 encoded. I say should, because somewhere deep inside the dark corners of that program, sometimes, the utf8 flag on a string is lost. (I'm still investigating where, tips to

Re: Question about converting utf8 to ascii and char refs

2004-10-27 Thread Nick Ing-Simmons
Aaron Siladi [EMAIL PROTECTED] writes: I have a UTF-8 string which I want to output as ascii and have the UTF8 characters converted to numeric character references. I tried using Encode with the FB_HTMLCREFS fail back option enabled, but for the 2 byte UTF8 characters, 2 incorrect char refs

Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

2004-10-25 Thread Nick Ing-Simmons
Dan Kogai [EMAIL PROTECTED] writes: On Oct 25, 2004, at 03:01, Nick Ing-Simmons wrote: But as Dan said at the start \xF6 on its own (say as 1023 octet in a 0..1023 1024-octet buffer is not a fail. Changing that will make :encoding() layer have problems as buffer boundaries can occur

Re: Resolving charset names with Encode

2004-10-24 Thread Nick Ing-Simmons
Bjoern Hoehrmann [EMAIL PROTECTED] writes: Hi, What is currently the best way to resolve charset names to use them with Encode.pm? I would have expected that e.g. Encode::decode('ebcdic-cp-us', '') would just work but it does not appear to know that alias. Then I've tried to use

Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

2004-10-24 Thread Nick Ing-Simmons
Dan Kogai [EMAIL PROTECTED] writes: On Oct 23, 2004, at 01:04, Bjoern Hoehrmann wrote: C12a in Unicode 4.0.1 notes [...] For example, in UTF-8 every code unit of the form 110 must be followed by a code unit of the form 10xx. A sequence such as 110x 0xxx is illformed and

Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Nick Ing-Simmons
Dan Kogai [EMAIL PROTECTED] writes: On Oct 24, 2004, at 06:41, Rafael Garcia-Suarez wrote: Dan Kogai wrote: Within less than 24hrs I resorted to release version 2.07. What the heck. 5.8.6 is soon I applied 2.07 to bleadperl, and looks like something is broken in PerlIO::encoding. More

Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Nick Ing-Simmons
Rafael Garcia-Suarez [EMAIL PROTECTED] writes: Dan Kogai wrote: This makes perl-5.8.6 happy but the problem is that I have made Encode::utf8 so that it accepts fallback values like Encode::XS (upon the request by Bjoern Hoehrmann via RT). Encode::utf8 used to return immediately at partial

Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Nick Ing-Simmons
Dan Kogai [EMAIL PROTECTED] writes: On Oct 24, 2004, at 18:34, Rafael Garcia-Suarez wrote: Welcome to backward compatibility hell :) Hell it was but seems like I came up with a way out (yay). I just want Encode::utf8-decode() to make sure Encode:RETURN_ON_ERR is on when the callar is

Re: Does LWP know anything (or need to know anything) about Unicode?

2004-10-11 Thread Nick Ing-Simmons
Rick Measham [EMAIL PROTECTED] writes: G'day Unicode Gurus and other assorted members of the perl Unicode community. I have a script that attempts to collect translations from Babelfish. I've posted it below. It uses LWP::Useragent to turn an English phrase into Japanese (or any other language

Re: Problem with 'Mailformed UTF-8 caracter' warning messageswhen I use Switch.pm package!

2004-09-04 Thread Nick Ing-Simmons
Rafael Garcia-Suarez [EMAIL PROTECTED] writes: I have a problem to avoid Mailformed UTF-8 caracter messages when I use the Switch.pm module on SuSE 9.1 Profesional with english or german language settings. Could we see a snippet of code that demonstrates the problem ? The version of perl you

Re: Weird behavior of encoding open pragmas

2004-08-17 Thread Nick Ing-Simmons
Marcin 'Qrczak' Kowalczyk [EMAIL PROTECTED] writes: $ perl -e 'use encoding ISO-8859-2; use open :encoding(ISO-8859-2); print ord($ARGV[0]), chr(260), $ARGV[0], \n' \x{00a1} does not map to iso-8859- 2 at -e line 1. 260\x{00a1} I don't understand it: ord($ARGV[0]) is 260, chr(260) can be

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Nick Ing-Simmons
Marcin 'Qrczak' Kowalczyk [EMAIL PROTECTED] writes: Should strings without the UTF8 flag be interpreted in the default encoding of the current locale or in ISO-8859-1? This is a tricky question - and status quo is likely to remain for compatibility reasons. Perl treats them inconsistently. On

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Nick Ing-Simmons
Marcin 'Qrczak' Kowalczyk [EMAIL PROTECTED] writes: W licie z pon, 16-08-2004, godz. 11:16 +0100, Nick Ing-Simmons napisa: Perl treats them inconsistently. On one hand they are read from files and used as filenames without any recoding, which implies that they are assumed to be in some

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Nick Ing-Simmons
Marcin 'Qrczak' Kowalczyk [EMAIL PROTECTED] writes: But there is a simple workaround for that, as perluniintro would tell you: the encoding pragma. The encoding pragma partially works. It doesn't influence assumed encoding of files opened without specifying the encoding, nor handling of

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Nick Ing-Simmons
Marcin 'Qrczak' Kowalczyk [EMAIL PROTECTED] writes: W licie z pon, 16-08-2004, godz. 16:54 +0300, Jarkko Hietaniemi napisa: The encoding pragma partially works. It doesn't influence assumed encoding of files opened without specifying the encoding, nor handling of filenames, and it needs to

Re: Unicode filenames on Windows with Perl = 5.8.2

2004-06-25 Thread Nick Ing-Simmons
Nicholas Clark [EMAIL PROTECTED] writes: On Mon, Jun 21, 2004 at 08:46:07AM -0700, Jan Dubois wrote: I think it is possible, but it requires someone to both do the work and to argue for it on P5P. Without this champion, I don't see it happening at all. Nor do I. But P5P isn't big on arguing

Re: utf8, japanese, web-pages: beginning to see the light...

2004-05-18 Thread Nick Ing-Simmons
Marco Baroni [EMAIL PROTECTED] writes: A few days ago, I queried this list about my problems with a script that finds the charset of Japanese web pages and translates their text into utf-8. The following solution, proposed by Nick Ing-Simmons, worked for my case: binmode STDOOUT,:utf8

Re: BOM and principle of least surprise

2004-05-18 Thread Nick Ing-Simmons
Erland Sommarskog [EMAIL PROTECTED] writes: Jarkko Hietaniemi ([EMAIL PROTECTED]) writes: Nick Ing-Simmons wrote: This thread started as complaint that perl5 can't read a script saved as UCS-2/UTF-16 or whatever Windows uses. Uh, really? Perl 5.8+ should be able to do that, automatically

Re: BOM and principle of least surprise

2004-04-26 Thread Nick Ing-Simmons
Erland Sommarskog [EMAIL PROTECTED] writes: Nick Ing-Simmons ([EMAIL PROTECTED]) writes: Erland Sommarskog [EMAIL PROTECTED] writes: I would really expect someone to have done this already, but I see no reference to such a module. Or layer-directive like :use-bom to open the file. And then some

Re: Decoding more languages

2004-04-13 Thread Nick Ing-Simmons
: Nick Ing-Simmons [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, April 13, 2004 11:13 AM Subject: Re: Decoding more languages Octavian Rasnita [EMAIL PROTECTED] writes: Hello all, I want to transform a text that contains words in more languages (it is a course for learning a foreign

Re: Creating a UTF-8 web page

2004-04-08 Thread Nick Ing-Simmons
Octavian Rasnita [EMAIL PROTECTED] writes: I have tried the following script: #!/perl/bin/perl -wC use Encode; my $text = Encode::decode('latin2', 'mta'); binmode(STDOUT, :utf8); print Content-Type: text/html; Charset=UTF-8\n\n; print Encode::encode('utf8', $text); You have double-encoded

Re: BOM and principle of least surprise

2004-03-31 Thread Nick Ing-Simmons
Erland Sommarskog [EMAIL PROTECTED] writes: It seems that the only way out, is to first open the file in plain mode, binmode I suspect. look at the first three bytes, and if it is BOM, close the file, open again with the appropriate options and discard the BOM. You don't have to close it just

Re: BOM and principle of least surprise

2004-03-19 Thread Nick Ing-Simmons
Erland Sommarskog [EMAIL PROTECTED] writes: open (F, ':encoding(ucs-2le)', 'rkmacka-ucs2.txt'); And one things seems just plain wrong to me: The \n is written as 0A 0D to the file, not 000A, 000D. But may there is some more manual reading I need to do find out how to do it. 0A 0D is

Re: Converting string to UTF-16LE

2004-02-26 Thread Nick Ing-Simmons
Sebastian Lehmann [EMAIL PROTECTED] writes: Hello, i use a perl script to search different files. The search values are given from a HTML page, the results are displayed on this page, too. The files are saved in the UTF16LE format, therefore i will open them with the following open command:

Re: Question regarding Unicode handling in perl: auto-sensing

2004-02-22 Thread Nick Ing-Simmons
Andreas Jaekel [EMAIL PROTECTED] writes: Dear Perl Dieties! I've been trying to figure this out for myself for a couple of hours now, but I got to the point were I gave up and decided that I'll have to bother you. Hope you don't mind. My task is the following, and I'm running out of ideas: //

Re: How to convert base64 string to utf-8

2004-02-06 Thread Nick Ing-Simmons
Guido Flohr [EMAIL PROTECTED] writes: ALexander N. Treyner wrote: Hello All, I'm using utf-8 Postgres database, where I save strings in many languages. I have to match the database with strings encoded in mime base64 or quoted-printable format. Like next:

Re: How to convert base64 string to utf-8

2004-02-06 Thread Nick Ing-Simmons
ALexander N. Treyner [EMAIL PROTECTED] writes: Hi John, Your code works perfect. But I found one strange thing. For example I have next string: hello hello world that converted by the mail client to hello =?windows-1255?Q?=F9=EC=E5=ED_hello_world?= After

Re: Patch for tests on un*x

2004-01-29 Thread Nick Ing-Simmons
Brad Guillory [EMAIL PROTECTED] writes: Last spring someone committed a patch to fix the tests on windows platforms (see Change 18966 by [EMAIL PROTECTED] on 2003/03/14 04:20:51). This broke the tests on my Redhat box. Here is a compromise patch: --- t/enc_module.t.orig 2004-01-28

Re: \W and [\W]

2004-01-02 Thread Nick Ing-Simmons
Eric Cholet [EMAIL PROTECTED] writes: Le 1 janv. 04, 17:50, Rafael Garcia-Suarez a crit : +(However, and as a limitation of the current implementation, using +C\w or C\W Iinside a C[...] character class will still match +with byte semantics.) I don't think it applies to \w, only \W. \x{df}

Re: perlunicode comment - when Unicode does not happen

2003-12-28 Thread Nick Ing-Simmons
Jarkko Hietaniemi [EMAIL PROTECTED] writes: Let's not 'fix' it (not carve it on a stone), but offer a few well-thought-out options. For instance, Perl may offer (not that these are particularly well-thought-out) 'just treat this as a sequence of octets', 'locale', and 'unicode'. 'locale' on

Re: perlunicode comment - when Unicode does not happen

2003-12-28 Thread Nick Ing-Simmons
Jungshik Shin [EMAIL PROTECTED] writes: Then, he should switch to en_GB.UTF-8. I probably will. Besides, he implied that he still uses ISO-8859-1 for files whose names can be covered by ISO-8859-1, which is why I wrote about mixing up two encodings in a single file system _under_ his

Re: perlunicode comment - when Unicode does not happen

2003-12-28 Thread Nick Ing-Simmons
Jarkko Hietaniemi [EMAIL PROTECTED] writes: What I wish is that the whole current locale system would curl up and die. As you'd agree, it's only 'encoding' part that has to die. Oh no, there are plenty of parts in it that I wish would die :-) (though the coupling of encoding is a major

Re: perlunicode comment - when Unicode does not happen

2003-12-28 Thread Nick Ing-Simmons
Jungshik Shin [EMAIL PROTECTED] writes: That will work if there's en_GB.UTF-8 available for him in his particular Unixes and assuming using UTF-8 locales won't break other things. Just so we get this clear. A year or so back I - as a Unicode advocate - tried to switch to en_GB.utf8. Within

Re: using Encode module

2003-12-11 Thread Nick Ing-Simmons
Dana Sharvit - M [EMAIL PROTECTED] writes: Hi , I am using the Encode module (perl 5.8)to convert a string from utf8 to big 5. There is something that I do not understand that I thought you may help with: The input to the program is a file that contains a utf8 string, The encoding works properly

RE: unicode on windows

2003-11-21 Thread Nick Ing-Simmons
Edward Batutis [EMAIL PROTECTED] writes: Also each character when I view it via character listing of IME pad, it has three hex numbers. Seeing three hex numbers per character is a sure sign you've got utf8. You need to convert the characters to the platform encoding before using 'open'. In

Re: possible patch for Perl 5.8.2's Alias.pm

2003-10-30 Thread Nick Ing-Simmons
Jarkko Hietaniemi [EMAIL PROTECTED] writes: a year ago, there was a discussion on this list about Encode not recognizing TIS-620 as alias for iso-8859-11: http://nntp.x.perl.org/group/perl.unicode/1656 In the latest release of Encode::Alias (1.38 from Encode 1.9801, included in Perl

Re: roundtrip conversion for Mac OS CJK encodings

2003-09-28 Thread Nick Ing-Simmons
SADAHIRO Tomoyuki [EMAIL PROTECTED] writes: Hello. For round-trip fidelity, Mac OS CJK encodings include many characters with mapping a single character in a Mac OS encoding to a sequence of standard Unicode characters. (cf. ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/README.TXT ) In the

Re: UCM file and combining character sequences

2003-09-22 Thread Nick Ing-Simmons
Hank Tt [EMAIL PROTECTED] writes: Hi, I'm trying to make a UCM file to feed to enc2xs. The legacy encoding for Taiwanese romanization *must* have its code points mapped to Unicode character sequences, for the simple reason that the UCS lacks the corresponding precomposed characters (and is

Re: UCM file and combining character sequences

2003-09-22 Thread Nick Ing-Simmons
Hank Tt [EMAIL PROTECTED] writes: Hi, I'm trying to make a UCM file to feed to enc2xs. The legacy encoding for Taiwanese romanization *must* have its code points mapped to Unicode character sequences, for the simple reason that the UCS lacks the corresponding precomposed characters (and is

Re: Invalid Uicode characters

2003-09-17 Thread Nick Ing-Simmons
John Delacour [EMAIL PROTECTED] writes: At 11:31 am +0100 16/9/03, [EMAIL PROTECTED] wrote: Dear PERLists, I am running Perl 5.8. and trying to filter out some invalid Unicode characters from Unicoded texts of some South Asian languages. There are 28 such characters in my data (all control

Re: Inverse of /\p{script}/

2003-08-29 Thread Nick Ing-Simmons
Jarkko Hietaniemi [EMAIL PROTECTED] writes: On Thu, Aug 28, 2003 at 03:16:20PM +0100, [EMAIL PROTECTED] wrote: Does the existing perl5.8.* Unicode support have a way to efficently determine which script(s) or block (in unicode sense) a code point belongs to? use Unicode::UCD

Re: Inverse of /\p{script}/

2003-08-29 Thread Nick Ing-Simmons
Dan Kogai [EMAIL PROTECTED] writes: But that is not good enough for cases below because... (Hiragana | Katakana | Han) = 'jisx0208.1990-0' This is very wrong because jisx0208.1990-0 only contains \p{Han} that appears in Japanese (JIS X 0208, to be exact). On the other hand, jisx0208.1990-0

Re: Inverse of /\p{script}/

2003-08-29 Thread Nick Ing-Simmons
Owen Taylor [EMAIL PROTECTED] writes: You might want to look at what we did for Pango - see pango/modules/basic/tables-big.i in ftp://ftp.gtk.org/pub/gtk/v2.2/pango-1.2.5.tar.gz. [There may come a time when I just give up Tcl/Tk and implement perl/Tk OO interface on top of gtk instead. But not

Re: Inverse of /\p{script}/

2003-08-29 Thread Nick Ing-Simmons
Owen Taylor [EMAIL PROTECTED] writes: On Fri, 2003-08-29 at 11:14, Nick Ing-Simmons wrote: We're dropping support for this code and for core X fonts in the next release of Pango, In favour of what? (FreeType on client side?) Yes, using the Xft and fontconfig libraries. (http

Re: Inverse of /\p{script}/

2003-08-29 Thread Nick Ing-Simmons
Jungshik Shin [EMAIL PROTECTED] writes: If you want, you can take a look at nsFontMetricsGTK.cpp file of mozilla. Can you pass on my admiration to the Mozilla team - its handling of these issues in version 1.4 is so much better than ye-olde Netscape. You can view that huge file (over 6,000

Re: Encode from XS

2003-08-11 Thread Nick Ing-Simmons
Simon Cozens [EMAIL PROTECTED] writes: [EMAIL PROTECTED] (Simon Cozens) writes: Can someone give me a few quick examples of creating Encode::XS objects to do simple transcoding, from XS? I think I expressed myself badly. Perhaps I don't mean creating Encode::XS objects, but instantiating them.

Re: IO::Socket::INET and utf-8

2003-07-02 Thread Nick Ing-Simmons
expect it to be in 5.8.1 as well. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: IO::Socket::INET and utf-8

2003-07-01 Thread Nick Ing-Simmons
Nick Ing-Simmons [EMAIL PROTECTED] writes: Martin J. Evans [EMAIL PROTECTED] writes: A socket is a file handle so : binmode($sock,:utf8); should work. I'm obviously missing something rather fundamental here. Not you - us. How can we have got this far without someone discovering

Re: Unicode and XS

2003-03-23 Thread Nick Ing-Simmons
' encoding uses core's SvUTF8 scheme - which is just fine if it _IS_ UTF-8 What we need for Encode::* to have its _own_ UTF-8 and UTF-EBCDIC encode/decode independant of what core is using... Thanks Brian -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Need some help in understanding Unicode in Perl...

2003-02-21 Thread Nick Ing-Simmons
email using Encode::'s euc-cn and Unicode fonts, but as I can't read many chineese characters this was mainly just as an exercise. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [PATCH] %open::modes to hold ${^OPEN} values for run-time access

2003-01-29 Thread Nick Ing-Simmons
v1.2.1 (FreeBSD) iD8DBQE+NvePtLPdNzw1AaARAm5fAJ9cURDB+e2FO88Aa+ULzJxACOWwAACfSiy0 i/vf6NBdmU5ynqXHU66nRso= =keaI -END PGP SIGNATURE- -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [not-yet-a-PATCH] compress Encode better

2002-11-04 Thread Nick Ing-Simmons
. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Fixed Encode::utf8

2002-10-20 Thread Nick Ing-Simmons
Attached is patch that implements -decode and -encode of Encode::utf8 as XS code that obeys all the rules that Encode::XS does. This allows :encoding(UTF-8) to handle partial chars at end of buffers correctly. Submited as //depot/perlio/...18032 -- Nick Ing-Simmons http://www.ni-s.u

Re: Is perl unicode or not?

2002-10-13 Thread Nick Ing-Simmons
you to that conclusion? how can I flatten it to binary? I tried with unpack without success. Nadim. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Re[2]: ISO 8859-11 (Thai) cross-mapping table

2002-10-09 Thread Nick Ing-Simmons
to be more usable (less embedded or at least more systematic-looking punctuation, more familiar from e-mail and HTTP headers etc.) We can revisit that if people think it would help. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Re[4]: ISO 8859-11 (Thai) cross-mapping table

2002-10-09 Thread Nick Ing-Simmons
to their preferred MIME names, all in lowercase. Maybe the unique ID number (MIBenum) could also be taken into account. I have no objection to that - and I doubt Dan will either. Would you care to at least enumerate the cases we fail - or ideally provide patch(es) ? -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Encode::MIME::Header my 2

2002-10-07 Thread Nick Ing-Simmons
remains empty and maybe we can make use of it I probably will - there are a whole slew of Encode-oid issues with body part of MIME. Dan the Encode Maintainer -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Re[2]: converting Japanese chars into their Unicode values using 5.8's Encode

2002-09-20 Thread Nick Ing-Simmons
it on my machine. To be pedantic it is not an Encoding it is a non-encoding ;-) I would recommend using encode and decode rather than from_to in such cases. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: converting Japanese chars into their Unicode values using 5.8's Encode

2002-09-19 Thread Nick Ing-Simmons
($_),split(//,$string))); print $ord; But, this gives a 3-character string 怜 (with the decimal values 230, 164 and 156). Could anyone please point me to the right direction on how to get the decimal number 26908 instead? Thanks in advance. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Unicode::Normalize surprise with dotless i

2002-09-05 Thread Nick Ing-Simmons
); } ' combining with i: \x{00ee} combining with dotless i: \x{0131}\x{0302} What do you think? Makes sense to me. U+00EE is LATIN SMALL LETTER I WITH CIRCUMFLEX not LATIN SMALL LETTER DOTLESS I WITH CIRCUMFLEX -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: 2 Suprises w/5.8.0

2002-08-01 Thread Nick Ing-Simmons
it is UTF-8 encoded ^^^ Why is that step necessary? encode_utf8() should do that itself on the way ... $self-{CONTENT} = Encode::encode_utf8($self-{CONTENT}); # make octets } -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: input methods, combining characters

2002-07-12 Thread Nick Ing-Simmons
xmodmap. Generally this sort of thing needs to be handled below Readline or TK or whatever. I think it is do-able as readline or Tk - or even a PerlIO layer: binmode(STDIN,:via(combine_accents)); -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Performance and interface of Encode(3pm) in perl 5.8.0-RC1

2002-07-11 Thread Nick Ing-Simmons
in design. I quite agree - which is why Encode works the same way :-) -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Performance and interface of Encode(3pm) in perl 5.8.0-RC1

2002-07-11 Thread Nick Ing-Simmons
Guido Flohr [EMAIL PROTECTED] writes: Hi, On Thu, Jul 11, 2002 at 12:15:30PM +0100, Nick Ing-Simmons wrote: For my Tk application of encode the in-place form causes unnecessary copies. e.g. I need the original and the form encoded into the encoding required by the font, or I have to copy

Re: libxml-perl?

2002-06-21 Thread Nick Ing-Simmons
had this problem? Does your $ENV{LANG} match /utf-?8/i ? If so then perl5.7.3+ will have assumed utf8 on your behalf... -- Nick Ing-Simmons http://www.ni-s.u-net.com/

RE: Encode should stay undefphobia

2002-05-01 Thread Nick Ing-Simmons
something that is so generic. Paul -- Nick Ing-Simmons http://www.ni-s.u-net.com/

RE: Encode doesn't like undef

2002-04-30 Thread Nick Ing-Simmons
far, but passing a variable that contains undef is more common. Can this be detected silenced? /Paraphrase Yes it could but we don't for very good reasons. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [Encode] 1.65 released

2002-04-30 Thread Nick Ing-Simmons
::JIS2K ! lib/Encode/Guess.pm POD fix by Miyagawa-kun Message-Id: [EMAIL PROTECTED] Dan the Encode Maintainer -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Practical problems with custom .ucm based encoding

2002-04-25 Thread Nick Ing-Simmons
in theory there can be bits for A. Update src string B. Use fallbacks C. Partials as bad chars D. Use perl QQ E. Warn on error F. Croak on error H. ;-) Use HTML entities as fallbacks -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Encode-1.50 +

2002-04-20 Thread Nick Ing-Simmons
rather than a passive bit? -- Nick Ing-Simmons http://www.ni-s.u-net.com/ //depot/perlio/ext/Encode/Encode.pm#64 - /home/p4work/perl/perlio/ext/Encode/Encode.pm Index: perlio/ext/Encode/Encode.pm --- perlio/ext/Encode/Encode.pm.~1~ Sat Apr 20 20:36:47 2002 +++ perlio/ext/Encode

Tk804 + Encode-1.50 :-) again

2002-04-19 Thread Nick Ing-Simmons
as there is no certainty that lib / archlib relative paths work like that. Will tweak Tk's Makefile.PL configure to hunt down encode.h. Will do a spelling patch on the pod(s) when I get a chance. -- Nick Ing-Simmons http://www.ni-s.u-net.com/ --- Encode.xs.ship Fri Apr 19 19:25:26 2002

Re: iso-2022-jp problem

2002-04-15 Thread Nick Ing-Simmons
designed so that you can rely on CRLF to split the stream. Dan the Encode Maintainer. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: iso-2022-jp snags.

2002-04-12 Thread Nick Ing-Simmons
Dan Kogai [EMAIL PROTECTED] writes: On Friday, April 12, 2002, at 02:30 , Nick Ing-Simmons wrote: Having hacked RFC2047 support into tkmail I have now seen some non-latin1 characters in a real perl/Tk app. There seem to be a few snags with mime's iso-2022-jp: - It failed to demand load

Re: Encode API

2002-04-11 Thread Nick Ing-Simmons
Encode() atitude ( which is fine, just not my style ), I guess there isn't much to go on for me. ;) que sera sera --d -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [Encode] Farsi is Okay. The problem is in Indics!

2002-04-05 Thread Nick Ing-Simmons
a single-byte encoding, this is still possible without bloating the UCM. Dan the Encode Maintainer -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Nick Ing-Simmons
have to beware of is UTF-8 encoding codepoints in the surogate range, rather than de-surogating and encoding the real code point. The fixed UCS-2BE works for Tk - but is still a little slower than it could be. I suggest we do UTF-16XE properly as XS code. -- Nick Ing-Simmons http://www.ni-s.u

Re: [Encode] UCS/UTF mess and Surrogate Handlings

2002-04-05 Thread Nick Ing-Simmons
Perl_utf16_to_utf8_reversed(pTHX_ U8* p, U8* d, I32 bytelen, I32 *newlen) Should be a good starting point for the XS version ;-) which does first a byteswap and then calls the non-reversed version). I also can see that the Perl_utf16_to_utf8 is non-EBCDIC aware... -- Nick Ing-Simmons http://www.ni-s.u

Re: [Encode] How to support (Apple's) compound Unicode characters?

2002-04-01 Thread Nick Ing-Simmons
convert UTF-8 sequences for sequences of characters - but .ucm would need tweaking to allow multiple U: UU \x We would have to be sure that Unicode was normalized as well. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [Encode] poll; should *.ucm be relocated out of Encode?

2002-04-01 Thread Nick Ing-Simmons
reason .enc's were installed was for Encode::Tcl. Dan the Encode Maintainer -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [Encode] How to support (Apple's) compound Unicode characters?

2002-04-01 Thread Nick Ing-Simmons
Dan Kogai [EMAIL PROTECTED] writes: On Monday, April 1, 2002, at 07:33 , Nick Ing-Simmons wrote: Dan Kogai [EMAIL PROTECTED] writes: I think I have found the reason why some of the encodings were missing from Tcl's *.enc, which later turned into *.ucm. Apple makes use of Unicode

Re: [Encode] Compound Unicode Character Support in UCM

2002-04-01 Thread Nick Ing-Simmons
to come. Dan the Encode Maintainer -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [Encode] enc2txt missing under perl-current/utils/

2002-04-01 Thread Nick Ing-Simmons
Ing-Simmons http://www.ni-s.u-net.com/

Re: [Encode] MacIceland(ic)?, once again.

2002-04-01 Thread Nick Ing-Simmons
MacRumother ... FYI, I do check83.pl before the release since 0.99 or so Dan -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Encode seriously broken

2002-04-01 Thread Nick Ing-Simmons
) that the data file is invalid in some way and that current decode mistakes that for incomplete character... -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [Encode] Charset-0.01 released

2002-03-30 Thread Nick Ing-Simmons
Autrijus Tang [EMAIL PROTECTED] writes: And then you'll ahve to disambiguate between that and encoding.pm... Why aren't we extending encoding.pm instead? That was my thought as well - that there is overlap with Jarkko's work use encoding. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Encode: CJK-Guide

2002-03-27 Thread Nick Ing-Simmons
. It would be good to have some algorithmic encodings to use as examples. The only ones we have at present are UCS-2 (as perl code) and UTF-8 (C but buried in perl's core). -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: let's cook it!

2002-03-27 Thread Nick Ing-Simmons
) iEYEARECAAYFAjyhnoMACgkQtLPdNzw1AaB1gQCghITGqkt9MQWL/5Rozdq+KOEa fJkAnRDSvdwxJMVmREw7MlRr3XvdujEt =Oykx -END PGP SIGNATURE- -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [Encode] 8.3 rules sucks! check83.pl is obsolete!

2002-03-25 Thread Nick Ing-Simmons
directory the file name must be unique if truncated to 8.3 - not that all file names must be 8.3 I am fairly sure that is what the check83.pl polices. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [Encode] 8.3 rules sucks! check83.pl is obsolete!

2002-03-25 Thread Nick Ing-Simmons
is still the long one. Well, I didn't really enjoy renaming files myself Dan the Man with Too Many Files to Watch Over -- Nick Ing-Simmons http://www.ni-s.u-net.com/

  1   2   >