FW: Utf8 encoding

2007-10-26 Thread Vajramatti Shashidhar (DS/EES1)
amatti Shashidhar (DS/EES1) > Sent: Friday, October 26, 2007 12:03 PM > To: 'perl-unicode@perl.org' > Subject: Utf8 encoding > > Hello, > I am parsing an xml file using libxml2. The xml file has umlauts(german > keys ü/ö/ä etc) , °(degree) atc as the characters.

Re: Utf8 encoding

2007-10-26 Thread Juerd Waalboer
Vajramatti Shashidhar (DS/EES1) skribis 2007-10-26 12:02 (+0200): > my $parser = ""; > my $doc = ""; > $parser = XML::LibXML->new();# > $doc = $parser->parse_file( $x_file ); You should combine these for nicer code: my $parser = XML::LibXML->new(); my $doc = $par

Utf8 encoding

2007-10-26 Thread Vajramatti Shashidhar (DS/EES1)
Hello, I am parsing an xml file using libxml2. The xml file has umlauts(german keys ü/ö/ä etc) , °(degree) atc as the characters. Could someone tell me how to encode such an xml to utf8? I get the below error: "parser error : Input is not proper UTF-8, indicate en

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi
On Wed, Oct 02, 2002 at 10:44:06PM +0900, Dan Kogai wrote: > On Wednesday, Oct 2, 2002, at 22:34 Asia/Tokyo, Jarkko Hietaniemi wrote: > >>Yes. that's where hiragana -> katakana conversion is attempted; > >>English equivalent of tr/A-Z/a-z/. > > > >Okay... What are the {begin,end} codepoints of th

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Dan Kogai
On Wednesday, Oct 2, 2002, at 22:34 Asia/Tokyo, Jarkko Hietaniemi wrote: >> Yes. that's where hiragana -> katakana conversion is attempted; >> English equivalent of tr/A-Z/a-z/. > > Okay... What are the {begin,end} codepoints of those ranges, > both LHS and RHS of tr, both in EUC-JP and in Unicod

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi
> I can explain that. "\x{3af}bc\x{3af}de" is is a string literal so > it gets encoded. however, my example in escaped form is; > > $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/ > > which does not get encoded. the intention was; > > $kana =~ tr/\x{3041}-\x{3093}/\x{30a1}-\x{30f3}/

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi
> >Are you doing character ranges in the tr/// under 'use encoding'? > >(I'm asking because I see a "-" in the middle of what I assume is > >mangled EUC-JP) > > Yes. that's where hiragana -> katakana conversion is attempted; > English equivalent of tr/A-Z/a-z/. Okay... What are the {begin,end

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Dan Kogai
On Wednesday, Oct 2, 2002, at 21:51 Asia/Tokyo, Jarkko Hietaniemi wrote: > However, I will need to stare at your example some more, since > for simpler cases I think tr/// *is* obeying the 'use encoding': > > use encoding 'greek'; > ($a = "\x{3af}bc\x{3af}de") =~ tr/\xdf/a/; > print $a, "\n"; > >

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Dan Kogai
On Wednesday, Oct 2, 2002, at 22:15 Asia/Tokyo, Jarkko Hietaniemi wrote: > (Hi, it's me again...) > > Are you doing character ranges in the tr/// under 'use encoding'? > (I'm asking because I see a "-" in the middle of what I assume is > mangled EUC-JP) Yes. that's where hiragana -> katakana conv

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi
(Hi, it's me again...) Are you doing character ranges in the tr/// under 'use encoding'? (I'm asking because I see a "-" in the middle of what I assume is mangled EUC-JP) -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'.

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi
(Not that I understand any Japanese but) could you resend your script as an attachment? I'm afraid it might get mangled otherwise. In the headers I see the following: Content-Type: text/plain; charset=ISO-2022-JP; format=flowed ... Content-Transfer-Encoding: 7bit and when I save the mess

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi
However, I will need to stare at your example some more, since for simpler cases I think tr/// *is* obeying the 'use encoding': use encoding 'greek'; ($a = "\x{3af}bc\x{3af}de") =~ tr/\xdf/a/; print $a, "\n"; This does print "abcade\n", and it also works when I replace the \xdf with the literal

Re: [FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Jarkko Hietaniemi
> As you see, tr/// is not subject to the magic of 'use encoding'. > jhi, have we made it so deliberately ? I am begging to think tr/// Not deliberately, no. I agree that making tr/// to understand 'use encoding' would be good. > is happier to enbrace the power thereof. > > Still, it can b

[FYI] use encoding 'non-utf8-encoding'; use CGI;

2002-10-02 Thread Dan Kogai
I am currently writing yet another CGI book. That is for the Japanese market and written in Japanese. So it is inevitable that you have to face the labyrinth of character encoding. Before perl 5.8.0, most book teaches how to handle Japanese in CGI goes as follows; * stick with EUC-JP. it do