amatti Shashidhar (DS/EES1)
> Sent: Friday, October 26, 2007 12:03 PM
> To: 'perl-unicode@perl.org'
> Subject: Utf8 encoding
>
> Hello,
> I am parsing an xml file using libxml2. The xml file has umlauts(german
> keys ü/ö/ä etc) , °(degree) atc as the characters.
Vajramatti Shashidhar (DS/EES1) skribis 2007-10-26 12:02 (+0200):
> my $parser = "";
> my $doc = "";
> $parser = XML::LibXML->new();#
> $doc = $parser->parse_file( $x_file );
You should combine these for nicer code:
my $parser = XML::LibXML->new();
my $doc = $par
Hello,
I am parsing an xml file using libxml2. The xml file has umlauts(german
keys ü/ö/ä etc) , °(degree) atc as the characters.
Could someone tell me how to encode such an xml to utf8?
I get the below error:
"parser error : Input is not proper UTF-8, indicate en
On Wed, Oct 02, 2002 at 10:44:06PM +0900, Dan Kogai wrote:
> On Wednesday, Oct 2, 2002, at 22:34 Asia/Tokyo, Jarkko Hietaniemi wrote:
> >>Yes. that's where hiragana -> katakana conversion is attempted;
> >>English equivalent of tr/A-Z/a-z/.
> >
> >Okay... What are the {begin,end} codepoints of th
On Wednesday, Oct 2, 2002, at 22:34 Asia/Tokyo, Jarkko Hietaniemi wrote:
>> Yes. that's where hiragana -> katakana conversion is attempted;
>> English equivalent of tr/A-Z/a-z/.
>
> Okay... What are the {begin,end} codepoints of those ranges,
> both LHS and RHS of tr, both in EUC-JP and in Unicod
> I can explain that. "\x{3af}bc\x{3af}de" is is a string literal so
> it gets encoded. however, my example in escaped form is;
>
> $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/
>
> which does not get encoded. the intention was;
>
> $kana =~ tr/\x{3041}-\x{3093}/\x{30a1}-\x{30f3}/
> >Are you doing character ranges in the tr/// under 'use encoding'?
> >(I'm asking because I see a "-" in the middle of what I assume is
> >mangled EUC-JP)
>
> Yes. that's where hiragana -> katakana conversion is attempted;
> English equivalent of tr/A-Z/a-z/.
Okay... What are the {begin,end
On Wednesday, Oct 2, 2002, at 21:51 Asia/Tokyo, Jarkko Hietaniemi wrote:
> However, I will need to stare at your example some more, since
> for simpler cases I think tr/// *is* obeying the 'use encoding':
>
> use encoding 'greek';
> ($a = "\x{3af}bc\x{3af}de") =~ tr/\xdf/a/;
> print $a, "\n";
>
>
On Wednesday, Oct 2, 2002, at 22:15 Asia/Tokyo, Jarkko Hietaniemi wrote:
> (Hi, it's me again...)
>
> Are you doing character ranges in the tr/// under 'use encoding'?
> (I'm asking because I see a "-" in the middle of what I assume is
> mangled EUC-JP)
Yes. that's where hiragana -> katakana conv
(Hi, it's me again...)
Are you doing character ranges in the tr/// under 'use encoding'?
(I'm asking because I see a "-" in the middle of what I assume is
mangled EUC-JP)
--
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.
(Not that I understand any Japanese but) could you resend your script
as an attachment? I'm afraid it might get mangled otherwise. In the
headers I see the following:
Content-Type: text/plain; charset=ISO-2022-JP; format=flowed
...
Content-Transfer-Encoding: 7bit
and when I save the mess
However, I will need to stare at your example some more, since
for simpler cases I think tr/// *is* obeying the 'use encoding':
use encoding 'greek';
($a = "\x{3af}bc\x{3af}de") =~ tr/\xdf/a/;
print $a, "\n";
This does print "abcade\n", and it also works when I replace the \xdf
with the literal
> As you see, tr/// is not subject to the magic of 'use encoding'.
> jhi, have we made it so deliberately ? I am begging to think tr///
Not deliberately, no. I agree that making tr/// to understand
'use encoding' would be good.
> is happier to enbrace the power thereof.
>
> Still, it can b
I am currently writing yet another CGI book. That is for the Japanese
market and written in Japanese. So it is inevitable that you have to
face the labyrinth of character encoding.
Before perl 5.8.0, most book teaches how to handle Japanese in CGI goes
as follows;
* stick with EUC-JP. it do
14 matches
Mail list logo