On Fri, 2005-06-10 at 10:24 +1000, [EMAIL PROTECTED] wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
...
> I thought that libxml2 was widely accepted, used by gnome, etc.
> I checked the manpage and nowhere does it say "this parser sucks",
> maybe I should submit a documentation bug?

libxml2 can do it just fine: it allows overriding of document encoding.
I was referring to the perl parser you are using. If that happens to be
bindings to libxml2, then they are incomplete.

> Hoping for a quick fix, I tried the expat based parser instead
> (which also has perl bindings) with the following program:
> 
...
> Sadly, this parser gives output in a different format so changing
> parser has now broken the rest of my program *SOB*. On the wilde
> chance of an undocumented feature I went back to the original libxml2
> based parser and tried inserting options from the expat bindings:

?? you mean the structured data you get back is different ? eeek!
...
> Frighteningly enough, this actually works... 
> 
> Woo hoo! I got XML to actually work!
>  
> > convert the (probably cp-1252) text into utf-8, then parse it. or set a
> > encoding in the header, it looks like the perl bindings suck a certain
> > amount.
> 
> By the looks of it, the bindings are better than the manpage
> is willing to admit. I still don't like XML because it is nutty
> that it should screw up so easily. My feeling is that if this
> sort of technology cannot make things EASIER to deal with then
> might as well go with something that does.

I think you are conflating the problem here. If you had a ascii, tab
delimited format and someone gave you an EBCDIC, tab delimited format,
you'd have to tell your parser to use EBCDIC. 

GIGO - oldest rule in the book.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Attachment: signature.asc
Description: This is a digitally signed message part

-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to