On Fri, 2005-06-10 at 10:24 +1000, [EMAIL PROTECTED] wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 ... > I thought that libxml2 was widely accepted, used by gnome, etc. > I checked the manpage and nowhere does it say "this parser sucks", > maybe I should submit a documentation bug?
libxml2 can do it just fine: it allows overriding of document encoding. I was referring to the perl parser you are using. If that happens to be bindings to libxml2, then they are incomplete. > Hoping for a quick fix, I tried the expat based parser instead > (which also has perl bindings) with the following program: > ... > Sadly, this parser gives output in a different format so changing > parser has now broken the rest of my program *SOB*. On the wilde > chance of an undocumented feature I went back to the original libxml2 > based parser and tried inserting options from the expat bindings: ?? you mean the structured data you get back is different ? eeek! ... > Frighteningly enough, this actually works... > > Woo hoo! I got XML to actually work! > > > convert the (probably cp-1252) text into utf-8, then parse it. or set a > > encoding in the header, it looks like the perl bindings suck a certain > > amount. > > By the looks of it, the bindings are better than the manpage > is willing to admit. I still don't like XML because it is nutty > that it should screw up so easily. My feeling is that if this > sort of technology cannot make things EASIER to deal with then > might as well go with something that does. I think you are conflating the problem here. If you had a ascii, tab delimited format and someone gave you an EBCDIC, tab delimited format, you'd have to tell your parser to use EBCDIC. GIGO - oldest rule in the book. Rob -- GPG key available at: <http://www.robertcollins.net/keys.txt>.
signature.asc
Description: This is a digitally signed message part
-- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html