On Thu, May 13, 2010 at 7:38 PM, Daniel Veillard <[email protected]> wrote:
>
> On Wed, May 12, 2010 at 03:15:02PM +0300, Alon Bar-Lev wrote:
> > Thank you for your comments.
> >
> > I found the first problem, the correct code page should be used in
> > order to interpret header, patch attached.
> >
> > But libxml does not work, it cannot work in EBCDIC environment.
> > As it convert the stream into UTF-8, it then tries to parse it using
> > native literals.
> >
> > For example:
> > ---
> > if (c == 'a')
> > ---
> >
> > Will not work, as 'a' is in EBCDIC and it is compared to c which is
> > UTF-8. Unlike ANSI, the character value is different between UTF-8
> > (latin1) and EBCDIC.
>
>  Fujitsu used to have the problem, until they found a compiler
> switch to tell the compiler that the source had to be interpreted as
> ASCII, and then their problem was solved (this is from memory from
> half a decade ago).
>
> > I tried to use #pragma convert("ISO8859-1"), and also tried to use
> > -qconvlit=ISO8859-1 compiler option, but both have too wide effect in
> > order to solve this.
>
>  Triple check your compiler documentation, that's probably there.
> I don't understand what you meant by "too wide effect", I don't see
> why this would be a problem for compiling libxml2.
>
> > Correct solution is to use:
> > #define UTF8_CHARACTER_A '\x41'
> > #define UTF8_CHARACTER_GT '\x3c'
> >
> > And use these in the parsers.
>
>  Nahh. Correct solution is that any form of text where the encoding
> is not made explicit or part of the metadata is broken. C is broken
> from this respect.
>  there is many places too where libxml2 code assumes things like
> a...z are stored in alphabetical order etc ... I'm all for portability
> but to the limit it doesn't completely penalize maintainability or
> code efficiency.

Well...
If you do:
----
if (xml_utf_char == '<') {
   printf("We got <\n");
}

You need the xml_utf_char to be ASCII and the message to be EBCDIC.
This what I call too wide effect, you see, I can tell compiler to
treat *ALL* literals as ASCII, but it won't work... OK... you can say
that a library does not print messages... What about fopen(filename,
"r")?

The correct solution is to have constants for characters when you
modify encoding, and not relay on the C source file encoding at all.

Alon.
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to