On Thu, May 13, 2010 at 07:46:53PM +0300, Alon Bar-Lev wrote:
> On Thu, May 13, 2010 at 7:38 PM, Daniel Veillard <veill...@redhat.com> wrote:
> >
> > On Wed, May 12, 2010 at 03:15:02PM +0300, Alon Bar-Lev wrote:
> > > Thank you for your comments.
> > >
> > > I found the first problem, the correct code page should be used in
> > > order to interpret header, patch attached.
> > >
> > > But libxml does not work, it cannot work in EBCDIC environment.
> > > As it convert the stream into UTF-8, it then tries to parse it using
> > > native literals.
> > >
> > > For example:
> > > ---
> > > if (c == 'a')
> > > ---
> > >
> > > Will not work, as 'a' is in EBCDIC and it is compared to c which is
> > > UTF-8. Unlike ANSI, the character value is different between UTF-8
> > > (latin1) and EBCDIC.
> >
> >  Fujitsu used to have the problem, until they found a compiler
> > switch to tell the compiler that the source had to be interpreted as
> > ASCII, and then their problem was solved (this is from memory from
> > half a decade ago).
> >
> > > I tried to use #pragma convert("ISO8859-1"), and also tried to use
> > > -qconvlit=ISO8859-1 compiler option, but both have too wide effect in
> > > order to solve this.
> >
> >  Triple check your compiler documentation, that's probably there.
> > I don't understand what you meant by "too wide effect", I don't see
> > why this would be a problem for compiling libxml2.
> >
> > > Correct solution is to use:
> > > #define UTF8_CHARACTER_A '\x41'
> > > #define UTF8_CHARACTER_GT '\x3c'
> > >
> > > And use these in the parsers.
> >
> >  Nahh. Correct solution is that any form of text where the encoding
> > is not made explicit or part of the metadata is broken. C is broken
> > from this respect.
> >  there is many places too where libxml2 code assumes things like
> > a...z are stored in alphabetical order etc ... I'm all for portability
> > but to the limit it doesn't completely penalize maintainability or
> > code efficiency.
> 
> Well...
> If you do:
> ----
> if (xml_utf_char == '<') {
>    printf("We got <\n");
> }
> 
> You need the xml_utf_char to be ASCII and the message to be EBCDIC.
> This what I call too wide effect, you see, I can tell compiler to
> treat *ALL* literals as ASCII, but it won't work... OK... you can say
> that a library does not print messages... What about fopen(filename,
> "r")?

  I would expect the compiler when being told to use ASCII to actually
convert the part where it would clearly make a problem. I think the
hardware and software stack you're using is expensive enough that as a
customer you could ask it to be a bit smart when trying to cope with
modern code.
  W.r.t. to error handling if you asume the error is being seen/analyzed
on a OS/390 terminal, well condoleance ... even in 1990 the AIX 2.x or
3.x boxes next to it were much more frienly that most people preferred
to use than the mainframe terminals.

  To quote a colleague on the issue "it's seriously time for OS/390 to
quit stealing time away from real development". 
  If you have a patch, I would carry it around as part of the tarballs
but considering Fujitsu managed to solve the problem half a decade ago
for their usage, I think something is doable without completely
revamping the core code.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to