I'm trying to upgrade from the old IBM SAX parser xml4j_1_1_14 to
xerces-1_4_1 and having a little trouble. Maybe someone can help.

-xerces-1_4_1 doesn't support UTF-16. Why not? And when might Xerces
support it?

-I used to get line numbers with Locator. Now I get nothing.

-xerces-1_4_1 seems to be dropping characters. For example, here is a
diff between my test output for xml4j_1_1_14 and xerces-1_4_1:

***************
*** 632,638 ****
                SLMA_OBJ_DTD_ID: 17
                SLMA_DTD_ID: 2
                SLMA_OBJTYP_ID: 10
!               MA_DTD_TOPLEVEL_ELM: DATE
        Tuple: SLMA_OBJ_DTD
                SLMA_OBJ_DTD_ID: 18
                SLMA_DTD_ID: 2
--- 633,639 ----
                SLMA_OBJ_DTD_ID: 17
                SLMA_DTD_ID: 2
                SLMA_OBJTYP_ID: 10
!               SLMA_DTD_TOPLEVEL_ELM: DATE
        Tuple: SLMA_OBJ_DTD
                SLMA_OBJ_DTD_ID: 18
                SLMA_DTD_ID: 2

"SLMA_DTD_TOPLEVEL_ELM" has become "MA_DTD_TOPLEVEL_ELM". There are
scores of such diffs in my test data. I did a little debugging and
came across something interesting:

SLMA_OBJTYP_ID 16295 14
10 16328 2
SL 16382 2
MA_DTD_TOPLEVEL_ELM 0 19
DATE 38 4

This is the result of this line in my HandlerBase.characters method
impelementation:

    System.err.println(new String(ch, start, length) + " " + start + "
" + length);

16384 happens to be 2^14. So it looks like the characters array is
getting cleaned out at this point and the string that straddles the
boundry is getting cut in two. Here is the next occurrence of the
problem:

GetVersions 16312 11
SLMA_REQ_ 16375 9
DEFAULT_CLASS 0 13

Same effect. The correct value here is "SLMA_REQ_DEFAULT_CLASS".

The same place in the test output with xml4j_1_1_14 looks like this:

SLMA_OBJTYP_ID 0 14
10 0 2
SLMA_DTD_TOPLEVEL_ELM 0 21
DATE 0 4

It looks as if the implementation was changed from using a new
characters array for each stretch of CDATA to using one array.

Thanks!
David

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to