I'm trying to upgrade from the old IBM SAX parser xml4j_1_1_14 to
xerces-1_4_1 and having a little trouble. Maybe someone can help.

-xerces-1_4_1 doesn't support UTF-16. Why not? And when might Xerces
support it?

-I used to get line numbers with Locator. Now I get nothing.

-xerces-1_4_1 seems to be dropping characters. For example, here is a
diff between my test output for xml4j_1_1_14 and xerces-1_4_1:

***************
*** 632,638 ****
                SLMA_OBJ_DTD_ID: 17
                SLMA_DTD_ID: 2
                SLMA_OBJTYP_ID: 10
!               MA_DTD_TOPLEVEL_ELM: DATE
        Tuple: SLMA_OBJ_DTD
                SLMA_OBJ_DTD_ID: 18
                SLMA_DTD_ID: 2
--- 633,639 ----
                SLMA_OBJ_DTD_ID: 17
                SLMA_DTD_ID: 2
                SLMA_OBJTYP_ID: 10
!               SLMA_DTD_TOPLEVEL_ELM: DATE
        Tuple: SLMA_OBJ_DTD
                SLMA_OBJ_DTD_ID: 18
                SLMA_DTD_ID: 2

"SLMA_DTD_TOPLEVEL_ELM" has become "MA_DTD_TOPLEVEL_ELM". There are
scores of such diffs in my test data. I did a little debugging and
came across something interesting:

SLMA_OBJTYP_ID 16295 14
10 16328 2
SL 16382 2
MA_DTD_TOPLEVEL_ELM 0 19
DATE 38 4

This is the result of this line in my HandlerBase.characters method
impelementation:

    System.err.println(new String(ch, start, length) + " " + start + "
" + length);

16384 happens to be 2^14. So it looks like the characters array is
getting cleaned out at this point and the string that straddles the
boundry is getting cut in two. Here is the next occurrence of the
problem:

GetVersions 16312 11
SLMA_REQ_ 16375 9
DEFAULT_CLASS 0 13

Same effect.

Thanks!
David

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to