DO NOT REPLY [Bug 5801] - Automatically insertion of new characters while parsing XML file using SAX

bugzilla Mon, 14 Jan 2002 21:22:38 -0800

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5801>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5801

Automatically insertion of new characters while parsing XML file using SAX





------- Additional Comments From [EMAIL PROTECTED]  2002-01-14 21:25 -------
Hi,
 After going through the source code and additional debugging , I came to
conclusion that Parser is not attaching the new line but what is happening is
something like this. Xerces covnerts that input XML file into linked list of
chunks with the size of each chunk being 16k and the input file I am using is
more that 50K in size. While processing this XML file, if for any XML tag if the
data is located in the multiple chunks say for XML tag PARAM with value as
SUBNETWORK, if SUB is in CHUNK1 and and NETWORK in CHUNK2, then while giving me
the value of tag PARAM using call back function Characters, the parse returns me
the value of the tag PARAM as NETWORK rather than SUB. So whats is happening is
when the value of a particular tag is distributed across the chunks the value
which is available in the last chunk for a given XML tag is only returned but in
actual scnerio or expected result is that parser should combine the values
across the chunks and should return the combined value as the value of the XML
tag.

I found out that problem is in the file
org/apache/xerces/readers/AbstractCharReader.java in the function
callCharDataHandler where first part of function handles the case when data is
in the single chunk and second part covers the data spead across the chunks. 
In the second part in this function , instead of calling
fCharDataHandler.processCharacters(dataChunk.toCharArray(), index,
 nbytes); for each chunk in the linked list what should be done is create a
temporary Char [] and get all the data spead across the chunk into this
temporary Char [] and then call this line 
fCharDataHandler.processCharacters(dataChunk.toCharArray(), index,
 nbytes); 

after the  do {} while (count >0); loop is over in function callCharDataHandler.


With the use of temporary Char [] in the source code of Xerces I am able to fix
my problem temporarily.

I just want to clarify that combining of this characters across the 16K chunk is
parsers responsibility or that application that is using the parser.

If its first one then its really bug and if its second one then its expected
behaviour but still I feel that parser should be the one who would be taking
care of merging and giving me the single value for a given tag.

Original Code:

private void callCharDataHandler(int offset, int endOffset, boolean
isWhitespace) throws Exception

       //
        // The data is spread across chunks.
        //
                int i=0;
        int count = length;
        int nbytes = CharDataChunk.CHUNK_SIZE - index;
        if (isWhitespace)
            fCharDataHandler.processWhitespace(dataChunk.toCharArray(), index,
nbytes);
        else
        {
            fCharDataHandler.processCharacters(dataChunk.toCharArray(), index,
nbytes)
;
        }
        count -= nbytes;

        //
        // Use each Chunk in turn until we are done.
        //
        do {
            dataChunk = dataChunk.nextChunk();
            if (dataChunk == null) {
                throw new RuntimeException(new
ImplementationMessages().createMessage(nu
ll, ImplementationMessages.INT_DCN, 0, null));
            }
            nbytes = count <= CharDataChunk.CHUNK_SIZE ? count :
CharDataChunk.CHUNK_SIZ
E;
            if (isWhitespace)
                fCharDataHandler.processWhitespace(dataChunk.toCharArray(), 0,
nbytes);
            else
                        {
                fCharDataHandler.processCharacters(dataChunk.toCharArray(), 0,
nbytes)
;
                        }
            count -= nbytes;
        } while (count > 0);
 
    }





Modified Code: (temporary fix)
private void callCharDataHandler(int offset, int endOffset, boolean
isWhitespace) throws Exception

       //
        // The data is spread across chunks.
        //
                char [] myChar1=new char[CharDataChunk.CHUNK_SIZE];
                char [] myChar2=new char[length+1];
                int i=0;
        int count = length;
        int nbytes = CharDataChunk.CHUNK_SIZE - index;
        if (isWhitespace)
            fCharDataHandler.processWhitespace(dataChunk.toCharArray(), index,
nbytes);
        else
        {
            //fCharDataHandler.processCharacters(dataChunk.toCharArray(), index,
nbytes)
;
                          myChar1=dataChunk.toCharArray();
                          for(i=0;i<nbytes;i++)
                                myChar2[i]=myChar1[i+index];

                }
        count -= nbytes;

        //
        // Use each Chunk in turn until we are done.
        //
        do {
            dataChunk = dataChunk.nextChunk();
            if (dataChunk == null) {
                throw new RuntimeException(new
ImplementationMessages().createMessage(nu
ll, ImplementationMessages.INT_DCN, 0, null));
            }
            nbytes = count <= CharDataChunk.CHUNK_SIZE ? count :
CharDataChunk.CHUNK_SIZ
E;
            if (isWhitespace)
                fCharDataHandler.processWhitespace(dataChunk.toCharArray(), 0,
nbytes);
            else
                        {
                //fCharDataHandler.processCharacters(dataChunk.toCharArray(), 0,
nbytes)
;
                                char[] myChar3=dataChunk.toCharArray();
                                for(int j=0;j<nbytes;j++,i++)
                                        myChar2[i]=myChar3[j];
                        }
            count -= nbytes;
        } while (count > 0);
               fCharDataHandler.processCharacters(myChar2, 0, i);
    }

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 5801] - Automatically insertion of new characters while parsing XML file using SAX

Reply via email to