DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5801>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5801 Automatically insertion of new characters while parsing XML file using SAX ------- Additional Comments From [EMAIL PROTECTED] 2002-01-14 21:25 ------- Hi, After going through the source code and additional debugging , I came to conclusion that Parser is not attaching the new line but what is happening is something like this. Xerces covnerts that input XML file into linked list of chunks with the size of each chunk being 16k and the input file I am using is more that 50K in size. While processing this XML file, if for any XML tag if the data is located in the multiple chunks say for XML tag PARAM with value as SUBNETWORK, if SUB is in CHUNK1 and and NETWORK in CHUNK2, then while giving me the value of tag PARAM using call back function Characters, the parse returns me the value of the tag PARAM as NETWORK rather than SUB. So whats is happening is when the value of a particular tag is distributed across the chunks the value which is available in the last chunk for a given XML tag is only returned but in actual scnerio or expected result is that parser should combine the values across the chunks and should return the combined value as the value of the XML tag. I found out that problem is in the file org/apache/xerces/readers/AbstractCharReader.java in the function callCharDataHandler where first part of function handles the case when data is in the single chunk and second part covers the data spead across the chunks. In the second part in this function , instead of calling fCharDataHandler.processCharacters(dataChunk.toCharArray(), index, nbytes); for each chunk in the linked list what should be done is create a temporary Char [] and get all the data spead across the chunk into this temporary Char [] and then call this line fCharDataHandler.processCharacters(dataChunk.toCharArray(), index, nbytes); after the do {} while (count >0); loop is over in function callCharDataHandler. With the use of temporary Char [] in the source code of Xerces I am able to fix my problem temporarily. I just want to clarify that combining of this characters across the 16K chunk is parsers responsibility or that application that is using the parser. If its first one then its really bug and if its second one then its expected behaviour but still I feel that parser should be the one who would be taking care of merging and giving me the single value for a given tag. Original Code: private void callCharDataHandler(int offset, int endOffset, boolean isWhitespace) throws Exception // // The data is spread across chunks. // int i=0; int count = length; int nbytes = CharDataChunk.CHUNK_SIZE - index; if (isWhitespace) fCharDataHandler.processWhitespace(dataChunk.toCharArray(), index, nbytes); else { fCharDataHandler.processCharacters(dataChunk.toCharArray(), index, nbytes) ; } count -= nbytes; // // Use each Chunk in turn until we are done. // do { dataChunk = dataChunk.nextChunk(); if (dataChunk == null) { throw new RuntimeException(new ImplementationMessages().createMessage(nu ll, ImplementationMessages.INT_DCN, 0, null)); } nbytes = count <= CharDataChunk.CHUNK_SIZE ? count : CharDataChunk.CHUNK_SIZ E; if (isWhitespace) fCharDataHandler.processWhitespace(dataChunk.toCharArray(), 0, nbytes); else { fCharDataHandler.processCharacters(dataChunk.toCharArray(), 0, nbytes) ; } count -= nbytes; } while (count > 0); } Modified Code: (temporary fix) private void callCharDataHandler(int offset, int endOffset, boolean isWhitespace) throws Exception // // The data is spread across chunks. // char [] myChar1=new char[CharDataChunk.CHUNK_SIZE]; char [] myChar2=new char[length+1]; int i=0; int count = length; int nbytes = CharDataChunk.CHUNK_SIZE - index; if (isWhitespace) fCharDataHandler.processWhitespace(dataChunk.toCharArray(), index, nbytes); else { //fCharDataHandler.processCharacters(dataChunk.toCharArray(), index, nbytes) ; myChar1=dataChunk.toCharArray(); for(i=0;i<nbytes;i++) myChar2[i]=myChar1[i+index]; } count -= nbytes; // // Use each Chunk in turn until we are done. // do { dataChunk = dataChunk.nextChunk(); if (dataChunk == null) { throw new RuntimeException(new ImplementationMessages().createMessage(nu ll, ImplementationMessages.INT_DCN, 0, null)); } nbytes = count <= CharDataChunk.CHUNK_SIZE ? count : CharDataChunk.CHUNK_SIZ E; if (isWhitespace) fCharDataHandler.processWhitespace(dataChunk.toCharArray(), 0, nbytes); else { //fCharDataHandler.processCharacters(dataChunk.toCharArray(), 0, nbytes) ; char[] myChar3=dataChunk.toCharArray(); for(int j=0;j<nbytes;j++,i++) myChar2[i]=myChar3[j]; } count -= nbytes; } while (count > 0); fCharDataHandler.processCharacters(myChar2, 0, i); } --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
