http://nagoya.apache.org/bugzilla/show_bug.cgi?id=2708 *** shadow/2708 Fri Jul 20 08:26:13 2001 --- shadow/2708.tmp.24138 Fri Jul 20 08:26:13 2001 *************** *** 0 **** --- 1,255 ---- + +============================================================================+ + | NullPointerException when doctype declaration included in source | + +----------------------------------------------------------------------------+ + | Bug #: 2708 Product: XalanJ2 | + | Status: NEW Version: 2.2.x | + | Resolution: Platform: All | + | Severity: Normal OS/Version: All | + | Priority: Other Component: org.apache.xalan.transf | + +----------------------------------------------------------------------------+ + | Assigned To: [EMAIL PROTECTED] | + | Reported By: [EMAIL PROTECTED] | + | CC list: Cc: | + +----------------------------------------------------------------------------+ + | URL: | + +============================================================================+ + | DESCRIPTION | + When migrating to Xalan Java V. 2.2.D6 (from Xalan 1) I had to to some majour + code changes. One problem seems to be unfixable: + + My XML document looks as follows: + <?xml version="1.0" encoding="utf-8"?> + <!DOCTYPE header [ + <!ENTITY % ISOlat1 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOlat1.pen" > %ISOlat1; + <!ENTITY % ISOlat2 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 2//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOlat2.pen" > %ISOlat2; + <!ENTITY % ISOnum PUBLIC "ISO 8879:1986//ENTITIES Numeric and Special + Graphic//EN//XML" "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOnum.pen" > + %ISOnum; + <!ENTITY % ISOpub PUBLIC "ISO 8879:1986//ENTITIES Publishing//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOpub.pen" > %ISOpub; + <!ENTITY % ISOtech PUBLIC "ISO 8879:1986//ENTITIES General Technical//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOtech.pen" > %ISOtech; + <!ENTITY % ISOgrk1 PUBLIC "ISO 9573-15:1993//ENTITIES Greek Letters//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOgrk1.pen" > %ISOgrk1; + <!ENTITY % ISOgrk2 PUBLIC "ISO 9573-15:1993//ENTITIES Monotoniko + Greek//EN//XML" "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOgrk2.pen" > + %ISOgrk2; + <!ENTITY % ISOgrk3 PUBLIC "ISO 8879:1986//ENTITIES Greek Symbols//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOgrk3.pen" > %ISOgrk3; + ] > + <header><issue><pinfo><pnm>U.S. National Science Foundation</pnm><pnm>u.s. + department of energy</pnm><pnm>los alamos national + laboratory</pnm><loc/></pinfo><jinfo><jid></jid><jtl></jtl><jsbt/><jabt/><issn/> + <cdn/></jinfo><pubinfo><vid></vid><iid></iid><cd year="2000" month="7" + day="26"></cd></pubinfo></issue><artcon><genhdr language="en"><artinfo><aid> + cs.IR/000704</aid><artty artty="rp"/><categ>Computer + Science</categ></artinfo><tig><atl language="en" purpose="normal"><e5>Relevance + as Deduction: A Logical View of Information + Retrieval</e5></atl></tig><aug><au>Gianni Amati</au><au>Konstantinos + Georgatos</au></aug><aloc href="http://arXiv.org/abs/cs/0007041"/><abs><p><e6>\ + The problem of Information Retrieval is, given a set of documents D and aquery + q, providing an algorithm for retrieving all documents in D relevant toq. + However, retrieval should depend and be updated whenever the user is able + toprovide as an input a preferred set of relevant documents; this process + isknown as em relevance feedback. Recent work in IR has been paying + greatattention to models which employ a logical approach; the advantage being + thatone can have a simple computable characterization of retrieval on the basis + ofa pure logical analysis of retrieval. Most of the logical models make use + ofprobabilities or similar belief functions in order to introduce the + inductivecomponent whereby uncertainty is treated. Their general paradigm is + thefollowing: em find the nature of conditional $d\imp q$ and then define + aprobability on the top of it. We just reverse this point of view; first use + thenumerical information, frequencies or probabilities, then define your + ownlogical consequence. More generally, we claim that retrieval is a form + ofdeduction. We introduce a simple but powerful logical framework of + relevancefeedback, derived from the well founded area of nonmonotonic logic. + Thisdescription can help us evaluate, describe and compare from a theoretical + pointof view previous approaches based on conditionals or + probabilities.\</e6></p></abs><kwdg><kwd>Information Retrieval</kwd><kwd>Logic + in Computer Science</kwd></kwdg></genhdr></artcon></header> + + It is held in a String variable. + + Running the following code results in a NullPointerException which I found out + was thrown in StringPool.addSymbol + + Debugger Stack Trace Report: + + Thread[main,5,main] (Alive) + Breakpoint #1 + + StringPool.addSymbol(String) + this=(org.apache.xerces.utils.StringPool) + org.apache.xerces.utils.StringPool@7843 + str=(java.lang.String) null + DefaultEntityHandler.addExternalPEDecl(int, int, int, boolean) + this=(org.apache.xerces.readers.DefaultEntityHandler) + org.apache.xerces.readers.DefaultEntityHandler@4d08 + name=(int) 37 + publicId=(int) 38 + systemId=(int) 39 + isExternal=(boolean) false + XMLDTDScanner.scanEntityDecl() + this=(org.apache.xerces.framework.XMLDTDScanner) + org.apache.xerces.framework.XMLDTDScanner@7c28 + isPEDecl=(boolean) true + sawPERef=(boolean) false + entityName=(int) 37 + single=(boolean) false + XMLDTDScanner.scanDecls(boolean) + this=(org.apache.xerces.framework.XMLDTDScanner) + org.apache.xerces.framework.XMLDTDScanner@7c28 + extSubset=(boolean) false + subsetOffset=(int) 57 + parseTextDecl=(boolean) false + prevState=(int) 50 + newParseTextDecl=(boolean) false + olddepth=(int) 1 + XMLDTDScanner.scanDoctypeDecl() + this=(org.apache.xerces.framework.XMLDTDScanner) + org.apache.xerces.framework.XMLDTDScanner@7c28 + lbrkt=(boolean) true + scanExternalSubset=(boolean) false + publicId=(int) -1 + systemId=(int) -1 + XMLDocumentScanner.scanDoctypeDecl(boolean) + this=(org.apache.xerces.framework.XMLDocumentScanner) + org.apache.xerces.framework.XMLDocumentScanner@54bf + standalone=(boolean) false + PrologDispatcher.dispatch(boolean) + + this=(org.apache.xerces.framework.XMLDocumentScanner$PrologDispatcher) + org.apache.xerces.framework.XMLDocumentScanner$PrologDispatcher@7c23 + keepgoing=(boolean) true + XMLDocumentScanner.parseSome(boolean) + this=(org.apache.xerces.framework.XMLDocumentScanner) + org.apache.xerces.framework.XMLDocumentScanner@54bf + doItAll=(boolean) true + SAXParser(XMLParser).parse(InputSource) + this=(org.apache.xerces.parsers.SAXParser) + org.apache.xerces.parsers.SAXParser@782d + source=(org.xml.sax.InputSource) org.xml.sax.InputSource@753b + DTMManagerDefault.getDTM(Source, boolean, DTMWSFilter, boolean, boolean) + this=(org.apache.xml.dtm.ref.DTMManagerDefault) + org.apache.xml.dtm.ref.DTMManagerDefault@5250 + source=(javax.xml.transform.Source) + javax.xml.transform.stream.StreamSource@6109 + unique=(boolean) false + whiteSpaceFilter=(org.apache.xml.dtm.DTMWSFilter) + org.apache.xalan.transformer.TransformerImpl@2c83 + incremental=(boolean) true + doIndexing=(boolean) true + xstringFactory=(org.apache.xml.utils.XMLStringFactory) + org.apache.xpath.objects.XMLStringFactoryImpl@199a + dtmPos=(int) 1 + documentID=(int) 1048576 + isSAXSource=(boolean) false + isStreamSource=(boolean) true + reader=(org.xml.sax.XMLReader) + org.apache.xerces.parsers.SAXParser@782d + xmlSource=(org.xml.sax.InputSource) org.xml.sax.InputSource@753b + dtm=(org.apache.xml.dtm.ref.sax2dtm.SAX2DTM) + org.apache.xml.dtm.ref.sax2dtm.SAX2DTM@7544 + haveXercesParser=(boolean) true + TransformerImpl.transform(Source, boolean) + this=(org.apache.xalan.transformer.TransformerImpl) + org.apache.xalan.transformer.TransformerImpl@2c83 + source=(javax.xml.transform.Source) + javax.xml.transform.stream.StreamSource@6109 + shouldRelease=(boolean) true + base=(java.lang.String) + file://localhost/Z:/hermes/etc/stylesheets/jour2mail.xsl + mgr=(org.apache.xml.dtm.DTMManager) + org.apache.xml.dtm.ref.DTMManagerDefault@5250 + TransformerImpl.transform(Source, Result, boolean) + this=(org.apache.xalan.transformer.TransformerImpl) + org.apache.xalan.transformer.TransformerImpl@2c83 + xmlSource=(javax.xml.transform.Source) + javax.xml.transform.stream.StreamSource@6109 + outputTarget=(javax.xml.transform.Result) + javax.xml.transform.stream.StreamResult@6111 + shouldRelease=(boolean) true + handler=(org.xml.sax.ContentHandler) + org.apache.xalan.serialize.SerializerToText@61c7 + TransformerImpl.transform(Source, Result) + this=(org.apache.xalan.transformer.TransformerImpl) + org.apache.xalan.transformer.TransformerImpl@2c83 + xmlSource=(javax.xml.transform.Source) + javax.xml.transform.stream.StreamSource@6109 + outputTarget=(javax.xml.transform.Result) + javax.xml.transform.stream.StreamResult@6111 + XSLHelper.transform(String, String) + stylesheetURL=(java.lang.String) + file://localhost/Z:\hermes\etc\stylesheets\jour2mail.xsl + xmlDoc=(java.lang.String) <?xml version="1.0" + encoding="utf-8"?> + <!DOCTYPE header [ + <!ENTITY % ISOlat1 PUBLIC "ISO + 8879:1986//ENTITIES Added Latin 1//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOlat1.pen" > %ISOlat1; + + <!ENTITY % ISOlat2 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 2//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOlat2.pen" > %ISOlat2; + + <!ENTITY % ISOnum PUBLIC "ISO 8879:1986//ENTITIES Numeric and Special + Graphic//EN//XML" "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOnum.pen" > + %ISOnum; + <!ENTITY % ISOpub PUBLIC "ISO 8879:1986//ENTITIES + Publishing//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOpub.pen" > %ISOpub; + + <!ENTITY % ISOtech PUBLIC "ISO 8879:1986//ENTITIES General Technical//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOtech.pen" > %ISOtech; + + <!ENTITY % ISOgrk1 PUBLIC "ISO 9573-15:1993//ENTITIES Greek Letters//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOgrk1.pen" > %ISOgrk1; + + <!ENTITY % ISOgrk2 PUBLIC "ISO 9573-15:1993//ENTITIES Monotoniko Greek//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOgrk2.pen" > %ISOgrk2; + + <!ENTITY % ISOgrk3 PUBLIC "ISO 8879:1986//ENTITIES Greek Symbols//EN//XML" + "file://localhost/Z:\hermes\lib\xml-entities\ISO\ISOgrk3.pen" > %ISOgrk3; + ] > + <header><issue><pinfo><pnm>U.S. National Science Foundation</pnm><pnm>u.s. + department of energy</pnm><pnm>los alamos national + laboratory</pnm><loc/></pinfo><jinfo><jid></jid><jtl></jtl><jsbt/><jabt/><issn/> + <cdn/></jinfo><pubinfo><vid></vid><iid></iid><cd year="2000" month="7" + day="26"></cd></pubinfo></issue><artcon><genhdr language="en"><artinfo><aid> + cs.IR/000704</aid><artty artty="rp"/><categ>Computer + Science</categ></artinfo><tig><atl language="en" purpose="normal"><e5>Relevance + as Deduction: A Logical View of Information + Retrieval</e5></atl></tig><aug><au>Gianni Amati</au><au>Konstantinos + Georgatos</au></aug><aloc href="http://arXiv.org/abs/cs/0007041"/><abs><p><e6>\ + The problem of Information Retrieval is, given a set of documents D and aquery + q, providing an algorithm for retrieving all documents in D relevant toq. + However, retrieval should depend and be updated whenever the user is able + toprovide as an input a preferred set of relevant documents; this process + isknown as em relevance feedback. Recent work in IR has been paying + greatattention to models which employ a logical approach; the advantage being + thatone can have a simple computable characterization of retrieval on the basis + ofa pure logical analysis of retrieval. Most of the logical models make use + ofprobabilities or similar belief functions in order to introduce the + inductivecomponent whereby uncertainty is treated. Their general paradigm is + thefollowing: em find the nature of conditional $d\imp q$ and then define + aprobability on the top of it. We just reverse this point of view; first use + thenumerical information, frequencies or probabilities, then define your + ownlogical consequence. More generally, we claim that retrieval is a form + ofdeduction. We introduce a simple but powerful logical framework of + relevancefeedback, derived from the well founded area of nonmonotonic logic. + Thisdescription can help us evaluate, describe and compare from a theoretical + pointof view previous approaches based on conditionals or + probabilities.\</e6></p></abs><kwdg><kwd>Information Retrieval</kwd><kwd>Logic + in Computer Science</kwd></kwdg></genhdr></artcon></header> + + It seems that in the method + + DefaultEntityHandler.addExternalPEDecl(int name, int publicId, int systemId, + boolean isExternal) + + StringPool.addSymbol(fSystemId) is called with fSystemId == null. + + I experienced a similar problem some months ago when I migrated to a newer + Xerces version and got around that problem by setting the systemId of the input + source explicitely. Here, this has no positive effect.
