I have raised an issue and a patch is provided.
Please confirm if it helps
https://issues.apache.org/jira/browse/SOLR-964

On Fri, Jan 16, 2009 at 3:52 PM, Noble Paul നോബിള്‍  नोब्ळ्
<noble.p...@gmail.com> wrote:
> stax parser automatically tries to fetch the DTD. How can we disable
> that at the parser level?
>
> On Fri, Jan 16, 2009 at 3:34 PM, Fergus McMenemie <fer...@twig.me.uk> wrote:
>> Hello all, as the subject says:
>>   DIH XPathEntityProcessor fails with docs containing <!DOCTYPE>
>>
>> This is using a solr nightly build from monday.
>>
>> INFO: Server startup in 3623 ms
>> Jan 16, 2009 9:54:12 AM org.apache.solr.handler.dataimport.SolrWriter 
>> readIndexerProperties
>> INFO: Read dataimport.properties
>> Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrCore execute
>> INFO: [jdocs] webapp=/solr path=/walkj params={command=full-import} status=0 
>> QTime=13
>> Jan 16, 2009 9:54:12 AM org.apache.solr.handler.dataimport.DataImporter 
>> doFullImport
>> INFO: Starting Full Import
>> Jan 16, 2009 9:54:12 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll
>> INFO: [jdocs] REMOVING ALL DOCUMENTS FROM INDEX
>> Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrDeletionPolicy onInit
>> INFO: SolrDeletionPolicy.onInit: commits:num=2
>>        
>> commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_c,version=1232026423291,generation=12,filenames=[segments_c,
>>  _4.fnm, _4.frq, _4.prx, _4.tis, _4.tii, _4.nrm, _4.fdx, _4.fdt]
>>        
>> commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_d,version=1232026423292,generation=13,filenames=[segments_d]
>> Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrDeletionPolicy updateCommits
>> INFO: last commit = 1232026423292
>> Jan 16, 2009 9:54:13 AM org.apache.solr.handler.dataimport.DocBuilder 
>> buildDocument
>> SEVERE: Exception while processing: jcurrent document : null
>> org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing 
>> failed for xml, url:/j/dtd/jxml/data/news/2008/frp70450.xmlrows processed :0 
>> Processing Document # 1
>>        at 
>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
>>        at 
>> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:252)
>>        at 
>> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:177)
>>        at 
>> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
>>        at 
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
>>        at 
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
>>        at 
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
>>        at 
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
>>        at 
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
>>        at 
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
>>        at 
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
>> Caused by: java.lang.RuntimeException: 
>> com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) 
>> /../config/jml-delivery-norm-2.1.dtd (No such file or directory)
>>  at [row,col {unknown-source}]: [3,81]
>>        at 
>> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85)
>>        at 
>> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:242)
>>        ... 9 more
>> Caused by: com.ctc.wstx.exc.WstxParsingException: (was 
>> java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such 
>> file or directory)
>>  at [row,col {unknown-source}]: [3,81]
>>        at 
>> com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
>>        at 
>> com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
>>        at 
>> com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:475)
>>        at 
>> com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:358)
>>        at 
>> com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3351)
>>        at 
>> com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:1988)
>>        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
>>        at 
>> org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:141)
>>        at 
>> org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$000(XPathRecordReader.java:89)
>>        at 
>> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:82)
>>        ... 10 more
>> Jan 16, 2009 9:54:13 AM org.apache.solr.handler.dataimport.DataImporter 
>> doFullImport
>> SEVERE: Full Import failed
>>
>> A fragment from the top of the failing document is
>>
>> <?xml version="1.0" encoding="ISO-8859-1"?>
>> <?xml-stylesheet type="text/xsl" 
>> href="../../../../config/support/j-deliver.xsl"?>
>> <!DOCTYPE j:record SYSTEM "../../../../config/jml-delivery-norm-2.1.dtd">
>> <j:record xmlns:j="http://dtd.j.com/2002/Content/"; id="frp70450"  
>> urname="record">
>>  <j:metadata xmlns:xlink="http://www.w3.org/1999/xlink"; xlink:href="" 
>> urname="metadata" xlink:type="simple">
>>    <dc:date xmlns:dc="http://purl.org/dc/elements/1.1/"; 
>> qualifier="pdate">20080131</dc:date>
>>
>> The DTD does exist at the specified location. Removing the DOCTYPE directive
>> fixes everything. I know that use of DOCTYPE is out of fashion, and it does
>> not exist in our newer documents, however there are lots of older XML docs
>> about!
>>
>> Regards Fergus.
>> --
>>
>> ===============================================================
>> Fergus McMenemie               Email:fer...@twig.me.uk
>> Techmore Ltd                   Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets             Analyst Programmer
>> ===============================================================
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul

Reply via email to