I've been away from Tika for awhile, so I'm not sure. This might also be an issue of Tika using a strict XML parser for HTML rather than a looser and more error-tolerant HTML-specific parser, like most browsers use, that allows these kinds of technical "errors" that in reality, in most cases, can simply be ignored.

Yes, by all means ask on the Tika list. Solr is just wrapping the error Tika reports.

-- Jack Krupansky

-----Original Message----- From: eShard
Sent: Thursday, April 04, 2013 2:14 PM
To: solr-user@lucene.apache.org
Subject: Re: detailed Error reporting in Solr

Yes, that's it exactly.
I crawled a link with these ( ›) in each list item and solr
couldn't handle it threw the xml parse error and the crawler terminated the
job.

Is this fixable? Or do I have to submit a bug to the tika folks?

Thanks,




--
View this message in context: http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821p4053882.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to