Good morning,
I'm currently running Solr 4.0 final with tika v1.2 and Manifoldcf v1.2 dev. 
And I'm battling Tika XML parse errors again. 
Solr reports this error:        org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: XML parse error which is too vague.
I had to manually run the link against the tika app and I got a much more
detailed error.
Caused by: org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 105;
The entity "nbsp" was referenced, but not declared.
so there are old school non break space in the html that tika can't handle.

for example: <li> Cyber Systems and Technology&nbsp;&rsaquo;
</mission/CST/CST.html>   </li>

My question is two fold:
1) how do I get solr to report more detailed errors and
2) how do I get tika to accept (or ignore) nbsp?

thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to