Good morning, I'm currently running Solr 4.0 final with tika v1.2 and Manifoldcf v1.2 dev. And I'm battling Tika XML parse errors again. Solr reports this error: org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: XML parse error which is too vague. I had to manually run the link against the tika app and I got a much more detailed error. Caused by: org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 105; The entity "nbsp" was referenced, but not declared. so there are old school non break space in the html that tika can't handle.
for example: <li> Cyber Systems and Technology › </mission/CST/CST.html> </li> My question is two fold: 1) how do I get solr to report more detailed errors and 2) how do I get tika to accept (or ignore) nbsp? thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/detailed-Error-reporting-in-Solr-tp4053821.html Sent from the Solr - User mailing list archive at Nabble.com.