Charlie, Does it mean you are talking to it from a client program? Or are you running Tika in a listen/server mode and build some adapters for standard Solr processes?
Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Dec 18, 2013 at 3:47 PM, Charlie Hull <char...@flax.co.uk> wrote: > On 17/12/2013 15:29, Augusto Camarotti wrote: > >> Hi guys, >> I'm having a problem with solr when trying to index some broken .doc >> files. >> I have set up a test case using Solr to index all the files the >> users save on the shared directorys of the company that i work for and >> Solr is hanging when trying to index this file in particular(the one i'm >> attaching on this e-mail). There are some others broken .doc files that >> Solr index by the name without a problem, even logging some Tika erros >> during the process, but when it reaches this file in particular, it >> hangs and i have to cancel the upload. >> I cannot guarantee the directorys will never hold a broken .doc >> file, or a broken file with some other extension, so i guess solr could >> just return a failing message, or something like that. >> These are the logging messages solr is recording: >> 03:38:23 ERROR SolrCore org.apache.solr.common. >> SolrException: >> org.apache.tika.exception.TikaException: Unexpected RuntimeException >> from org.apache.tika.parser.microsoft.OfficeParser@386f9474 >> 03:38:25 ERROR SolrDispatchFilter >> null:org.apache.solr.common.SolrException: >> org.apache.tika.exception.TikaException: Unexpected RuntimeException >> from org.apache.tika.parser.microsoft.OfficeParser@386f9474 >> >> So, how do I prevent solr from hanging when trying to index broken files? >> Regards, >> Augusto Camarotti >> > > We don't like to run Tika from within Solr ourselves, as it has been known > to barf (especially on large PDF files, yes there are such horrors as 3000 > page PDFs!). We usually run it in an external process so it can be watched > and killed if necessary. > > Cheers > > Charlie > > -- > Charlie Hull > Flax - Open Source Enterprise Search > > tel/fax: +44 (0)8700 118334 > mobile: +44 (0)7767 825828 > web: www.flax.co.uk >