Charlie,

Does it mean you are talking to it from a client program? Or are you
running Tika in a listen/server mode and build some adapters for standard
Solr processes?

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Dec 18, 2013 at 3:47 PM, Charlie Hull <char...@flax.co.uk> wrote:

> On 17/12/2013 15:29, Augusto Camarotti wrote:
>
>> Hi guys,
>>     I'm having a problem with solr when trying to index some broken .doc
>> files.
>>     I have set up a test case using Solr to index all the files the
>> users save on the shared directorys of the company that i work for and
>> Solr is hanging when trying to index this file in particular(the one i'm
>> attaching on this e-mail). There are some others broken .doc files that
>> Solr index by the name without a problem, even logging some Tika erros
>> during the process, but when it reaches this file in particular, it
>> hangs and i have to cancel the upload.
>>     I cannot guarantee the directorys will never hold a broken .doc
>> file, or a broken file with some other extension, so i guess solr could
>> just return a failing message, or something like that.
>>     These are the logging messages solr is recording:
>> 03:38:23        ERROR   SolrCore        org.apache.solr.common.
>> SolrException:
>> org.apache.tika.exception.TikaException: Unexpected RuntimeException
>> from org.apache.tika.parser.microsoft.OfficeParser@386f9474
>> 03:38:25        ERROR   SolrDispatchFilter
>> null:org.apache.solr.common.SolrException:
>> org.apache.tika.exception.TikaException: Unexpected RuntimeException
>> from org.apache.tika.parser.microsoft.OfficeParser@386f9474
>>
>> So, how do I prevent solr from hanging when trying to index broken files?
>> Regards,
>> Augusto Camarotti
>>
>
> We don't like to run Tika from within Solr ourselves, as it has been known
> to barf (especially on large PDF files, yes there are such horrors as 3000
> page PDFs!). We usually run it in an external process so it can be watched
> and killed if necessary.
>
> Cheers
>
> Charlie
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>

Reply via email to