We have not taken up anything yet. The idea is to create another
contrib which will contain extensions to DIH which has external
dependencies as SOLR-934.
TikaEntityProcessor is something we wish to do but our limited
bandwidth has been the problem

On Thu, Feb 5, 2009 at 5:15 AM, Chris Harris <rygu...@gmail.com> wrote:
> Back in November, Shalin and Grant were discussing integrating
> DataImportHandler and Tika. Shalin's estimation about the best way to
> do this was as follows:
>
> **
>
> I think the best way would be a TikaEntityProcessor which knows how to
> handle documents. I guess a typical use-case would be
> FileListEntityProcessor->TikaEntityProcessor as parent-child entities.
>
> Also see SOLR-833 which adds a FieldReaderDataSource using which you can
> pass any field's content to an entity for processing. So you can have a
> [SqlEntityProcessor, JdbcDataSource] producing a blob and a
> [FieldReaderDataSource, TikaEntityProcessor] consuming it.
>
> (http://www.nabble.com/DataImportHandler-and-Blobs-td20464891.html)
>
> **
>
> Has there been any work on something like this? Alternatively, is
> anyone else put together an alternative way to get DataImportHandler
> to extract body text from PDFs, Word files, etc.?
>
> Thanks,
> Chris
>



-- 
--Noble Paul

Reply via email to