DataImportHandler, BlobTransformer, FieldReaderDataSource and TikaEntityExtractor

2013-07-30 Thread Raymond Wiker
I have a case where I want to documents and metadata content from a datebase. The metadata is is not a problem, but it does not appear that I can handle the document content (held as BLOBS in the database) with out-of-the-box SOLR 4.4 functionality. I was hoping to to be able to solve this by doin

Re: DataImportHandler, BlobTransformer, FieldReaderDataSource and TikaEntityExtractor

2013-07-30 Thread Shalin Shekhar Mangar
There's no BlobTransformer in DataImportHandler. You'll have to write one. Also, you'd probably need to write a FieldInputStreamDataSource instead of FieldReaderDataSource. On Tue, Jul 30, 2013 at 12:30 PM, Raymond Wiker wrote: > I have a case where I want to documents and metadata content from

Re: DataImportHandler, BlobTransformer, FieldReaderDataSource and TikaEntityExtractor

2013-08-02 Thread Raymond Wiker
It appears that this is simpler than I thought: in SOLR 4.4, at least, there is a dataSource class named "FieldStreamDataSource" that I can use directly with the TikaEntityProcessor. Given a blob column named DOCIMAGE, I can use the following Tika entity: ...