You can do this with DIH, but that has some problems. I'd strongly
recommend you think about using Tika in an independent client code.
Here's a program that gets you started:

https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

On Wed, Dec 13, 2017 at 5:36 AM, Sean Gilhuly <gilhulys...@gmail.com> wrote:
> Hello,
>
> I have been successfully able to index archive files (zip, tar, and the
> like) using solr cell, but the archive is returned as a single document
> when I do queries. Is there a way to configure it so that files are
> extracted recursively, and indexed separately?
>
> I know that if I set the extractOnly flag to true, I can parse the returned
> xml and send it back. Is there an easier way?
>
> Regards,
> Sean

Reply via email to