You can do this with DIH, but that has some problems. I'd strongly recommend you think about using Tika in an independent client code. Here's a program that gets you started:
https://lucidworks.com/2012/02/14/indexing-with-solrj/ Best, Erick On Wed, Dec 13, 2017 at 5:36 AM, Sean Gilhuly <gilhulys...@gmail.com> wrote: > Hello, > > I have been successfully able to index archive files (zip, tar, and the > like) using solr cell, but the archive is returned as a single document > when I do queries. Is there a way to configure it so that files are > extracted recursively, and indexed separately? > > I know that if I set the extractOnly flag to true, I can parse the returned > xml and send it back. Is there an easier way? > > Regards, > Sean