Here’s a skeletal SolrJ program using Tika as another alternative. Best, Erick
> On Jun 7, 2020, at 2:06 PM, Jörn Franke <jornfra...@gmail.com> wrote: > > You have to write an external application that creates multiple threads, > parses the PDFs and index them in Solr. Ideally you parse the PDFs once and > store the resulting text on some file system and then index it. Reason is > that if you upgrade to two major versions of Solr you might need to reindex > again. Then you can save time because you don’t need to parse the PDFs again. > It can be also useful in case you are not sure yet about the final schema and > need to index several times in different schemas etc > > You can also use Apache manifoldCF. > > > >> Am 07.06.2020 um 19:19 schrieb Fiz N <fiznewy...@gmail.com>: >> >> Hello SOLR Experts, >> >> I am working on a POC to Index millions of PDF documents present in >> Multiple Folder in fileshare. >> >> Could you please let me the best practices and step to implement it. >> >> Thanks >> Fiz Nadiyal.