Re: offline Solr index creation

2020-02-13 Thread Erick Erickson
Indexing rates scale pretty linearly with the number of shards, so one way to increase throughput is to simply create a collection with more shards. For the initial bulk-indexing operations, you can go with a 1-replica-per-shard scenario then ADDREPLICA if you need to build things out. However…

offline Solr index creation

2020-02-13 Thread vivek chaurasiya
Hi there, We are using AWS EMR as our big data processing cluster. We have like 3TB of text files where each line denotes a json record which I want to be indexed into Solr. I have tried this by batching them and pushing to Solr index using SolrJClient. But I feel thats really slow. My doubt is