Indexing rates scale pretty linearly with the number of shards, so one
way to increase throughput is to simply create a collection with
more shards. For the initial bulk-indexing operations, you can
go with a 1-replica-per-shard scenario then ADDREPLICA if you need
to build things out.
However…
Hi there,
We are using AWS EMR as our big data processing cluster. We have like 3TB
of text files where each line denotes a json record which I want to be
indexed into Solr.
I have tried this by batching them and pushing to Solr index using
SolrJClient. But I feel thats really slow.
My doubt is