Optimal configuration for high throughput indexing

Vinay Pothnis Thu, 30 Apr 2015 12:22:29 -0700

Hello,

I have a usecase with the following characteristics:


 - High index update rate (adds/updates)
 - High query rate
 - Low index size (~800MB for 2.4Million docs)
 - The documents that are created at the high rate eventually "expire" and
are deleted regularly at half hour intervals

I currently have a solr cloud set up with 1 shard and 4 replicas.
 * My index updates are sent to a VIP/loadbalancer (round robins to one of
the 4 solr nodes)
 * I am using http client to send the updates
 * Using batch size of 100 and 8 to 10 threads sending the batch of updates
to solr.

When I try to run tests to scale out the indexing rate, I see the following:
 * solr nodes go into recovery
 * updates are taking really long to complete.

As I understand, when a node receives an update:
 * If it is the leader, it forwards the update to all the replicas and
waits until it receives the reply from all of them before replying back to
the client that sent the reply.
 * If it is not the leader, it forwards the update to the leader, which
THEN does the above steps mentioned.

How do I go about scaling the index updates:
 * As I add more replicas, my updates would get slower and slower?
 * Is there a way I can configure the leader to wait for say N out of M
replicas only?
 * Should I be targeting the updates to only the leader?
 * Any other approach i should be considering?

Thanks
Vinay

Optimal configuration for high throughput indexing

Reply via email to