Hi everyone,

I have Solr running on one master and two slaves (load balanced) via
Solr 1.4.1 native replication.

If the load is low, both slaves replicate with around 100MB/s from master.

But when I use Solrmeter (100-400 queries/min) for load tests (over
the load balancer), the replication slows down to an unacceptable
speed, around 100KB/s (at least that's whats the replication page on
/solr/admin says).

Going to a slave directly without load balancer yields the same result
for the slave under test:

Slave 1 gets hammered with Solrmeter and the replication slows down to 100KB/s.
At the same time, Slave 2 with only 20-50 queries/min without the load
test has no problems. It replicates with 100MB/s and the index version
is 5-10 versions ahead of Slave 1.

The replications stays in the 100KB/s range even after the load test
is over until the application server is restarted. The same issue
comes up under both Tomcat and Jetty.

The setup looks like this:

- Same hardware for all servers: Physical machines with quad core
CPUs, 24GB RAM (JVM starts up with -XX:+UseConcMarkSweepGC -Xms10G
-Xmx10G)
- Index size is about 100GB with 40M docs
- Master commits every 10 min/10k docs
- Slaves polls every minute

I checked this:

- Changed network interface; same behavior
- Increased thread pool size from 200 to 500 and queue size from 100
to 500 in Tomcat; same behavior
- Both disk and network I/O are not bottlenecked. Disk I/O went down
to almost zero after every query in the load test got cached. Network
isn't doing much and can put through almost an GBit/s with iPerf
(network throughput tester) while Solrmeter is running.

Any ideas what could be wrong?


Best Regards
Vadim

Reply via email to