Hi everyone, I have Solr running on one master and two slaves (load balanced) via Solr 1.4.1 native replication.
If the load is low, both slaves replicate with around 100MB/s from master. But when I use Solrmeter (100-400 queries/min) for load tests (over the load balancer), the replication slows down to an unacceptable speed, around 100KB/s (at least that's whats the replication page on /solr/admin says). Going to a slave directly without load balancer yields the same result for the slave under test: Slave 1 gets hammered with Solrmeter and the replication slows down to 100KB/s. At the same time, Slave 2 with only 20-50 queries/min without the load test has no problems. It replicates with 100MB/s and the index version is 5-10 versions ahead of Slave 1. The replications stays in the 100KB/s range even after the load test is over until the application server is restarted. The same issue comes up under both Tomcat and Jetty. The setup looks like this: - Same hardware for all servers: Physical machines with quad core CPUs, 24GB RAM (JVM starts up with -XX:+UseConcMarkSweepGC -Xms10G -Xmx10G) - Index size is about 100GB with 40M docs - Master commits every 10 min/10k docs - Slaves polls every minute I checked this: - Changed network interface; same behavior - Increased thread pool size from 200 to 500 and queue size from 100 to 500 in Tomcat; same behavior - Both disk and network I/O are not bottlenecked. Disk I/O went down to almost zero after every query in the load test got cached. Network isn't doing much and can put through almost an GBit/s with iPerf (network throughput tester) while Solrmeter is running. Any ideas what could be wrong? Best Regards Vadim