On 12/5/2019 10:28 AM, Rahul Goswami wrote:
We have a Solr 7.2.1 Solr Cloud setup where the client is indexing in 5
parallel threads with 5000 docs per batch. This is a test setup and all
documents are indexed on the same node. We are seeing connection timeout
issues thereafter some time into indexing. I am yet to analyze GC pauses
and other possibilities, but as a guideline just wanted to know what
indexing rate might be "too high" for Solr so as to consider throttling ?
The documents are mostly metadata with about 25 odd fields, so not very
heavy.
Would be nice to know a baseline performance expectation for better
application design considerations.

It's not really possible to give you a number here. It depends on a lot of things, and every install is going to be different.

On a setup that I once dealt with, where there was only a single thread doing the indexing, indexing on each core happened at about 1000 docs per second. I've heard people mention rates beyond 50000 docs per second. I've also heard people talk about rates of indexing far lower than what I was seeing.

When you say "connection timeout" issues ... that could mean a couple of different things. It could mean that the connection never gets established because it times out while trying, or it could mean that the connection gets established, and then times out after that. Which are you seeing? Usually dealing with that involves changing timeout settings on the client application. Figuring out what's causing the delays that lead to the timeouts might be harder. GC pauses are a primary candidate.

There are typically two bottlenecks possible when indexing. One is that the source system cannot supply the documents fast enough. The other is that the Solr server is sitting mostly idle while the indexing program waits for an opportunity to send more documents. The first is not something we can help you with. The second is dealt with by making the indexing application multi-threaded or multi-process, or adding more threads/processes.

Thanks,
Shawn

Reply via email to