On 6/28/2011 1:38 PM, Pranav Prakash wrote:
    - Will the commit by incremental indexer script also commit the
    previously uncommitted changes made by full indexer script before it broke?

Yes, as long as the Solr instance hasn't crashed. Anything added but not yet committed sticks around and will be committed on next 'commit'. There are no 'transactions' for adding docs in Solr, even if multiple processes are adding, if anyone of them issues a 'commit' they'll all be committed.

Sometimes, while during execution, Solr's avg response time 9avg resp time
for last 10 requests, read from log file) goes as high as 9000ms (which I am
still unclear why, any ideas how to start hunting for the problem?),

It could be a Java garbage collection issue. I have found it useful to start the JVM with Solr in it using some parameters to tune garbage collection. I use these JVM options: -server -XX:+AggressiveOpts -d64 -XX:+UseConcMarkSweepGC -XX:+UseCompressedOops

You've still got to make sure Solr has enough memory for what you're doing with it, with with your 5 million doc index might be more than you expect. On the other hand, giving a JVM too _much_ heap can cause slowdowns too, although I think the -XX:+UseConcMarkSweepGC should amelioerate that to some extent.

Possibly more likely, it could instead be Solr readying the new indexes. Do you issue commits in the middle of 'execution', and could the slowdown happen right after a commit? When a commit is issued to Solr, Solr's got to switch new indexes in with the newly added documents, and 'warm' those indexes in various ways. Which can be a CPU (as well as RAM) intensive thing. (For these purposes a replication from master counts as a commit (because it is), and an optimize can count too (becaue it's close enough)).

This can be especially a problem if you issue multiple commits very close together -- Solr's still working away at readying the index from the first commit, when the second comes in, and now Solr's trying to get ready two indexes at once (one of which will never be used because its' already outdated). Or even more than two if you issue a bunch of commits in rapid succession.




  I found that the uncommitted changes were
applied and searchable. However, the updates were uncommitted.

There is in general no way that uncomitted adds could be searchable, that's probably not happening. What is probably happening instead is that a commit _is_ happening. One way a commit can happen even if you aren't manually issuing one is with various auto-commit settings in solrconfig.xml. Commit any pending adds after X documents, or after T seconds, can both be configured. If they are configured, that could be causing commits to happen when you don't realize it, which could also trigger the slowdown due to a commit mentioned in the previous paragraph.

Jonathan

Reply via email to