By default, indexer-solr plugin send a commit command after every N
documents. N being this property in your nutch-site.xml :
<property>
<name>solr.commit.size</name>
<value>10000</value>
<description>
Defines the number of documents to send to Solr in a single update batch.
Decrease when handling very large documents to prevent Nutch from running
out of memory. NOTE: It does not explicitly trigger a server side commit.
</description>
</property>
-nocommit would turn this off, so no commit after every N documents. Now
that doesn't mean that the documents are not being committed. Solr, on
its side, has configuration that could triggers commit after every N
documents or M milliseconds or both. Also, IIRC, Indexer still sends a
single commit command at the end of the job as well regardless which, I
think, is also configurable.
On 05/26/2016 07:40 AM, Joseph Naegele wrote:
Hi folks, I'm looking for clarification on the index "-nocommit" option:
The description says: "do the commits once and for all the reducers in one
go (optional)", which sounds unintuitive. The relevant code in
IndexerJob.java looks like this:
// do the commits once and for all the reducers in one go
if (!noCommit) {
writers.open(job, "commit");
writers.commit();
}
Which tells me that if I specify the option the commits are NOT performed.
Is the "-nocommit" description incorrect?
For reference, the solrindex "-nocommit" option's description used to say:
"Do not send a commit after indexing the segment(s).". source:
https://wiki.apache.org/nutch/bin/nutch%20solrindex
To further the confusion: this option works with the indexer-solr plugin,
but is useless with indexer-elastic because indexer-elastic "commits" for
every bulk update (e.g. every N "write" calls).
Joe