By default, indexer-solr plugin send a commit command after every N documents. N being this property in your nutch-site.xml :

<property>
  <name>solr.commit.size</name>
  <value>10000</value>
  <description>
  Defines the number of documents to send to Solr in a single update batch.
  Decrease when handling very large documents to prevent Nutch from running
  out of memory. NOTE: It does not explicitly trigger a server side commit.
  </description>
</property>

-nocommit would turn this off, so no commit after every N documents. Now that doesn't mean that the documents are not being committed. Solr, on its side, has configuration that could triggers commit after every N documents or M milliseconds or both. Also, IIRC, Indexer still sends a single commit command at the end of the job as well regardless which, I think, is also configurable.




On 05/26/2016 07:40 AM, Joseph Naegele wrote:
Hi folks, I'm looking for clarification on the index "-nocommit" option:

The description says: "do the commits once and for all the reducers in one
go (optional)", which sounds unintuitive. The relevant code in
IndexerJob.java looks like this:

       // do the commits once and for all the reducers in one go
       if (!noCommit) {
           writers.open(job, "commit");
           writers.commit();
       }

Which tells me that if I specify the option the commits are NOT performed.
Is the "-nocommit" description incorrect?

For reference, the solrindex "-nocommit" option's description used to say:
"Do not send a commit after indexing the segment(s).". source:
https://wiki.apache.org/nutch/bin/nutch%20solrindex

To further the confusion: this option works with the indexer-solr plugin,
but is useless with indexer-elastic because indexer-elastic "commits" for
every bulk update (e.g. every N "write" calls).

Joe


Reply via email to