Thanks Kaveh,

Unfortunately no, the "-noCommit" option does not disable the commits after
every N documents. Please see both indexer-solr and indexer-elastic again to
confirm. The only thing this option does is disable the single commit at the
end of the indexing job. In addition, its description says "do the commits
once and for all the reducers in one go (optional)", which is inaccurate, as
the option actually disables the final commit. I'll file an issue.

Joe

-----Original Message-----
From: kaveh minooie [mailto:ka...@plutoz.com] 
Sent: Friday, May 27, 2016 5:23 AM
To: user@nutch.apache.org
Subject: Re: indexer -nocommit option

By default, indexer-solr plugin send a commit command after every N
documents. N being this property in your nutch-site.xml :

<property>
   <name>solr.commit.size</name>
   <value>10000</value>
   <description>
   Defines the number of documents to send to Solr in a single update batch.
   Decrease when handling very large documents to prevent Nutch from running
   out of memory. NOTE: It does not explicitly trigger a server side commit.
   </description>
</property>

-nocommit would turn this off, so no commit after every N documents. Now
that doesn't mean that the documents are not being committed. Solr, on its
side, has configuration that could triggers commit after every N documents
or M milliseconds or both. Also, IIRC, Indexer still sends a single commit
command at the end of the job as well regardless which, I think, is also
configurable.




On 05/26/2016 07:40 AM, Joseph Naegele wrote:
> Hi folks, I'm looking for clarification on the index "-nocommit" option:
>
> The description says: "do the commits once and for all the reducers in 
> one go (optional)", which sounds unintuitive. The relevant code in 
> IndexerJob.java looks like this:
>
>        // do the commits once and for all the reducers in one go
>        if (!noCommit) {
>            writers.open(job, "commit");
>            writers.commit();
>        }
>
> Which tells me that if I specify the option the commits are NOT performed.
> Is the "-nocommit" description incorrect?
>
> For reference, the solrindex "-nocommit" option's description used to say:
> "Do not send a commit after indexing the segment(s).". source:
> https://wiki.apache.org/nutch/bin/nutch%20solrindex
>
> To further the confusion: this option works with the indexer-solr 
> plugin, but is useless with indexer-elastic because indexer-elastic 
> "commits" for every bulk update (e.g. every N "write" calls).
>
> Joe
>


Reply via email to