Thanks Kaveh, Unfortunately no, the "-noCommit" option does not disable the commits after every N documents. Please see both indexer-solr and indexer-elastic again to confirm. The only thing this option does is disable the single commit at the end of the indexing job. In addition, its description says "do the commits once and for all the reducers in one go (optional)", which is inaccurate, as the option actually disables the final commit. I'll file an issue.
Joe -----Original Message----- From: kaveh minooie [mailto:ka...@plutoz.com] Sent: Friday, May 27, 2016 5:23 AM To: user@nutch.apache.org Subject: Re: indexer -nocommit option By default, indexer-solr plugin send a commit command after every N documents. N being this property in your nutch-site.xml : <property> <name>solr.commit.size</name> <value>10000</value> <description> Defines the number of documents to send to Solr in a single update batch. Decrease when handling very large documents to prevent Nutch from running out of memory. NOTE: It does not explicitly trigger a server side commit. </description> </property> -nocommit would turn this off, so no commit after every N documents. Now that doesn't mean that the documents are not being committed. Solr, on its side, has configuration that could triggers commit after every N documents or M milliseconds or both. Also, IIRC, Indexer still sends a single commit command at the end of the job as well regardless which, I think, is also configurable. On 05/26/2016 07:40 AM, Joseph Naegele wrote: > Hi folks, I'm looking for clarification on the index "-nocommit" option: > > The description says: "do the commits once and for all the reducers in > one go (optional)", which sounds unintuitive. The relevant code in > IndexerJob.java looks like this: > > // do the commits once and for all the reducers in one go > if (!noCommit) { > writers.open(job, "commit"); > writers.commit(); > } > > Which tells me that if I specify the option the commits are NOT performed. > Is the "-nocommit" description incorrect? > > For reference, the solrindex "-nocommit" option's description used to say: > "Do not send a commit after indexing the segment(s).". source: > https://wiki.apache.org/nutch/bin/nutch%20solrindex > > To further the confusion: this option works with the indexer-solr > plugin, but is useless with indexer-elastic because indexer-elastic > "commits" for every bulk update (e.g. every N "write" calls). > > Joe >