[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566643#action_12566643 ]
Grant Ingersoll commented on SOLR-342: -------------------------------------- I did some benchmarking of the autocommit functionality in Lucene (as opposed to in Solr, which is different). Currently, in Lucene autocommit is true by default, meaning that every time there is a flush, it is also committed. Solr adds its own layer on top of this with its commit semantics. There is a noticeable difference in memory used and speed in Lucene performance between autocommit = false and autocommit = true. Some rough numbers using the autocommit.alg in Lucene's benchmark contrib (from trunk): Operation round ac ram runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] MAddDocs_200000 0rue2.00 1 200000 400.1 499.90 61,322,608 68,780,032 [java] MAddDocs_200000 - 1lse2.00 - - 1 - - 200000 - - 499.9 - - 400.08 - 49,373,632 - 75,018,240 [java] MAddDocs_200000 2rue2.00 1 200000 383.7 521.27 70,716,096 75,018,240 [java] MAddDocs_200000 - 3lse2.00 - - 1 - - 200000 - - 552.7 - - 361.89 - 68,069,464 - 75,018,240 The first row has autocommit = true, second is false, and then alternating. The key value is the rec/s, which is: 1. ac = true 400.1 2. ac = false 499.9 3. ac = true 383.7 4. ac = false 552.7 Notice also the diff in avgUsedMem. Adding this functionality may, perhaps, be more important to Solr's performance than the flush by RAM capability. > Add support for Lucene's new Indexing and merge features (excluding > Document/Field/Token reuse) > ----------------------------------------------------------------------------------------------- > > Key: SOLR-342 > URL: https://issues.apache.org/jira/browse/SOLR-342 > Project: Solr > Issue Type: Improvement > Components: update > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, > SOLR-342.tar.gz > > > LUCENE-843 adds support for new indexing capabilities using the > setRAMBufferSizeMB() method that should significantly speed up indexing for > many applications. To fix this, we will need trunk version of Lucene (or > wait for the next official release of Lucene) > Side effect of this is that Lucene's new, faster StandardTokenizer will also > be incorporated. > Also need to think about how we want to incorporate the new merge scheduling > functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.