Token reuse)

Grant Ingersoll (JIRA) Thu, 07 Feb 2008 07:48:41 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566643#action_12566643
 ]


Grant Ingersoll commented on SOLR-342:
--------------------------------------

I did some benchmarking of the autocommit functionality in Lucene (as opposed 
to in Solr, which is different).  Currently, in Lucene autocommit is true by 
default, meaning that every time there is a flush, it is also committed.  Solr 
adds its own layer on top of this with its commit semantics.  There is a 
noticeable difference in memory used and speed in Lucene performance between 
autocommit = false and autocommit = true.  

Some rough numbers using the autocommit.alg in Lucene's benchmark contrib (from 
trunk):  
 Operation       round ac ram   runCnt   recsPerRun        rec/s  elapsedSec    
avgUsedMem    avgTotalMem
     [java] MAddDocs_200000     0rue2.00        1       200000        400.1     
 499.90    61,322,608     68,780,032
     [java] MAddDocs_200000 -   1lse2.00 -  -   1 -  -  200000 -  -   499.9 -  
- 400.08 -  49,373,632 -   75,018,240
     [java] MAddDocs_200000     2rue2.00        1       200000        383.7     
 521.27    70,716,096     75,018,240
     [java] MAddDocs_200000 -   3lse2.00 -  -   1 -  -  200000 -  -   552.7 -  
- 361.89 -  68,069,464 -   75,018,240

The first row has autocommit = true, second is false, and then alternating.  
The key value is the rec/s, which is:
1. ac = true 400.1
2. ac = false 499.9
3. ac = true 383.7
4. ac = false 552.7

Notice also the diff in avgUsedMem.  Adding this functionality may, perhaps, be 
more important to Solr's performance than the flush by RAM capability.

> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> -----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-342
>                 URL: https://issues.apache.org/jira/browse/SOLR-342
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
> SOLR-342.tar.gz
>
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

Reply via email to