Token reuse)

Grant Ingersoll (JIRA) Mon, 22 Oct 2007 13:15:12 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Grant Ingersoll updated SOLR-342:
---------------------------------

    Description: 
LUCENE-843 adds support for new indexing capabilities using the 
setRAMBufferSizeMB() method that should significantly speed up indexing for 
many applications.  To fix this, we will need trunk version of Lucene (or wait 
for the next official release of Lucene)

Side effect of this is that Lucene's new, faster StandardTokenizer will also be 
incorporated.  

Also need to think about how we want to incorporate the new merge scheduling 
functionality (new default in Lucene is to do merges in a background thread)

  was:
LUCENE-843 adds support for new indexing capabilities using the 
setRAMBufferSizeMB() method that should significantly speed up indexing for 
many applications.  To fix this, we will need trunk version of Lucene (or wait 
for the next official release of Lucene)

Side effect of this is that Lucene's new, faster StandardTokenizer will also be 
incorporated.

        Summary: Add support for Lucene's new Indexing and merge features 
(excluding Document/Field/Token reuse)  (was: Add support for Lucene's new 
setRAMBufferSizeMB() method in IndexWriter)

Updated to cover the broader scope of changes that effect upgrading to Lucene 
trunk.

Plan to implement:
Add <ramBufferSizeMB> tag to specify the number of megabytes to give Lucene.  
Setting this value will override all other related settings (maxBufferedDocs, 
etc.) related to IndexWriter configuration

Add <mergeScheduler> tag that can have two values:  concurrent or serial.   Or 
would it be better to take in a classname?  Doing the latter would mean we 
would have to have a no-arg constructor, right?

Add <mergePolicy> tag that defines the merge policy that can have two values: 
byteSize or docCount.  Or would it be better to take a classname? 

NOTE: I am not proposing to handle the new reusable Document/Field/Token 
mechanism in Lucene, which should also be considered.




> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> -----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-342
>                 URL: https://issues.apache.org/jira/browse/SOLR-342
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

Reply via email to