Token reuse)

Will Johnson (JIRA) Fri, 08 Feb 2008 09:12:44 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567099#action_12567099
 ]


Will Johnson commented on SOLR-342:
-----------------------------------

I think we're running into a very serious issue with trunk + this patch.  
either the document summaries are not matched or the overall matching is 
'wrong'.  i did find this in the lucene jira: LUCENE-994 

"Note that these changes will break users of ParallelReader because the
parallel indices will no longer have matching docIDs. Such users need
to switch IndexWriter back to flushing by doc count, and switch the
MergePolicy back to LogDocMergePolicy. It's likely also necessary to
switch the MergeScheduler back to SerialMergeScheduler to ensure
deterministic docID assignment."

we're seeing rather consistent bad results but only after 20-30k documents and 
multiple commits and wondering if anyone else is seeing anything.  i've 
verified that the results are bad even though luke which would seem to remove 
the search side of hte solr equation.   the basic test case is to search for 
title:foo and get back documents that only have title:bar.  we're going to 
start on a unit test but give the document counts and the corpus we're testing 
against it may be a while so i thought i'd ask to see if anyone had any hints.

removing this patch seems to remove the issue so i doesn't appear to be a 
lucene problem



> Add support for Lucene's new Indexing and merge features (excluding 
> Document/Field/Token reuse)
> -----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-342
>                 URL: https://issues.apache.org/jira/browse/SOLR-342
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
> SOLR-342.patch, SOLR-342.tar.gz
>
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.  
> Also need to think about how we want to incorporate the new merge scheduling 
> functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

Reply via email to