I'm surprised by a 30% increase. The approach of adding a
special token for "not present" is one of the standard ones....

So just to check, when you say "stored", are you really
storing the missing value? As in Field.Store.YES? As
opposed to Field.Index.###? Because theres no
need to Store this value.

Erick

On Thu, Jan 21, 2010 at 11:22 PM, Dallan Quass <dal...@quass.org> wrote:

> Hi,
>
> I want to issue queries where queried fields have a specified value or are
> "missing".  I know that I can query missing values using a negated
> full-range query, but it doesn't seem like that's very efficient (the
> fields
> in question have a lot of possible values).  So I've opted to store special
> "missing" value for each field that isn't found in a document, and issue
> queries like "+(field1:value field1:missing) +(field2:value
> field2:missing)".
>
> The issue is that storing the missing values increases the size of the
> index
> by 30%, because a lot of documents don't have values for all fields.  I'd
> like to keep the index as small as possible so it can be cached in memory.
>
> Any ideas on an alternative approach?  Is there a way to convince lucene to
> store the doc-id list for the "missing" field value as a bitmap?  What if I
> added some boolean fields to my schema; e.g., field1_missing and
> field2_missing and stored a true in those fields for documents that were
> missing the corresponding fields?  Does lucene store BoolField's as
> bitmaps?
>
> -dallan
>
>

Reply via email to