Oops, that's a Lucene bit (got confused which list I was on). You can still control storing the raw text in SOLR, so my question is still relevant, but the solution may be different. Do you store the fields?
Erick On Fri, Jan 22, 2010 at 10:27 AM, Erick Erickson <erickerick...@gmail.com>wrote: > I'm surprised by a 30% increase. The approach of adding a > special token for "not present" is one of the standard ones.... > > So just to check, when you say "stored", are you really > storing the missing value? As in Field.Store.YES? As > opposed to Field.Index.###? Because theres no > need to Store this value. > > Erick > > On Thu, Jan 21, 2010 at 11:22 PM, Dallan Quass <dal...@quass.org> wrote: > >> Hi, >> >> I want to issue queries where queried fields have a specified value or are >> "missing". I know that I can query missing values using a negated >> full-range query, but it doesn't seem like that's very efficient (the >> fields >> in question have a lot of possible values). So I've opted to store >> special >> "missing" value for each field that isn't found in a document, and issue >> queries like "+(field1:value field1:missing) +(field2:value >> field2:missing)". >> >> The issue is that storing the missing values increases the size of the >> index >> by 30%, because a lot of documents don't have values for all fields. I'd >> like to keep the index as small as possible so it can be cached in memory. >> >> Any ideas on an alternative approach? Is there a way to convince lucene >> to >> store the doc-id list for the "missing" field value as a bitmap? What if >> I >> added some boolean fields to my schema; e.g., field1_missing and >> field2_missing and stored a true in those fields for documents that were >> missing the corresponding fields? Does lucene store BoolField's as >> bitmaps? >> >> -dallan >> >> >