>From an theoretical IR standpoint, there is no reason to index null values,
or even empty strings for that matter.  However in practice there are plenty
of cases that I've encountered where it is necessary to obtain a list of
documents where a particular field is null (i.e. hasn't been specified at
index time) or an empty string.

For example, you may need to generate a list of products contained in your
index that do not have a part number.  A dirty, ugly hack work-around to
this problem that we've used in the past is to replace null or unset values
at index time with a special token value like "__null__" that (hopefully)
won't appear in normal indexed data.  This then allows you to perform a
query something like part_number:"__null__" to obtain all documents without
a part number.  This approach has worked in the past for string fields, not
sure how effective it would be for numerical field types though.

Ultimately, this leads to the situation where you are using Lucene (and
Solr) as a RDBMS, which it clearly is not.  While I'd love to have support
for querying null / empty string fields, I don't think it's going to happen
in the near future.

PIete

Reply via email to