I've got an idea for a feature that I think could be very useful.  I'd
like to get some community feedback about it, see whether it's worth
opening an issue for discussion.

First, some background info:

As I understand it, the fact that stored fields are compressed means
that even if a particular stored field is not requested in the fl
parameter, the data on disk for that field must still be read, in order
to decompress the data and find the fields that ARE desired.  If one of
the stored fields that's NOT requested is really large, that would
pollute the OS disk cache with useless data.

If the data for a field in the results comes from docValues instead of
stored fields, I don't think it is compressed, which hopefully means
that if a field is NOT requested, the corresponding docValues data is
never read.

And now for the idea:

What if there were a schema option that would skip docValue retrieval
for a field unless the fl parameter were to *explicitly* ask for that
field?  With a typical wildcard value in fl, fields with this option
enabled would not be retrieved.  If the field is not stored, not
indexed, but has docValues, I *think* its presence on the disk would not
affect performance (OS disk cache efficiency) unless its data is
returned in results.

One practical application, should my theory about docValues prove to be
accurate:  Implementing a field that contains all the data sent for
indexing, which could then be used for completely internal reindexing. 
A field like this would probably be detrimental to performance unless it
could be automatically excluded without the client asking for the exclusion.

SOLR-3191 is a sort-of related issue.  This links to SOLR-9467, which
made me think of another potential use -- making it so certain fields
are semi-secure because they aren't returned unless they are explicitly
requested.  It wouldn't be TRULY secure, of course.

Thanks,
Shawn

Reply via email to