On 09/11/2013 08:40 AM, Per Steffensen wrote:
The reason I mention sort is that we in my project, half a year ago,
have dealt with the FieldCache->OOM-problem when doing sort-requests.
We basically just reject sort-requests unless they hit below X
documents - in case they do we just find them without sorting and sort
them ourselves afterwards.
Currently our problem is, that we have to do a group/distinct (in
SQL-language) query and we have found that we can do what we want to
do using group (http://wiki.apache.org/solr/FieldCollapsing) or facet
- either will work for us. Problem is that they both use FieldCache
and we "know" that using FieldCache will lead to OOM-execptions with
the amount of data each of our Solr-nodes administrate. This time we
have really no option of just "limit" usage as we did with sort.
Therefore we need a group/distinct-functionality that works even on
huge data-amounts (and a algorithm using FieldCache will not)
I believe setting facet.method=enum will actually make facet not use
the FieldCache. Is that true? Is it a bad idea?
I do not know much about DocValues, but I do not believe that you will
avoid FieldCache by using DocValues? Please elaborate, or point to
documentation where I will be able to read that I am wrong. Thanks!
There is Simon Willnauer's presentation
http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene
and this blog post
http://blog.trifork.com/2011/10/27/introducing-lucene-index-doc-values/
and this one that shows some performance comparisons:
http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/