Markus Jelsma wrote:
Here's a very recent thread on the matter:
http://lucene.472066.n3.nabble.com/facet-method-enum-vs-fc-td1681277.html


Thanks, that's helpful, but still leaves me with questions.

Yonik suggests with only ~25 unique facet values, method=enum is probably the way to go.

What about 100? 200? It probably depends on number of documents too: I've got about 3 million.

I know I can just try it and see, but since the penalty for picking wrong is using way a lot of memory, rather than performance -- this is very hard for me, with my limited JVM knowledge, to know if I've picked wrong or not. The only thing I know tells me I did it wrong is if I get an OutOfMemory. But maybe I don't get one right away, but get one a couple weeks later, perhaps under a different usage pattern. Was it caused by the facet.method=enum? Or something else maybe I changed in the interim. Or something else that was always there but which the different usage pattern triggered. It's confusing, you know?

That thread Markus references says:

"The enum method creates a bitset for #each# unique facet value. The bit set is (maxdocs / 8) bytes in size (I'm ignoring
some overhead here)."

Is that maxdocs the number of docs in your index, or the number of docs that are assigned to a given unique facet value? (and in the current result set, or in the index as a whole?) Makes a pretty big difference in overall memory use if you've got, say, 3 million docs, 100 unique facet values and the documents are relatively evenly distributed within them. I _think_ from the math that follows, Erick is saying "maxdocs" in that simple equation is the number of documents assigned to a given unique facet value, in the index as a whole. But that would seem to mean that the amount of memory taken up would be solely a function of number of documents in your index, not in fact of number of unique facet values. And that doesn't doesn't seem to square with the other advice we get on the subject.

So... I am confused.

Reply via email to