On 6/17/2013 4:32 AM, Yoni Amir wrote:
I was wondering about your recommendation to use facet.method=enum? Can you 
explain what is the trade-off here? I understand that I gain a benefit by using 
less memory, but what with I lose? Is it speed?

The problem with facet.method=fc (the default) and memeory is that every field and query that you use for faceting ends up separately cached in the FieldCache, and the memory required grows as your index grows. If you only use facets on one or two fields, then the normal method is fine, and subsequent facets will be faster. It does eat a lot of java heap memory, though ... and the bigger your java heap is, the more problems you'll have with garbage collection.

With enum, it must gather the data out of the index for every facet run. If you have plenty of extra memory for the OS disk cache, this is not normally a major issue, because it will be pulled out of RAM, similar to what happens with fc, except that it's not java heap memory. The OS is a lot more efficient with how it uses memory than Java is.

Also, do you know if there is an answer to my original question in this thread? 
Solr has a queue of incoming requests, which, in my case, kept on growing. I 
looked at the code but couldn't find it, I think maybe it is an implicit queue 
in the form of Java's concurrent thread pool or something like that.

Is it possible to limit the size of this queue, or to determine its size during 
runtime? This is the last issue that I am trying to figure out right now.

I do not know the answer to this.

Also, to answer your question about the field all_text: all the fields are 
stored in order to support partial-update of documents. Most of the fields are 
used for highlighting, all_text is used for searching. I'll gladly omit 
all_text from being stored, but then partial-update won't work.

Your copyFields will still work just fine with atomic updates even if they are not stored. Behind the scenes, an atomic update is a delete and an add with the stored data plus the changes... if all your source fields are stored, then the copyField should be generated correctly from all the source fields.

The wiki page on the subject actually says that copyField destinations *MUST* be set to stored=false.

http://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations

The reason I didn't use edismax to search all the fields, is because the list 
of all fields is very long. Can edismax handle several hundred fields in the 
list? What about dynamic fields? Edismax requires the list to be fixed in the 
configuration file, so I can't include dynamic fields there. I can pass along 
the full list in the 'qf' parameter in every search request, but this seems 
like a waste? Also, what about performance? I was told that the best practice 
in this case (you have lots of fields and want to search everything) is to copy 
everything to a catch-all field.

If there is ever any situation where you can come up with some searches that only need to search against some of the fields and other searches that need to search against different fields, then you might consider creating different search handlers with different qf lists. If you always want to search against all the fields, then it's probably more efficient to keep your current method.

Thanks,
Shawn

Reply via email to