Re: out of memory during indexing do to large incoming queue

Shawn Heisey Mon, 17 Jun 2013 09:12:28 -0700

On 6/17/2013 4:32 AM, Yoni Amir wrote:

I was wondering about your recommendation to use facet.method=enum? Can you 
explain what is the trade-off here? I understand that I gain a benefit by using 
less memory, but what with I lose? Is it speed?

The problem with facet.method=fc (the default) and memeory is that everyfield and query that you use for faceting ends up separately cached inthe FieldCache, and the memory required grows as your index grows. Ifyou only use facets on one or two fields, then the normal method isfine, and subsequent facets will be faster. It does eat a lot of javaheap memory, though ... and the bigger your java heap is, the moreproblems you'll have with garbage collection.

With enum, it must gather the data out of the index for every facet run.If you have plenty of extra memory for the OS disk cache, this is notnormally a major issue, because it will be pulled out of RAM, similar towhat happens with fc, except that it's not java heap memory. The OS isa lot more efficient with how it uses memory than Java is.

Also, do you know if there is an answer to my original question in this thread? 
Solr has a queue of incoming requests, which, in my case, kept on growing. I 
looked at the code but couldn't find it, I think maybe it is an implicit queue 
in the form of Java's concurrent thread pool or something like that.

Is it possible to limit the size of this queue, or to determine its size during 
runtime? This is the last issue that I am trying to figure out right now.


I do not know the answer to this.

Also, to answer your question about the field all_text: all the fields are 
stored in order to support partial-update of documents. Most of the fields are 
used for highlighting, all_text is used for searching. I'll gladly omit 
all_text from being stored, but then partial-update won't work.

Your copyFields will still work just fine with atomic updates even ifthey are not stored. Behind the scenes, an atomic update is a deleteand an add with the stored data plus the changes... if all your sourcefields are stored, then the copyField should be generated correctly fromall the source fields.

The wiki page on the subject actually says that copyField destinations*MUST* be set to stored=false.


http://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations

The reason I didn't use edismax to search all the fields, is because the list 
of all fields is very long. Can edismax handle several hundred fields in the 
list? What about dynamic fields? Edismax requires the list to be fixed in the 
configuration file, so I can't include dynamic fields there. I can pass along 
the full list in the 'qf' parameter in every search request, but this seems 
like a waste? Also, what about performance? I was told that the best practice 
in this case (you have lots of fields and want to search everything) is to copy 
everything to a catch-all field.

If there is ever any situation where you can come up with some searchesthat only need to search against some of the fields and other searchesthat need to search against different fields, then you might considercreating different search handlers with different qf lists. If youalways want to search against all the fields, then it's probably moreefficient to keep your current method.


Thanks,
Shawn

Re: out of memory during indexing do to large incoming queue

Reply via email to