Erick, Thanks for your reply. So there is no easy way to get around this problem.
We have a way to rework the schema by keeping a single sort field. The dynamic fields we have are like relevance_CLASSID. The current schema has a unique key NODEID and a multi-valued field CLASSID - the relevance scores are for these class Ids. If we instead keep one document per classId per nodeId i.e. the new schema will have DOCID:CLASSID as unique key and store some redundant information across documents with the same NODEID, then we can sort on a single field relevance and do a filter query on classId. On Tue, Nov 27, 2012 at 7:07 PM, Erick Erickson <erickerick...@gmail.com>wrote: > I sure don't see how this can work given the constraints. Just to hold the > values, assuming that each doc holds a value in 150 fields, you have 150 * > 4 * 14,000,000 or 8.4G of memory required, and you just don't have that > much memory to play around with. > > Sharding seems silly for 14M docs, but that might be what's necessary. Or > get hardware with lots of memory. > > Or redefine the problem so you don't have to sort so many fields. Not quite > sure how do do that off the top of my head, but..... > > Best > Erick > > > On Tue, Nov 27, 2012 at 9:25 PM, Arun Rangarajan > <arunrangara...@gmail.com>wrote: > > > We have a Solr 3.6 core that has about 250 TrieIntFields (declared using > > dynamicField). There are about 14M docs in our Solr index and many > > documents have some value in many of these fields. We have a need to sort > > on all of these 250 fields over a period of time. > > > > The issue we are facing is that the underlying lucene fieldCache gets > > filled up very quickly. We have a 4 GB box and the index size is 18 GB. > > After a sort on 40 or 45 of these dynamic fields, the memory consumption > is > > about 90% (tomcat set up to get max heap size of 3.6 GB) and we start > > getting OutOfMemory errors. > > > > For now, we have a cron job running every minute restarting tomcat if the > > total memory consumed is more than 80%. > > > > We thought instead of sorting, if we did boosting it won't go to > > fieldCache. So instead of issuing a query like > > > > select?q=name:alba&sort=relevance_11 desc > > > > we tried > > > > select?q={!boost relevance_11}name:alba > > > > but unfortunately boosting also populates the field cache. > > > > From what I have read, I understand that restricting the number of > distinct > > values on sortable Solr fields will bring down the fieldCache space. The > > values in these sortable fields can be any integer from 0 to 33000 and > > quite widely distributed. We have a few scaling solutions in mind, but > what > > is the best way to handle this whole issue? > > > > thanks. > > >