Erick,

Thanks for your reply. So there is no easy way to get around this problem.

We have a way to rework the schema by keeping a single sort field. The
dynamic fields we have are like relevance_CLASSID. The current schema has a
unique key NODEID and a multi-valued field CLASSID - the relevance scores
are for these class Ids. If we instead keep one document per classId per
nodeId i.e. the new schema will have DOCID:CLASSID as unique key and store
some redundant information across documents with the same NODEID, then we
can sort on a single field relevance and do a filter query on classId.


On Tue, Nov 27, 2012 at 7:07 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> I sure don't see how this can work given the constraints. Just to hold the
> values, assuming that each doc holds a value in 150 fields, you have 150 *
> 4 * 14,000,000 or 8.4G of memory required, and you just don't have that
> much memory to play around with.
>
> Sharding seems silly for 14M docs, but that might be what's necessary. Or
> get hardware with lots of memory.
>
> Or redefine the problem so you don't have to sort so many fields. Not quite
> sure how do do that off the top of my head, but.....
>
> Best
> Erick
>
>
> On Tue, Nov 27, 2012 at 9:25 PM, Arun Rangarajan
> <arunrangara...@gmail.com>wrote:
>
> > We have a Solr 3.6 core that has about 250 TrieIntFields (declared using
> > dynamicField). There are about 14M docs in our Solr index and many
> > documents have some value in many of these fields. We have a need to sort
> > on all of these 250 fields over a period of time.
> >
> > The issue we are facing is that the underlying lucene fieldCache gets
> > filled up very quickly. We have a 4 GB box and the index size is 18 GB.
> > After a sort on 40 or 45 of these dynamic fields, the memory consumption
> is
> > about 90% (tomcat set up to get max heap size of 3.6 GB) and we start
> > getting OutOfMemory errors.
> >
> > For now, we have a cron job running every minute restarting tomcat if the
> > total memory consumed is more than 80%.
> >
> > We thought instead of sorting, if we did boosting it won't go to
> > fieldCache. So instead of issuing a query like
> >
> > select?q=name:alba&sort=relevance_11 desc
> >
> > we tried
> >
> > select?q={!boost relevance_11}name:alba
> >
> > but unfortunately boosting also populates the field cache.
> >
> > From what I have read, I understand that restricting the number of
> distinct
> > values on sortable Solr fields will bring down the fieldCache space. The
> > values in these sortable fields can be any integer from 0 to 33000 and
> > quite widely distributed. We have a few scaling solutions in mind, but
> what
> > is the best way to handle this whole issue?
> >
> > thanks.
> >
>

Reply via email to