Re: Facets and running out of Heap Space

Mike Klaas Tue, 09 Oct 2007 18:31:34 -0700

On 9-Oct-07, at 12:36 PM, David Whalen wrote:

<field name="id" type="string" indexed="true" stored="true" />
<field name="content_date" type="date" indexed="true" stored="true" />
<field name="media_type" type="string" indexed="true" stored="true" />
<field name="location" type="string" indexed="true" stored="true" />
<field name="country_code" type="string" indexed="true"stored="true" /><field name="text" type="text" indexed="true" stored="true"multiValued="true" /><field name="content_source" type="string" indexed="true"stored="true" />
<field name="title" type="string" indexed="true" stored="true" />
<field name="site_id" type="string" indexed="true" stored="true" />
<field name="journalist_id" type="string" indexed="true"stored="true" />
<field name="blog_url" type="string" indexed="true" stored="true" />
<field name="created_date" type="date" indexed="true" stored="true" />
I'm sure we could stop storing many of these columns, especially
if someone told me that would make a big difference.

I don't think that it would make a difference in memory consumption,but storage is certainly not necessary for faceting. Extra storedfields can slow down search if they are large (in terms of bytes),but don't really occupy extra memory, unless they are polluting thedoc cache. Does 'text' need to be stored?

what does the LukeReqeust Handler tell you about the # of
distinct terms in each field that you facet on?


Where would I find that?  I could probably estimate that myself
on a per-column basis.  it ranges from 4 distinct values for
media_type to 30-ish for location to 200-ish for country_code
to almost 10,000 for site_id to almost 100,000 for journalist_id.

Using the filter cache method on the things like media type andlocation; this will occupy ~2.3MB of memory _per unique value_, so itshould be a net win for those (although quite close in spacerequirements for a 30-ary field on your index size).


-Mike

Re: Facets and running out of Heap Space

Reply via email to