Re: Out of memory on sorting

Erick Erickson Thu, 19 May 2011 06:40:23 -0700

See below:

On Thu, May 19, 2011 at 9:06 AM, Rohit <ro...@in-rev.com> wrote:
> Hi Erick,
>
> My OOM problem starts when I query the core with 13217121 documents. My
> schema and other details are given below,


Hmmmm, how many cores are you running and what are they doing? Because they
all use the same memory pool, so you may be getting some carry-over. So one
strategy would be just to move this core to a dedicated machine.

>
> 1> how is your sort field defined? String? Integer? If it's a string and you
> could change it to a numeric type, you'd use a lot less memory.
>
> We primarily use two different sort criteria one is a date field and the
> other is string (id). I cannot change the "id" field as this is also the
> uniquekey for my schema.

OK, but can you use a separate field just for sorting? Populate it with
a <copyField> and sort on that rather than ID. This is only helpful if
you can make a compact representation, e.g. integer.

>
> 2> How many distinct terms? I'm guessing one/document actually,this is
> somewhat of an anti-pattern in Solr for all it's sometimes necessary.
>
> Since one of the field is a timestamp instance and the other a unique key
> all are distinct. (These are tweets happening for keyword)
>

Not one, but two fields where all values are distinct. Although  I don't think
the timestamp is much of a problem, assuming you're storing it as one
of the numeric types (I'd especially make sure it was one of the Trie types,
specifically "tdate" if you're going to do range queries). There are tricks for
dealing with this, but your "id" field will get you a bigger bang for the buck,
concentrate on that first.

> 3> How much memory are you allocating for the JVM?
>
> I am starting solr with the following command java -Xms1024M -Xmx-2048M
> start.jar
>

Well, you can bump this higher if you're on 64 bit OSs, The other possibility is
to shard your index. But really, with 13M documents this should fit on one
machine.

What does your statistics page tell you, especially about cache usage?



>
> All out test case for moving to solr has passed, this is proving to be a big
> set back. Help would be greatly appreciated.
>
> Regards,
> Rohit
>
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 19 May 2011 18:21
> To: solr-user@lucene.apache.org
> Subject: Re: Out of memory on sorting
>
> The warming queries warm up the caches used in sorting. So
> just including the &sort=..... will warm the sort caches. the terms
> searched are not important. The same is true with facets...
>
> However, I don't understand how that relates to your OOM problems. I'd
> expect the OOM to start happening on startup, you'd be doing
> the operation that runs you out of memory on startup...
>
> So, we need more details:
> 1> how is your sort field defined? String? Integer? If it's a string
>     and you could change it to a numeric type, you'd use a lot
>     less memory.
> 2> How many distinct terms? I'm guessing one/document actually,
>     this is somewhat of an anti-pattern in Solr for all it's sometimes
>     necessary.
> 3> How much memory are you allocating for the JVM?
> 4> What other fields are you sorting on and how many unique values
>     in each? Solr Admin can help you here....
>
> Best
> Erick
>
>
> On Thu, May 19, 2011 at 6:20 AM, Rohit <ro...@in-rev.com> wrote:
>> Thanks for pointing me in the right direction, now I see the configuration
>> for firstsearcher or newsearcher, the <str name="q"> needs to configured
>> previously. In my case the q is every changing, users can actually search
>> for anything and the possibilities of queries unlimited.
>>
>> How can I make this generic?
>>
>> -Rohit
>>
>>
>>
>> -----Original Message-----
>> From: rajini maski [mailto:rajinima...@gmail.com]
>> Sent: 19 May 2011 14:53
>> To: solr-user@lucene.apache.org
>> Subject: Re: Out of memory on sorting
>>
>> Explicit Warming of Sort Fields
>>
>> If you do a lot of field based sorting, it is advantageous to add
> explicitly
>> warming queries to the "newSearcher" and "firstSearcher" event listeners
> in
>> your solrconfig which sort on those fields, so the FieldCache is populated
>> prior to any queries being executed by your users.
>> firstSearcher
>> <lst> <str name="q">solr rocks</str><str name="start">0</str><str
>> name="rows">10</str><str name="sort">empID asc</str></lst>
>>
>>
>>
>> On Thu, May 19, 2011 at 2:39 PM, Rohit <ro...@in-rev.com> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> We are moving to a multi-core Solr installation with each of the core
>>> having
>>> millions of documents, also documents would be added to the index on an
>>> hourly basis.  Everything seems to run find and I getting the expected
>>> result and performance, except where sorting is concerned.
>>>
>>>
>>>
>>> I have an index size of 13217121 documents, now when I want to get
>>> documents
>>> between two dates and then sort them by ID  solr goes out of memory. This
>>> is
>>> with just me using the system, we might also have simultaneous users, how
>>> can I improve this performance?
>>>
>>>
>>>
>>> Rohit
>>>
>>>
>>
>>
>
>

Re: Out of memory on sorting

Reply via email to