Accd to previous posters on this topic, sorting requires an array with an
entry per document in the entire index. Each entry has 32 bits for the 'int'
type, and 32 bits plus the field representation length for other types. Not
knowing Lucene internals I have a hard time believing that it really has to
be this wasteful, but oh well.

Since 'sint' is needed to do range queries on a field, and 'int' is needed
for efficient sorting, we wound up have one field of each type and a
<copyField> to make sure they both get the same numbers.  Yes, it's
annoying. 

-----Original Message-----
From: Mark Miller [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 20, 2008 6:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Sorting performance

christophe wrote:
> When I start indexing new documents, searches are taking long time
> again: is the sort cache flushed when new documents are indexed ?

When you commit, a new Reader will be opened (or reopened) so that the
freshly added docs can be seen. This would make the first search slow again,
but if you have the warming queries, it should be warmed before being put
into use. Be sure the warming query sorts on the right field.

>
> Are there any metrics on how to compute memory requirements (based on 
> doc average size, number of sorted fields, number of indexed documents
> + number of new document / day) ?

Depends on the field type, but I think its 32bits x numDocs for most 
datatypes, with the String datatype also requiring an array of all the 
unique terms to index into. Thats not everything, but it dominates.


> Thanks
> Christophe
> Mark Miller wrote:
>> You need to setup a warming query that sorts so that the initial long 
>> query is done behind the scenes. Users first query will then be fast. 
>> Solrconfig.
>>
>> - Mark
>>
>>
>> On Oct 18, 2008, at 1:34 AM, christophe <[EMAIL PROTECTED]> 
>> wrote:
>>
>>> Here are the memory parameters I'm using now(Tomcat): -Xms2024m 
>>> -Xmx2024m
>>> With those values, the second query is way faster. Only the first 
>>> one is very slow.
>>> Thanks for the tip.
>>> However, I'm wondering if will be enough and I will not hit the same 
>>> issues when I will have many users searching at the same time: I 
>>> will do a stress test to check this.
>>>
>>> Thanks
>>> Christophe
>>>
>>> christophe wrote:
>>>> It is slow each time I run it. (I test it from the Solr admin 
>>>> console or from a JAVA program using the Http client).
>>>> I do not get the OOM each time.
>>>>
>>>> Thx
>>>> Christophe
>>>>
>>>> Otis Gospodnetic wrote:
>>>>> Is the sorted query slow only the first time or every time you run 
>>>>> it?
>>>>>
>>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>>
>>>>> Otis
>>>>> -- 
>>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message ----
>>>>>
>>>>>> From: christophe <[EMAIL PROTECTED]>
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>>> Subject: Sorting performance
>>>>>> Hi,
>>>>>>
>>>>>> I'm doing some tests with Solr1.3
>>>>>> I have loaded around 7M documents, each with a few stored and 
>>>>>> indexed fields.
>>>>>>
>>>>>> This query: text:sometext returns the results, sorted by score in 
>>>>>> a few milliseconds. (I display 10 out of 8747 matched documents)
>>>>>> This one: text:sometext;id desc   takes something like 60s or 
>>>>>> more to return the data (when it doesn't fails with an out of 
>>>>>> memory error). (id is a string type).
>>>>>> I have tried to display only id, same results.
>>>>>>
>>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>>
>>>>>> My schema is based on the sample, with the following fields:
>>>>>>
>>>>>>  />           multiValued="true" />
>>>>>>  default="NOW" multiValued="false"/>
>>>>>>
>>>>>> Thanks
>>>>>> Christophe
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>


Reply via email to