Philippe Kernévez [pkerne...@octo.com] wrote:
> We use Lucene 2.4 (provided by Alfresco).

Lucene 2.4 is 6 years old. The obvious advice is to upgrade, but I guess you 
have your reasons not to.

> We looked at a memory dump with Eclipse Memory Analyser, and we were quite
> surprised to see that most of that memory is kept by enormous String[] that
> are yet mostly empty.

I am guessing you have a lot of documents in your index and that you are 
sorting on at least one String field?

http://www.lhelper.org/dev/lucene-2.4.0/docs/api/org/apache/lucene/search/Sort.html
states that sorting on String in Lucene means that all Strings for that field 
are kept in memory. There has to be one entry in the String array(s) for each 
document, even if the document does not have a value for that field.

If my guess is correct, the solution is to reduce the number of String sort 
fields, ideally to 0. Maybe you can use an integer field instead by doing some 
mapping?

> In our case we need to have some very short word indexed, so we desactivate
> 'stop words'. If we want to have the list of Term order by their index size
> what is good tool to do that (Luce?) and how ca we do such request ?

Luke has term statistics build-in. I don't remember the details, but I recall 
that it was straight forward.

- Toke Eskildsen

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to