Hi,

I'm new on Lucene and on a quite old existing project.

We use Lucene 2.4 (provided by Alfresco).
We have a Lucene Index of about 5,5Go on the disk.
We have heavy memory consumption and allocation, it seems that most of that
memory is consummed by Lucene index.
We looked at a memory dump with Eclipse Memory Analyser, and we were quite
surprised to see that most of that memory is kept by enormous String[] that
are yet mostly empty.
The String[] are mainly around String[2'075'498] that use 16MB for each
(local memory consumption, not reatain heap).
They have an occupancy level that is quite low, about 10%.
There are 339 big arrays like that, lucene use about 6.5Go, half on it just
in these array (and not reference objects).

Is-it 'normal' to have :
1) half of the memory that is used by String[] ?
2) About 10% of occupancy ?
3) Is-it possible to change configuration to increase the number of array
and decrease theire size ?


For exemple is a quite small dump (10GB, we have bigger JVM consumption up
to 30GB, but they are harder to read/parse/analyse) :
Class Name                                    | Shallow Heap | Retained
Heap | Percentage
------------------------------------------------------------------------------------------
java.lang.String[2075498] @ 0xfffffffcb7f89c60|   16'604'008 |
 30'268'792 |      0.27%
java.lang.String[2075498] @ 0xfffffffd12d2cf90|   16'604'008 |
 30'268'792 |      0.27%
java.lang.String[2075111] @ 0xfffffffd474f8e40|   16'600'912 |
 30'265'256 |      0.27%
java.lang.String[2075098] @ 0xfffffffd13e1f148|   16'600'808 |
 30'265'064 |      0.27%
java.lang.String[2075006] @ 0xfffffffb4940b180|   16'600'072 |
 30'263'184 |      0.27%
java.lang.String[2075171] @ 0xfffffffd83c6c1b8|   16'601'392 |
 30'262'848 |      0.27%
java.lang.String[2075176] @ 0xfffffffd8f087ce8|   16'601'432 |
 30'262'504 |      0.27%
------------------------------------------------------------------------------------------


Some other statictics from that dump (filter on 'Term' and 'Field':
Class Name                                             |   Objects |
Shallow Heap |  Retained Heap
---------------------------------------------------------------------------------------------------
org.apache.lucene.index.TermInfosReader                |       232 |
25'984 | >= 307'965'968
org.apache.lucene.index.Term                           | 1'876'549 |
60'049'568 | >= 240'004'272
org.apache.lucene.index.Term[]                         |     1'001 |
12'057'680 | >= 223'024'664
org.apache.lucene.index.TermInfosReader$ThreadResources|     2'902 |
92'864 | >= 102'801'168
org.apache.lucene.index.TermInfo[]                     |       230 |
12'033'008 |  >= 72'170'448
org.apache.lucene.index.TermInfo                       | 1'769'215 |
70'768'600 |  >= 70'768'600
java.lang.reflect.Field                                |    14'899 |
 1'907'072 |  >= 20'325'464
org.apache.lucene.index.SegmentTermEnum                |     3'136 |
 351'232 |  >= 10'151'464
sun.reflect.generics.repository.FieldRepository        |     5'508 |
 220'320 |   >= 9'196'560
org.apache.lucene.search.TermQuery                     |    56'553 |
 1'809'696 |   >= 8'300'672
org.apache.lucene.index.TermBuffer                     |     9'408 |
 526'848 |   >= 5'811'240
org.apache.lucene.document.Field                       |    14'350 |
 803'600 |   >= 3'454'360
---------------------------------------------------------------------------------------------------


In our case we need to have some very short word indexed, so we desactivate
'stop words'. If we want to have the list of Term order by their index size
what is good tool to do that (Luce?) and how ca we do such request ?


Regards,
Philippe

-- 
Philippe Kernévez



Directeur technique (Suisse),
pkerne...@octo.com
+41 79 888 33 32

Retrouvez OCTO sur OCTO Talk : http://blog.octo.com
OCTO Technology http://www.octo.com

Reply via email to