Hi, I'm new on Lucene and on a quite old existing project.
We use Lucene 2.4 (provided by Alfresco). We have a Lucene Index of about 5,5Go on the disk. We have heavy memory consumption and allocation, it seems that most of that memory is consummed by Lucene index. We looked at a memory dump with Eclipse Memory Analyser, and we were quite surprised to see that most of that memory is kept by enormous String[] that are yet mostly empty. The String[] are mainly around String[2'075'498] that use 16MB for each (local memory consumption, not reatain heap). They have an occupancy level that is quite low, about 10%. There are 339 big arrays like that, lucene use about 6.5Go, half on it just in these array (and not reference objects). Is-it 'normal' to have : 1) half of the memory that is used by String[] ? 2) About 10% of occupancy ? 3) Is-it possible to change configuration to increase the number of array and decrease theire size ? For exemple is a quite small dump (10GB, we have bigger JVM consumption up to 30GB, but they are harder to read/parse/analyse) : Class Name | Shallow Heap | Retained Heap | Percentage ------------------------------------------------------------------------------------------ java.lang.String[2075498] @ 0xfffffffcb7f89c60| 16'604'008 | 30'268'792 | 0.27% java.lang.String[2075498] @ 0xfffffffd12d2cf90| 16'604'008 | 30'268'792 | 0.27% java.lang.String[2075111] @ 0xfffffffd474f8e40| 16'600'912 | 30'265'256 | 0.27% java.lang.String[2075098] @ 0xfffffffd13e1f148| 16'600'808 | 30'265'064 | 0.27% java.lang.String[2075006] @ 0xfffffffb4940b180| 16'600'072 | 30'263'184 | 0.27% java.lang.String[2075171] @ 0xfffffffd83c6c1b8| 16'601'392 | 30'262'848 | 0.27% java.lang.String[2075176] @ 0xfffffffd8f087ce8| 16'601'432 | 30'262'504 | 0.27% ------------------------------------------------------------------------------------------ Some other statictics from that dump (filter on 'Term' and 'Field': Class Name | Objects | Shallow Heap | Retained Heap --------------------------------------------------------------------------------------------------- org.apache.lucene.index.TermInfosReader | 232 | 25'984 | >= 307'965'968 org.apache.lucene.index.Term | 1'876'549 | 60'049'568 | >= 240'004'272 org.apache.lucene.index.Term[] | 1'001 | 12'057'680 | >= 223'024'664 org.apache.lucene.index.TermInfosReader$ThreadResources| 2'902 | 92'864 | >= 102'801'168 org.apache.lucene.index.TermInfo[] | 230 | 12'033'008 | >= 72'170'448 org.apache.lucene.index.TermInfo | 1'769'215 | 70'768'600 | >= 70'768'600 java.lang.reflect.Field | 14'899 | 1'907'072 | >= 20'325'464 org.apache.lucene.index.SegmentTermEnum | 3'136 | 351'232 | >= 10'151'464 sun.reflect.generics.repository.FieldRepository | 5'508 | 220'320 | >= 9'196'560 org.apache.lucene.search.TermQuery | 56'553 | 1'809'696 | >= 8'300'672 org.apache.lucene.index.TermBuffer | 9'408 | 526'848 | >= 5'811'240 org.apache.lucene.document.Field | 14'350 | 803'600 | >= 3'454'360 --------------------------------------------------------------------------------------------------- In our case we need to have some very short word indexed, so we desactivate 'stop words'. If we want to have the list of Term order by their index size what is good tool to do that (Luce?) and how ca we do such request ? Regards, Philippe -- Philippe Kernévez Directeur technique (Suisse), pkerne...@octo.com +41 79 888 33 32 Retrouvez OCTO sur OCTO Talk : http://blog.octo.com OCTO Technology http://www.octo.com