> Is it correct that a segment file is ready for merging after a commit has > been done (e.g. using the autoCommit property), so I will see merges of 100 > and up documents (and the index writer continues writing into a new segment > file)?
Yes, merging won't happen until after a segment is closed. How big the segments are depends on the MergePolicy, of which there are several. Here's a great blog explaining that... http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Best Erick On Thu, Sep 20, 2012 at 5:17 AM, "Trym R. Møller" <t...@sigmat.dk> wrote: > Hi > > Thanks a lot for your answer, Erick! > > I changed the value of the autoSoftCommit property and it had the expected > effect. It can be noted that this is per Core, so I get four getReader calls > when my Solr contains four cores per autoSoftCommit interval. > > Is it correct that a segment file is ready for merging after a commit has > been done (e.g. using the autoCommit property), so I will see merges of 100 > and up documents (and the index writer continues writing into a new segment > file)? > > It looks like the segments are being merged into 6 MB files and when enough > into 60MB files and these again into 3,5GB files. > > Best regards Trym > > Den 19-09-2012 14:49, Erick Erickson skrev: > >> I _think_ the getReader calls are being triggered by the autoSoftCommit >> being >> at one second. If so, this is probably OK. But bumping that up would nail >> whether that's the case... >> >> About RamBufferSizeMB. This has nothing to do with the size of the >> segments! >> It's just how much memory is consumed before the RAMBuffer is flushed to >> the _currently open_ segment. So until a hard commit happens, the >> currently >> open segment will continue to grow as successive RAMBuffers are flushed. >> >> bq: I expected that my Lucene index segment files would be a bit >> bigger than 1KB >> >> Is this a typo? The 512 is specifying MB...... >> >> Best >> Erick >> >> On Wed, Sep 19, 2012 at 6:01 AM, "Trym R. Møller" <t...@sigmat.dk> wrote: >>> >>> Hi >>> >>> Using SolrCloud I have added the following to solrconfig.xml (actually >>> the >>> node in zookeeper) >>> <ramBufferSizeMB>512</ramBufferSizeMB> >>> >>> After that I expected that my Lucene index segment files would be a bit >>> bigger than 1KB as I'm indexing very small documents >>> Enabling the infoStream I see a lot of "flush at getReader" (one segment >>> of >>> the infoStream file pasted below) >>> >>> 1. Where can I look for why documents are flushed so frequently? >>> 2. Does it have anything to do with "getReader" and can I do anything so >>> Solr doesn't need to get a new reader so often? >>> >>> Any comments are most welcome. >>> >>> Best regards Trym >>> >>> Furthermore I have specified >>> <autoCommit> >>> <maxTime>180000</maxTime> >>> </autoCommit> >>> <autoSoftCommit> >>> <maxTime>1000</maxTime> >>> </autoSoftCommit> >>> >>> >>> IW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush at >>> getReader >>> DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: pool-12-thread-1 >>> startFullFlush >>> DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: anyChanges? >>> numDocsInRam=7 deletes=false hasTickets:false pendingChangesInFullFlush: >>> false >>> DWFC 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: >>> addFlushableState >>> DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_kc, >>> aborting=false, >>> numDocsInRAM=7, deleteQueue=DWDQ: [ generation: 1 ]] >>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush postings >>> as >>> segment _kc numDocs=7 >>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment has >>> 0 >>> deleted docs >>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment has >>> no >>> vectors; norms; no docValues; prox; freqs >>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: >>> flushedFiles=[_kc_Lucene40_0.frq, _kc.fnm, _kc_Lucene40_0.tim, >>> _kc_nrm.cfs, >>> _kc.fdx, _kc.fdt, _kc_Lucene40_0.prx, _kc_nrm.cfe, _kc_Lucene40_0.tip] >>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed >>> codec=Lucene40 >>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed: >>> segment=_kc ramUsed=0,095 MB newFlushedSize(includes docstores)=0,003 MB >>> docs/MB=2.283,058 >>> >