> Is it correct that a segment file is ready for merging after a commit has
> been done (e.g. using the autoCommit property), so I will see merges of 100
> and up documents (and the index writer continues writing into a new segment
> file)?

Yes, merging won't happen until after a segment is closed. How big the segments
are depends on the MergePolicy, of which there are several. Here's a great
blog explaining that...

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Best
Erick

On Thu, Sep 20, 2012 at 5:17 AM, "Trym R. Møller" <t...@sigmat.dk> wrote:
> Hi
>
> Thanks a lot for your answer, Erick!
>
> I changed the value of the autoSoftCommit property and it had the expected
> effect. It can be noted that this is per Core, so I get four getReader calls
> when my Solr contains four cores per autoSoftCommit interval.
>
> Is it correct that a segment file is ready for merging after a commit has
> been done (e.g. using the autoCommit property), so I will see merges of 100
> and up documents (and the index writer continues writing into a new segment
> file)?
>
> It looks like the segments are being merged into 6 MB files and when enough
> into 60MB files and these again into 3,5GB files.
>
> Best regards Trym
>
> Den 19-09-2012 14:49, Erick Erickson skrev:
>
>> I _think_ the getReader calls are being triggered by the autoSoftCommit
>> being
>> at one second. If so, this is probably OK. But bumping that up would nail
>> whether that's the case...
>>
>> About RamBufferSizeMB. This has nothing to do with the size of the
>> segments!
>> It's just how much memory is consumed before the RAMBuffer is flushed to
>> the _currently open_ segment. So until a hard commit happens, the
>> currently
>> open segment will continue to grow as successive RAMBuffers are flushed.
>>
>> bq:  I expected that my Lucene index segment files would be a bit
>> bigger than 1KB
>>
>> Is this a typo? The 512 is specifying MB......
>>
>> Best
>> Erick
>>
>> On Wed, Sep 19, 2012 at 6:01 AM, "Trym R. Møller" <t...@sigmat.dk> wrote:
>>>
>>> Hi
>>>
>>> Using SolrCloud I have added the following to solrconfig.xml (actually
>>> the
>>> node in zookeeper)
>>>      <ramBufferSizeMB>512</ramBufferSizeMB>
>>>
>>> After that I expected that my Lucene index segment files would be a bit
>>> bigger than 1KB as I'm indexing very small documents
>>> Enabling the infoStream I see a lot of "flush at getReader" (one segment
>>> of
>>> the infoStream file pasted below)
>>>
>>> 1. Where can I look for why documents are flushed so frequently?
>>> 2. Does it have anything to do with "getReader" and can I do anything so
>>> Solr doesn't need to get a new reader so often?
>>>
>>> Any comments are most welcome.
>>>
>>> Best regards Trym
>>>
>>> Furthermore I have specified
>>>         <autoCommit>
>>>           <maxTime>180000</maxTime>
>>>         </autoCommit>
>>>         <autoSoftCommit>
>>>           <maxTime>1000</maxTime>
>>>         </autoSoftCommit>
>>>
>>>
>>> IW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush at
>>> getReader
>>> DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: pool-12-thread-1
>>> startFullFlush
>>> DW 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: anyChanges?
>>> numDocsInRam=7 deletes=false hasTickets:false pendingChangesInFullFlush:
>>> false
>>> DWFC 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]:
>>> addFlushableState
>>> DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_kc,
>>> aborting=false,
>>> numDocsInRAM=7, deleteQueue=DWDQ: [ generation: 1 ]]
>>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flush postings
>>> as
>>> segment _kc numDocs=7
>>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment has
>>> 0
>>> deleted docs
>>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: new segment has
>>> no
>>> vectors; norms; no docValues; prox; freqs
>>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]:
>>> flushedFiles=[_kc_Lucene40_0.frq, _kc.fnm, _kc_Lucene40_0.tim,
>>> _kc_nrm.cfs,
>>> _kc.fdx, _kc.fdt, _kc_Lucene40_0.prx, _kc_nrm.cfe, _kc_Lucene40_0.tip]
>>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed
>>> codec=Lucene40
>>> DWPT 0 [Wed Sep 19 11:07:45 CEST 2012; pool-12-thread-1]: flushed:
>>> segment=_kc ramUsed=0,095 MB newFlushedSize(includes docstores)=0,003 MB
>>> docs/MB=2.283,058
>>>
>

Reply via email to