(1) no. The internal Ram buffer will pretty much limit the amount of heap used 
however.

(2) You actually have several segments. “.cfs” stands for “Compound File”, see: 

https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/codecs/lucene70/package-summary.html
"An optional "virtual" file consisting of all the other index files for systems 
that frequently run out of file handles.”

IOW, _0.cfs is a complete segment. _1.cfs is a different, complete segment etc. 
The merge policy (TieredMergePolicy) controls when these are used .vs. the 
segment being kept in separate files.

New segments are created whenever the ram buffer is flushed or whenever you do 
a commit (closing the IW also creates a segment IIUC). However, under control 
of the merge policy, segments are merged. See: 
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

You’re confusing closing a writer with merging segments. Essentially, every 
time a commit happens, the merge policy is called to determine if segments 
should be merged, see Mike’s blog above.

Additionally, you say "I was hoping there would be only _0.cfs file”. This’ll 
pretty much never happen. Segment names always increase, at best you’d have 
something like _ab.cfs, if not 10-15 _ab* files.

Lucene likes file handles, essentially when searching a file handle will be 
open for _every_ file in your index all the time.

All that said, counting the number of files seems like a waste of time. If 
you’re running on a *nix box, the usual (Solr I’ll admit, but I think it 
applies to Lucene as well) is to set the limit to 65K or so.

And if you’re truly concerned, and since you say this is an immutable, you can 
do a forceMerge. Prior to Lucene 7.5, the would by default form exactly one 
segment. For Lucene 7.5 and later, it’ll respect max segment size (a parameter 
in TMP, defaults to 5g) unless you specify a segment count of 1.

Best,
Erick

> On Nov 11, 2019, at 5:47 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> 
> On 11/11/2019 1:40 PM, siddharth teotia wrote:
>> I have a few questions about Lucene indexing and file handling. It would be
>> great if someone can help with these. I had earlier asked these questions
>> on gene...@lucene.apache.org but was asked to seek help here.
> 
> This mailing list (solr-user) is for Solr.  Questions about Lucene do not 
> belong on this list.
> 
> You should ask on the java-user mailing list, which is for questions related 
> to the core (Java) version of Lucene.
> 
> http://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg
> 
> I have put the original sender address in the BCC field just in case you are 
> not subscribed here.
> 
> Thanks,
> Shawn

Reply via email to