The one exception that we should always note is that if your batch includes deletion of existing documents, an optimize can be appropriate since the term frequencies stored by Lucene may be off since the deleted documents still count as existing terms.

Is this exception noted in the Solr ref guide?

-- Jack Krupansky

-----Original Message----- From: Erick Erickson
Sent: Tuesday, June 24, 2014 11:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Does one need to perform an optimize soon after doing a batch indexing using SolrJ ?

Your indexing process looks fine, there's no reason to
change it.

Optimizing is _probably_ unnecessary at all. In fact in the 4.x
world it was changed to "forceMerge" to make it seem less
attractive (I mean, who wouldn't want an optimized index?)

That said, the batch indexing process has nothing at all to
do with optimization. Nothing in the process of adding docs
to a server will trigger an optimize.

In your case, since your index only changes once a week it
will help your performance a little (but perhaps so little you won't
notice) to optimize after the batch index is done.

In short, your process seems fine. Indexes are never optimized
unless you explicitly do it. After all, how would Solr know that
you are done with your batch indexing?

Best,
Erick

On Tue, Jun 24, 2014 at 5:32 AM, RadhaJayalakshmi
<rlakshminaraya...@inautix.co.in> wrote:
I am using Solr 4.5.1. I have two collections:
                Collection 1 - 2 shards, 3 replicas (Size of Shard 1 - 115
MB, Size of Shard 2 - 55 MB)
                Collection 2 - 2 shards, 3 replicas (Size of Shard 2 - 3.5
GB, Size of Shard 2 - 1 GB)

I have a batch process that performs indexing (full refresh) - once a week
on the same index.

Here is some information on how I index:
a) I use SolrJ's bulk ADD API for indexing - CloudSolrServer.add(Collection
docs).
b) I have an autoCommit (hardcommit) setting of for both my Collections
(solrConfig.xml):
                                <autoCommit>
                                                <maxDocs>100000</maxDocs>

<openSearcher>false</openSearcher>
                                </autoCommit>
c) I do a programatic hardcommit at the end of the indexing cycle - with an
open searcher of "true" - so that the documents show up on the Search
Results.
d) I neither programatically soft commit (nor have any autoSoftCommit
seetings) during the batch indexing process
e) When I re-index all my data again (the following week) into the same
index - I don't delete existing docs. Rather, I just re-index into the same
Collection.
f) I am using the default mergefactor of 10 in my solrconfig.xml
                <mergeFactor>10</mergeFactor>

Here is what I am observing:
1) After a batch indexing cycle - the segment counts for each shard / core
is pretty high. The Solr Dashboard reports segment counts between 8 - 30
segments on the variousr cores.
2) Sometimes the Solr Dashboard shows the status of my Core as - NOT
OPTIMIZED. This I find unusual - since I have just finished a Batch indexing
cycle - and would assume that the Index should already be optimized - Is
this happening because I don't delete my docs before re-indexing all my data
?
3) After I run an optimize on my Collections - the segment count does reduce
to significantly - to 1 segment.

Am I doing indexing the right way ? Is there a better strategy ?

Is it necessary to perform an optimize after every batch indexing cycle ??

The outcome I am looking for is that I need an optimized index after every
major Batch Indexing cycle.

Thanks!!



--
View this message in context: http://lucene.472066.n3.nabble.com/Does-one-need-to-perform-an-optimize-soon-after-doing-a-batch-indexing-using-SolrJ-tp4143686.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to