The main reason we go this route is that after awhile (with default settings) we end up with hundreds of shards and performance of course drops abysmally as a result. By using a stepped optimize a) we don't run into the we need the 3x+ head room issue, b) optimize performance penalty during optimize is less than the hundreds of shards not being optimized performance penalty.

BTW, as we use batched a batch insert/update cycle [once daily] we only do optimize to a segment of 1 after a complete batch has been run. Though during the batch we reduce segment counts down to a max of 16 every 250K insert/updates to prevent the large segment count performance penalty.


On 04/30/2018 07:10 PM, Erick Erickson wrote:
There's really no good way to purge deleted documents from the index
other than to wait until merging happens.

Optimize/forceMerge and expungeDeletes both suffer from the problem
that they create massive segments that then stick around for a very
long time, see:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Best,
Erick

On Mon, Apr 30, 2018 at 1:56 PM, Michael Joyner <mich...@newsrx.com> wrote:
Based on experience, 2x head room is room is not always enough, sometimes
not even 3x, if you are optimizing from many segments down to 1 segment in a
single go.

We have however figured out a way that can work with as little as 51% free
space via the following iteration cycle:

public void solrOptimize() {
         int initialMaxSegments = 256;
         int finalMaxSegments = 1;
         if (isShowSegmentCounter()) {
             log.info("Optimizing ...");
         }
         try (SolrClient solrServerInstance = getSolrClientInstance()){
             for (int segments=initialMaxSegments;
segments>=finalMaxSegments; segments--) {
                 if (isShowSegmentCounter()) {
                     System.out.println("Optimizing to a max of "+segments+"
segments.");
                 }
                 solrServerInstance.optimize(true, true, segments);
             }
         } catch (SolrServerException | IOException e) {
             throw new RuntimeException(e);

         }
     }


On 04/30/2018 04:23 PM, Walter Underwood wrote:
You need 2X the minimum index size in disk space anyway, so don’t worry
about keeping the indexes as small as possible. Worry about having enough
headroom.

If your indexes are 250 GB, you need 250 GB of free space.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Apr 30, 2018, at 1:13 PM, Antony A <antonyaugus...@gmail.com> wrote:

Thanks Erick/Deepak.

The cloud is running on baremetal (128 GB/24 cpu).

Is there an option to run a compact on the data files to make the size
equal on both the clouds? I am trying find all the options before I add
the
new fields into the production cloud.

Thanks
AA

On Mon, Apr 30, 2018 at 10:45 AM, Erick Erickson
<erickerick...@gmail.com>
wrote:

Anthony:

You are probably seeing the results of removing deleted documents from
the shards as they're merged. Even on replicas in the same _shard_,
the size of the index on disk won't necessarily be identical. This has
to do with which segments are selected for merging, which are not
necessarily coordinated across replicas.

The test is if the number of docs on each collection is the same. If
it is, then don't worry about index sizes.

Best,
Erick

On Mon, Apr 30, 2018 at 9:38 AM, Deepak Goel <deic...@gmail.com> wrote:
Could you please also give the machine details of the two clouds you
are
running?



Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Mon, Apr 30, 2018 at 9:51 PM, Antony A <antonyaugus...@gmail.com>
wrote:
Hi Shawn,

The cloud is running version 6.2.1. with ClassicIndexSchemaFactory

The sum of size from admin UI on all the shards is around 265 G vs 224
G
between the two clouds.

I created the collection using "numShards" so compositeId router.

If you need more information, please let me know.

Thanks
AA

On Mon, Apr 30, 2018 at 10:04 AM, Shawn Heisey <apa...@elyograg.org>
wrote:

On 4/30/2018 9:51 AM, Antony A wrote:

I am running two separate solr clouds. I have 8 shards in each with
a
total
of 300 million documents. Both the clouds are indexing the document
from
the same source/configuration.

I am noticing there is a difference in the size of the collection
between
them. I am planning to add more shards to see if that helps solve
the
issue. Has anyone come across similar issue?

There's no information here about exactly what you are seeing, what
you
are expecting to see, and why you believe that what you are seeing is
wrong.
You did say that there is "a difference in size".  That is a very
vague
problem description.

FYI, unless a SolrCloud collection is using the implicit router, you
cannot add shards.  And if it *IS* using the implicit router, then
you
are
100% in control of document routing -- Solr cannot influence that at
all.
Thanks,
Shawn



Reply via email to