There's really no good way to purge deleted documents from the index
other than to wait until merging happens.

Optimize/forceMerge and expungeDeletes both suffer from the problem
that they create massive segments that then stick around for a very
long time, see:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Best,
Erick

On Mon, Apr 30, 2018 at 1:56 PM, Michael Joyner <mich...@newsrx.com> wrote:
> Based on experience, 2x head room is room is not always enough, sometimes
> not even 3x, if you are optimizing from many segments down to 1 segment in a
> single go.
>
> We have however figured out a way that can work with as little as 51% free
> space via the following iteration cycle:
>
> public void solrOptimize() {
>         int initialMaxSegments = 256;
>         int finalMaxSegments = 1;
>         if (isShowSegmentCounter()) {
>             log.info("Optimizing ...");
>         }
>         try (SolrClient solrServerInstance = getSolrClientInstance()){
>             for (int segments=initialMaxSegments;
> segments>=finalMaxSegments; segments--) {
>                 if (isShowSegmentCounter()) {
>                     System.out.println("Optimizing to a max of "+segments+"
> segments.");
>                 }
>                 solrServerInstance.optimize(true, true, segments);
>             }
>         } catch (SolrServerException | IOException e) {
>             throw new RuntimeException(e);
>
>         }
>     }
>
>
> On 04/30/2018 04:23 PM, Walter Underwood wrote:
>>
>> You need 2X the minimum index size in disk space anyway, so don’t worry
>> about keeping the indexes as small as possible. Worry about having enough
>> headroom.
>>
>> If your indexes are 250 GB, you need 250 GB of free space.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>> On Apr 30, 2018, at 1:13 PM, Antony A <antonyaugus...@gmail.com> wrote:
>>>
>>> Thanks Erick/Deepak.
>>>
>>> The cloud is running on baremetal (128 GB/24 cpu).
>>>
>>> Is there an option to run a compact on the data files to make the size
>>> equal on both the clouds? I am trying find all the options before I add
>>> the
>>> new fields into the production cloud.
>>>
>>> Thanks
>>> AA
>>>
>>> On Mon, Apr 30, 2018 at 10:45 AM, Erick Erickson
>>> <erickerick...@gmail.com>
>>> wrote:
>>>
>>>> Anthony:
>>>>
>>>> You are probably seeing the results of removing deleted documents from
>>>> the shards as they're merged. Even on replicas in the same _shard_,
>>>> the size of the index on disk won't necessarily be identical. This has
>>>> to do with which segments are selected for merging, which are not
>>>> necessarily coordinated across replicas.
>>>>
>>>> The test is if the number of docs on each collection is the same. If
>>>> it is, then don't worry about index sizes.
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Mon, Apr 30, 2018 at 9:38 AM, Deepak Goel <deic...@gmail.com> wrote:
>>>>>
>>>>> Could you please also give the machine details of the two clouds you
>>>>> are
>>>>> running?
>>>>>
>>>>>
>>>>>
>>>>> Deepak
>>>>> "The greatness of a nation can be judged by the way its animals are
>>>>> treated. Please stop cruelty to Animals, become a Vegan"
>>>>>
>>>>> +91 73500 12833
>>>>> deic...@gmail.com
>>>>>
>>>>> Facebook: https://www.facebook.com/deicool
>>>>> LinkedIn: www.linkedin.com/in/deicool
>>>>>
>>>>> "Plant a Tree, Go Green"
>>>>>
>>>>> Make In India : http://www.makeinindia.com/home
>>>>>
>>>>> On Mon, Apr 30, 2018 at 9:51 PM, Antony A <antonyaugus...@gmail.com>
>>>>
>>>> wrote:
>>>>>>
>>>>>> Hi Shawn,
>>>>>>
>>>>>> The cloud is running version 6.2.1. with ClassicIndexSchemaFactory
>>>>>>
>>>>>> The sum of size from admin UI on all the shards is around 265 G vs 224
>>>>>> G
>>>>>> between the two clouds.
>>>>>>
>>>>>> I created the collection using "numShards" so compositeId router.
>>>>>>
>>>>>> If you need more information, please let me know.
>>>>>>
>>>>>> Thanks
>>>>>> AA
>>>>>>
>>>>>> On Mon, Apr 30, 2018 at 10:04 AM, Shawn Heisey <apa...@elyograg.org>
>>>>>> wrote:
>>>>>>
>>>>>>> On 4/30/2018 9:51 AM, Antony A wrote:
>>>>>>>
>>>>>>>> I am running two separate solr clouds. I have 8 shards in each with
>>>>>>>> a
>>>>>>>> total
>>>>>>>> of 300 million documents. Both the clouds are indexing the document
>>>>
>>>> from
>>>>>>>>
>>>>>>>> the same source/configuration.
>>>>>>>>
>>>>>>>> I am noticing there is a difference in the size of the collection
>>>>>>
>>>>>> between
>>>>>>>>
>>>>>>>> them. I am planning to add more shards to see if that helps solve
>>>>>>>> the
>>>>>>>> issue. Has anyone come across similar issue?
>>>>>>>>
>>>>>>> There's no information here about exactly what you are seeing, what
>>>>
>>>> you
>>>>>>>
>>>>>>> are expecting to see, and why you believe that what you are seeing is
>>>>>>
>>>>>> wrong.
>>>>>>>
>>>>>>> You did say that there is "a difference in size".  That is a very
>>>>
>>>> vague
>>>>>>>
>>>>>>> problem description.
>>>>>>>
>>>>>>> FYI, unless a SolrCloud collection is using the implicit router, you
>>>>>>> cannot add shards.  And if it *IS* using the implicit router, then
>>>>>>> you
>>>>>>
>>>>>> are
>>>>>>>
>>>>>>> 100% in control of document routing -- Solr cannot influence that at
>>>>
>>>> all.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Shawn
>>>>>>>
>>>>>>>
>>
>

Reply via email to