Thank you all. I have around 70% free space in production. I will compute for the additional fields.
Sent from my mobile. Please excuse any typos. > On Apr 30, 2018, at 5:10 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > There's really no good way to purge deleted documents from the index > other than to wait until merging happens. > > Optimize/forceMerge and expungeDeletes both suffer from the problem > that they create massive segments that then stick around for a very > long time, see: > https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ > > Best, > Erick > >> On Mon, Apr 30, 2018 at 1:56 PM, Michael Joyner <mich...@newsrx.com> wrote: >> Based on experience, 2x head room is room is not always enough, sometimes >> not even 3x, if you are optimizing from many segments down to 1 segment in a >> single go. >> >> We have however figured out a way that can work with as little as 51% free >> space via the following iteration cycle: >> >> public void solrOptimize() { >> int initialMaxSegments = 256; >> int finalMaxSegments = 1; >> if (isShowSegmentCounter()) { >> log.info("Optimizing ..."); >> } >> try (SolrClient solrServerInstance = getSolrClientInstance()){ >> for (int segments=initialMaxSegments; >> segments>=finalMaxSegments; segments--) { >> if (isShowSegmentCounter()) { >> System.out.println("Optimizing to a max of "+segments+" >> segments."); >> } >> solrServerInstance.optimize(true, true, segments); >> } >> } catch (SolrServerException | IOException e) { >> throw new RuntimeException(e); >> >> } >> } >> >> >>> On 04/30/2018 04:23 PM, Walter Underwood wrote: >>> >>> You need 2X the minimum index size in disk space anyway, so don’t worry >>> about keeping the indexes as small as possible. Worry about having enough >>> headroom. >>> >>> If your indexes are 250 GB, you need 250 GB of free space. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>>> On Apr 30, 2018, at 1:13 PM, Antony A <antonyaugus...@gmail.com> wrote: >>>> >>>> Thanks Erick/Deepak. >>>> >>>> The cloud is running on baremetal (128 GB/24 cpu). >>>> >>>> Is there an option to run a compact on the data files to make the size >>>> equal on both the clouds? I am trying find all the options before I add >>>> the >>>> new fields into the production cloud. >>>> >>>> Thanks >>>> AA >>>> >>>> On Mon, Apr 30, 2018 at 10:45 AM, Erick Erickson >>>> <erickerick...@gmail.com> >>>> wrote: >>>> >>>>> Anthony: >>>>> >>>>> You are probably seeing the results of removing deleted documents from >>>>> the shards as they're merged. Even on replicas in the same _shard_, >>>>> the size of the index on disk won't necessarily be identical. This has >>>>> to do with which segments are selected for merging, which are not >>>>> necessarily coordinated across replicas. >>>>> >>>>> The test is if the number of docs on each collection is the same. If >>>>> it is, then don't worry about index sizes. >>>>> >>>>> Best, >>>>> Erick >>>>> >>>>>> On Mon, Apr 30, 2018 at 9:38 AM, Deepak Goel <deic...@gmail.com> wrote: >>>>>> >>>>>> Could you please also give the machine details of the two clouds you >>>>>> are >>>>>> running? >>>>>> >>>>>> >>>>>> >>>>>> Deepak >>>>>> "The greatness of a nation can be judged by the way its animals are >>>>>> treated. Please stop cruelty to Animals, become a Vegan" >>>>>> >>>>>> +91 73500 12833 >>>>>> deic...@gmail.com >>>>>> >>>>>> Facebook: https://www.facebook.com/deicool >>>>>> LinkedIn: www.linkedin.com/in/deicool >>>>>> >>>>>> "Plant a Tree, Go Green" >>>>>> >>>>>> Make In India : http://www.makeinindia.com/home >>>>>> >>>>>> On Mon, Apr 30, 2018 at 9:51 PM, Antony A <antonyaugus...@gmail.com> >>>>> >>>>> wrote: >>>>>>> >>>>>>> Hi Shawn, >>>>>>> >>>>>>> The cloud is running version 6.2.1. with ClassicIndexSchemaFactory >>>>>>> >>>>>>> The sum of size from admin UI on all the shards is around 265 G vs 224 >>>>>>> G >>>>>>> between the two clouds. >>>>>>> >>>>>>> I created the collection using "numShards" so compositeId router. >>>>>>> >>>>>>> If you need more information, please let me know. >>>>>>> >>>>>>> Thanks >>>>>>> AA >>>>>>> >>>>>>> On Mon, Apr 30, 2018 at 10:04 AM, Shawn Heisey <apa...@elyograg.org> >>>>>>> wrote: >>>>>>> >>>>>>>>> On 4/30/2018 9:51 AM, Antony A wrote: >>>>>>>>> >>>>>>>>> I am running two separate solr clouds. I have 8 shards in each with >>>>>>>>> a >>>>>>>>> total >>>>>>>>> of 300 million documents. Both the clouds are indexing the document >>>>> >>>>> from >>>>>>>>> >>>>>>>>> the same source/configuration. >>>>>>>>> >>>>>>>>> I am noticing there is a difference in the size of the collection >>>>>>> >>>>>>> between >>>>>>>>> >>>>>>>>> them. I am planning to add more shards to see if that helps solve >>>>>>>>> the >>>>>>>>> issue. Has anyone come across similar issue? >>>>>>>>> >>>>>>>> There's no information here about exactly what you are seeing, what >>>>> >>>>> you >>>>>>>> >>>>>>>> are expecting to see, and why you believe that what you are seeing is >>>>>>> >>>>>>> wrong. >>>>>>>> >>>>>>>> You did say that there is "a difference in size". That is a very >>>>> >>>>> vague >>>>>>>> >>>>>>>> problem description. >>>>>>>> >>>>>>>> FYI, unless a SolrCloud collection is using the implicit router, you >>>>>>>> cannot add shards. And if it *IS* using the implicit router, then >>>>>>>> you >>>>>>> >>>>>>> are >>>>>>>> >>>>>>>> 100% in control of document routing -- Solr cannot influence that at >>>>> >>>>> all. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Shawn >>>>>>>> >>>>>>>> >>> >>