Hi Danilo,

We have a solr 7.3.1 instance with around 40 MLN documents in it.


I guess you are hard committing after few of millions of docs are indexed,
right? I suggest you not to fully avoid hard committing. Set *autoCommit*
(not autoSoftCommit) at around half a million of documents (that's from my
experience given my core of 250 million documents). Obviously, you need to
find the sweet spot yourself but you can start with this number.

Also, play with values of *IndexConfig*
<https://lucene.apache.org/solr/guide/6_6/indexconfig-in-solrconfig.html>
(merge
factor, segment size, maxBufferedDocs, Merge Policies). We, at
Auto-Suggest, also do atomic updates daily and specifically changing merge
factor gave us a boost of ~4x during indexing. At current configuration,
our core atomically updates ~423 documents per second. I also do few core
optimizations in between the full indexing.

On Thu, 24 Oct 2019 at 13:31, Danilo Tomasoni <tomas...@cosbi.eu> wrote:

> Hello all,
>
> we have a solr 7.3.1 instance with around 40 MLN documents in it.
>
> After the initial one-shot import, we found an issue in the import
> software, we updated it and re-run the import that will atomically
> update (with set)
>
> the existing documents.
>
> The import is divided into processes, each process is responsible of
> updating a portion of the documents.
>
> For every document processed, a soft commit is performed to make the
> update visible to other concurrent update processes.
>
> Every process at the end will perform an hard commit.
>
> The issue I have is that hard commits never terminate (it's ongoing by
> more than 3 days) and the number of segments and the solr index will
> grow a lot.
>
> In the past when the commit finished I was used to incrementally
> optimize the index (from 40 segments to 39, to 38 and so on)
>
> but also here the process is very slow.
>
>
> Any advice on how to speed up things?
>
> I checked the system usage in the solr machine and neither I/O nor CPU
> are heavily used..
>
>
> Thanks
>
> Danilo
>
> --
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu
> http://www.cosbi.eu
>
> As for the European General Data Protection Regulation 2016/679 on the
> protection of natural persons with regard to the processing of personal
> data, we inform you that all the data we possess are object of treatment in
> the respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how;
> you may ask for their correction, cancellation or you may oppose to their
> use by written request sent by recorded delivery to The Microsoft Research
> – University of Trento Centre for Computational and Systems Biology Scarl,
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.

Reply via email to