Thank you for the interesting reply. You confirmed our assumptions about that. The usage of two or more collections, as Jörn Franke said, is more complicated for developing. And for a now we will only try split image to more shards and servers and try to reduce commit times too.
I think that NRT times about one minute are acceptable Thank you On 2019/08/06 19:59:49, Shawn Heisey <apa...@elyograg.org> wrote: > On 7/31/2019 6:47 AM, profiuser wrote: > > we have something about 400 000 000 items in a solr collection. > > We have set up auto commit property for this collection to 15 minutes. > > Is a big collection and we using some caches etc. Therefore we have big > > autocommit value. > > I would set autoCommit to 60 seconds (a value of 60000) with > openSearcher set to false. This will not affect change visibility in > any way, but it will keep your transaction logs from becoming huge. > Commits that do NOT open a new searcher are very fast. > > Then I would use autoSoftCommit as a failsafe on change visibility. > Start with a value between two and five minutes. > > > This have disadvantage that we haven't NRT searches. > > > > We would like to have NRT at least for searching for the newly added items. > > > > We read about new functionality "Category routed alilases" in a solr version > > 8.1. > > > > And we got an idea, that we could add to our collection schema field for > > routing. > > And at the time of indexing we check if item is new and to routing field we > > set up value "new", or the item is older than some time period we set up > > value to "old". > > And we will have one category routed alias routedCollection, and there will > > be 2 collections old and new. > > > > If we index new item, router choose new collection and this item is inserted > > to it. After some period we reindex item and we decide that this item is old > > and to routing field we set up value "old". Router decide to update (insert) > > item to collection old. But we expect that solr automatically check > > uniqueness in all routed collections. And if solr found item in other > > collection, than will be automatically deleted. But not !!! > > > > Is this expected behaviour? > > I know very little about the new routed collection capability, but in > general, I would not expect Solr to check more than one collection for > an existing ID value when it is indexing. I don't think there's > anything happening at that level that even knows about other > collections. If you want to split your index into hot and cold pieces, > you're probably going to need to have your indexing software be aware of > that and either figure out where to send deletes, or just send deletes > to all parts of the index. > > What kind of lag time do you think about when you imagine near real time > indexing? Note that extremely short NRT times may not be achievable, > especially with the large index you're using. A good starting point in > my opinion is 30000, which is 30 seconds. > > What I would do is use the autoCommit and autoSoftCommit settings that I > mentioned above, and include a "commitWithin" parameter on all indexing > requests. The commitWithin would be for NRT. > > Thanks, > Shawn >