Thanks for your reply. Below are what you have requested about our Solr setup, configurations files ,schema and results of debug queries:
Looking forward to your advice and support on our problem. 1. System configurations OS: Windows 10 Pro 64 bit System Memory: 32GB CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical Processor(s) HDD: 3.0 TB (free 2.1 TB) SATA 2. solrconfig.xml of customers and policies collection, and solr.in,cmd which can be download from the following link: https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing 3. The debug queries from both collections *3.1. Debug Query From Policies ( which is Slow)* "debug":{ "rawquerystring":"sherry", "querystring":"sherry", "parsedquery":"searchFields_tcs:sherry", "parsedquery_toString":"searchFields_tcs:sherry", "explain":{ "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in 3097315) [SchemaSimilarity], result of:\n 14.540428 = score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n 812.0 = docFreq\n 6000000.0 = docCount\n 1.6324438 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n 5.0 = termFreq=5.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 19.397041 = avgFieldLength\n 31.0 = fieldLength\n”,.. "QParser":"LuceneQParser", "timing":{ "time":681.0, "prepare":{ "time":0.0, "query":{ "time":0.0}, "facet":{ "time":0.0}, "facet_module":{ "time":0.0}, "mlt":{ "time":0.0}, "highlight":{ "time":0.0}, "stats":{ "time":0.0}, "expand":{ "time":0.0}, "terms":{ "time":0.0}, "debug":{ "time":0.0}}, "process":{ "time":680.0, "query":{ "time":19.0}, "facet":{ "time":0.0}, "facet_module":{ "time":0.0}, "mlt":{ "time":0.0}, "highlight":{ "time":651.0}, "stats":{ "time":0.0}, "expand":{ "time":0.0}, "terms":{ "time":0.0}, "debug":{ "time":8.0}}, "loadFieldValues":{ "time":12.0}}}} *3.2. Debug Query From Customers (which is fast because we index it after indexing Policies):* "debug":{ "rawquerystring":"sherry", "querystring":"sherry", "parsedquery":"searchFields_tcs:sherry", "parsedquery_toString":"searchFields_tcs:sherry", "explain":{ "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in 2453665) [SchemaSimilarity], result of:\n 13.191501 = score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n 9.08604 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n 428.0 = docFreq\n 3784142.0 = docCount\n 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n 3.0 = termFreq=3.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 20.22558 = avgFieldLength\n 28.0 = fieldLength\n”, .. "QParser":"LuceneQParser", "timing":{ "time":38.0, "prepare":{ "time":1.0, "query":{ "time":1.0}, "facet":{ "time":0.0}, "facet_module":{ "time":0.0}, "mlt":{ "time":0.0}, "highlight":{ "time":0.0}, "stats":{ "time":0.0}, "expand":{ "time":0.0}, "terms":{ "time":0.0}, "debug":{ "time":0.0}}, "process":{ "time":36.0, "query":{ "time":1.0}, "facet":{ "time":0.0}, "facet_module":{ "time":0.0}, "mlt":{ "time":0.0}, "highlight":{ "time":31.0}, "stats":{ "time":0.0}, "expand":{ "time":0.0}, "terms":{ "time":0.0}, "debug":{ "time":3.0}}, "loadFieldValues":{ "time":13.0}}}} Best Regards, Edwin On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <jan....@cominvent.com> wrote: > It would be useful if you can disclose the machine configuration, OS, > memory, settings etc, as well as solr config including solr.in < > http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole picture > of memory, GC, etc. > You could also specify debugQuery=true on a slow search and check the > timings section for clues. What QTime are you seeing on the slow queries in > solr.log? > If that does not reveal the reason, I'd connect to your solr instance with > a tool like jVisualVM or similar, to inspect what takes time. Or better, > hook up to DataDog, SPM or some other cloud tool to get a full view of the > system. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <edwinye...@gmail.com > >: > > > > Hi Shawn, > > > > Unfortunately your reply of memory may not be valid. Please refer to my > > explanation below of the strange behaviors (is it much more like a BUG > than > > anything else that is explainable): > > > > Note that we still have 18GB of free unused memory on the server. > > > > 1. We indexed the first collection called customers (3.7 millioin records > > from CSV data), index size is 2.09GB. The search in customers for any > > keyword is returned within 50ms (QTime) for using highlight (unified > > highlighter, posting, light term vectors) > > > > 2. Then we indexed the second collection called policies (6 million > records > > from CSV data), index size is 2.55GB. The search in policies for any > > keyword is returned within 50ms (QTime) for using highlight (unified > > highlighter, posting, light term vectors) > > > > 3. But now any search in customers for any keywords (not from cache) > takes > > as high as 1200ms (QTime). But still policies search remains very fast > > (50ms). > > > > 4. So we decided to run the force optimize command on customers > collection ( > > > https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false > ), > > surprisingly after optimization the search on customers collection for > any > > keywords become very fast again (less than 50ms). BUT strangely, the > search > > in policies collection become very slow (around 1200ms) without any > changes > > to the policies collection. > > > > 5. Based on above result, we decided to run the force optimize command on > > policies collection ( > > > https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false > ). > > More surprisingly, after optimization the search on policies collection > for > > any keywords become very fast again (less than 50ms). BUT more strangely, > > the search in customers collection again become very slow (around 1200ms) > > without any changes to the customers collection. > > > > What a strange and unexpected behavior! If this is not a bug, how could > you > > explain the above very strange behavior in Solr 7.5. Could it be a bug? > > > > We would appreciate any support or help on our above situation. > > > > Thank you. > > > > Regards, > > Edwin > > > > On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > > wrote: > > > >> Hi Shawn, > >> > >>> If the two collections have data on the same server(s), I can see this > >>> happening. More memory is consumed when there is additional data, and > >>> when Solr needs more memory, performance might be affected. The > >>> solution is generally to install more memory in the server. > >> > >> I have found that even after we delete the index in collection2, the > query > >> QTime for collection1 still remains slow. It does not goes back to its > >> previous fast speed before we index collection2. > >> > >> Regards, > >> Edwin > >> > >> > >> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <edwinye...@gmail.com > > > >> wrote: > >> > >>> Hi Shawn, > >>> > >>> Thanks for your reply. > >>> > >>> The log only shows a list the following and I don't see any other logs > >>> besides these. > >>> > >>> 2019-01-24 02:47:57.925 INFO (qtp2131952342-1330) [c:collectioin1 > >>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2] > >>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory > update-script#processAdd: > >>> id=13245417 > >>> 2019-01-24 02:47:57.957 INFO (qtp2131952342-1330) [c:collectioin1 > >>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2] > >>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory > update-script#processAdd: > >>> id=13245430 > >>> 2019-01-24 02:47:57.957 INFO (qtp2131952342-1330) [c:collectioin1 > >>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2] > >>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory > update-script#processAdd: > >>> id=13245435 > >>> > >>> There is no change to the segments info. but the slowdown in the first > >>> collection is very drastic. > >>> Before the indexing of collection2, the collection1 query QTime are in > >>> the range of 4 to 50 ms. However, after indexing collection2, the > >>> collection1 query QTime increases to more than 1000 ms. The index are > done > >>> in CSV format, and the size of the index is 3GB. > >>> > >>> Regards, > >>> Edwin > >>> > >>> > >>> > >>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <apa...@elyograg.org> > wrote: > >>> > >>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote: > >>>>> I am using Solr 7.5.0, and currently I am facing an issue of when I > am > >>>>> indexing in collection2, the indexing affects the records in > >>>> collection1. > >>>>> Although the records are still intact, it seems that the settings of > >>>> the > >>>>> termVecotrs get wipe out, and the index size of collection1 reduced > >>>> from > >>>>> 3.3GB to 2.1GB after I do the indexing in collection2. > >>>> > >>>> This should not be possible. Indexing in one collection should have > >>>> absolutely no effect on another collection. > >>>> > >>>> If logging has been left at its default settings, the solr.log file > >>>> should have enough info to show what actually happened. > >>>> > >>>>> Also, the search in > >>>>> collection1, which was originall very fast, becomes very slow after > the > >>>>> indexing is done is collection2. > >>>> > >>>> If the two collections have data on the same server(s), I can see this > >>>> happening. More memory is consumed when there is additional data, and > >>>> when Solr needs more memory, performance might be affected. The > >>>> solution is generally to install more memory in the server. If the > >>>> system is working, there should be no need to increase the heap size > >>>> when the memory size increases ... but there can be situations where > the > >>>> heap is a little bit too small, where you WOULD want to increase the > >>>> heap size. > >>>> > >>>> Thanks, > >>>> Shawn > >>>> > >>>> > >