Re: Indexing in one collection affect index in another collection

Jan Høydahl Fri, 25 Jan 2019 06:35:11 -0800

How do you start Solr, cause the solr.in.cmd you sent does not contain the 
memory settings. What other parameters do you start Solr with?


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 25. jan. 2019 kl. 15:28 skrev Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
> 
> Hi Jan,
> 
> We are using 64 bit Java, version 1.8.0_191.
> We started Solr with 6 GB heap size.
> 
> Besides Solr, we have ZooKeeper, IIS, Google Chrome and NotePad++  running
> on the machine. There is still 22 GB of memory left on the server, out of
> the 32 GB available on the machine.
> 
> Regards,
> Edwin
> 
> On Fri, 25 Jan 2019 at 21:04, Jan Høydahl <jan....@cominvent.com> wrote:
> 
>> Which java version? 32 or 64 bit? You start Solr with default 512Mb heap
>> size?
>> Other software running on the machine?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 25. jan. 2019 kl. 13:05 skrev Zheng Lin Edwin Yeo <edwinye...@gmail.com
>>> :
>>> 
>>> Hi Jan and Shawn,
>>> 
>>> For your info, this is another debug query.
>>> 
>>> "debug":{
>>> 
>>>   "rawquerystring":"johnny",
>>> 
>>>   "querystring":"johnny",
>>> 
>>>   "parsedquery":"searchFields_tcs:johnny",
>>> 
>>>   "parsedquery_toString":"searchFields_tcs:johnny",
>>> 
>>>   "explain":{
>>> 
>>>     "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in
>>> 75730) [SchemaSimilarity], result of:\n  12.8497505 =
>>> score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n    7.5108404
>>> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
>>> 0.5)) from:\n      473.0 = docFreq\n      865438.0 = docCount\n
>>> 1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
>>> b + b * fieldLength / avgFieldLength)) from:\n      4.0 =
>>> termFreq=4.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
>>> 26.66791 = avgFieldLength\n      25.0 = fieldLength\n”,
>>> 
>>>   "QParser":"LuceneQParser",
>>> 
>>>   "timing":{
>>> 
>>>     "time":350.0,
>>> 
>>>     "prepare":{
>>> 
>>>       "time":0.0,
>>> 
>>>       "query":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "facet":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "facet_module":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "mlt":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "highlight":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "stats":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "expand":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "terms":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "debug":{
>>> 
>>>         "time":0.0}},
>>> 
>>>     "process":{
>>> 
>>>       "time":348.0,
>>> 
>>>       "query":{
>>> 
>>>         "time":287.0},
>>> 
>>>       "facet":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "facet_module":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "mlt":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "highlight":{
>>> 
>>>         "time":54.0},
>>> 
>>>       "stats":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "expand":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "terms":{
>>> 
>>>         "time":0.0},
>>> 
>>>       "debug":{
>>> 
>>>         "time":6.0}},
>>> 
>>>     "loadFieldValues":{
>>> 
>>>       "time":0.0}}}}
>>> 
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> 
>>> On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
>>> wrote:
>>> 
>>>> Hi Jan and Shawn,
>>>> 
>>>> Please focus on the strange issue that I have described above in more
>>>> details, summary is as follows:
>>>> 
>>>> 1. Index customers data, then queries from highlight, select, and all
>>>> handlers are very fast (less than 50ms)
>>>> 
>>>> 2. Now index policies data, then queries on polices are very fast BUT
>>>> queries on customers becomes slow
>>>> 
>>>> 3. If I reindex customers data, then again queries for customers are
>> very
>>>> fast BUT queries on policies becomes slow.
>>>> 
>>>> How can you explain this behavior?
>>>> 
>>>> We have never experienced such a strange behavior before Solr 7.
>>>> 
>>>> Regards,
>>>> Edwin
>>>> 
>>>> On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo <edwinye...@gmail.com
>>> 
>>>> wrote:
>>>> 
>>>>> Hi Jan,
>>>>> 
>>>>> Referring to what you have mentioned that the highlighting takes up
>> most
>>>>> of the time in the first query from the policies collection, the
>>>>> highlighting was very fast (less than 50ms) from the time it was
>> indexed,
>>>>> till the time after customers collection gets indexed, in which it
>> slowed
>>>>> down tremendously.
>>>>> 
>>>>> Also, the slow down does not just affect on the highlight
>> requestHandler.
>>>>> It also affects other requestHandler like select and clustering as
>> well.
>>>>> All of them gets the QTime of more than 500ms. This is also proven in
>> the
>>>>> latest debug query that we have sent earlier, in which we have set
>> hl=false
>>>>> and fl=null.
>>>>> 
>>>>> Any idea or explanation on this strange behavior?
>>>>> Thank you for your support, as we look forward to shed some lights on
>>>>> this issue and to resolve it.
>>>>> 
>>>>> Regards,
>>>>> Edwin
>>>>> 
>>>>> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi Jan,
>>>>>> 
>>>>>> Thanks for your reply.
>>>>>> 
>>>>>> However, we are still getting a slow QTime of 517ms even after we set
>>>>>> hl=false&fl=null.
>>>>>> 
>>>>>> Below is the debug query:
>>>>>> 
>>>>>> "debug":{
>>>>>>   "rawquerystring":"cherry",
>>>>>>   "querystring":"cherry",
>>>>>>   "parsedquery":"searchFields_tcs:cherry",
>>>>>>   "parsedquery_toString":"searchFields_tcs:cherry",
>>>>>>   "explain":{
>>>>>>     "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in
>> 5747763) [SchemaSimilarity], result of:\n  14.227914 =
>> score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 =
>> idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
>> from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4798305 =
>> tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b *
>> fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 =
>> parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n
>>    25.0 = fieldLength\n",
>>>>>>     "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in
>> 4840794) [SchemaSimilarity], result of:\n  13.937909 =
>> score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 =
>> idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
>> from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4496675 =
>> tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b *
>> fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 =
>> parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n
>>    27.0 = fieldLength\n",
>>>>>>   "QParser":"LuceneQParser",
>>>>>>   "timing":{
>>>>>>     "time":517.0,
>>>>>>     "prepare":{
>>>>>>       "time":0.0,
>>>>>>       "query":{
>>>>>>         "time":0.0},
>>>>>>       "facet":{
>>>>>>         "time":0.0},
>>>>>>       "facet_module":{
>>>>>>         "time":0.0},
>>>>>>       "mlt":{
>>>>>>         "time":0.0},
>>>>>>       "highlight":{
>>>>>>         "time":0.0},
>>>>>>       "stats":{
>>>>>>         "time":0.0},
>>>>>>       "expand":{
>>>>>>         "time":0.0},
>>>>>>       "terms":{
>>>>>>         "time":0.0},
>>>>>>       "debug":{
>>>>>>         "time":0.0}},
>>>>>>     "process":{
>>>>>>       "time":516.0,
>>>>>>       "query":{
>>>>>>         "time":15.0},
>>>>>>       "facet":{
>>>>>>         "time":0.0},
>>>>>>       "facet_module":{
>>>>>>         "time":0.0},
>>>>>>       "mlt":{
>>>>>>         "time":0.0},
>>>>>>       "highlight":{
>>>>>>         "time":0.0},
>>>>>>       "stats":{
>>>>>>         "time":0.0},
>>>>>>       "expand":{
>>>>>>         "time":0.0},
>>>>>>       "terms":{
>>>>>>         "time":0.0},
>>>>>>       "debug":{
>>>>>>         "time":500.0}}}}}
>>>>>> 
>>>>>> Regards,
>>>>>> Edwin
>>>>>> 
>>>>>> 
>>>>>> On Thu, 24 Jan 2019 at 22:43, Jan Høydahl <jan....@cominvent.com>
>> wrote:
>>>>>> 
>>>>>>> Looks like highlighting takes most of the time on the first query
>>>>>>> (680ms). You config seems to ask for a lot of highlighting here,
>> like 100
>>>>>>> snippets of max 100000 characters etc.
>>>>>>> Sounds to me that this might be a highlighting configuration problem.
>>>>>>> Try to disable highlighting (hl=false) and see if you get back your
>> speed.
>>>>>>> Also, I see fl=* in your config, which is really asking for all
>> fields.
>>>>>>> Are you sure you want that, that may also be slow. Try to ask for
>> just the
>>>>>>> fields you will be using.
>>>>>>> 
>>>>>>> --
>>>>>>> Jan Høydahl, search solution architect
>>>>>>> Cominvent AS - www.cominvent.com
>>>>>>> 
>>>>>>>> 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <
>>>>>>> edwinye...@gmail.com>:
>>>>>>>> 
>>>>>>>> Thanks for your reply.
>>>>>>>> 
>>>>>>>> Below are what you have requested about our Solr setup,
>> configurations
>>>>>>>> files ,schema and results of debug queries:
>>>>>>>> 
>>>>>>>> Looking forward to your advice and support on our problem.
>>>>>>>> 
>>>>>>>> 1. System configurations
>>>>>>>> OS: Windows 10 Pro 64 bit
>>>>>>>> System Memory: 32GB
>>>>>>>> CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
>>>>>>>> Processor(s)
>>>>>>>> HDD: 3.0 TB (free 2.1 TB)  SATA
>>>>>>>> 
>>>>>>>> 2. solrconfig.xml of customers and policies collection, and solr.in
>>>>>>> ,cmd
>>>>>>>> which can be download from the following link:
>>>>>>>> 
>>>>>>> 
>> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
>>>>>>>> 
>>>>>>>> 3. The debug queries from both collections
>>>>>>>> 
>>>>>>>> *3.1. Debug Query From Policies ( which is Slow)*
>>>>>>>> 
>>>>>>>> "debug":{
>>>>>>>> 
>>>>>>>>  "rawquerystring":"sherry",
>>>>>>>> 
>>>>>>>>  "querystring":"sherry",
>>>>>>>> 
>>>>>>>>  "parsedquery":"searchFields_tcs:sherry",
>>>>>>>> 
>>>>>>>>  "parsedquery_toString":"searchFields_tcs:sherry",
>>>>>>>> 
>>>>>>>>  "explain":{
>>>>>>>> 
>>>>>>>>    "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
>>>>>>>> 3097315) [SchemaSimilarity], result of:\n  14.540428 =
>>>>>>>> score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
>>>>>>>> 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
>>>>>>>> (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
>>>>>>>> docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
>>>>>>>> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
>>>>>>>> 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
>>>>>>>> b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..
>>>>>>>> 
>>>>>>>>  "QParser":"LuceneQParser",
>>>>>>>> 
>>>>>>>>  "timing":{
>>>>>>>> 
>>>>>>>>    "time":681.0,
>>>>>>>> 
>>>>>>>>    "prepare":{
>>>>>>>> 
>>>>>>>>      "time":0.0,
>>>>>>>> 
>>>>>>>>      "query":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "facet":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "facet_module":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "mlt":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "highlight":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "stats":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "expand":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "terms":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "debug":{
>>>>>>>> 
>>>>>>>>        "time":0.0}},
>>>>>>>> 
>>>>>>>>    "process":{
>>>>>>>> 
>>>>>>>>      "time":680.0,
>>>>>>>> 
>>>>>>>>      "query":{
>>>>>>>> 
>>>>>>>>        "time":19.0},
>>>>>>>> 
>>>>>>>>      "facet":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "facet_module":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "mlt":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "highlight":{
>>>>>>>> 
>>>>>>>>        "time":651.0},
>>>>>>>> 
>>>>>>>>      "stats":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "expand":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "terms":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "debug":{
>>>>>>>> 
>>>>>>>>        "time":8.0}},
>>>>>>>> 
>>>>>>>>    "loadFieldValues":{
>>>>>>>> 
>>>>>>>>      "time":12.0}}}}
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> *3.2. Debug Query From Customers (which is fast because we index it
>>>>>>> after
>>>>>>>> indexing Policies):*
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> "debug":{
>>>>>>>> 
>>>>>>>>  "rawquerystring":"sherry",
>>>>>>>> 
>>>>>>>>  "querystring":"sherry",
>>>>>>>> 
>>>>>>>>  "parsedquery":"searchFields_tcs:sherry",
>>>>>>>> 
>>>>>>>>  "parsedquery_toString":"searchFields_tcs:sherry",
>>>>>>>> 
>>>>>>>>  "explain":{
>>>>>>>> 
>>>>>>>>    "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
>>>>>>>> 2453665) [SchemaSimilarity], result of:\n  13.191501 =
>>>>>>>> score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n
>> 9.08604
>>>>>>>> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
>>>>>>>> 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
>>>>>>>> 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1
>> -
>>>>>>>> b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
>>>>>>>> termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
>>>>>>>> 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
>>>>>>>> 
>>>>>>>>  "QParser":"LuceneQParser",
>>>>>>>> 
>>>>>>>>  "timing":{
>>>>>>>> 
>>>>>>>>    "time":38.0,
>>>>>>>> 
>>>>>>>>    "prepare":{
>>>>>>>> 
>>>>>>>>      "time":1.0,
>>>>>>>> 
>>>>>>>>      "query":{
>>>>>>>> 
>>>>>>>>        "time":1.0},
>>>>>>>> 
>>>>>>>>      "facet":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "facet_module":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "mlt":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "highlight":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "stats":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "expand":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "terms":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "debug":{
>>>>>>>> 
>>>>>>>>        "time":0.0}},
>>>>>>>> 
>>>>>>>>    "process":{
>>>>>>>> 
>>>>>>>>      "time":36.0,
>>>>>>>> 
>>>>>>>>      "query":{
>>>>>>>> 
>>>>>>>>        "time":1.0},
>>>>>>>> 
>>>>>>>>      "facet":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "facet_module":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "mlt":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "highlight":{
>>>>>>>> 
>>>>>>>>        "time":31.0},
>>>>>>>> 
>>>>>>>>      "stats":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "expand":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "terms":{
>>>>>>>> 
>>>>>>>>        "time":0.0},
>>>>>>>> 
>>>>>>>>      "debug":{
>>>>>>>> 
>>>>>>>>        "time":3.0}},
>>>>>>>> 
>>>>>>>>    "loadFieldValues":{
>>>>>>>> 
>>>>>>>>      "time":13.0}}}}
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best Regards,
>>>>>>>> Edwin
>>>>>>>> 
>>>>>>>> On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <jan....@cominvent.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> It would be useful if you can disclose the machine configuration,
>> OS,
>>>>>>>>> memory, settings etc, as well as solr config including solr.in <
>>>>>>>>> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole
>>>>>>> picture
>>>>>>>>> of memory, GC, etc.
>>>>>>>>> You could also specify debugQuery=true on a slow search and check
>> the
>>>>>>>>> timings section for clues. What QTime are you seeing on the slow
>>>>>>> queries in
>>>>>>>>> solr.log?
>>>>>>>>> If that does not reveal the reason, I'd connect to your solr
>>>>>>> instance with
>>>>>>>>> a tool like jVisualVM or similar, to inspect what takes time. Or
>>>>>>> better,
>>>>>>>>> hook up to DataDog, SPM or some other cloud tool to get a full view
>>>>>>> of the
>>>>>>>>> system.
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Jan Høydahl, search solution architect
>>>>>>>>> Cominvent AS - www.cominvent.com
>>>>>>>>> 
>>>>>>>>>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <
>>>>>>> edwinye...@gmail.com
>>>>>>>>>> :
>>>>>>>>>> 
>>>>>>>>>> Hi Shawn,
>>>>>>>>>> 
>>>>>>>>>> Unfortunately your reply of memory may not be valid. Please refer
>>>>>>> to my
>>>>>>>>>> explanation below of the strange behaviors (is it much more like a
>>>>>>> BUG
>>>>>>>>> than
>>>>>>>>>> anything else that is explainable):
>>>>>>>>>> 
>>>>>>>>>> Note that we still have 18GB of free unused memory on the server.
>>>>>>>>>> 
>>>>>>>>>> 1. We indexed the first collection called customers (3.7 millioin
>>>>>>> records
>>>>>>>>>> from CSV data), index size is 2.09GB. The search in customers for
>>>>>>> any
>>>>>>>>>> keyword is returned within 50ms (QTime) for using highlight
>> (unified
>>>>>>>>>> highlighter, posting, light term vectors)
>>>>>>>>>> 
>>>>>>>>>> 2. Then we indexed the second collection called policies (6
>> million
>>>>>>>>> records
>>>>>>>>>> from CSV data), index size is 2.55GB. The search in policies for
>> any
>>>>>>>>>> keyword is returned within 50ms (QTime) for using highlight
>> (unified
>>>>>>>>>> highlighter, posting, light term vectors)
>>>>>>>>>> 
>>>>>>>>>> 3. But now any search in customers for any keywords (not from
>> cache)
>>>>>>>>> takes
>>>>>>>>>> as high as 1200ms (QTime). But still policies search remains very
>>>>>>> fast
>>>>>>>>>> (50ms).
>>>>>>>>>> 
>>>>>>>>>> 4. So we decided to run the force optimize command on customers
>>>>>>>>> collection (
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
>>>>>>>>> ),
>>>>>>>>>> surprisingly after optimization the search on customers collection
>>>>>>> for
>>>>>>>>> any
>>>>>>>>>> keywords become very fast again (less than 50ms). BUT strangely,
>> the
>>>>>>>>> search
>>>>>>>>>> in policies collection become very slow (around 1200ms) without
>> any
>>>>>>>>> changes
>>>>>>>>>> to the policies collection.
>>>>>>>>>> 
>>>>>>>>>> 5. Based on above result, we decided to run the force optimize
>>>>>>> command on
>>>>>>>>>> policies collection (
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
>>>>>>>>> ).
>>>>>>>>>> More surprisingly, after optimization the search on policies
>>>>>>> collection
>>>>>>>>> for
>>>>>>>>>> any keywords become very fast again (less than 50ms). BUT more
>>>>>>> strangely,
>>>>>>>>>> the search in customers collection again become very slow (around
>>>>>>> 1200ms)
>>>>>>>>>> without any changes to the customers collection.
>>>>>>>>>> 
>>>>>>>>>> What a strange and unexpected behavior! If this is not a bug, how
>>>>>>> could
>>>>>>>>> you
>>>>>>>>>> explain the above very strange behavior in Solr 7.5. Could it be a
>>>>>>> bug?
>>>>>>>>>> 
>>>>>>>>>> We would appreciate any support or help on our above situation.
>>>>>>>>>> 
>>>>>>>>>> Thank you.
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Edwin
>>>>>>>>>> 
>>>>>>>>>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <
>>>>>>> edwinye...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Shawn,
>>>>>>>>>>> 
>>>>>>>>>>>> If the two collections have data on the same server(s), I can
>> see
>>>>>>> this
>>>>>>>>>>>> happening.  More memory is consumed when there is additional
>>>>>>> data, and
>>>>>>>>>>>> when Solr needs more memory, performance might be affected.  The
>>>>>>>>>>>> solution is generally to install more memory in the server.
>>>>>>>>>>> 
>>>>>>>>>>> I have found that even after we delete the index in collection2,
>>>>>>> the
>>>>>>>>> query
>>>>>>>>>>> QTime for collection1 still remains slow. It does not goes back
>> to
>>>>>>> its
>>>>>>>>>>> previous fast speed before we index collection2.
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Edwin
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <
>>>>>>> edwinye...@gmail.com
>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Shawn,
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for your reply.
>>>>>>>>>>>> 
>>>>>>>>>>>> The log only shows a list  the following and I don't see any
>>>>>>> other logs
>>>>>>>>>>>> besides these.
>>>>>>>>>>>> 
>>>>>>>>>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330)
>> [c:collectioin1
>>>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>>>>>>>> update-script#processAdd:
>>>>>>>>>>>> id=13245417
>>>>>>>>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330)
>> [c:collectioin1
>>>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>>>>>>>> update-script#processAdd:
>>>>>>>>>>>> id=13245430
>>>>>>>>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330)
>> [c:collectioin1
>>>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>>>>>>>> update-script#processAdd:
>>>>>>>>>>>> id=13245435
>>>>>>>>>>>> 
>>>>>>>>>>>> There is no change to the segments info. but the slowdown in the
>>>>>>> first
>>>>>>>>>>>> collection is very drastic.
>>>>>>>>>>>> Before the indexing of collection2, the collection1 query QTime
>>>>>>> are in
>>>>>>>>>>>> the range of 4 to 50 ms. However, after indexing collection2,
>> the
>>>>>>>>>>>> collection1 query QTime increases to more than 1000 ms. The
>> index
>>>>>>> are
>>>>>>>>> done
>>>>>>>>>>>> in CSV format, and the size of the index is 3GB.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Edwin
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <apa...@elyograg.org
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>>>>>>>>>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of
>>>>>>> when I
>>>>>>>>> am
>>>>>>>>>>>>>> indexing in collection2, the indexing affects the records in
>>>>>>>>>>>>> collection1.
>>>>>>>>>>>>>> Although the records are still intact, it seems that the
>>>>>>> settings of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> termVecotrs get wipe out, and the index size of collection1
>>>>>>> reduced
>>>>>>>>>>>>> from
>>>>>>>>>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This should not be possible.  Indexing in one collection should
>>>>>>> have
>>>>>>>>>>>>> absolutely no effect on another collection.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If logging has been left at its default settings, the solr.log
>>>>>>> file
>>>>>>>>>>>>> should have enough info to show what actually happened.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Also, the search in
>>>>>>>>>>>>>> collection1, which was originall very fast, becomes very slow
>>>>>>> after
>>>>>>>>> the
>>>>>>>>>>>>>> indexing is done is collection2.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If the two collections have data on the same server(s), I can
>>>>>>> see this
>>>>>>>>>>>>> happening.  More memory is consumed when there is additional
>>>>>>> data, and
>>>>>>>>>>>>> when Solr needs more memory, performance might be affected.
>> The
>>>>>>>>>>>>> solution is generally to install more memory in the server.  If
>>>>>>> the
>>>>>>>>>>>>> system is working, there should be no need to increase the heap
>>>>>>> size
>>>>>>>>>>>>> when the memory size increases ... but there can be situations
>>>>>>> where
>>>>>>>>> the
>>>>>>>>>>>>> heap is a little bit too small, where you WOULD want to
>> increase
>>>>>>> the
>>>>>>>>>>>>> heap size.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Shawn
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>> 
>>

Re: Indexing in one collection affect index in another collection

Reply via email to