Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo Thu, 24 Jan 2019 06:00:34 -0800

Thanks for your reply.

Below are what you have requested about our Solr setup, configurations
files ,schema and results of debug queries:


Looking forward to your advice and support on our problem.

1. System configurations
OS: Windows 10 Pro 64 bit
System Memory: 32GB
CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
Processor(s)
HDD: 3.0 TB (free 2.1 TB)  SATA

2. solrconfig.xml of customers and policies collection, and solr.in,cmd
which can be download from the following link:
https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing

3. The debug queries from both collections

*3.1. Debug Query From Policies ( which is Slow)*

  "debug":{

    "rawquerystring":"sherry",

    "querystring":"sherry",

    "parsedquery":"searchFields_tcs:sherry",

    "parsedquery_toString":"searchFields_tcs:sherry",

    "explain":{

      "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
3097315) [SchemaSimilarity], result of:\n  14.540428 =
score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
(freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..

    "QParser":"LuceneQParser",

    "timing":{

      "time":681.0,

      "prepare":{

        "time":0.0,

        "query":{

          "time":0.0},

        "facet":{

          "time":0.0},

        "facet_module":{

          "time":0.0},

        "mlt":{

          "time":0.0},

        "highlight":{

          "time":0.0},

        "stats":{

          "time":0.0},

        "expand":{

          "time":0.0},

        "terms":{

          "time":0.0},

        "debug":{

          "time":0.0}},

      "process":{

        "time":680.0,

        "query":{

          "time":19.0},

        "facet":{

          "time":0.0},

        "facet_module":{

          "time":0.0},

        "mlt":{

          "time":0.0},

        "highlight":{

          "time":651.0},

        "stats":{

          "time":0.0},

        "expand":{

          "time":0.0},

        "terms":{

          "time":0.0},

        "debug":{

          "time":8.0}},

      "loadFieldValues":{

        "time":12.0}}}}



*3.2. Debug Query From Customers (which is fast because we index it after
indexing Policies):*



  "debug":{

    "rawquerystring":"sherry",

    "querystring":"sherry",

    "parsedquery":"searchFields_tcs:sherry",

    "parsedquery_toString":"searchFields_tcs:sherry",

    "explain":{

      "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
2453665) [SchemaSimilarity], result of:\n  13.191501 =
score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n    9.08604
= idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..

    "QParser":"LuceneQParser",

    "timing":{

      "time":38.0,

      "prepare":{

        "time":1.0,

        "query":{

          "time":1.0},

        "facet":{

          "time":0.0},

        "facet_module":{

          "time":0.0},

        "mlt":{

          "time":0.0},

        "highlight":{

          "time":0.0},

        "stats":{

          "time":0.0},

        "expand":{

          "time":0.0},

        "terms":{

          "time":0.0},

        "debug":{

          "time":0.0}},

      "process":{

        "time":36.0,

        "query":{

          "time":1.0},

        "facet":{

          "time":0.0},

        "facet_module":{

          "time":0.0},

        "mlt":{

          "time":0.0},

        "highlight":{

          "time":31.0},

        "stats":{

          "time":0.0},

        "expand":{

          "time":0.0},

        "terms":{

          "time":0.0},

        "debug":{

          "time":3.0}},

      "loadFieldValues":{

        "time":13.0}}}}



Best Regards,
Edwin

On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <jan....@cominvent.com> wrote:

> It would be useful if you can disclose the machine configuration, OS,
> memory, settings etc, as well as solr config including solr.in <
> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole picture
> of memory, GC, etc.
> You could also specify debugQuery=true on a slow search and check the
> timings section for clues. What QTime are you seeing on the slow queries in
> solr.log?
> If that does not reveal the reason, I'd connect to your solr instance with
> a tool like jVisualVM or similar, to inspect what takes time. Or better,
> hook up to DataDog, SPM or some other cloud tool to get a full view of the
> system.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <edwinye...@gmail.com
> >:
> >
> > Hi Shawn,
> >
> > Unfortunately your reply of memory may not be valid. Please refer to my
> > explanation below of the strange behaviors (is it much more like a BUG
> than
> > anything else that is explainable):
> >
> > Note that we still have 18GB of free unused memory on the server.
> >
> > 1. We indexed the first collection called customers (3.7 millioin records
> > from CSV data), index size is 2.09GB. The search in customers for any
> > keyword is returned within 50ms (QTime) for using highlight (unified
> > highlighter, posting, light term vectors)
> >
> > 2. Then we indexed the second collection called policies (6 million
> records
> > from CSV data), index size is 2.55GB. The search in policies for any
> > keyword is returned within 50ms (QTime) for using highlight (unified
> > highlighter, posting, light term vectors)
> >
> > 3. But now any search in customers for any keywords (not from cache)
> takes
> > as high as 1200ms (QTime). But still policies search remains very fast
> > (50ms).
> >
> > 4. So we decided to run the force optimize command on customers
> collection (
> >
> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
> ),
> > surprisingly after optimization the search on customers collection for
> any
> > keywords become very fast again (less than 50ms). BUT strangely, the
> search
> > in policies collection become very slow (around 1200ms) without any
> changes
> > to the policies collection.
> >
> > 5. Based on above result, we decided to run the force optimize command on
> > policies collection (
> >
> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
> ).
> > More surprisingly, after optimization the search on policies collection
> for
> > any keywords become very fast again (less than 50ms). BUT more strangely,
> > the search in customers collection again become very slow (around 1200ms)
> > without any changes to the customers collection.
> >
> > What a strange and unexpected behavior! If this is not a bug, how could
> you
> > explain the above very strange behavior in Solr 7.5. Could it be a bug?
> >
> > We would appreciate any support or help on our above situation.
> >
> > Thank you.
> >
> > Regards,
> > Edwin
> >
> > On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> > wrote:
> >
> >> Hi Shawn,
> >>
> >>> If the two collections have data on the same server(s), I can see this
> >>> happening.  More memory is consumed when there is additional data, and
> >>> when Solr needs more memory, performance might be affected.  The
> >>> solution is generally to install more memory in the server.
> >>
> >> I have found that even after we delete the index in collection2, the
> query
> >> QTime for collection1 still remains slow. It does not goes back to its
> >> previous fast speed before we index collection2.
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <edwinye...@gmail.com
> >
> >> wrote:
> >>
> >>> Hi Shawn,
> >>>
> >>> Thanks for your reply.
> >>>
> >>> The log only shows a list  the following and I don't see any other logs
> >>> besides these.
> >>>
> >>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
> >>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> update-script#processAdd:
> >>> id=13245417
> >>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> >>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> update-script#processAdd:
> >>> id=13245430
> >>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> >>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> update-script#processAdd:
> >>> id=13245435
> >>>
> >>> There is no change to the segments info. but the slowdown in the first
> >>> collection is very drastic.
> >>> Before the indexing of collection2, the collection1 query QTime are in
> >>> the range of 4 to 50 ms. However, after indexing collection2, the
> >>> collection1 query QTime increases to more than 1000 ms. The index are
> done
> >>> in CSV format, and the size of the index is 3GB.
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>>
> >>>
> >>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <apa...@elyograg.org>
> wrote:
> >>>
> >>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
> >>>>> I am using Solr 7.5.0, and currently I am facing an issue of when I
> am
> >>>>> indexing in collection2, the indexing affects the records in
> >>>> collection1.
> >>>>> Although the records are still intact, it seems that the settings of
> >>>> the
> >>>>> termVecotrs get wipe out, and the index size of collection1 reduced
> >>>> from
> >>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
> >>>>
> >>>> This should not be possible.  Indexing in one collection should have
> >>>> absolutely no effect on another collection.
> >>>>
> >>>> If logging has been left at its default settings, the solr.log file
> >>>> should have enough info to show what actually happened.
> >>>>
> >>>>> Also, the search in
> >>>>> collection1, which was originall very fast, becomes very slow after
> the
> >>>>> indexing is done is collection2.
> >>>>
> >>>> If the two collections have data on the same server(s), I can see this
> >>>> happening.  More memory is consumed when there is additional data, and
> >>>> when Solr needs more memory, performance might be affected.  The
> >>>> solution is generally to install more memory in the server.  If the
> >>>> system is working, there should be no need to increase the heap size
> >>>> when the memory size increases ... but there can be situations where
> the
> >>>> heap is a little bit too small, where you WOULD want to increase the
> >>>> heap size.
> >>>>
> >>>> Thanks,
> >>>> Shawn
> >>>>
> >>>>
>
>

Re: Indexing in one collection affect index in another collection

Reply via email to