Interesting! If turning off lazy field loading helps, I think I have a
trivial patch that may fix this (i.e. without requiring the workaround
of disabling lazy field loading -- which, as you say, makes no sense
to have in effect without the documentCache). The only thing that had
been stopping me from suggesting this patch right off the bat was the
"magic" threshold of 50, which I couldn't explain at all. But maybe
it's not so much a magic threshold as arbitrary, and specific to the
data you're evaluating over. I'll open an issue/PR more narrowly
scoped to the change. I'd say you could open the issue, except I still
don't fully understand the connection between the change I'm
considering and the behavior you're seeing -- just that they seem very
likely to be connected. If you're set up to try running a patched
version on your data, I'm curious to know if this will help.

On Thu, Jun 20, 2024 at 6:16 PM Oleksandr Tkachuk <sasha547...@gmail.com> wrote:
>
> FYI: There is a solution in the last paragraph, but I still ran your
> tests, since the solution was found by "Cut and Try"  and there is no
> deep understanding.
>
> >I wonder what would happen if you fully bypassed the query cache (i.e., 
> >`q={!cache=false}product_type:"1"`?
> It does not help, there is not even one millisecond of difference in both 
> cases.
>
> >I recall that previously you had a very large number of dynamic fields. Is 
> >that the case here as well? And if so, are the dynamic fields mostly stored? 
> >docValues?
> This is another collection, I’ll get to the one with many many fields later 
> :))
> If this is the ~correct way to count the number of fields, then this
> collection has the following number of fields:
> curl -s "http://localhost:8983/solr/XXX/admin/luke?numTerms=0"; | grep
> '"type"' | wc -l
> 121
> Of these, 88 have docvalues enabled and 33 stored.
>
> As for the two fields used in query, here's how they are defined in the 
> schema.
>   <field name="product_id" type="plong" indexed="true" stored="true"/>
>   <field name="product_type" type="pint" indexed="true" stored="false"/>
>   <fieldType name="pint" class="solr.IntPointField" docValues="true"/>
>   <fieldType name="plong" class="solr.LongPointField" docValues="true"/>
>
> Changing fl= to something like a string field with stored=true without
> docvalues results in zero changes.
> I also tried this simple query on string type fields (copying the
> field) and got the same result. I also tried it on fields where the
> cardinality was different - the spread was not 150 times, but also
> often noticeable. In addition, I still do not fully understand the
> logic of this behavior
> ("product_type":["3",1069282,"2",710042,"1",13702]) if I do:
> 1) q=product_type:"1" rows=50 - qtime 150ms
> 2) q=product_type:"1" rows=51 - qtime 0ms
> 3) q=product_type:"2" rows=50 - qtime 3ms
> 4) q=product_type:"2" rows=51 - qtime 0ms
> 5) q=product_type:"3" rows=50 - qtime 1ms
> 6) q=product_type:"3" rows=51 - qtime 0ms
> I checked on other fields and get the same behavior - the fewer
> documents contain a given value, the slower the query becomes.
> If I can provide any more information, I will be glad.
>
> The problem was solved by turning off enableLazyFieldLoading. I am
> very surprised that this functionality continues to work when document
> cache is disabled and I thought that this parameter was intended only
> for it. In addition, we received an improvement in avg and 95% on many
> other types of queries, as well as some reduction in CPU load. Are
> there any consequences or disadvantages of such a decision? If not,
> then perhaps it is worth paying attention to this problem.
>
> On Thu, Jun 20, 2024 at 10:13 PM Michael Gibney
> <mich...@michaelgibney.net> wrote:
> >
> > I've been unable to reproduce anything like this behavior. If you're
> > really getting queryResultCache hits for these, then the field
> > type/etc of the field you're querying on shouldn't make a difference.
> > type/etc of the return field (product_id) would be more likely to
> > matter. I wonder what would happen if you fully bypassed the query
> > cache (i.e., `q={!cache=false}product_type:"1"`?
> >
> > I recall that previously you had a very large number of dynamic
> > fields. Is that the case here as well? And if so, are the dynamic
> > fields mostly stored? docValues?
> >
> >
> >
> > On Fri, Jun 14, 2024 at 7:29 AM Oleksandr Tkachuk <sasha547...@gmail.com> 
> > wrote:
> > >
> > > Initial data:
> > > Doc count: 1793026
> > > Field: "product_type", point int, indexed true, stored false,
> > > docvalues true. Values:
> > >  "facet_fields":{
> > >       "product_type":["3",1069282,"2",710042,"1",13702]
> > >     },
> > > Single shard, single instance.
> > >
> > > # ./hey_linux_amd64 -n 10000 -c 10 -T "application/json"
> > > 'http://localhost:8983/solr/XXX/select?fl=product_id&wt=json&q=product_type:"1"&start=0&rows=51'
> > > Summary:
> > >   Total:        0.6374 secs
> > >   Slowest:      0.0043 secs
> > >   Fastest:      0.0003 secs
> > >   Average:      0.0006 secs
> > >   Requests/sec: 15688.5755
> > >
> > > # ./hey_linux_amd64 -n 10000 -c 10 -T "application/json"
> > > 'http://localhost:8983/solr/XXX/select?fl=product_id&wt=json&q=product_type:"1"&start=0&rows=50'
> > > Summary:
> > >   Total:        101.3246 secs
> > >   Slowest:      0.2048 secs
> > >   Fastest:      0.0564 secs
> > >   Average:      0.1007 secs
> > >   Requests/sec: 98.6927
> > >
> > >
> > > 1) I've already played with queryResultWindowSize and
> > > queryResultMaxDocsCached by setting different, high and low values and
> > > this is probably not what I'm looking for since it gave a <few
> > > milliseconds difference in query performance
> > > 2) Checked on different versions of solr (9.6.1 and 8.7.0) - no
> > > significant changes
> > > 3) Tried changing the field type to string - zero performance changes
> > > 4) In both cases I see successful lookups in queryResultCache
> > > 5) Enabling documentCache solves the problem in this case (rows<=50),
> > > but introduces many other performance issues so it doesn't seem like a
> > > viable option.

Reply via email to