bq: ...but the collection wasn't emptied first....

This is what I'd suspect is the problem. Here's the issue: Segments
aren't merged identically on all replicas. So at some point you had
this field indexed without docValues, changed that and re-indexed. But
the segment merging could "read" the first segment it's going to merge
and think it knows about docValues for that field, when in fact that
segment had the old (non-DV) definition.

This would not necessarily be the same on all replicas even on the _same_ shard.

This can propagate through all following segment merges IIUC.

So my bet is that if you index into a new collection, everything will
be fine. You can also just delete everything first, but I usually
prefer a new collection so I'm absolutely and positively sure that the
above can't happen.

Best,
Erick

On Wed, Oct 11, 2017 at 12:51 PM, Chris Ulicny <culicny@iq.media> wrote:
> Hi,
>
> We've run into a strange issue with our deployment of solrcloud 6.3.0.
> Essentially, a standard facet query on a string field usually comes back
> empty when it shouldn't. However, every now and again the query actually
> returns the correct values. This is only affecting a single shard in our
> setup.
>
> The behavior pattern generally looks like the query works properly when it
> hasn't been run recently, and then returns nothing after the query seems to
> have been cached (< 50ms QTime). Wait a while and you get the correct
> result followed by blanks. It doesn't matter which replica of the shard is
> queried; the results are the same.
>
> The general query in question looks like
> /select?q=*:*&facet=true&facet.field=market&rows=0&fq=<some filters>
>
> The field is defined in the schema as <field name="market" type="string"
> docValues="true"/>
>
> There are numerous other fields defined similarly, and they do not exhibit
> the same behavior when used as the facet.field value. They consistently
> return the right results on the shard in question.
>
> If we add facet.method=enum to the query, we get the correct results every
> time (though slower. So our assumption is that something is sporadically
> working when the fc method is chosen by default.
>
> A few other notes about the collection. This collection is not freshly
> indexed, but has not had any particularly bad failures beyond follower
> replicas going down due to PKIAuthentication timeouts (has been fixed). It
> has also had a full reindex after a schema change added docValues some
> fields (including the one above), but the collection wasn't emptied first.
> We are using the composite router to co-locate documents.
>
> Currently, our plan is just to reindex all of the documents on the affected
> shard to see if that fixes the problem. Any ideas on what might be
> happening or ways to troubleshoot this are appreciated.
>
> Thanks,
> Chris

Reply via email to