That begins to sound like it should have a JIRA. A coordinator node should
probably be forwarding the request without any sort of interference.

On Wed, Mar 4, 2026 at 7:05 AM Endika Posadas <[email protected]> wrote:

> https://github.com/apache/solr/pull/4186 There seems to be a difference. I
> have modified the tests by creating a dedicated coordinator node and then
> they fail when I target the coordinator but succeed when I target the data
> nodes. I'll continue in github.
>
> Thanks
>
> On Tue, 3 Mar 2026 at 22:11, Mikhail Khludnev <[email protected]> wrote:
>
> > I tried to reproduce join on the coord node, and test passed
> > https://github.com/apache/solr/pull/4184/changes
> > I propose to double check the cluster setup, and usage of the coord node
> >
> >
> https://solr.apache.org/guide/solr/latest/deployment-guide/node-roles.html#the-work-flow-in-a-coordinator-node
> > Once again the exception above might only occur in the data node with
> > "to"-side where query parser is actually executed.
> >
> > On Tue, Mar 3, 2026 at 8:00 PM Endika Posadas <[email protected]>
> > wrote:
> >
> > > Sorry, I'll add more context. The main collection is a sharded
> collection
> > > with over ten shards and where each shard has 2 replicas. The from
> > > collection (fromData) has a single shard and one replica in each of the
> > > solr nodes.
> > > The query I send is a Json Query, looking like:
> > >
> > > {
> > >   "filter":[{"join":{
> > >         "query":{"lucene":{
> > >             "query":"\"test\"",
> > >             "df":"value_s"}},
> > >         "from":"id",
> > >         "to":"to_s",
> > >         "fromIndex":"fromData"}},
> > >     ],
> > >   "offset":0,
> > >   "query":"*:*",
> > >   "limit":1,
> > >   "params":{
> > >     "TZ":"GMT+01:00",
> > >     "timeAllowed":1800000},
> > >   "fields":["id"]
> > > }
> > >
> > > It works perfectly fine when sending it to any random solr node, but it
> > > fails when it gets sent from the coordinator query. Every other query
> > that
> > > doesn't have a join works fine, or at least I haven't found any other
> > > problems.
> > >
> > > Thanks
> > >
> > > On Tue, 3 Mar 2026 at 17:38, Mikhail Khludnev <[email protected]> wrote:
> > >
> > > > Hello,
> > > > I'm in doubt. Assuming you use
> > > >
> > > >
> > >
> >
> https://solr.apache.org/guide/solr/latest/query-guide/join-query-parser.html#joining-multiple-shard-collections
> > > > Please confirm.
> > > > There;s no exact coordinator test for shard joins here
> > > >
> > > >
> > >
> >
> https://github.com/apache/solr/blob/main/solr/core/src/test/org/apache/solr/search/join/ShardToShardJoinAbstract.java#L58
> > > > But it creates 5 nodes for 3 shard collections, and I believe pick a
> > > > coordinator randomly. So, we may expect it's working.
> > > > Then, the error you provide might occur at "to"-node when it didn't
> > find
> > > > expected co-shard.
> > > > I'm afraid we need to check shard alignment across cluster, and
> > detailed
> > > > request log across nodes. what exactly happened at coordinator and
> > > > subordinate nodes.
> > > > Regarding shards allocation: even if there's a node with a shard1 of
> > "to"
> > > > collection collocated with "from" shard1, nothing will stop the
> > > coordinator
> > > > from attempting to search "to" shard1 at another node where "from"
> > shard1
> > > > is absent, and got the error like this.
> > > >
> > > > On Tue, Mar 3, 2026 at 6:02 PM Endika Posadas <[email protected]>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We're running dedicated coordinator nodes for query performance,
> with
> > > > > collections that are properly co-located across data nodes.
> > > > >
> > > > >
> > > > > When sending a join query (fromIndex pointing to a co-located
> > > collection)
> > > > > through the coordinator, we get an error:
> > > > >
> > > > > "error":{
> > > > >
> > > > >
> > > >
> > >
> >
> "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],
> > > > >     "msg":"SolrCloud join: To join with a collection that might not
> > be
> > > > > co-located, use method=crossCollection.",
> > > > >     "code":400
> > > > >   }
> > > > >
> > > > >
> > > > > The same query works fine when sent directly to a data node.
> > > > >
> > > > > It seems like the coordinator is trying to resolve the join instead
> > of
> > > > > delegating it to the data nodes. Is there a workaround around this?
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours
> > > > Mikhail Khludnev
> > > >
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)

Reply via email to