Hi Erick

Thanks for your quick response and remaining me about attachment issue.

Yes, i run on 2 different jvms that not related to if they are on same
machine or not.

let me describe my scenario, i have two collection:

i start 2 nodes on my laptop on 2 different JVM, ports are 8983 and 8984.

1. movieDirectors: 1 shard, 2 replica, master is on 8984, slave is on 8983
2. movies: 2 shard, 1 replica/shard    shard1 is on 8983, shard2 is on 8984.

collection movieDirectors has 2 docs:
{
"id":"1", "title":"Dunkirk", "director_id":"1", "_version_":
1642343781358370816
}, { "id":"2", "title":"Get Out", "director_id":"2", "_version_":
1642343828930166784
}
collection movies has 2 docs too:
{ "id":"1", "title":"Dunkirk", "director_id":"1", "_version_":
1642343781358370816
}, { "id":"2", "title":"Get Out", "director_id":"2", "_version_":
1642343828930166784
}
everything is ok when i run query with "{!join from=id
fromIndex=movieDirectors to=director_id}has_oscar:true" on both 8983 and
8984, i can got expected result:
{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":79, "params":{
"q":"*:*", "fq":"{!join from=id fromIndex=movieDirectors
to=director_id}has_oscar:true", "_":"1566313944099"}}, "response":{"numFound
":2,"start":0,"maxScore":1.0,"docs":[ { "id":"1", "title":"Dunkirk", "
director_id":"1", "_version_":1642343781358370816}, { "id":"2", "title":"Get
Out", "director_id":"2", "_version_":1642343828930166784}] }}
but when i run "{!join from=director_id fromIndex=movies
to=id}title:"Dunkirk"" on 8983 got 1 doc,
 if i filter by "title:Get Out", i got nothing.  i understood "Get Out" is
not exist in 8983.
{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":3, "params":{ "
q":"*:*", "fq":"{!join from=director_id fromIndex=movies
to=id}title:\"Dunkirk\"", "_":"1566261450613"}}, "response":{"numFound":1,"
start":0,"docs":[ { "id":"1", "name":"Christopher Nolan", "has_oscar":true,
"_version_":1642343436642156544}] }}

but question is coming, when i run "{!join from=director_id
fromIndex=movies to=id}title:"Dunkirk"" on 8984, i got "SolrCloud join:
multiple shards not yet supported movies"
no matter what filter value is.

i found following code:

private static String findLocalReplicaForFromIndex(ZkController
zkController, String fromIndex) {
  String fromReplica = null;

  String nodeName = zkController.getNodeName();
  for (Slice slice :
zkController.getClusterState().getCollection(fromIndex).getActiveSlicesArr())
{
    if (fromReplica != null)
      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
          "SolrCloud join: multiple shards not yet supported " + fromIndex);
    for (Replica replica : slice.getReplicas()) {
      if (replica.getNodeName().equals(nodeName)) {
        fromReplica = replica.getStr(ZkStateReader.CORE_NAME_PROP);
        // found local replica, but is it Active?
        if (replica.getState() != Replica.State.ACTIVE)
          throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
              "SolrCloud join: "+fromIndex+" has a local replica ("+fromReplica+
                  ") on "+nodeName+", but it is "+replica.getState());

        break;
      }
    }
  }

  if (fromReplica == null)
    throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
        "SolrCloud join: No active replicas for "+fromIndex+
            " found in node " + nodeName);

  return fromReplica;
}


when i run joining from movies on 8983, slice length is 2 as movies have 2
shards. "fromReplica " was assigned in second cycle,  because zkController
name is 8983 and replica name is 8984 in first cycle.

but when run on 8984, "fromReplica" was assigned in first cycle, because
zkController name isand replica name both are 8984 in first cycle, so throw
"SolrCloud join: multiple shards not yet supported" in second cycle.

Thanks for your patience, it's too long. i'm confused about why use this
way to judge "multiple shards", because the result is also wrong running on
8983 even if didnt throw exception. why dont use  slice length>1 to judge
"multiple shards" ? or maybe have other better way?

Please advise.

Thanks in advance!

Erick Erickson <erickerick...@gmail.com> 于2019年8月20日周二 下午7:39写道:

> None of your images came through, the mail server aggressively strips
> attachments. You’ll have to put them somewhere and provide a link.
>
> Given that, I’m guessing without much data so this may be totally
> misguided. You mention ports 8984 and 8984. Assuming those are two
> different Solr JVMs, the fact that they’re running on the same machine is
> irrelevant; As far as SolrCloud is concerned, they are two separate
> machines. Your directors collection must be completely resident on both
> Solr instances for cross-collection join to work.
>
> Best,
> Erick
>
> > On Aug 19, 2019, at 9:39 PM, 王立生 <wanglishen...@gmail.com> wrote:
> >
> > Hello,
> >
> > I have a question about solrCloud joining. i knew solrCloud joining can
> do join only when index is  not splited to shards, but when i test it, i
> found a problem which make me confused.
> >
> > i tested it on version 8.2
> >
> > assuming i have 2 collections like sample about "joining" on solr
> offcial website,
> >
> > one collection called "movies", another called "movieDirectors".
> >
> > movies's fields: id, title, director_id
> > movieDirectors's fields: id, name, has_oscar
> >
> > the information of shards and replicas as below, i started two nodes on
> my laptop:
> >
> >  moviesDirectors have 2 docs:
> >
> > movies also have 2 docs:
> >
> > everything is ok when i run query with "{!join from=id
> fromIndex=movieDirectors to=director_id}has_oscar:true" on both 8983 and
> 8984, i can got expacted result:
> >
> > but when i run "{!join from=director_id fromIndex=movies
> to=id}title:"Dunkirk"" on 8983
> > got 1 doc and if i filter by "title:Get Out", i got nothing.  i
> understood "Get Out" is not exist in 8983.
> >
> >
> > but question is coming, when i run "{!join from=director_id
> fromIndex=movies to=id}title:"Dunkirk"" on 8984, i got "SolrCloud join:
> multiple shards not yet supported movies"
> > no matter what filter value is.
> >
> > i found following code:
> >
> >
> > when i run joining from movies on 8983, slice length is 2 as movies have
> 2 shards. "fromReplica " was assigned in second cycle,  because
> zkController name is 8983 and replica name is 8984 in first cycle.
> >
> > but when run on 8984, "fromReplica" was assigned in first cycle, because
> zkController name isand replica name both are 8984 in first cycle, so throw
> "SolrCloud join: multiple shards not yet supported" in second cycle.
> >
> > Thanks for your patience, it's too long. i'm confused about why use this
> way to judge "multiple shards", because the result is also wrong running on
> 8983 even if didnt throw exception. why dont use  slice length>1 to judge
> "multiple shards" ? or maybe have other better way?
> >
> > Please advise.
> >
> > Thanks in advance!
> >
> >
> >
>
>

Reply via email to