Have you looked at using one of the update processors? 

Consider StatelessScriptUpdateProcessorFactory for instance. You can do anything
you’d like to do in a script (Groovy, Postscript. Python I think, and others). 
See:
./example/files/conf/update-script.js for one example.

You put it in your solrconfig file in the update handler, then put the script 
in your
conf directory and push it to ZK and the rest is automagical.

There are a bunch of other update processors that you can use that are also
pretty much by configuration, but the one I referenced is the one that is the
most general-purpose.

In your situation, add this _before_ the update is distributed and instead of
coreB, ask for collectionB.

Distributed updates go like this:
1. the doc gets routed to a leader for a shard
2. the doc gets forwarded to each replica.

Now, depending on where you put the update processor (and you’ll have to 
dig a bit. Much of this distribution logic is implicit, but you can explicitly
define it in solrconfig.xml), this either happens  _before_ the docs are sent
to the rest of the replicas or _after_ the docs arrive at each replica. From 
what
you’ve described, you want to do this before distribution so all copies have
the new field. You don’t care what replica is the leader. You don’t care how 
many
other replicas exist or where they are. You don’t even care if there’s any
replica hosting this particular collection on the node that does this, it 
happens
before distribution.

Next, you want to get the value from “coreB”. Don’t do that, get it from 
_collection_ B. Since you have the doc ID (presumably the <uniqueKey>),
using get-by-id instead of a standard query will be very efficient. I can 
imagine
under very heavy load this might introduce too much overhead, but it’s
where I’d start.

Best,
Erick

> On Aug 29, 2019, at 1:45 PM, Arnold Bronley <arnoldbron...@gmail.com> wrote:
> 
> I can't use  CloudSolrClient  because I need to intercept the incoming
> indexing request and then add one more field to it. All this happens on
> Solr side and not client side.
> 
> On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini <a.gazzar...@sease.io>
> wrote:
> 
>> Hi Arnold,
>> why don't you use solrj (in this case a CloudSolrClient) instead of dealing
>> with such low-level details? The actual location of the document you are
>> looking for would be completely abstracted.
>> 
>> Best,
>> Andrea
>> 
>> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, <arnoldbron...@gmail.com>
>> wrote:
>> 
>>> So, here is the problem that I am trying to solve. I am moving from Solr
>>> master-slave architecture to SolrCloud architecture. I have one custom
>> Solr
>>> plugin that does following:
>>> 
>>> 1. When a document (say document with unique id doc1)is getting indexed
>> to
>>> a core say core A then this plugin adds one more field to the indexing
>>> request. It fetches this new field from core B. Core B in our case
>>> maintains popularity score field for each document which gets calculated
>> in
>>> a different project. It fetches the popularity score from score B for
>> doc1
>>> and adds it to indexing request.
>>> 2. In following code, dataInfo.dataSource is the name of the core B.
>>> 
>>> I can use the name of the core B like collection_shard1_replica_n21 and
>> it
>>> works. But it is not a good solution. What if I had a multiple shards for
>>> core B? In that case the the doc1 that I am trying to find might not be
>>> present in collection_shard1_replica_n21.
>>> 
>>> So is there something like,
>>> 
>>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
>>> 
>>> @Override
>>> public void processAdd(AddUpdateCommand cmd) throws IOException {
>>>   SolrInputDocument doc = cmd.getSolrInputDocument();
>>>   String uniqueId = getUniqueId(doc);
>>> 
>>>   SolrCore dataCore =
>>> req.getCore().getCoreContainer().getCore(dataInfo.dataSource);
>>> 
>>>   if (dataCore == null){
>>>       LOG.error("Solr core '{}' to use as data source could not be
>>> found!  "
>>>               + "Please check if it is loaded.", dataInfo.dataSource);
>>>   } else{
>>> 
>>>          Document sourceDoc = getSourceDocument(dataCore, uniqueId);
>>> 
>>>          if (sourceDoc != null){
>>> 
>>>              populateDocToBeAddedFromSourceDoc(doc,sourceDoc);
>>>          }
>>>   }
>>> 
>>>   // pass it up the chain
>>>   super.processAdd(cmd);
>>> }
>>> 
>>> 
>>> On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson <erickerick...@gmail.com>
>>> wrote:
>>> 
>>>> No, you cannot just use the collection name. Replicas are just cores.
>>>> You can host many replicas of a single collection on a single Solr node
>>>> in a single CoreContainer (there’s only one per Solr JVM). If you just
>>>> specified a collection name how would the code have any clue which
>>>> of the possibilities to return?
>>>> 
>>>> The name is in the form collection_shard1_replica_n21
>>>> 
>>>> How do you know where the doc you’re working on? Put the ID through
>>>> the hashing mechanism.
>>>> 
>>>> This isn’t the same at all if you’re running stand-alone, then there’s
>>> only
>>>> one name.
>>>> 
>>>> But as I indicated above, your ask for just using the collection name
>>> isn’t
>>>> going to work by definition.
>>>> 
>>>> So perhaps this is an XY problem. You’re asking about getCore, which is
>>>> a very specific, low-level concept. What are you trying to do at a
>> higher
>>>> level? Why do you think you need to get a core? What do you want to
>> _do_
>>>> with the doc that you need the core it resides in?
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>>> On Aug 28, 2019, at 5:28 PM, Arnold Bronley <arnoldbron...@gmail.com
>>> 
>>>> wrote:
>>>>> 
>>>>> Wait, would I need to use core name like
>> collection1_shard1_replica_n4
>>>>> etc/? Can't I use collection name? What if  I have multiple shards,
>> how
>>>>> would I know where does the document that I am working with lives in
>>>>> currently.
>>>>> I would rather prefer to use collection name and expect the core
>>>>> information to be abstracted out that way.
>>>>> 
>>>>> On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson <
>>> erickerick...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hmmm, should work. What is your core_name? There’s strings like
>>>>>> collection1_shard1_replica_n4 and core_node6. Are you sure you’re
>>> using
>>>> the
>>>>>> right one?
>>>>>> 
>>>>>>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley <
>> arnoldbron...@gmail.com
>>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> In a custom Solr plugin code,
>>>>>>> req.getCore().getCoreContainer().getCore(core_name) is returning
>> null
>>>>>> even
>>>>>>> if core by name core_name is loaded and up in Solr. req is object
>>>>>>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
>>>>>>> 
>>>>>>> Any ideas on why this might be the case?
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 

Reply via email to