Re: Solr Atomic Updates

Erick Erickson Thu, 04 Jun 2015 10:26:57 -0700

NP. It's something of a step when moving to SolrCloud to "let go" of the
details you've had to (painfully) pay attention to, but worth it. The price is,
of course, learning to do things a new way ;)...


Best,
Erick

On Thu, Jun 4, 2015 at 10:04 AM, Ксения Баталова <batalova...@gmail.com> wrote:
> Erick,
>
> Thank you so much. It became a bit clearer.
>
> It was decided to upgrade Solr to 5.2 and use SolrCloud in our next release.
>
> I think I'll write here about it yet :)
>
> _ _
>
> Batalova Kseniya
>
>
> I have to ask then why you're not using SolrCloud with multiple shards? It
> seems to me that that gives you the indexing throughput you need (be sure to
> use CloudSolrServer from your client). At 300M complex documents, you
> pretty much certainly will need to shard anyway so in some sense you're
> re-inventing the wheel here.
>
> You can host multiple shards on the same machine, and these _are_ separate
> Solr cores under the covers so you problem with atomic updates disappears.
>
> Although I would consider upgrading to Solr 4.10.3 or even 5.2 (which is being
> voted on even now and should be out in a week or so barring problems).
>
> Best,
> Erick
>
> On Wed, Jun 3, 2015 at 11:04 AM, Ксения Баталова <batalova...@gmail.com>
> wrote:
>> Jack,
>>
>> Decision of using several cores was made to increase indexing and
>> searching performance (experimentally).
>>
>> In my project index is about 300-500 millions documents (each document
>> has rather difficult structure) and it may be larger.
>>
>> So, while indexing the documents are being added in different cores by
>> some amount of threads.
>>
>> In other words, each thread collect nessesary information for list of
>> documents and generate create-documents query to specific core.
>>
>> At this moment it doesn't matter (and it can't be found out) which
>> document in which core will be.
>>
>> And now there is necessary to update (atomic update) this index.
>>
>> Something like this..
>>
>> _ _
>>
>> Batalova Kseniya
>>
>>
>> Explain a little about why you have separate cores, and how you decide
>> which core a new document should reside in. Your scenario still seems a bit
>> odd, so help us understand.
>>
>>
>> -- Jack Krupansky
>>
>> On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова <batalova...@gmail.com>
>> wrote:
>>
>>> Hi!
>>>
>>> Thanks for your quick reply.
>>>
>>> The problem that all my index is consists of several parts (several cores)
>>>
>>> and while updating I don't know in advance in which part updated id is
>>> lying (in which core the document with specified id is lying).
>>>
>>> For example, I have two cores (*Core1 *and *Core2*) and I want to
>>> update the document with id *Id1 *and I don't know where this document
>>> is lying.
>>>
>>> So, I have to do two select-queries to my cores to know where it is.
>>>
>>> And then generate update-query to necessary core.
>>>
>>> What am I doing wrong?
>>>
>>> I remind that I'm using SOLR 4.4.0.
>>>
>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>> Best regards,
>>> Batalova Kseniya
>>>
>>>
>>> What exactly is the problem? And why do you care about cores, per se -
>>> other than to send the update to the core/collection you are trying to
>>> update? You should specify the core/collection name in the URL.
>>>
>>> You should also be using the Solr reference guide rather than the (old)
>>> wiki:
>>>
>>> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова <batalova...@gmail.com>
>>> wrote:
>>>
>>> > Hi!
>>> >
>>> > I'm using *SOLR 4.4.0* for searching in my project.
>>> > Now I am facing a problem of atomic updates in multiple cores.
>>> > From wiki:
>>> >
>>> > curl *http://localhost:8983/solr/update
>>> > <http://localhost:8983/solr/update> *-H
>>> > 'Content-type:application/json' -d '
>>> > [
>>> >  {
>>> >   "*id*"        : "*TestDoc1*",
>>> >   "title"     : {"set":"test1"},
>>> >   "revision"  : {"inc":3},
>>> >   "publisher" : {"add":"TestPublisher"}
>>> >  },
>>> >  {
>>> >   "id"        : "TestDoc2",
>>> >   "publisher" : {"add":"TestPublisher"}
>>> >  }
>>> > ]'
>>> >
>>> > As well as I understand, this means that the document, for example, with
>>> id
>>> > *TestDoc1*, will be searched for updating *only in one core*.
>>> > And if there is no any document with id *TestDoc1*, the document will be
>>> > created.
>>> > Can I somehow to specify the* list of cores* for searching and then
>>> > updating necessary document with specific id?
>>> >
>>> > It's something like *shards *parameter in *select* query.
>>> > From wiki:
>>> >
>>> > #now do a distributed search across both servers with your browser or
>>> curl
>>> > curl '
>>> >
>>> http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
>>> > '
>>> >
>>> > Or is it planned in the future?
>>> >
>>> > Thanks in advance.
>>> >
>>> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>> >
>>> > Best regards,
>>> > Batalova Kseniya
>>> >
>>>

Re: Solr Atomic Updates

Reply via email to