Re: copying data from one collection to another collection (solr cloud 521)

Raja Pothuganti Mon, 13 Jul 2015 15:22:28 -0700

Thank you Erick
>Actually, my question is why do it this way at all? Why not index
>directly to your "live" nodes? This is what SolrCloud is built for.
>You an use "implicit" routing to create shards say, for each week and
>age out the ones that are "too old" as well.



Any updates to EXISTING document in the LIVE collection should NOT be
replicated to the previous week(s) snapshot(s). Think of the snapshot(s)
as an archive of sort and searchable independent of LIVE. We're aiming to
support at most 2 archives of data in the past.


>Another option would be to use "collection aliasing" to keep an
>offline index up to date then switch over when necessary.

Does offline indexing refers to this link
https://github.com/cloudera/search/tree/0d47ff79d6ccc0129ffadcb50f9fe0b271f
102aa/search-mr


Thanks
Raja



On 7/13/15, 3:14 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

>Actually, my question is why do it this way at all? Why not index
>directly to your "live" nodes? This is what SolrCloud is built for.
>
>There's the new backup/restore functionality that's still a work in
>progress, see: https://issues.apache.org/jira/browse/SOLR-5750
>
>You an use "implicit" routing to create shards say, for each week and
>age out the ones that are "too old" as well.
>
>Another option would be to use "collection aliasing" to keep an
>offline index up to date then switch over when necessary.
>
>I'd really like to know this isn't an XY problem though, what's the
>high-level problem you're trying to solve?
>
>Best,
>Erick
>
>On Mon, Jul 13, 2015 at 12:49 PM, Raja Pothuganti
><rpothuga...@competitrack.com> wrote:
>>
>> Hi,
>> We are setting up a new SolrCloud environment with 5.2.1 on Ubuntu
>>boxes. We currently ingest data into a large collection, call it LIVE.
>>After the full ingest is done we then trigger a delta delta ingestion
>>every 15 minutes to get the documents & data that have changed into this
>>LIVE instance.
>>
>> In Solr 4.X using a Master / Slave setup we had slaves that would
>>periodically (weekly, or monthly) refresh their data from the Master
>>rather than every 15 minutes. We're now trying to figure out how to get
>>this same type of setup using SolrCloud.
>>
>> Question(s):
>> - Is there a way to copy data from one SolrCloud collection into
>>another quickly and easily?
>> - Is there a way to programmatically control when a replica receives
>>it's data or possibly move it to another collection (without losing
>>data) that updates on a  different interval? It ideally would be another
>>collection name, call it Week1 ... Week52 ... to avoid a replica in the
>>same collection serving old data.
>>
>> One option we thought of was to create a backup and then restore that
>>into a new clean cloud. This has a lot of moving parts and isn't nearly
>>as neat as the Master / Slave controlled replication setup. It also has
>>the side effect of potentially taking a very long time to backup and
>>restore instead of just copying the indexes like the old M/S setup.
>>
>> Any ideas of thoughts? Thanks in advance for you help.
>> Raja

Re: copying data from one collection to another collection (solr cloud 521)

Reply via email to