Re: Optimal backup strategy

Hossein Ghiyasi Mehr Tue, 03 Dec 2019 01:32:24 -0800

I am sorry! This is true. I forgot "*not*"!
1. It's *not* recommended to use commit log after one node failure.
Cassandra has many options such as replication factor as
substitute solution.


*VafaTech.com - A Total Solution for Data Gathering & Analysis*


On Tue, Dec 3, 2019 at 10:42 AM Adarsh Kumar <adarsh0...@gmail.com> wrote:

> Thanks Hossein,
>
> Just one more question is there any special SOP or consideration we have
> to take for multi-site backup.
>
> Please share any helpful link, blog or steps documented.
>
> Regards,
> Adarsh Kumar
>
> On Sun, Dec 1, 2019 at 10:40 PM Hossein Ghiyasi Mehr <
> ghiyasim...@gmail.com> wrote:
>
>> 1. It's recommended to use commit log after one node failure. Cassandra
>> has many options such as replication factor as substitute solution.
>> 2. Yes, right.
>>
>> *VafaTech.com - A Total Solution for Data Gathering & Analysis*
>>
>>
>> On Fri, Nov 29, 2019 at 9:33 AM Adarsh Kumar <adarsh0...@gmail.com>
>> wrote:
>>
>>> Thanks Ahu and Hussein,
>>>
>>> So my understanding is:
>>>
>>>    1. Commit log backup is not documented for Apache Cassandra, hence
>>>    not standard. But can be used for restore on the same machine (For taking
>>>    backup from commit_log_dir). If used on other machine(s) has to be in the
>>>    same topology. Can it be used for replacement node?
>>>    2. For periodic backup Snapshot+Incremental backup is the best option
>>>
>>>
>>> Thanks,
>>> Adarsh Kumar
>>>
>>> On Fri, Nov 29, 2019 at 7:28 AM guo Maxwell <cclive1...@gmail.com>
>>> wrote:
>>>
>>>> Hossein is right , But for use , we restore to the same cassandra
>>>> topology ,So it is usable to do replay .But when restore to the
>>>> same machine it is also usable .
>>>> Using sstableloader cost too much time and more storage(though will
>>>> reduce after  restored)
>>>>
>>>> Hossein Ghiyasi Mehr <ghiyasim...@gmail.com> 于2019年11月28日周四 下午7:40写道：
>>>>
>>>>> commitlog backup isn't usable in another machine.
>>>>> Backup solution depends on what you want to do: periodic backup or
>>>>> backup to restore on other machine?
>>>>> Periodic backup is combine of snapshot and incremental backup. Remove
>>>>> incremental backup after new snapshot.
>>>>> Take backup to restore on other machine: You can use snapshot after
>>>>> flushing memtable or Use sstableloader.
>>>>>
>>>>>
>>>>> ----
>>>>> VafaTech.com - A Total Solution for Data Gathering & Analysis
>>>>>
>>>>> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell <cclive1...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> for cassandra or datastax's documentation, commitlog's backup is not
>>>>>> mentioned.
>>>>>> only snapshot and incremental backup is described to do backup .
>>>>>>
>>>>>> Though commitlog's archive for keyspace/table is not support but
>>>>>> commitlog' replay (though you must put log to commitlog_dir and restart 
>>>>>> the
>>>>>> process)
>>>>>> support the feature of keyspace/table' replay filter (using
>>>>>> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format 
>>>>>> to
>>>>>> replay the specified keyspace/table)
>>>>>>
>>>>>> Snapshot do affect the storage, for us we got snapshot one week a
>>>>>> time under the low business peak and making snapshot got throttle ,for 
>>>>>> you
>>>>>> you may
>>>>>> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Adarsh Kumar <adarsh0...@gmail.com> 于2019年11月28日周四 上午1:00写道：
>>>>>>
>>>>>>> Thanks Guo and Eric for replying,
>>>>>>>
>>>>>>> I have some confusions about commit log backup:
>>>>>>>
>>>>>>>    1. commit log archival technique is (
>>>>>>>    
>>>>>>> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
>>>>>>>    ) as good as an incremental backup, as it also captures commit logs 
>>>>>>> after
>>>>>>>    memtable flush.
>>>>>>>    2. If we go for "Snapshot + Incremental bk + Commit log", here
>>>>>>>    we have to take commit log from commit log directory (is there any 
>>>>>>> SOP for
>>>>>>>    this?). As commit logs are not per table or ks, we will have 
>>>>>>> chalange in
>>>>>>>    restoring selective tables.
>>>>>>>    3. Snapshot based backups are easy to manage and operate due to
>>>>>>>    its simplicity. But they are heavy on storage. Any views on this?
>>>>>>>    4. Please share any successful strategy that someone is using
>>>>>>>    for production. We are still in the design phase and want to 
>>>>>>> implement the
>>>>>>>    best solution.
>>>>>>>
>>>>>>> Thanks Eric for sharing link for medusa.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Adarsh Kumar
>>>>>>>
>>>>>>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell <cclive1...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> For me, I think the last one :
>>>>>>>>  Snapshot + Incremental + commitlog
>>>>>>>> is the most meaningful way to do backup and restore, when you make
>>>>>>>> the data backup to some where else like AWS S3.
>>>>>>>>
>>>>>>>>    - Snapshot based backup // for incremental data will not be
>>>>>>>>    backuped and may lose data when restore to the time latter than 
>>>>>>>> snapshot
>>>>>>>>    time;
>>>>>>>>    - Incremental backups // better than snapshot backup .but
>>>>>>>>    with Insufficient data accuracy. For data remain in the memtable 
>>>>>>>> will be
>>>>>>>>    lose;
>>>>>>>>    - Snapshot + incremental
>>>>>>>>    - Snapshot + commitlog archival // better data precision than
>>>>>>>>    made incremental backup, but the data in the non archived 
>>>>>>>> commitlog(not
>>>>>>>>    archive and commitlog log not closed) will not restore and will 
>>>>>>>> lose. Also
>>>>>>>>    when log is too much, do log reply will cost very mucu time
>>>>>>>>
>>>>>>>> For me ,We use snapshot + incremental + commitlog archive. We read
>>>>>>>> snapshot data and incremental data .Also the log is backuped .But we 
>>>>>>>> will
>>>>>>>> not backup the
>>>>>>>> log whose data have been flush to sstable ,for the data will be
>>>>>>>> backuped by the way we do incremental backup .
>>>>>>>>
>>>>>>>> This way , the data will exist in the format of sstable trough
>>>>>>>> snapshot backup and incremental backup . The log number will be very 
>>>>>>>> small
>>>>>>>> .And log replay will not cost much time.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Eric LELEU <e...@strapdata.com> 于2019年11月27日周三 下午4:13写道：
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> TheLastPickle & Spotify have released Medusa as Cassandra Backup
>>>>>>>>> tool.
>>>>>>>>>
>>>>>>>>> See :
>>>>>>>>> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html
>>>>>>>>>
>>>>>>>>> Hope this link will help you.
>>>>>>>>>
>>>>>>>>> Eric
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Le 27/11/2019 à 08:10, Adarsh Kumar a écrit :
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I was looking for the backup strategies of Cassandra. After some
>>>>>>>>> study I came to know that there are the following options:
>>>>>>>>>
>>>>>>>>>    - Snapshot based backup
>>>>>>>>>    - Incremental backups
>>>>>>>>>    - Snapshot + incremental
>>>>>>>>>    - Snapshot + commitlog archival
>>>>>>>>>    - Snapshot + Incremental + commitlog
>>>>>>>>>
>>>>>>>>> Which is the most suitable and feasible approach? Also which of
>>>>>>>>> these is used most.
>>>>>>>>> Please let me know if there is any other option to tool available.
>>>>>>>>>
>>>>>>>>> Thanks in advance.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Adarsh Kumar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> you are the apple of my eye !
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> you are the apple of my eye !
>>>>>>
>>>>>
>>>>
>>>> --
>>>> you are the apple of my eye !
>>>>
>>>

Re: Optimal backup strategy

Reply via email to