Re: Cqlsh copy command on a larger data set

Jai Bheemsen Rao Dhanwada Thu, 16 Jul 2020 15:09:15 -0700

thank you

On Thu, Jul 16, 2020 at 12:29 PM Alex Ott <alex...@gmail.com> wrote:


> look into a series of the blog posts that I sent, I think that it should
> be in the 4th post
>
> On Thu, Jul 16, 2020 at 8:27 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> okay, is there a way to export the TTL using CQLsh or DSBulk?
>>
>> On Thu, Jul 16, 2020 at 11:20 AM Alex Ott <alex...@gmail.com> wrote:
>>
>>> if you didn't export TTL explicitly, and didn't load it back, then
>>> you'll get not expirable data.
>>>
>>> On Thu, Jul 16, 2020 at 7:48 PM Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> wrote:
>>>
>>>> In tried verify metadata, In case of writetime it is setting it as
>>>> insert time but the TTL value is showing as null. Is this expected? Does
>>>> this mean this record will never expire after the insert?
>>>> Is there any alternative to preserve the TTL ?
>>>>
>>>> In the new Table inserted with Cqlsh and Dsbulk
>>>> cqlsh > SELECT ttl(secret) from ks_blah.cf_blah ;
>>>>
>>>>  ttl(secret)
>>>> --------------
>>>>          null
>>>>          null
>>>>
>>>> (2 rows)
>>>>
>>>> In the old table where the data was written from application
>>>>
>>>> cqlsh > SELECT ttl(secret) from ks_old.cf_old ;
>>>>
>>>>  ttl(secret)
>>>> --------------------
>>>>          4517461
>>>>          4525958
>>>>
>>>> (2 rows)
>>>>
>>>> On Wed, Jul 15, 2020 at 1:17 PM Jai Bheemsen Rao Dhanwada <
>>>> jaibheem...@gmail.com> wrote:
>>>>
>>>>> thank you
>>>>>
>>>>> On Wed, Jul 15, 2020 at 1:11 PM Russell Spitzer <
>>>>> russell.spit...@gmail.com> wrote:
>>>>>
>>>>>> Alex is referring to the "writetime" and "tttl" values for each cell.
>>>>>> Most tools copy via CQL writes and don't by default copy those previous
>>>>>> writetime and ttl values and instead just give a new writetime value 
>>>>>> which
>>>>>> matches the copy time rather than initial insert time.
>>>>>>
>>>>>> On Wed, Jul 15, 2020 at 3:01 PM Jai Bheemsen Rao Dhanwada <
>>>>>> jaibheem...@gmail.com> wrote:
>>>>>>
>>>>>>> Hello Alex,
>>>>>>>
>>>>>>>
>>>>>>>    - use DSBulk - it's a very effective tool for unloading &
>>>>>>>    loading data from/to Cassandra/DSE. Use zstd compression for 
>>>>>>> offloaded data
>>>>>>>    to save disk space (see blog links below for more details).  But the 
>>>>>>> *preserving
>>>>>>>    metadata* could be a problem.
>>>>>>>
>>>>>>> Here what exactly do you mean by "preserving metadata" ? would you
>>>>>>> mind explaining?
>>>>>>>
>>>>>>> On Tue, Jul 14, 2020 at 8:50 AM Jai Bheemsen Rao Dhanwada <
>>>>>>> jaibheem...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thank you for the suggestions
>>>>>>>>
>>>>>>>> On Tue, Jul 14, 2020 at 1:42 AM Alex Ott <alex...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> CQLSH definitely won't work for that amount of data, so you need
>>>>>>>>> to use other tools.
>>>>>>>>>
>>>>>>>>> But before selecting them, you need to define requirements. For
>>>>>>>>> example:
>>>>>>>>>
>>>>>>>>>    1. Are you copying the data into tables with exactly the same
>>>>>>>>>    structure?
>>>>>>>>>    2. Do you need to preserve metadata, like, writetime & TTL?
>>>>>>>>>
>>>>>>>>> Depending on that, you may have following choices:
>>>>>>>>>
>>>>>>>>>    - use sstableloader - it will preserve all metadata, like, ttl
>>>>>>>>>    and writetime. You just need to copy SSTable files, or stream 
>>>>>>>>> directly from
>>>>>>>>>    the source cluster.  But this will require copying of data into 
>>>>>>>>> tables with
>>>>>>>>>    exactly same structure (and in case of UDTs, the keyspace names 
>>>>>>>>> should be
>>>>>>>>>    the same)
>>>>>>>>>    - use DSBulk - it's a very effective tool for unloading &
>>>>>>>>>    loading data from/to Cassandra/DSE. Use zstd compression for 
>>>>>>>>> offloaded data
>>>>>>>>>    to save disk space (see blog links below for more details).  But 
>>>>>>>>> the
>>>>>>>>>    preserving metadata could be a problem.
>>>>>>>>>    - use Spark + Spark Cassandra Connector. But also, preserving
>>>>>>>>>    the metadata is not an easy task, and requires programming to 
>>>>>>>>> handle all
>>>>>>>>>    edge cases (see
>>>>>>>>>    https://datastax-oss.atlassian.net/browse/SPARKC-596 for
>>>>>>>>>    details)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> blog series on DSBulk:
>>>>>>>>>
>>>>>>>>>    -
>>>>>>>>>    
>>>>>>>>> https://www.datastax.com/blog/2019/03/datastax-bulk-loader-introduction-and-loading
>>>>>>>>>    -
>>>>>>>>>    
>>>>>>>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-more-loading
>>>>>>>>>    -
>>>>>>>>>    
>>>>>>>>> https://www.datastax.com/blog/2019/04/datastax-bulk-loader-common-settings
>>>>>>>>>    -
>>>>>>>>>    
>>>>>>>>> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>>>>>>>>>    -
>>>>>>>>>    https://www.datastax.com/blog/2019/07/datastax-bulk-loader-counting
>>>>>>>>>    -
>>>>>>>>>    
>>>>>>>>> https://www.datastax.com/blog/2019/12/datastax-bulk-loader-examples-loading-other-locations
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 14, 2020 at 1:47 AM Jai Bheemsen Rao Dhanwada <
>>>>>>>>> jaibheem...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I would like to copy some data from one cassandra cluster to
>>>>>>>>>> another cassandra cluster using the CQLSH copy command. Is this the 
>>>>>>>>>> good
>>>>>>>>>> approach if the dataset size on the source cluster is very high(500G 
>>>>>>>>>> -
>>>>>>>>>> 1TB)? If not what is the safe approach? and are there any 
>>>>>>>>>> limitations/known
>>>>>>>>>> issues to keep in mind before attempting this?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> With best wishes,                    Alex Ott
>>>>>>>>> http://alexott.net/
>>>>>>>>> Twitter: alexott_en (English), alexott (Russian)
>>>>>>>>>
>>>>>>>>
>>>
>>> --
>>> With best wishes,                    Alex Ott
>>> http://alexott.net/
>>> Twitter: alexott_en (English), alexott (Russian)
>>>
>>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>

Re: Cqlsh copy command on a larger data set

Reply via email to