Hi,
   We have a Cassandra database column family containing 320 millions rows
and each row contains about 15 columns. We want to take monthly dump of
this single column family contained in this database in text format.

We are planning to take following approach to implement this functionality
1. Take a snapshot of Cassandra database using nodetool utility. We specify
-cf flag to
     specify column family name so that snapshot contains data
corresponding to a single
     column family.
2. We take backup of this snapshot and move this backup to a separate
physical machine.
3. We using "SStable to json conversion" utility to json convert all the
data files into json
    format.

We have following questions/doubts regarding the above approach
a) Generated json records contains "d" (IS_MARKED_FOR_DELETE) flag in json
record
     and can I safely ignore all such json records?
b) If I ignore all records marked by "d" flag, than can generated json
files in step 3, contain
    duplicate records? I mean do multiple entries for same key.

Do there can be any other better approach to generate data dumps in text
format.

Regards,
Gaurav

Reply via email to