Because the data format has changed, you’ll need to read it out and write it back in again.
This means using either a driver (java, python, c++, etc), or something like spark. In either case, split up the token range so you can parallelize it for significant speed improvements. From: "qihuang.zheng" Reply-To: "user@cassandra.apache.org" Date: Wednesday, October 21, 2015 at 6:18 PM To: user Subject: C* Table Changed and Data Migration with new primary key Hi All: We have a table defined only one partition key and some cluster key. CREATE TABLE test1 ( attribute text, partner text, app text, "timestamp" bigint, event text, PRIMARY KEY ((attribute), partner, app, "timestamp") ) And now we want to split original test1 table to 3 tables like this: test_global : PRIMARY KEY ((attribute), “timestamp") test_partner: PRIMARY KEY ((attribute, partner), "timestamp”) test_app: PRIMARY KEY ((attribute, partner, app), “timestamp”) Why we split original table because when query global data by timestamp desc like this: select * from test1 where attribute=? order by timestamp desc is not support in Cass. As class order by support should use all clustering key. But sql like this: select * from test1 where attribute=? order by partner desc,app desc, timestamp desc can’t query the right global data by ts desc. After Split table we could do globa data query right: select * from test_global where attribute=? order by timestamp desc. Now we have a problem of data migration. As I Know, sstableloader is the most easy way,but could’t deal with different table name. (Am I right?) And cp cmd in cqlsh can’t fit our situation because our data is two large. (10Nodes, one nodes has 400G data) I alos try JavaAPI by query the origin table and then insert into 3 different splited table.But seems too slow Any Solution aboult quick data migration? TKS!! PS: Cass version: 2.0.15 Thanks & Regards, qihuang.zheng
smime.p7s
Description: S/MIME cryptographic signature