Ask for suggestions to de-duplicate data for Cassandra CDC

2017-06-20 Thread Jay Zhuang
Hi, For Cassandra CDC feature: http://cassandra.apache.org/doc/latest/operating/cdc.html The CDC data is duplicated RF number of times. Let's say replication factor is 3 in one DC, the same data will be sent out 3 times. One solution is adding another DC with RF=1, which will be only used for

RE: Best practice to add(bootstrap) multiple nodes to cluster at once

2017-06-20 Thread ZAIDI, ASAD A
adding multiple nodes at once tax system more and caused me issues on existing nodes. I prefer to add one node at a time … From: techpyaasa . [mailto:techpya...@gmail.com] Sent: Tuesday, June 20, 2017 9:32 AM To: user@cassandra.apache.org Subject: Best practice to add(bootstrap) multiple nodes

RE: Secondary Index

2017-06-20 Thread ZAIDI, ASAD A
Hey there – Like other suggested before adding more index , look for opportunity to de-normalize your data model OR create composite keys for your primary index – if that works for you. Secondary index are there so you can leverage them they come with cost. They’re difficult to manage , as you

Best practice to add(bootstrap) multiple nodes to cluster at once

2017-06-20 Thread techpyaasa .
Hi, What is the best practice to add(bootstrap) multiple nodes at once to c* cluster. Using c*-2.1.17 , 2 DCs , 3 groups in each DC Thanks TechPyaasa

Re: Question: Behavior of inserting a list multiple times with same timestamp

2017-06-20 Thread Thakrar, Jayesh
Ok, tried the test again, w/o the TIMESTAMP, and got the expected behavior. Apparently, the INSERT does replace the entire list if no timestamp is specified (as expected). However, if the TIMESTAMP is specified, then it does (what appears to be) an append. But found even more weird issue - see

Re: Large temporary files generated during cleaning up

2017-06-20 Thread Alain RODRIGUEZ
Hi Simon, I know for sure that clean up (like compaction) need to copy the entire SSTable (Data + index) excepted from the part being evicted by the cleanup. As SSTables are immutable, to manipulate (remove) data, cleanup like compaction need to copy the data we want to keep before removing the

Re: Secondary Index

2017-06-20 Thread Eduardo Alonso
Hi: If you model your table with 'status' as the partitiion key you are limiting your cluster. If status only has 5 posible values, every insert will be assigned only to 5 nodes. So, you will not use your cluster resources correctly. create table ks1.sta1(status int,id1 bigint,id2 binint,resp

Re: Secondary Index

2017-06-20 Thread @Nandan@
Hi , Better you can go with denormalized the data based on status. create table ks1.sta1(status int,id1 bigint,id2 binint,resp text,primary key(status,id1)); This will allow you to do as you want.. select * from ks1.sta1 where status = 0 and id1 = 123; Please make sure, that (status and id1)

Re: Secondary Index

2017-06-20 Thread techpyaasa .
Hi ZAIDI, Thanks for reply. Sorry I didn't get your line "You can get away the potential situation by leveraging composite key, if that is possible for you?" How can I get through it?? Like I have a table as below CREATE TABLE ks1.cf1 (id1 bigint, id2 bigint, resp text, status int, PRIMARY KEY