RE: Slender Cassandra Cluster Project

2018-01-30 Thread Kenneth Brotman
Hi Yuri, If possible I will do everything with AWS Cloudformation. I'm working on it now. Nothing published yet. Kenneth Brotman -Original Message- From: Yuri Subach [mailto:ysub...@gmail.com] Sent: Tuesday, January 30, 2018 7:02 PM To: user@cassandra.apache.org Subject: RE: Slender

Re: Nodes show different number of tokens than initially

2018-01-30 Thread Dikang Gu
What's the partitioner you use? We have logic to prevent duplicate tokens. private static Collection adjustForCrossDatacenterClashes(final TokenMetadata tokenMetadata, StrategyAdapter strategy, Collection tokens) { List filtered = Lists.newArrayListWithCapacity(tokens.size()); for (Token

RE: Slender Cassandra Cluster Project

2018-01-30 Thread Yuri Subach
Hi Kenneth, I like this project idea! A couple of questions: - What tools are you going to use for AWS cluster setup? - Do you have anything published already (github)? On 2018-01-22 22:42:11, Kenneth Brotman wrote: > Thanks Anthony! I’ve made a note to include that information in the > docum

Re: CDC usability and future development

2018-01-30 Thread Jeff Jirsa
Here's a deck of some proposed additions, discussed at one of the NGCC sessions last fall: https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf On Tue, Jan 30, 2018 at 5:10 PM, Andrew Prudhomme wrote: > Hi all, > > We are currently designing a system that allows our Cassand

CDC usability and future development

2018-01-30 Thread Andrew Prudhomme
Hi all, We are currently designing a system that allows our Cassandra clusters to produce a stream of data updates. Naturally, we have been evaluating if CDC can aid in this endeavor. We have found several challenges in using CDC for this purpose. CDC provides only the mutation as opposed to the

RE: group by select queries

2018-01-30 Thread Modha, Digant
It was local quorum. There’s no difference with CONSISTENCY ALL. Consistency level set to LOCAL_QUORUM. cassandra@cqlsh> select * from wp.position where account_id = 'user_1'; account_id | security_id | counter | avg_exec_price | pending_quantity | quantity | transaction_id | update_time -

Not what I‘ve expected Performance

2018-01-30 Thread Jürgen Albersdorfer
Hi, We are using C* 3.11.1 with a 9 Node Cluster built on CentOS Servers eac= h having 2x Quad Core Xeon, 128GB of RAM and two separate 2TB spinning Disks= , one for Log one for Data with Spark on Top. Due to bad Schema (Partitions of about 4 to 8 GB) I need to copy a whole Tab= le into another ha

Re: Commitlogs are filling the Full Disk space and nodes are down

2018-01-30 Thread Jeff Jirsa
There's an open bug for users that have offheap memtables and secondary index - there's at least a few people reporting an error flushing that blocks future flushes. If you're seeing that, and use that combo, you may want to switch to on-heap memtables (or contribute a patch to fix the offheap+2i

Re: Commitlogs are filling the Full Disk space and nodes are down

2018-01-30 Thread Chris Lohfink
The commitlog growing is often a symptom of a problem. If the memtable flush or post flush fails in anyway, the commitlogs will not be recycled/deleted and will continue to pool up. Might want to go back in logs earlier to make sure theres nothing like the postmemtable flusher getting a permiss

Re: Nodes show different number of tokens than initially

2018-01-30 Thread Jeff Jirsa
All DCs in a cluster use the same token space in the DHT, so token conflicts across datacenters are invalid config -- Jeff Jirsa > On Jan 29, 2018, at 11:50 PM, Oleksandr Shulgin > wrote: > >> On Tue, Jan 30, 2018 at 5:13 AM, kurt greaves wrote: >> Shouldn't happen. Can you send through

Re: Heavy one-off writes best practices

2018-01-30 Thread Jeff Jirsa
Two other options, both of which will be faster (and less likely to impact read latencies) but require some app side programming, if you’re willing to generate the sstables programmatically with CQLSSTableWriter or similar. Once you do that, you can: 1) stream them in with the sstableloader (wh

Re: TWCS not deleting expired sstables

2018-01-30 Thread Thakrar, Jayesh
Thanks Kurt and Kenneth. Now only if they would work as expected. node111.ord.ae.tsg.cnvr.net:/ae/disk1/data/ae/raw_logs_by_user-f58b9960980311e79ac26928246f09c1>ls -lt | tail -rw-r--r--. 1 vchadoop vchadoop286889260 Sep 18 14:14 mc-1070-big-Index.db -rw-r--r--. 1 vchadoop vchadoop12

Re: Heavy one-off writes best practices

2018-01-30 Thread Lucas Benevides
Hello Julien, After reading the excelent post and video by Alain Rodriguez, maybe you should read the paper Performance Tuning of Big Data Platform: Cassandra Case Study by SATHVIK KATAM. In the results he sets new values

RE: TWCS not deleting expired sstables

2018-01-30 Thread Kenneth Brotman
Wow! It’s in the DataStax documentation: https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/toolsSStables/toolsSStabExpiredBlockers.html Other nice tools there as well: https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/toolsSStables/toolsSSTableUt

RE: Commitlogs are filling the Full Disk space and nodes are down

2018-01-30 Thread Amit Singh
Hi, When you actually say nodetool flush, data from memTable goes to disk based structure as SStables and side by side , commit logs segments for that particular data get written off and its continuous process . May be in your case , you can decrease the value of below uncommented property in

RE: Cassandra nodes are down

2018-01-30 Thread Amit Singh
Hello, Please check in debug logs for detailed trace, here exact reason can't be figure out. Try your luck there. From: Mokkapati, Bhargav (Nokia - IN/Chennai) [mailto:bhargav.mokkap...@nokia.com] Sent: Monday, January 29, 2018 11:09 PM To: user@cassandra.apache.org Cc: mbhargavna...@gmail

Commitlogs are filling the Full Disk space and nodes are down

2018-01-30 Thread Mokkapati, Bhargav (Nokia - IN/Chennai)
Hi Team, My Cassandra version : Apache Cassandra 3.0.13 Cassandra nodes are down due to Commitlogs are getting filled up until full disk size. [cid:image001.jpg@01D399E3.666CF940] With "Nodetool flush" I didn't see any commitlogs deleted. Can anyone tell me how to flush the commitlogs without

RE: Cleanup blocking snapshots - Options?

2018-01-30 Thread Steinmaurer, Thomas
Hi Kurt, had another try now, and yes, with 2.1.18, this constantly happens. Currently running nodetool cleanup on a single node in production with disabled hourly snapshots. SSTables with > 100G involved here. Triggering nodetool snapshot will result in being blocked. From an operational persp

Re: Heavy one-off writes best practices

2018-01-30 Thread Alain RODRIGUEZ
ll have to hit a lot of files thus making an increasing number of reads. The throughput should be set to a value that is fast enough to keep up with compactions. If you really have to rewrite 100% of the data, every day, I would suggest you to create 10 new tables every day instead of rewriting exis

Re: Heavy one-off writes best practices

2018-01-30 Thread Alain RODRIGUEZ
put should be set to > a value that is fast enough to keep up with compactions. > > If you really have to rewrite 100% of the data, every day, I would suggest > you to create 10 new tables every day instead of rewriting existing data. > Writing a new table 'MyAwesomeTable-20180130' for

Heavy one-off writes best practices

2018-01-30 Thread Julien Moumne
Hello, I am looking for best practices for the following use case : Once a day, we insert at the same time 10 full tables (several 100GiB each) using Spark C* driver, without batching, with CL set to ALL. Whether skinny rows or wide rows, data for a partition key is always completely updated / ov