Achilles not picking cassandra ConsistencyLevel

2016-03-11 Thread Raman Gugnani
Hi All, We are using achilles in our code. We are setting Consistency level as below, but is not being set.Does anyone else have faced this issue.We have three nodes in our cluster. PoolingOptions poolingOptions = new PoolingOptions();

Compaction Filter in Cassandra

2016-03-11 Thread Dikang Gu
Hello there, RocksDB has the feature called "Compaction Filter" to allow application to modify/delete a key-value during the background compaction. https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226 I'm wondering is there a plan/value to add this into C* as well?

Re: How can I make Cassandra stable in a 2GB RAM node environment ?

2016-03-11 Thread Hiroyuki Yamada
Thank you all to respond and discuss my question. I agree with you all basically, but, I think, in Cassandra case, it seems a matter of how much data we use with how much memory we have. As Jack's (and datastax's) suggestion, I also used 4GM RAM machine (t2.medium) with 1 billion records (about

Query regarding CassandraJavaRDD while running spark job on cassandra

2016-03-11 Thread Siddharth Verma
In cassandra I have a table with the following schema. CREATE TABLE my_keyspace.my_table1 ( col_1 text, col_2 text, col_3 text, col_4 text,, col_5 text, col_6 text, col_7 text, PRIMARY KEY (col_1, col_2, col_3) ) WITH CLUSTERING ORDER BY (col_2 ASC, col_3 ASC);

Re: What is wrong in this token function

2016-03-11 Thread Matt Kennedy
The conversation around the partitioner sidetracks a bit from your original question. You originally asked: >> Business case: Show me all events for a given customer in a given time frame In RDBMS it will be (Query1) where customer_id = '289' and event_time >= '2016-03-01 18:45:00+' and

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jack Krupansky
Thanks, that level of query detail gives us a better picture to focus on. I think through this some more over the weekend. Also, these queries focus on raw, bulk retrieval of sensor data readings, but do you have reading-based queries, such as range of an actual sensor reading? -- Jack Krupansky

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Carlos Alonso
Hi Jason, If I understand correctly you have no problems with the size of your partitions or transactional queries but with the 'identification' of them when having to do analytical queries. I'd then suggest two options: 1. Keep using Cassandra and store the first 'bucket' of each sensor in a

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jason Kania
The 5000 readings mentioned would be against a single sensor on a single sensor unit. The scope of the queries on this table is intended to be fairly simple. Here are some example queries, without 'sharding', that we would perform on this table: SELECT "time","readings" FROM

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jack Krupansky
Thanks for the additional information, but there is still not enough color on the queries and too much focus on a premature data model. Is this 5000 readings for a single sensor of a single sensor unit, or for all sensors of a specified unit, or... both? I presume you want "next" and "previous"

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jason Kania
Jack, Thanks for the response. We are targeting our database design to 1 sensor units and each sensor unit has 32 sensors. We are seeing about 700 events per day per sensor, each providing about 2K of data. Based on keeping each partition to about 10 Mb (based on readings we saw on

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-11 Thread Jack Krupansky
I'll stay away from advising on a specific schema per se, but I'll stick to the advice that you need to make sure that your queries are depending solely on the columns of the primary key or relatively short slices/scans, rather than run the risk of very long scans or having to process multiple

Re: JMX liveSSTableCount

2016-03-11 Thread Hazel Bobins
Just using the standard jmeter, the mbean you reference is in org.apache.cassandra.metrics, however, it used to be in org.apache.cassandra.db, guess it moved. Thanks H On 11/03/2016 19:25, Robert Coli wrote: On Fri, Mar 11, 2016 at 10:04 AM, Hazel Bobins

Re: JMX liveSSTableCount

2016-03-11 Thread Robert Coli
On Fri, Mar 11, 2016 at 10:04 AM, Hazel Bobins wrote: > Does anyone know if the removal of the liveSSTableCount JMX attribute > from the 'org.apache.cassandra.db:type=Tables,keyspaces=' mbean was > intentional in 3.x? I can not see reference to its removal in any Jira etc >

this week in cassandra - 3.0 storage engine

2016-03-11 Thread Jonathan Haddad
All, The last month or so we've been doing weekly posting & commentary to Planet Cassandra in a "This Week in Cassandra" theme, similar to some other weekly tech blogs & podcasts. This week we had Aaron Morton & Tyler Hobbs, talking about 3.4, some upcoming Thread Per Core improvements, and the

Re: Using User Defined Functions in UPDATE queries

2016-03-11 Thread Kim Liu
Thank you for the clarification. —Kim From: Sylvain Lebresne > Reply-To: "user@cassandra.apache.org" > Date: Friday, March 11, 2016 at 10:05 To:

Re: Using User Defined Functions in UPDATE queries

2016-03-11 Thread Sylvain Lebresne
On Fri, Mar 11, 2016 at 5:09 PM, Kim Liu wrote: > Just for sake of clarification, then, what is the use-case for having UDFs > in an UPDATE? > Honestly, it's merely there for convenience when you use things like cqlsh for instance. > > If they cannot read data from

JMX liveSSTableCount

2016-03-11 Thread Hazel Bobins
Hello- Does anyone know if the removal of the liveSSTableCount JMX attribute from the 'org.apache.cassandra.db:type=Tables,keyspaces=' mbean was intentional in 3.x? I can not see reference to its removal in any Jira etc Cheers H --- This email has been checked for viruses by Avast antivirus

Re: Cassandra causing OOM Killer to strike on new cluster running 3.4

2016-03-11 Thread Adam Plumb
Here is the creation syntax for the entire schema. The xyz table has about 2.1 billion keys and the def table has about 230 million keys. Max row size is about 3KB, mean row size is 700B. CREATE KEYSPACE abc WITH replication = {'class': 'NetworkTopologyStrategy', > 'us-east': 3}; > CREATE TABLE

Re: Cassandra causing OOM Killer to strike on new cluster running 3.4

2016-03-11 Thread Jack Krupansky
What is your schema and data like - in particular, how wide are your partitions (number of rows and typical row size)? Maybe you just need (a lot) more heap for rows during the repair process. -- Jack Krupansky On Fri, Mar 11, 2016 at 11:19 AM, Adam Plumb wrote: > These are

Re: Cassandra causing OOM Killer to strike on new cluster running 3.4

2016-03-11 Thread Adam Plumb
These are brand new boxes only running Cassandra. Yeah the kernel is what is killing the JVM, and this does appear to be a memory leak in Cassandra. And Cassandra is the only thing running, aside from the basic services needed for Amazon Linux to run. On Fri, Mar 11, 2016 at 11:17 AM, Sebastian

Re: Cassandra causing OOM Killer to strike on new cluster running 3.4

2016-03-11 Thread Sebastian Estevez
Sacrifice child in dmesg is your OS killing the process with the most ram. That means you're actually running out of memory at the Linux level outside of the JVM. Are you running anything other than Cassandra on this box? If so, does it have a memory leak? all the best, Sebastián On Mar 11,

Cassandra causing OOM Killer to strike on new cluster running 3.4

2016-03-11 Thread Adam Plumb
I've got a new cluster of 18 nodes running Cassandra 3.4 that I just launched and loaded data into yesterday (roughly 2TB of total storage) and am seeing runaway memory usage. These nodes are EC2 c3.4xlarges with 30GB RAM and the heap size is set to 8G with a new heap size of 1.6G. Last night I

Re: Using User Defined Functions in UPDATE queries

2016-03-11 Thread Kim Liu
Just for sake of clarification, then, what is the use-case for having UDFs in an UPDATE? If they cannot read data from the data store, then all of the parameters to the UDF must be supplied by the client, correct? If the client has all the parameters, the client could perform the equivalent

Re: Using User Defined Functions in UPDATE queries

2016-03-11 Thread Sylvain Lebresne
UDF are usable in UPDATE statement as actually trying them shows, it's just the documented grammar that needs fixing. But as far as doing something like: UPDATE test_table SET data=max_int(data,5) WHERE idx='abc’; this is indeed *not* supported and likely never will. One big pillar of C* design