slow read for cassandra time series

2013-05-29 Thread michael
I have a slow query that is making me think I don't understand the data model for time series: select asset, returns from marketData where date >= 20130101 and date <= 20130110 allow filtering; create table marketData { asset varchar, returns double, date timestamp, PRIMARY KEY(asset, date) }

Re: Cassandra performance decreases drastically with increase in data size.

2013-05-29 Thread Jonathan Ellis
Sounds like you're spending all your time in GC, which you can verify by checking what GCInspector and StatusLogger say in the log. Fix is increase your heap size or upgrade to 1.2: http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 On Wed, May 29, 2013 at 11:32 PM, srmore

Cassandra performance decreases drastically with increase in data size.

2013-05-29 Thread srmore
Hello, I am observing that my performance is drastically decreasing when my data size grows. I have a 3 node cluster with 64 GB of ram and my data size is around 400GB on all the nodes. I also see that when I re-start Cassandra the performance goes back to normal and then again starts decreasing af

Re: Cleanup understastanding

2013-05-29 Thread Takenori Sato
> But, that is still awkward. Does cleanup take so much disk space to complete the compaction operation? In other words, twice the size? Not really, but logically yes. According to 1.0.7 source, cleanup checks if there's enough space that is larger than the worst scenario as below. If not, the ex

Re: Does replicate_on_write=true imply that CL.QUORUM for reads is unnecessary?

2013-05-29 Thread Andrew Bialecki
To answer my own question, directly from the docs: http://www.datastax.com/docs/1.0/configuration/storage_configuration#replicate-on-write. It appears the answer to this is: "Yes, CL.QUORUM isn't necessary for reads." Essentially, replicate_on_write sets the CL to ALL regardless of what you actuall

Re: Cassandra 1.2.5 RPM availability

2013-05-29 Thread Blair Zajac
On 05/29/2013 02:57 AM, Gabriel Ciuloaica wrote: Hi, When will 1.2.5 RPM be available into Datastax repo? Looks like it's there now: http://rpm.datastax.com/community/noarch/ Blair

1.2 tuning

2013-05-29 Thread Darren Smythe
Lots of possible "issues" with high write load and not sure if it means we need more nodes or if the nodes aren't tuned correctly. Were using 4 EC2 xlarge instances to support 4 medium instances. We're getting about 10k inserts/sec, but after about 10 minutes it goes down to about 7k/sec which see

Re: how to handle join properly in this case

2013-05-29 Thread Hiller, Dean
There is cassandra partitioning which puts all of one partition on a single node. Playorm's partitions are virtual which we needed way more since it is likely we want 5000 rows from a partition and playorm ends up reading from X disks instead of one disk for better performance. Then we leverage t

Re: how to handle join properly in this case

2013-05-29 Thread Jiaan Zeng
Thanks for all the comments and thoughts! I think Hiller points out a promising direction. I wonder if the partition and filter are features shipped with Cassandra or features came from PlayOrm. Any resources about that would be appreciated. Thanks! On Tue, May 28, 2013 at 11:39 AM, Hiller, Dean

Re: Starting up Cassandra occurred errors after upgrading Cassandra to 1.2.5 from 1.0.12

2013-05-29 Thread Robert Coli
On Wed, May 29, 2013 at 12:32 AM, Colin Kuo wrote: > We followed the upgrade > guide(http://www.datastax.com/docs/1.2/install/upgrading) from Datastax web > site and upgraded Cassadra to 1.2.5, but it occurred errors in system.log > when starting up. In general you should not upgrade more than on

Re: Cassandra on a single (under-powered) instance?

2013-05-29 Thread Tyler Hobbs
You can get away with a 1 to 2GB heap if you don't put too much pressure on it. I commonly run stress tests against a 400M heap node while developing and I almost never see OutOfMemory errors, but I'm not keeping a close eye on latency and throughput, which will be impacted when the JVM GC is runn

Re: random thoughts for MUCH faster key lookup in cassandra

2013-05-29 Thread Hiller, Dean
I forgot the random partitioner can be switched out. We don't use ordered partitioner so I had forgotten about that one. I guess it could only be a random partitioner type option :(. I think 80% of projects use random partitioner though, right? In fact, we use PlayOrm queries so the indice a

Re: random thoughts for MUCH faster key lookup in cassandra

2013-05-29 Thread Andy Twigg
How would you implement range queries? On 29 May 2013 17:49, Hiller, Dean wrote: > We recently ran into too much data in one CF because LCS can't really run > in parallel on one CF in a single tier which got me thinking, why doesn't > the CF directoy have 100 or 1000 directories 0-999 and cass

Re: Cassandra read reapair

2013-05-29 Thread Kais Ahmed
Thanks aaron 2013/5/28 aaron morton > Start using QUOURM for reads and writes and then run a nodetool repair. > > That should get you back to the land of the consistent. > > Cheers > > - > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.

random thoughts for MUCH faster key lookup in cassandra

2013-05-29 Thread Hiller, Dean
We recently ran into too much data in one CF because LCS can't really run in parallel on one CF in a single tier which got me thinking, why doesn't the CF directoy have 100 or 1000 directories 0-999 and cassandra hash the key to which directory it would go in and then put it in one of the sstabl

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Hiller, Dean
Something we just ran into with compaction and timeseries data. We have 60,000 virtual tables(playorm virtual tables) inside ONE CF. This unfortunately hurt our compaction with LCS since it can't be parallized for a single tier. We should have had 10 CF's called data0, data1, data2 Š.data9 such

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Jabbar Azam
Hello Cem, You can get a similar effect by specifying a TTL value for data you save to a table. If the data becomes older than the TTL value then it will automatically be deleted by C* Thanks Jabbar Azam On 29 May 2013 17:01, cem wrote: > Thank you very much for the fast answer. > > Does pl

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Hiller, Dean
Nope, partitioning is done per CF in PlayOrm. Dean From: cem mailto:cayiro...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Wednesday, May 29, 2013 10:01 AM To: "user@cassandra.apache.org

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread cem
Thank you very much for the fast answer. Does playORM use different column families for each partition in Cassandra? Cem On Wed, May 29, 2013 at 5:30 PM, Jeremy Powell wrote: > Cem, yes, you can do this with C*, though you have to handle the logic > yourself (other libraries might do this for

Re: USING CONSISTENCY WHILE SELECT

2013-05-29 Thread Michael Kjellman
Consistency is no longer query level but now session level in 1.2.0+. Change the consistency first. Then issue your select/update/insert query. Cheers, Michael On May 29, 2013, at 7:06 AM, "Chandana Tummala" wrote: > Hi Team, > > I am using datastax cassandra community edition version(1

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Jeremy Powell
Cem, yes, you can do this with C*, though you have to handle the logic yourself (other libraries might do this for you, seen the dev of playORM discuss some things which might be similar). We use Astyanax and programmatically create CFs based on a time period of our choosing that makes sense for o

Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread cem
Hi All, I used time range partitions 5 years ago with MySQL to clean up data much faster. I had a big FACT table with time range partitions and it was very is to drop old partitions (with archiving) and do some saving on disk. Has anyone implemented such a thing in Cassandra? It would be great i

Re: data clean up problem

2013-05-29 Thread cem
Thanks for the answers! Cem On Wed, May 29, 2013 at 1:26 AM, Robert Coli wrote: > On Tue, May 28, 2013 at 2:38 PM, Bryan Talbot > wrote: > > I think what you're asking for (efficient removal of TTL'd write-once > data) > > is already in the works but not until 2.0 it seems. > > If your entir

USING CONSISTENCY WHILE SELECT

2013-05-29 Thread Chandana Tummala
Hi Team, I am using datastax cassandra community edition version(1.2.4) with three node cluster using property file snitch . I have created a keyspace with network topology strategy with replication factor 3 for the datacentres. topology properties file is something like this 127.0.0.1=DC1:RAC1 1

Does replicate_on_write=true imply that CL.QUORUM for reads is unnecessary?

2013-05-29 Thread Andrew Bialecki
Quick question about counter columns. In looking at the replicate_on_write setting, assuming you go with the default of "true", my understanding is it writes the increment to all replicas on any increment. If that's the case, doesn't that mean there's no point in using CL.QUORUM for reads because

Re: Cleanup understastanding

2013-05-29 Thread Víctor Hugo Oliveira Molinar
Thanks for the answers. I got it. I was using cleanup, because I thought it would delete the tombstones. But, that is still awkward. Does cleanup take so much disk space to complete the compaction operation? In other words, twice the size? *Atenciosamente,* *Víctor Hugo Molinar - *@vhmolinar

Cassandra 1.2.5 RPM availability

2013-05-29 Thread Gabriel Ciuloaica
Hi, When will 1.2.5 RPM be available into Datastax repo? Thanks, Gabi

Starting up Cassandra occurred errors after upgrading Cassandra to 1.2.5 from 1.0.12

2013-05-29 Thread Colin Kuo
Hi All, We followed the upgrade guide( http://www.datastax.com/docs/1.2/install/upgrading) from Datastax web site and upgraded Cassadra to 1.2.5, but it occurred errors in system.log when starting up. After digging into code level, it looks like Cassandra found the file length of IndexSummary sst