Re: Unit Testing Cassandra

2013-06-19 Thread Shahab Yunus
Thanks Edward, Ben and Dean for the pointers. Yes, I am using Java and these sounds promising for unit testing, at least. Regards, Shahab On Wed, Jun 19, 2013 at 9:58 AM, Edward Capriolo wrote: > You really do not need much in java you can use the embedded server. > Hector wrap a simple class a

Re: Data not fully replicated with 2 nodes and replication factor 2

2013-06-19 Thread Wei Zhu
Rob, Thanks. I was not aware of that. So we can avoid repair if there is no hardware failure...I found a blog: http://www.datastax.com/dev/blog/modern-hinted-handoff -Wei - Original Message - From: "Robert Coli" To: user@cassandra.apache.org, "Wei Zhu" Sent: Wednesday, June 19

Re: Performance Difference between Cassandra version

2013-06-19 Thread Franc Carter
On Thu, Jun 20, 2013 at 9:18 AM, Raihan Jamal wrote: > I am trying to see whether there will be any performance difference > between Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly? > > Has anyone seen any major performance difference? > We are part way through a performance compa

Performance Difference between Cassandra version

2013-06-19 Thread Raihan Jamal
I am trying to see whether there will be any performance difference between Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly? Has anyone seen any major performance difference?

error on startup: unable to find sufficient sources for streaming range

2013-06-19 Thread Faraaz Sareshwala
Hi, I couldn't find any information on the following error so I apologize if it has already been discussed. On some of my nodes, I'm getting the following exception when cassandra starts up: 2013-06-19 22:17:39.480414500 Exception encountered during startup: unable to find sufficient sources fo

Re: Data not fully replicated with 2 nodes and replication factor 2

2013-06-19 Thread Robert Coli
On Wed, Jun 19, 2013 at 11:43 AM, Wei Zhu wrote: > I think hints are only stored when the other node is down, not on the > dropped mutations. (Correct me if I am wrong, actually it's not a bad idea > to store hints for dropped mutations and replay them later?) This used to be the way it worked pr

Re: Date range queries

2013-06-19 Thread David McNelis
So, if you want to grab by the created_at and occasionally limit by question id, that is why you'd use created_at. The way the primary keys work is the first part of the primary key is the Partioner key, that field is what essentially is the single cassandra row. The second key is the order prese

Re: Date range queries

2013-06-19 Thread Christopher J. Bottaro
Interesting, thank you for the reply. Two questions though... Why should created_at come before question_id in the primary key? In other words, why (user_id, created_at, question_id) instead of (user_id, question_id, created_at)? Given this setup, all a user's answers (all 10k) will be stored i

Re: timeuuid and cql3 query

2013-06-19 Thread Francisco Andrades Grassi
Hi, I believe what he's recommending is: CREATE TABLE count3 ( counter text, ts timeuuid, key1 text, value int, PRIMARY KEY (counter, ts) ) That way counter will be your partitioning key, and all the rows that have the same counter value will be clustered (stored as a single wide row

Re: Data not fully replicated with 2 nodes and replication factor 2

2013-06-19 Thread Wei Zhu
You have a lot of Dropped Mutations which means those writes might not go through. Since you have CL.ONE as write consistency, your client doesn't see the exception if write fails only on one node. I think hints are only stored when the other node is down, not on the dropped mutations. (Correct

Re: Joining distinct clusters with the same schema together

2013-06-19 Thread Eric Stevens
> > On its face my answer is "not... really"? What do you view yourself as > getting with this technique versus using built in replication? As an > example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM > consistency level operations? Doing replication manually sounds like a recipe for t

Re: Heap is not released and streaming hangs at 0%

2013-06-19 Thread Wei Zhu
If you want, you can try to force the GC through Jconsole. Memory->Perform GC. It theoretically triggers a full GC and when it will happen depends on the JVM -Wei - Original Message - From: "Robert Coli" To: user@cassandra.apache.org Sent: Tuesday, June 18, 2013 10:43:13 AM Subje

Re: Joining distinct clusters with the same schema together

2013-06-19 Thread Robert Coli
On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala wrote: > Each datacenter will have a cassandra cluster with a separate set of seeds > specific to that datacenter. However, the cluster name will be the same. > > Question 1: is this enough to guarentee that the three datacenters will have > dist

Re: timeuuid and cql3 query

2013-06-19 Thread Sylvain Lebresne
So part of it is a bug, namely https://issues.apache.org/jira/browse/CASSANDRA-5666. In summary CQL3 should not accept: ts > minTimeuuid('2013-06-17 22:36:16') and ts < minTimeuuid('2013-06-20 22:44:02'), because it does no know how to handle it properly. What it should support is token(ts) > token

Joining distinct clusters with the same schema together

2013-06-19 Thread Faraaz Sareshwala
My company is planning on deploying cassandra to three separate datacenters. Each datacenter will have a cassandra cluster with a separate set of seeds specific to that datacenter. However, the cluster name will be the same. Question 1: is this enough to guarentee that the three datacenters will h

Re: Date range queries

2013-06-19 Thread David McNelis
I think you'd just be better served with just a little different primary key. If your primary key was (user_id, created_at) or (user_id, created_at, question_id), then you'd be able to run the above query without a problem. This will mean that the entire pantheon of a specific user_id will be st

Re: nodetool ring showing different 'Load' size

2013-06-19 Thread Robert Coli
On Wed, Jun 19, 2013 at 5:47 AM, Michal Michalski wrote: > You can also perform a major compaction via nodetool compact (for > SizeTieredCompaction), but - again - you really should not do it unless > you're really sure what you do, as it compacts all the SSTables together, > which is not somethin

Date range queries

2013-06-19 Thread Christopher J. Bottaro
Hello, We are considering using Cassandra and I want to make sure our use case fits Cassandra's strengths. We have the table like: answers --- user_id | question_id | result | created_at Where our most common query will be something like: SELECT * FROM answers WHERE user_id = 123 AND creat

Re: timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
Note that it seems to work when you structure your schema in this example below, BUT this is a problem because all of my data will wind up hitting a single node in my cassandra cluster because the partitioning key is "counter" and that isn't unique enough. I was hoping that I wasn't going to ne

Re: timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
Here's an example of that not working: cqlsh:Test> desc table count4; CREATE TABLE count4 ( ts timeuuid, counter text, key1 text, value int, PRIMARY KEY (ts, counter) ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.0

Re: timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
Tyler, You're recommending this schema instead, correct? CREATE TABLE count3 ( counter text, ts timeuuid, key1 text, value int, PRIMARY KEY (ts, counter) ) I believe I tried this as well and ran into similar problems but I'll try it again. I'm using the "ByteOrderedPartitioner" if th

Re: Reduce Cassandra GC

2013-06-19 Thread Mohit Anchlia
How much data do you have per node? How much RAM per node? How much CPU per node? What is the avg CPU and memory usage? On Wed, Jun 19, 2013 at 12:16 AM, Joel Samuelsson wrote: > My Cassandra ps info: > > root 26791 1 0 07:14 ?00:00:00 /usr/bin/jsvc -user > cassandra -home /opt

Re: timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
I'm using the byte ordered partitioner. Sent from my iPhone On Jun 19, 2013, at 11:26 AM, "Sylvain Lebresne" mailto:sylv...@datastax.com>> wrote: You're using the ordered partitioner, right? On Wed, Jun 19, 2013 at 5:06 PM, Davide Anastasia mailto:davide.anasta...@gmail.com>> wrote: Hi Tyle

Re: timeuuid and cql3 query

2013-06-19 Thread Sylvain Lebresne
You're using the ordered partitioner, right? On Wed, Jun 19, 2013 at 5:06 PM, Davide Anastasia < davide.anasta...@gmail.com> wrote: > Hi Tyler, > I am interested in this scenario as well: could you please elaborate > further your answer? > > Thanks a lot, > Davide > On 19 Jun 2013 16:01, "Tyler

Re: timeuuid and cql3 query

2013-06-19 Thread Davide Anastasia
Hi Tyler, I am interested in this scenario as well: could you please elaborate further your answer? Thanks a lot, Davide On 19 Jun 2013 16:01, "Tyler Hobbs" wrote: > > On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent wrote: > >> >> CREATE TABLE count3 ( >> counter text, >> ts timeuuid, >> ke

Re: token() function in CQL3 (1.2.5)

2013-06-19 Thread Tyler Hobbs
On Wed, Jun 19, 2013 at 7:47 AM, Ben Boule wrote: > Can anyone explain this to me? I have been looking through the source > code but can't seem to find the answer. > > The documentation mentions using the token() function to change a value > into it's token for use in queries. It always menti

Re: timeuuid and cql3 query

2013-06-19 Thread Tyler Hobbs
On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent wrote: > > CREATE TABLE count3 ( > counter text, > ts timeuuid, > key1 text, > value int, > PRIMARY KEY ((counter, ts)) > ) > Instead of doing a composite partition key, remove a set of parens and let ts be your clustering key. That will c

DC dedicated to Hadoop jobs

2013-06-19 Thread cscetbon.ext
Hi, Our Hadoop jobs will only do READs and we want to restrict reads in this dedicated DC even if performances are bad. What can we do to achieve this goal ? - set dynamic_snitch_badness_threshold to 0.98 on these DC's nodes ? can we have different dynamic_snitch_badness_threshold values on

Re: vnodes ready for production ?

2013-06-19 Thread Jim Ancona
On Tue, Jun 18, 2013 at 4:04 AM, aaron morton wrote: >> Even more if we could automate some up-scale thanks to AWS alarms, It >> would be awesome. > > I saw a demo for Priam (https://github.com/Netflix/Priam) doing that at > netflix in March, not sure if it's public yet. > >> Are the vnodes featur

Re: Unit Testing Cassandra

2013-06-19 Thread Edward Capriolo
You really do not need much in java you can use the embedded server. Hector wrap a simple class around thiscalled EmbeddedServerHelper On Wednesday, June 19, 2013, Ben Boule wrote: > Hi Shabab, > > Cassandra-Unit has been helpful for us for running unit tests without requiring a real cassandra i

timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
I'm experimenting with a data model that will need to ingest a lot of data that will need to be query able by time. In the example below, I want to be able to run a query like "select * from count3 where counter = 'test' and ts > minTimeuuid('2013-06-18 16:23:00') and ts < minTimeuuid('2013-06-

RE: Unit Testing Cassandra

2013-06-19 Thread Ben Boule
Hi Shabab, Cassandra-Unit has been helpful for us for running unit tests without requiring a real cassandra instance to be running. We only use this to test our "DAO" code which interacts with the Cassandra client. It basically starts up an embedded instance of cassandra and fools your clien

Re: Unit Testing Cassandra

2013-06-19 Thread Hiller, Dean
For unit testing, we actually use PlayOrm which has an in-memory version of nosql so we just write unit tests against our code which uses the in-memory version but that is only if you are in java. Later, Dean From: Shahab Yunus mailto:shahab.yu...@gmail.com>> Reply-To: "user@cassandra.apache.org

Re: nodetool ring showing different 'Load' size

2013-06-19 Thread Michal Michalski
You can start compaction via JMX if you need it and you know what you're doing: Find org.apache.cassandra.db:type=CompactionManager MBean and forceUserDefinedCompaction operation in it. First argument is keyspace name, second one is a comma-separated list of SSTables to compact (filename) You

token() function in CQL3 (1.2.5)

2013-06-19 Thread Ben Boule
Can anyone explain this to me? I have been looking through the source code but can't seem to find the answer. The documentation mentions using the token() function to change a value into it's token for use in queries. It always mentions it as taking a single parameter: SELECT * FROM posts

Re: Unit Testing Cassandra

2013-06-19 Thread Shahab Yunus
Thanks Stephen for you reply and explanation. My bad that I mixed those up and wasn't clear enough. Yes, I have different 2 requests/questions. 1) One is for the unit testing. 2) Second (in which I am more interested in) is for performance (stress/load) testing. Let us keep integration aside for

Re: Dropped mutation messages

2013-06-19 Thread Shahab Yunus
Hello Arthur, What do you mean by "The queries need to be lightened"? Thanks, Shahb On Tue, Jun 18, 2013 at 8:47 PM, Arthur Zubarev wrote: > Cem hi, > > as per http://wiki.apache.org/cassandra/FAQ#dropped_messages > > > Internode messages which are received by a node, but do not get not to b

Re: nodetool ring showing different 'Load' size

2013-06-19 Thread Rodrigo Felix
Thanks Eric. Is there a way to start manually compaction operations? I'm thinking about doing after loading data and before start run phase of the benchmark. Thanks. Att. *Rodrigo Felix de Almeida* LSBD - Universidade Federal do Ceará Project Manager MBA, CSM, CSPO, SCJP On Mon, Jun 17, 2013 at

Re: Real Use Cases in Cassandra !!!

2013-06-19 Thread Elliot Thompson
Visit Planet cassandra website.. hosted by datastax.. On 19 Jun 2013, at 13:21, Romain HARDOUIN wrote: > Hi, > > Have a look at DataStax's customers: http://www.datastax.com/customers > > > varadaraja...@polarisft.com a écrit sur 19/06/2013 12:48:50 : > > > De : varadaraja...@polarisft.com

RE: Real Use Cases in Cassandra !!!

2013-06-19 Thread Romain HARDOUIN
Hi, Have a look at DataStax's customers: http://www.datastax.com/customers varadaraja...@polarisft.com a écrit sur 19/06/2013 12:48:50 : > De : varadaraja...@polarisft.com > A : user@cassandra.apache.org, > Date : 19/06/2013 12:49 > Objet : Real Use Cases in Cassandra !!! > > > Team, > >

Real Use Cases in Cassandra !!!

2013-06-19 Thread varadarajan . v
Team, Can anyone share real use cases in Cassandra? Thanks & Regards, Varada Solution Architect/Business Information Management Services Practice Polaris Financial Technology Limited 6th Floor, West Wing, Nxt lvl, Navalur W:044-33418000*8613 M:9791700984 : VOIP:90-8613 E:varadaraja...@

RE: TTL can't be speciefied at column level using CQL 3 in Cassandra 1.2.x

2013-06-19 Thread Amresh Kumar Singh
Thanks Sylvian, I am working on a high level client (Kundera) which, if users want, should be able to achieve this, even if that's uncommon. Writing Update Batch CQL is an approach that works, as you are saying performance is not impacted. In my opinion, an *optional* "USING TTL" with column v

Re: TTL can't be speciefied at column level using CQL 3 in Cassandra 1.2.x

2013-06-19 Thread Sylvain Lebresne
Hi, > But CQL3 doesn't provide a way for this. That's not true. But the syntax is probably a bit more verbose than what you were hoping for. Your example (where I assume user_name is you partition key) can be achieved with: BEGIN BATCH UPDATE users SET password = 'aa' WHERE user_name='x

TTL can't be speciefied at column level using CQL 3 in Cassandra 1.2.x

2013-06-19 Thread Amresh Kumar Singh
Hi, Using Thrift, we are allowed to specify different TTL values for each columns in a row. But CQL3 doesn't provide a way for this. For instance, this is allowed: INSERT INTO users (user_name, password, gender, state) VALUES ('xamry2, 'aa', 'm', 'UP') using TTL 5; But something like

Re: Reduce Cassandra GC

2013-06-19 Thread Fabrice Facorat
2013/6/19 Takenori Sato : > GC options are not set. You should see the followings. > > -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure > -Xloggc:/var/log/cassandra/gc-1371603607.log > >> Is it normal to have two processes like this? > > No. You are running two processes. It's "normal" as this i

RE: Data not fully replicated with 2 nodes and replication factor 2

2013-06-19 Thread James Lee
The test tool I am using catches any exceptions on the original writes and resubmits the write request until it's successful (bailing out after 5 failures). So for each key Cassandra has reported a successful write. Nodetool says the following - I'm guessing the pending hinted handoff is the

Re: Reduce Cassandra GC

2013-06-19 Thread Joel Samuelsson
Right, after getting the GC logging information I tested upgrading to 1.2. Didn't help but I forgot to reenable the GC options. > No. You are running two processes. Ok, that's weird. I am using an unmodified version of a startup script in /etc/init.d/cassandra from the Debian package. Here's some

Re: Reduce Cassandra GC

2013-06-19 Thread Takenori Sato
GC options are not set. You should see the followings. -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc-1371603607.log > Is it normal to have two processes like this? No. You are running two processes. On Wed, Jun 19, 2013 at 4:16 PM, Joel Samuelsson wrote: > M

Rolling upgrade from 1.1.12 to 1.2.5 visibility issue

2013-06-19 Thread Polytron Feng
Hi, We are trying to roll upgrade from 1.0.12 to 1.2.5, but we found that the 1.2.5 node cannot see other old nodes. Therefore, we tried to upgrade to 1.1.12 first, and it works. However, we still saw the same issue when rolling upgrade from 1.1.12 to 1.2.5. This seems to be the fixed issue as htt

Re: Reduce Cassandra GC

2013-06-19 Thread Joel Samuelsson
My Cassandra ps info: root 26791 1 0 07:14 ?00:00:00 /usr/bin/jsvc -user cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log -cp /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4

Re: Unit Testing Cassandra

2013-06-19 Thread Stephen Connolly
Unit testing means testing in isolation the smallest part. Unit tests should not take more than a few milliseconds to set up and verify their assertions. As such, if your code is not factored well for testing, you would typically use mocking (either by hand, or with mocking libraries) to mock out