Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread John Sanda
The session.execute blocks until the C* returns the response. Use the async version, but do so with caution. If you don't throttle the requests, you will start seeing timeouts on the client side pretty quickly. For throttling I've used a Semaphore, but I think Guava's RateLimiter is better suited.

Re: nodetool repair keeping an empty cluster busy

2013-12-10 Thread Sven Stark
Corollary: what is getting shipped over the wire? The ganglia screenshot shows the network traffic on all the three hosts on which I ran the nodetool repair. [image: Inline image 1] remember UN 10.1.2.11 107.47 KB 256 32.9% 1f800723-10e4-4dcd-841f-73709a81d432 rack1 UN 10.1.2.10 127.

nodetool repair keeping an empty cluster busy

2013-12-10 Thread Sven Stark
Howdy! Not a matter of life or death, just curious. I've just stood up a three node cluster (v1.2.8) on three c3.2xlarge boxes in AWS. Silly me forgot the correct replication factor for one of the needed keyspaces. So I changed it via cli and ran a nodetool repair. Well .. there is no data at all

About Cassandra-Hadoop(Pig) Integration issue

2013-12-10 Thread pradeep kumar
Hello Cassandra users, For one of our our new Big data BI projects, we are using Apache Cassandra 1.2.10 as our primary data store with the support of Hadoop for analytics. For prototyping purpose we have 1 node each for Apache Cassandra/Hadoop. Pig is our choice to process the data from/to C*. B

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread graham sanderson
I can’t speak for Astyanax; their thrift transport I believe is abstracted out, however the object model is very CF wide row vs table-y. I have no idea what the plans are for further Astyanax dev (maybe someone on this list), but I believe the thrift API is not going away, so considering Astyan

2 nodes cassandra cluster raid10 or JBOD

2013-12-10 Thread cem
Hi all, I need to setup 2 nodes Cassandra cluster. I know that Datastax recommends using JBOD as a disk configuration and have replication for the redundancy. I was planning to use RAID 10 but using JBOD can save 50% disk space and increase the performance . But I am not sure I should use JBOD wit

Re: list all nodes as seeds (excluding self)

2013-12-10 Thread Anne Sullivan
Comments added, not sure about the usefulness seeing as the issue is already "resolved" :)  -Anne On 12/10/2013 03:12 PM, Robert Coli wrote: On Tue, Dec 10, 2013 at 5:58 AM, Anne Sullivan wro

Re: Murmur Long.MIN_VALUE token allowed?

2013-12-10 Thread Robert Coli
On Tue, Dec 10, 2013 at 12:15 AM, horschi wrote: > And my feeling gets worse when I look at Murmur3Partitioner.normalize(). > This one explicitly excludes Long.MIN_VALUE by changing it to > Long.MAX_VALUE. > > I think I'll just avoid it in the future. Better safe than sorry... > I see, your ques

Re: list all nodes as seeds (excluding self)

2013-12-10 Thread Robert Coli
On Tue, Dec 10, 2013 at 5:58 AM, Anne Sullivan < anne.b.sulli...@alcatel-lucent.com> wrote: > My understanding is that a node won't auto-bootstrap if it thinks it's a > seed node. So when adding a new node to an existing cluster, I want to > make sure it will auto-bootstrap, and I don't want to

Re: Try to configure commitlog_archiving.properties

2013-12-10 Thread Robert Coli
On Tue, Dec 10, 2013 at 1:45 AM, Bonnet Jonathan < jonathan.bon...@externe.bnpparibas.com> wrote: > Taking a look to the code ? i'm not a develloper but a DBA, where should i > look ? Thank you. > In all seriousness, if you plan to operate Cassandra, get used to the idea of reading Java source co

Re: Drop keyspace via CQL hanging on master/trunk.

2013-12-10 Thread Brian O'Neill
Great. Thanks Aaron. FWIW, I am/was porting Virgil over CQL. I should be able to release a new REST API for C* (using CQL) shortly. -brian --- Brian O'Neill Chief Architect Health Market Science The Science of Better Results 2700 Horizon Drive • King of Prussia, PA • 19406 M: 215.588.6024 •

Re: Drop keyspace via CQL hanging on master/trunk.

2013-12-10 Thread Aaron Morton
Looks like a bug, will try to fix today https://issues.apache.org/jira/browse/CASSANDRA-6472 Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 6/12/2013, at 10:25 am, Brian O'Neill wrote

Re: Recurring actions with 4 hour interval

2013-12-10 Thread Nate McCall
Michael has a good point about the system tables - particularly hints and batching (though neither should be a real tax unless you have bigger issues). If you have a monitoring system, add in the flush counters for these and other higher traffic tables and see if there is a correlation at the four

Re: Exactly one wide row per node for a given CF?

2013-12-10 Thread Laing, Michael
You could shard your rows like the following. You would need over 100 shards, possibly... so testing is in order :) Michael -- put this in and run using 'cqlsh -f DROP KEYSPACE robert_test; CREATE KEYSPACE robert_test WITH replication = { 'class': 'SimpleStrategy', 'replication_facto

Re: Exactly one wide row per node for a given CF?

2013-12-10 Thread onlinespending
Where you’ll run into trouble is with compaction. It looks as if pord is some sequentially increasing value. Try your experiment again with a clustering key which is a bit more random at the time of insertion. On Dec 10, 2013, at 5:41 AM, Robert Wille wrote: > I have a question about this s

Re: Exactly one wide row per node for a given CF?

2013-12-10 Thread onlinespending
comments below On Dec 9, 2013, at 11:33 PM, Aaron Morton wrote: >> But this becomes troublesome if I add or remove nodes. What effectively I >> want is to partition on the unique id of the record modulus N (id % N; where >> N is the number of nodes). > This is exactly the problem consistent ha

Re: Try to configure commitlog_archiving.properties

2013-12-10 Thread Bonnet Jonathan
Thanks a lot, It Works, i see commit log bein archived. I'll try tomorrow the restore command. Thanks again. Bonnet Jonathan.

Re: Recurring actions with 4 hour interval

2013-12-10 Thread Laing, Michael
2.0.3: system tables have a 1 hour memtable_flush_period which I have observed to trigger compaction on the 4 hour mark. Going by memory tho... -ml On Tue, Dec 10, 2013 at 10:31 AM, Andre Sprenger wrote: > As far as I know there is nothing hard coded in Cassandra that kicks in > every 4 hours. T

Re: Recurring actions with 4 hour interval

2013-12-10 Thread Andre Sprenger
As far as I know there is nothing hard coded in Cassandra that kicks in every 4 hours. Turn on GC logging, maybe dump the output of jstats to a file and correlate this data with the Cassandra logs. Cassandra logs are pretty good in telling you what is going on. 2013/12/10 Joel Samuelsson > Hell

setting PIG_INPUT_INITIAL_ADDRESS environment . variable in Oozie for cassandra ...¿?

2013-12-10 Thread Miguel Angel Martin junquera
Hi, I have an error with pig action in oozie 4.0.0 using cassandraStorage. (cassandra 1.2.10) I can run pig scripts right with cassandra. but whe I try to use cassandraStorage to load data I have this error: *Run pig script using PigRunner.run() for Pig version 0.8+* *Apache Pig version 0.10

Recurring actions with 4 hour interval

2013-12-10 Thread Joel Samuelsson
Hello, We've been having a lot of problems with extremely long GC (and still do) which I've asked about several times on this list (I can find links to those discussions if anyone is interested). We noticed a pattern that the GC pauses may be related to something happening every 4 hours. Is there

Re: list all nodes as seeds (excluding self)

2013-12-10 Thread Anne Sullivan
My understanding is that a node won't auto-bootstrap if it thinks it's a seed node.  So when adding a new node to an existing cluster, I want to make sure it will auto-bootstrap, and I don't want to do 2 edits to the config file (first start without node as seed, then add

Re: Exactly one wide row per node for a given CF?

2013-12-10 Thread Robert Wille
I have a question about this statement: When rows get above a few 10¹s of MB things can slow down, when they get above 50 MB they can be a pain, when they get above 100MB it¹s a warning sign. And when they get above 1GB, well you you don¹t want to know what happens then. I tested a data model t

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread David Tinker
Hmm. I have read that the thrift interface to Cassandra is out of favour and the CQL interface is in. Where does that leave Astyanax? On Tue, Dec 10, 2013 at 1:14 PM, graham sanderson wrote: > Perhaps not the way forward, however I can bulk insert data via astyanax at a > rate that maxes out our

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread graham sanderson
I should probably give you a number which is about 300 meg / s via thrift api and use 1mb batches On Dec 10, 2013, at 5:14 AM, graham sanderson wrote: > Perhaps not the way forward, however I can bulk insert data via astyanax at a > rate that maxes out our (fast) networks. That said for our ne

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread graham sanderson
Perhaps not the way forward, however I can bulk insert data via astyanax at a rate that maxes out our (fast) networks. That said for our next release (of this part of our product - our other current is node.js via binary protocol) we will be looking at insert speed via java driver, and also alte

What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread David Tinker
I have tried the DataStax Java driver and it seems the fastest way to insert data is to compose a CQL string with all parameters inline. This loop takes 2500ms or so on my test cluster: PreparedStatement ps = session.prepare("INSERT INTO perf_test.wibble (id, info) VALUES (?, ?)") for (int i = 0;

Re: Try to configure commitlog_archiving.properties

2013-12-10 Thread Artur Kronenberg
Hi, There is some docs on the internet for this operations. It is basically as presented in the archive-commitlog file. (commitlog_archiving.properties). The way the operations work: The operation is called automatically with parameters that give you control over what you want to do with it.

Re: Try to configure commitlog_archiving.properties

2013-12-10 Thread Bonnet Jonathan
Vicky Kak gmail.com> writes: > > > > >>Why, can you give me a good example and the good way to configure archive > >>commit logs ? > Take a look at the cassandra code ;) > > Taking a look to the code ? i'm not a develloper but a DBA, where should i look ? Thank you. Regards, Bonnet Jona

Re: Murmur Long.MIN_VALUE token allowed?

2013-12-10 Thread horschi
Hi Aaron, thanks for your response. But that is exactly what scares me: RandomPartitioner.MIN is -1, which is not a valid token :-) And my feeling gets worse when I look at Murmur3Partitioner.normalize(). This one explicitly excludes Long.MIN_VALUE by changing it to Long.MAX_VALUE. I think I'll