Re: Re: Range search on keys not working?

2010-06-09 Thread Ben Browning
? On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning ben...@gmail.com wrote: Martin, On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller martin.grabmuel...@eleven.de wrote: I think you can specify an end key, but it should be a key which does exist in your column family. Logically, it doesn't

Re: Seeds and AutoBoostrap

2010-06-09 Thread Ben Browning
There really aren't seed nodes in a Cassandra cluster. When you specify a seed in a node's configuration it's just a way to let it know how to find the other nodes in the cluster. A node functions the same whether it is another node's seed or not. In other words, all of the nodes in a cluster are

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Ben Browning
How many subcolumns are in each supercolumn and how large are the values? Your example shows 8 subcolumns, but I didn't know if that was the actual number. I've been able to read columns out of Cassandra at an order of magnitude higher than what you're seeing here but there are too many variables

Re: Range search on keys not working?

2010-06-02 Thread Ben Browning
Martin, On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller martin.grabmuel...@eleven.de wrote: I think you can specify an end key, but it should be a key which does exist in your column family. Logically, it doesn't make sense to ever specify an end key with random partitioner. If you

Re: Giant sets of ordered data

2010-06-02 Thread Ben Browning
I like to model this kind of data as columns, where the timestamps are the column name (either longs, TimeUUIDs, or string depending on your usage). If you have too much data for a single row, you'd need to have multiple rows of these. For time-series data, it makes sense to use one row per

Re: Hadoop over Cassandra

2010-05-18 Thread Ben Browning
Maxim, Check out the getLocation() method from this file: http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java Basically, it loops over the list of nodes containing this split of data and if any of them are the local node, it returns

Re: What is the optimal size of batch mutate batches?

2010-05-11 Thread Ben Browning
I like to base my batch sizes off of the total number of columns instead of the number of rows. This effectively means counting the number of Mutation objects in your mutation map and submitting the batch once it reaches a certain size. For my data, batch sizes of about 25,000 columns work best.

Re: What is the optimal size of batch mutate batches?

2010-05-11 Thread Ben Browning
not the bottleneck. On Tue, May 11, 2010 at 8:31 AM, David Boxenhorn da...@lookin2.com wrote: Thanks a lot! 25,000 is a number I can work with. Any other suggestions? On Tue, May 11, 2010 at 3:21 PM, Ben Browning ben...@gmail.com wrote: I like to base my batch sizes off of the total number of columns