Re: Hackathon?!?

2010-03-11 Thread Jonathan Ellis
Ack, I agreed to speak at http://nosqleu.com/, I never did hear a final date but they put up a schedule online (april 20-22). But, 22 probably is a better date, and Eric and Stu are fully capable of representing rackspace without me. :) -Jonathan On Wed, Mar 10, 2010 at 10:50 PM, Chris Goffinet

Re: problem with running simple example using cassandra-cli with 0.6.0-beta2

2010-03-11 Thread Bill Au
Yes, I was expecting the column names to come back as strings like the way it does with 0.5.1. Bill On Thu, Mar 11, 2010 at 12:03 AM, Jonathan Ellis jbel...@gmail.com wrote: I think he means how the column names are rendered as bytes but the values are strings. On Wed, Mar 10, 2010 at 5:22

Re: cassandra 0.6.0 beta 2 download contains beta 1?

2010-03-11 Thread Eric Evans
On Wed, 2010-03-10 at 13:39 -0500, Vick Khera wrote: On Wed, Mar 10, 2010 at 11:30 AM, Eric Evans eev...@rackspace.com wrote: apache-cassandra-0.6.0-beta1.jar apache-cassandra-0.6.0-beta2.jar Ugh, my bad. I must have failed to `clean' in between the aborted beta1 and beta2. The

Re: Effective allocation of multiple disks

2010-03-11 Thread Eric Evans
On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote: On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: I would almost recommend just keeping things simple and removing multiple data directories from the config altogether and just documenting that you

Re: cassandra 0.6.0 beta 2 download contains beta 1?

2010-03-11 Thread Vick Khera
On Thu, Mar 11, 2010 at 12:53 PM, Eric Evans eev...@rackspace.com wrote: Yes, this is a new feature^H^H^H^H^Hcontroversy in that most of the third-party jars are no longer distributed by us, and must be fetched using `ant ivy-retrieve'. This is currently being disputed, see

Re: Effective allocation of multiple disks

2010-03-11 Thread Jonathan Ellis
Except that for a major compaction the whole thing gets put in one directory. That's the problem w/ the JBOD approach. On Thu, Mar 11, 2010 at 12:01 PM, Eric Evans eev...@rackspace.com wrote: On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote: On Wed, Mar 10, 2010 at 9:31 PM, Anthony

Re: SuperColumn.getSubColumns() ordering

2010-03-11 Thread Jonathan Ellis
it's ordered by the column name as determined by the subcolumn comparator you declared in the definition, yes On Thu, Mar 11, 2010 at 12:24 PM, Matteo Caprari matteo.capr...@gmail.com wrote: Hi. If I iterate over SuperColumn.getSubColumn(), do I get columns sorted by the column name?

client.get_count query

2010-03-11 Thread Sonny Heer
What does this query return? Is there a way to do a range query and get the row count? (e.g. row start = TOW' row end = 'TOWZ') Thanks

Re: cassandra 0.6.0 beta 2 download contains beta 1?

2010-03-11 Thread Eric Evans
On Thu, 2010-03-11 at 13:21 -0500, Vick Khera wrote: As a newcomer, I started by reading the wiki and following examples. The quick-start guide failed, so I just backed out of the beta to the released version. The wiki recommends using the beta release to protect against on-disk format

Re: client.get_count query

2010-03-11 Thread Eric Evans
On Thu, 2010-03-11 at 11:29 -0800, Sonny Heer wrote: What does this query return? It counts the number of columns in a row or super column. Try: http://wiki.apache.org/cassandra/API#get_count Is there a way to do a range query and get the row count? (e.g. row start = TOW' row end = 'TOWZ')

Re: client.get_count query

2010-03-11 Thread Jesse McConnell
i suspect your looking for: https://issues.apache.org/jira/browse/CASSANDRA-653 cheers, jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Thu, Mar 11, 2010 at 13:44, Sonny Heer sonnyh...@gmail.com wrote: Thanks.  Are there plans to implement a row count feature? I have a model which

Re: client.get_count query

2010-03-11 Thread Eric Evans
On Thu, 2010-03-11 at 11:44 -0800, Sonny Heer wrote: Thanks. Are there plans to implement a row count feature? Not that I'm aware of. I have a model which doesn't store any columns since I could potentially have a large # of columns. So all the valuable information has been moved into the

Re: Effective allocation of multiple disks

2010-03-11 Thread Anthony Molinaro
I'm still wondering what happens when you have something like 2 500GB disks, with 2 sstables which use up 25OGB, one on each disk, then a major compaction occurs. Will it still compact and probably fill up a disk (especially with the 2x overhead of compaction mentioned either here or on the

Re: Use Case scenario: Keeping a window of data + online analytics

2010-03-11 Thread Bill Au
Daniel, Can you provide more information (an example would be very nice) on using batch_mutate deletes to build a time-series store in Cassandra? I have been reading up on batch_mutate from the Wiki: http://wiki.apache.org/cassandra/API It seems to me that since the outer map of

Re: client.get_count query

2010-03-11 Thread Sonny Heer
a lot. In the trillions (where each column name stores the valuable information and column values are empty). I read somewhere that column size should be in the single MB digits. Storing it in the key allows true horizontal scalability. Is this true? On Thu, Mar 11, 2010 at 11:59 AM, Eric

Re: Effective allocation of multiple disks

2010-03-11 Thread Ryan King
On Thu, Mar 11, 2010 at 10:45 AM, Jonathan Ellis jbel...@gmail.com wrote: Except that for a major compaction the whole thing gets put in one directory.  That's the problem w/ the JBOD approach. Even without major compaction, you can get significant imbalances in how much data is on each disk

question about deleting from cassandra

2010-03-11 Thread Bill Au
Let take Twitter as an example. All the tweets are timestamped. I want to keep only a month's worth of tweets for each user. The number of tweets that fit within this one month window varies from user to user. What is the best way to accomplish this? There are millions of users. Do I need to

libcassandra - C++ Cassandra Client

2010-03-11 Thread Padraig O'Sullivan
We have developed a C++ client library based on the hector Java client for Cassandra that we intend on using for Drizzle integration. This library is still very much alpha and more features will be added while we work on drizzle integration. Connection pooling or failover is currently not

Re: libcassandra - C++ Cassandra Client

2010-03-11 Thread Jonathan Ellis
Cool! On Thu, Mar 11, 2010 at 11:12 PM, Padraig O'Sullivan osullivan.padr...@gmail.com wrote: We have developed a C++ client library based on the hector Java client for Cassandra that we intend on using for Drizzle integration. This library is still very much alpha and more features will be

Re: libcassandra - C++ Cassandra Client

2010-03-11 Thread Avinash Lakshman
How is Drizzle being integrated with Cassandra? Are there any resources on the Internet that I could read up? Thanks Avinash On Thu, Mar 11, 2010 at 8:12 PM, Padraig O'Sullivan osullivan.padr...@gmail.com wrote: We have developed a C++ client library based on the hector Java client for

Re: libcassandra - C++ Cassandra Client

2010-03-11 Thread Padraig O'Sullivan
On Thu, Mar 11, 2010 at 11:31 PM, Avinash Lakshman avinash.laksh...@gmail.com wrote: How is Drizzle being integrated with Cassandra? Are there any resources on the Internet that I could read up? The idea is to create a storage engine (along with some INFORMATION_SCHEMA tables probably) in

Re: libcassandra - C++ Cassandra Client

2010-03-11 Thread Alex Durgin
On Mar 11, 2010, at 10:51 PM, Padraig O'Sullivan osullivan.padr...@gmail.com wrote: On Thu, Mar 11, 2010 at 11:31 PM, Avinash Lakshman avinash.laksh...@gmail.com wrote: How is Drizzle being integrated with Cassandra? Are there any resources on the Internet that I could read up? The

Re: question about deleting from cassandra

2010-03-11 Thread Mark Robson
On 12 March 2010 03:34, Bill Au bill.w...@gmail.com wrote: Let take Twitter as an example. All the tweets are timestamped. I want to keep only a month's worth of tweets for each user. The number of tweets that fit within this one month window varies from user to user. What is the best way