Re: Can I get all the query data back into memory?

2011-06-09 Thread Mark Kerzner
Thanks a bunch. Mark On Thu, Jun 9, 2011 at 10:26 PM, Jonathan Ellis wrote: > On Thu, Jun 9, 2011 at 9:50 PM, Mark Kerzner > wrote: > > Hi, > > when I am issuing some query, that returns a HashMap, does the whole > HashMap > > have to be in memory? > > Yes. > > > If so, it can easily use up all

Re: Secondary indices with multiple conditions?

2011-06-09 Thread Mark Kerzner
So in Hector I can do, for example, addGtExpression any number of times, correct? Internally, how is it implemented? Do we get the subset of data based on an indexed columns, and then essentially scan the rest, with Cassandra API or Hector providing filtering. So this may be an expensive operatio

Re: Retrieving a column from a fat row vs retrieving a single row

2011-06-09 Thread aaron morton
Don't forget that reading at ONE may not mean that only 1 replica is involved in the request. Any get or multiget (not range scan) read that runs with ReadRepair enabled will be sent to all UP replicas. If the RR is disabled it will only be sent to as many replicas as needed for the CL. For CL

Re: thread blocked from jconsle

2011-06-09 Thread Jonathan Ellis
That says "I'm idle waiting for a read to process." 2011/6/9 Donna Li : > > > Hello everyone: > > From the jconsole, I find many thread is blocked, but cassandra server > is normal, the result of nodetool cfstats and nodetool tpstats are as > following, is the cassandra server normal, should

Re: Secondary indices with multiple conditions?

2011-06-09 Thread Jonathan Ellis
Yes, with the restriction that at least one of the conditions must be = on an indexed column. See http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes for an example. On Thu, Jun 9, 2011 at 9:53 PM, Mark Kerzner wrote: > Hi, > if I am using Cassandra's secondary indices, or

Re: Can I get all the query data back into memory?

2011-06-09 Thread Jonathan Ellis
On Thu, Jun 9, 2011 at 9:50 PM, Mark Kerzner wrote: > Hi, > when I am issuing some query, that returns a HashMap, does the whole HashMap > have to be in memory? Yes. > If so, it can easily use up all memory? Is there some > cursor or paging provisions? Yes, that is what all the start_key parame

Re: Cassandra HDFS question

2011-06-09 Thread Jake Luciani
Hi JKnight, Yes. The Brisk project adds a HDFS compatible layer for Cassandra see http://github.com/riptano/brisk -Jake On Thu, Jun 9, 2011 at 11:05 PM, JKnight JKnight wrote: > Dear all, > > Does Cassandra support HDFS storage? > > Thank a lot for support. > > -- > Best regards, > JKnight >

thread blocked from jconsle

2011-06-09 Thread Donna Li
Hello everyone: From the jconsole, I find many thread is blocked, but cassandra server is normal, the result of nodetool cfstats and nodetool tpstats are as following, is the cassandra server normal, should I care about the thread block info of jconsle? A block thread of jconsle:

Cassandra HDFS question

2011-06-09 Thread JKnight JKnight
Dear all, Does Cassandra support HDFS storage? Thank a lot for support. -- Best regards, JKnight

Secondary indices with multiple conditions?

2011-06-09 Thread Mark Kerzner
Hi, if I am using Cassandra's secondary indices, or even if I am doing it myself following Ed Anuff's advice, can I do multiple slices? That is, how do I imitate a SQL query of where column_1 > 5 and column_2 < 4 and so on, up to d

Can I get all the query data back into memory?

2011-06-09 Thread Mark Kerzner
Hi, when I am issuing some query, that returns a HashMap, does the whole HashMap have to be in memory? If so, it can easily use up all memory? Is there some cursor or paging provisions? Thank you very much. Mark

Re: need some help with counters

2011-06-09 Thread aaron morton
I may be missing something but could you use a column for each of the last 48 hours all in the same row for a url ? e.g. { "/url.com/hourly" : { "20110609T01:00:00" : 456, "20110609T02:00:00" : 4567, } } Increment the current hour only. Delete the

Re: Best way to import data from Cassandra 0.6 to 0.8

2011-06-09 Thread aaron morton
If you are talking about an upgrade AFAIK 0.8 can read 0.6 SSTables, you will need to migrate the schema as described in the 0.7 upgrade info here https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/NEWS.txt#L134 If you want to do a bulk dump / load take a look at sstable2json. Hope tha

Re: need some help with counters

2011-06-09 Thread Ian Holsman
So would doing something like storing it in reverse (so I know what to delete) work? Or is storing a million columns in a supercolumn impossible. I could always use a logfile and run the archiver off that as a worst case I guess. Would doing so many deletes screw up the db/cause other problems

Re: need some help with counters

2011-06-09 Thread Ryan King
On Thu, Jun 9, 2011 at 1:06 PM, Ian Holsman wrote: > Hi Ryan. > you wouldn't have your version of cassandra up on github would you?? No, and the patch isn't in our version yet either. We're still working on it. -ryan

Re: caution for restarting CassandraDaemon in junit

2011-06-09 Thread Thijssen, Ron
This is kind of related to the issue I filed https://issues.apache.org/jira/browse/CASSANDRA-2724 Forkmode works for an ordinary maven build. But when including the sonar plugin, the forkmode is ignored in it's analysis. A REAL work around is to manually start and stop a dedicated cassandra in yo

Re: need some help with counters

2011-06-09 Thread Ian Holsman
Hi Ryan. you wouldn't have your version of cassandra up on github would you?? Colin.. always a pleasure. On Jun 9, 2011, at 3:44 PM, Ryan King wrote: > On Thu, Jun 9, 2011 at 12:41 PM, Ian Holsman wrote: >> Hi. >> >> I had a brief look at CASSANDRA-2103 (expiring counter columns), and I was >

Re: need some help with counters

2011-06-09 Thread Yang
something like this: https://issues.apache.org/jira/browse/CASSANDRA-2103 but this turns out not feasible On Thu, Jun 9, 2011 at 12:41 PM, Ian Holsman wrote: > Hi. > > I had a brief look at CASSANDRA-2103 (expiring counter columns), and I wa

Re: need some help with counters

2011-06-09 Thread Colin
Hey guy, have you tried amazon turk? -- Colin Clark +1 315 886 3422 cell +1 701 212 4314 office http://cloudeventprocessing.com http://blog.cloudeventprocessing.com @EventCloudPro *Sent from Star Trek like flat panel device, which although larger than my Star Trek like communicator device, may h

Re: need some help with counters

2011-06-09 Thread Ryan King
On Thu, Jun 9, 2011 at 12:41 PM, Ian Holsman wrote: > Hi. > > I had a brief look at CASSANDRA-2103 (expiring counter columns), and I was > wondering if anyone can help me with my problem. > > I want to keep some page-view stats on a URL at different levels of > granularity (page views per hour,

need some help with counters

2011-06-09 Thread Ian Holsman
Hi. I had a brief look at CASSANDRA-2103 (expiring counter columns), and I was wondering if anyone can help me with my problem. I want to keep some page-view stats on a URL at different levels of granularity (page views per hour, page views per day, page views per year etc etc). so my thinkin

caution for restarting CassandraDaemon in junit

2011-06-09 Thread Yang
I'm doing a bunch of tests for my code that uses Cassandra. I have 2 test classes, each of them sets up a thrift.CassandraDaemon in @BeforeClass and activate it in @AfterClass when I ran them separately , they both work fine; if I run them together by "mvn test", the latter one fails. it turns

Re: Ideas for Big Data Support

2011-06-09 Thread Ryan King
On Thu, Jun 9, 2011 at 7:40 AM, Edward Capriolo wrote: > > > On Thu, Jun 9, 2011 at 4:23 AM, AJ wrote: >> >> [Please feel free to correct me on anything or suggest other workarounds >> that could be employed now to help.] >> >> Hello, >> >> This is purely theoretical, as I don't have a big workin

Re: Purge Data

2011-06-09 Thread Jonathan Ellis
Remember that you should have many more rows of size X than you have cluster nodes for any value of X. So if you have a data model where a handful of rows may have > 2B columns but the rest will be much smaller, you should probably rethink that. On Thu, Jun 9, 2011 at 10:51 AM, Bahadur, Kamal wr

Re: Removing zombie node

2011-06-09 Thread Jonathan Ellis
Short version is, it's harmless. A full cluster restart (all at once) would clear it out though. On Thu, Jun 9, 2011 at 10:57 AM, Marcus Bointon wrote: > A while ago I removed a node (on EC2) with decommission and removed its > token. It all seemed happy at the time, but on creating a column fa

Re: Purge Data

2011-06-09 Thread Jeremy Hanna
Have you looked at the TTL column feature in 0.7? http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-columns Those will automatically expire columns after a certain time period - not when you near the column limit, but might be helpful for you. On Jun 9, 2011, at 10:51 AM, Bahadur

Removing zombie node

2011-06-09 Thread Marcus Bointon
A while ago I removed a node (on EC2) with decommission and removed its token. It all seemed happy at the time, but on creating a column family, I get this: Waiting for schema agreement... Warning: unreachable nodes ... schemas agree across the cluster The IP it lists is not in the ring. How do

Purge Data

2011-06-09 Thread Bahadur, Kamal
>From the documentation, I came to know that there is a limitation of maximum number (2 billion) of columns that a column family can have. My questions is, is there a way to purge the old columns when the number of columns is nearing the 2 billion mark? Thanks, Kamal

Re: Ideas for Big Data Support

2011-06-09 Thread AJ
On 6/9/2011 8:40 AM, Edward Capriolo wrote: Some of these things are challenges, and a few are being worked on in one way or another. 1) Dynamic snitch was implemented to determine slow acting nodes and re-balance load. 2) You can budget bootstrap with rsync, as long as you know what dat

Re: fixing unbalanced cluster !?

2011-06-09 Thread Jonathan Colby
Thanks Ben. That's what I was afraid I had to do. I can see how it's a lot easier if you simply double the cluster when adding capacity. Jon On Jun 9, 2011, at 4:44 PM, Benjamin Coverston wrote: > Because you were able to successfully run repair you can follow up with a > nodetool cleanup

Re: Data directories

2011-06-09 Thread Héctor Izquierdo Seliva
I'm actually using it in a couple of nodes, but is slower than directly accesing the data in a ssd. El jue, 09-06-2011 a las 11:10 -0400, Chris Burroughs escribió: > On 06/08/2011 05:54 AM, Héctor Izquierdo Seliva wrote: > > Is there a way to control what sstables go to what data directory? I > >

Best way to import data from Cassandra 0.6 to 0.8

2011-06-09 Thread JKnight JKnight
Dear all, Could you tell me the best way to import data from Cassandra 0.6 to 0.8? Thank you very much. -- Best regards, JKnight

Re: Data directories

2011-06-09 Thread Chris Burroughs
On 06/08/2011 05:54 AM, Héctor Izquierdo Seliva wrote: > Is there a way to control what sstables go to what data directory? I > have a fast but space limited ssd, and a way slower raid, and i'd like > to put latency sensitive data into the ssd and leave the other data in > the raid. Is this possibl

Re: fixing unbalanced cluster !?

2011-06-09 Thread Benjamin Coverston
Because you were able to successfully run repair you can follow up with a nodetool cleanup which will git rid of some of the extraneous data on that (bigger) node. You're also assured after you run repair that entropy beteen the nodes is minimal. Assuming you're using the random ordered partit

Re: Ideas for Big Data Support

2011-06-09 Thread Edward Capriolo
On Thu, Jun 9, 2011 at 4:23 AM, AJ wrote: > [Please feel free to correct me on anything or suggest other workarounds > that could be employed now to help.] > > Hello, > > This is purely theoretical, as I don't have a big working cluster yet and > am still in the planning stages, but from what I u

Re: Retrieving a column from a fat row vs retrieving a single row

2011-06-09 Thread Richard Low
2011/6/9 Héctor Izquierdo Seliva : > Yeah, but if I have RF=3 then there are three nodes that can answer the > request right? Yes, if you're happy to read ConsistencyLevel.ONE.

fixing unbalanced cluster !?

2011-06-09 Thread Jonathan Colby
I got myself into a situation where one node (10.47.108.100) has a lot more data than the other nodes. In fact, the 1 TB disk on this node is almost full. I added 3 new nodes and let cassandra automatically calculate new tokens by taking the highest loaded nodes. Unfortunately there is still

Re: Retrieving a column from a fat row vs retrieving a single row

2011-06-09 Thread Héctor Izquierdo Seliva
El jue, 09-06-2011 a las 13:28 +0200, Richard Low escribió: > Remember also that partitioning is done by rows, not columns. So > large rows are stored on a single host. This means they can't be load > balanced and also all requests to that row will hit one host. Having > separate rows will allow

Re: Retrieving a column from a fat row vs retrieving a single row

2011-06-09 Thread Richard Low
Remember also that partitioning is done by rows, not columns. So large rows are stored on a single host. This means they can't be load balanced and also all requests to that row will hit one host. Having separate rows will allow load balancing of I/Os. -- Richard Low Acunu | http://www.acunu.c

Re: Is there a way from a running Cassandra node to determine whether or not itself is "up"?

2011-06-09 Thread aaron morton
None via thrift that I can recall, but the StorageService MBean exposes getLiveNodes() this is what nodetool uses to see which nodes are live. From the code... /** * Retrieve the list of live nodes in the cluster, where "liveness" is * determined by the failure detector of the node

Re: after a while nothing happening with repair

2011-06-09 Thread Sasha Dolgy
I recall having this issue when one of the nodes wasn't available ... or there was a problem during the repair process. Cancelling the repair job and rerunning it would complete successfully. I believe there is a bug open for this https://issues.apache.org/jira/browse/CASSANDRA-2290 On Thu, Jun

after a while nothing happening with repair

2011-06-09 Thread Jonathan Colby
When I run repair on a node in my 0.7.6-2 cluster, the repair starts to stream data and activity is seen in the logs. However, after a while (a day or so) it seems like everything freezes up. The repair command is still running (the command prompt has not returned) and netstats shows output s

Ideas for Big Data Support

2011-06-09 Thread AJ
[Please feel free to correct me on anything or suggest other workarounds that could be employed now to help.] Hello, This is purely theoretical, as I don't have a big working cluster yet and am still in the planning stages, but from what I understand, while Cass scales well horizontally, EACH