Re: schema design question

2010-03-09 Thread Matteo Caprari
Thanks Jonathan. Correct if I'm wrong: you are suggesting that each time we receive a new row (item, [users]) we do 2 operations: 1) insert (or merge) this row 'as it is' (item, [users]) 2) for each user in [users]: insert (user, [item]) Each incoming item is liked by 100 users, so it would be

Re: schema design question

2010-03-09 Thread Matteo Caprari
On Tue, Mar 9, 2010 at 1:23 PM, Jonathan Ellis wrote: > One quad-core node can handle ~14000 inserts per second so you are in > good shape. Well, yeah! >> instead of 'all users that liked N items'? > > That's true.  So you'd want to use a custom comparator where first 64 > bits is the Long and t

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Sylvain Lebresne
On Tue, Mar 9, 2010 at 2:52 PM, Jonathan Ellis wrote: > By "reads" do you mean what stress.py counts (rows) or rows * columns? >  If it is rows, then you are still actually reading more columns/s in > case 2. Well, unless I'm mistaking, that's the same in my example as I give in both case to stre

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 7:15 AM, Sylvain Lebresne wrote: >  1) stress.py -t 10 -o read -n 5000 -c 1 -r >  2) stress.py -t 10 -o read -n 50 -c 1 -r > > In the case 1) I get around 200 reads/seconds and that's pretty stable. The > disk is spinning like crazy (~25% io_wait), very few cpu or me

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 8:31 AM, Sylvain Lebresne wrote: > Well, unless I'm mistaking, that's the same in my example as I give in > both case > to stress.py the option '-c 1' which tells it to retrieve only one > column each time > even in the case where I have 100 columns by row. Oh. Why would y

Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Sylvain Lebresne
Hello, I've done some tests and it seems that somehow to have more rows with few columns is better than to have more rows with fewer columns, at least as long as read performance is concerned. Using stress.py, on a quad core 2.27Ghz with 4Go RAM and the out of the box cassandra configuration, I in

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Jesse McConnell
in my experience #2 will work well up to a point where it will trigger a limitation of cassandra (slated to be resolved in .7 \o/) where all of the columns under a given key must be able to fit into memory. For things like index's of data I have opted to shard the keys for really large data sets t

Re: schema design question

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 3:53 AM, Matteo Caprari wrote: > Thanks Jonathan. > > Correct if I'm wrong: you are suggesting that each time we receive a new > row (item, [users]) we do 2 operations: > > 1) insert (or merge) this row 'as it is' (item, [users]) > 2) for each user in [users]: insert  (user,

another ConcurrentModificationException

2010-03-09 Thread B. Todd Burruss
using cassandra-0.6.0-beta2/ 2010-03-09 09:17:26,827 ERROR [pool-1-thread-675] [Cassandra.java:1166] Internal error processing get java.util.ConcurrentModificationException at java.util.AbstractList $Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.n

Re: Cassandra hardware - balancing CPU/memory/iops/disk space

2010-03-09 Thread B. Todd Burruss
our dataset is too big to fit into cache, so we are hitting disk. not a problem for normal operation, but when a node is restored, hinted handoff, load balanced, or if reads/write simply build up we see a problem. the nodes can't seem to catch up. this seems to be centered around drive seek

Re: another ConcurrentModificationException

2010-03-09 Thread Jonathan Ellis
Cool, you're doing a great job finding these. :) Can you create a ticket? On Tue, Mar 9, 2010 at 11:57 AM, B. Todd Burruss wrote: > using cassandra-0.6.0-beta2/ > > > 2010-03-09 09:17:26,827 ERROR [pool-1-thread-675] [Cassandra.java:1166] > Internal error processing get > java.util.ConcurrentMod

Re: another ConcurrentModificationException

2010-03-09 Thread B. Todd Burruss
np, you give me free software, i give you free testing ;) i have some more so i'll just create tix and send them along i just switched to using thunderbird and any new messages i send to the list are being flagged as spam. i have no problems with evolution. anyone have an idea? (i can repl

new bug tix

2010-03-09 Thread B. Todd Burruss
these are both ConcurrentModificationExceptions https://issues.apache.org/jira/browse/CASSANDRA-864 https://issues.apache.org/jira/browse/CASSANDRA-865 this one is an AssertError https://issues.apache.org/jira/browse/CASSANDRA-866

Re: Cassandra hardware - balancing CPU/memory/iops/disk space

2010-03-09 Thread Jesse McConnell
let us know how the SSD's pan out, I am curious about that as well cheers, jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Tue, Mar 9, 2010 at 12:08, B. Todd Burruss wrote: > our dataset is too big to fit into cache, so we are hitting disk.  not a > problem for normal operation, but whe

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Sylvain Lebresne
Alright, What I'm observing shows better with bigger columns, so I've slightly modified the stress.py test so that it inserts column of 50K bytes (I attach the modified stress.py for info but it really just read 5 bytes from /dev/null and use that as data. I also added a sleep to the insert ot

no longer in storage-conf.xml in 0.6

2010-03-09 Thread Bill Au
I am checking out the 0.6 release since I need the batch_mutate command. I noticed that is no longer in storage-conf.xml for 0.6. Is that not used anymore? Or is that not configurable anymore? If it is still used but not configurable, how do I run multiple instances of Cassandra on a single ma

Re: no longer in storage-conf.xml in 0.6

2010-03-09 Thread Jonathan Ellis
It's no longer used. And it was always assumed that ControlPort and StoragePort are the same across all instances; you run multiple instances on a single machine by varying the IP address, not the ports. On Tue, Mar 9, 2010 at 1:21 PM, Bill Au wrote: > I am checking out the 0.6 release since I n

Re: schema design question

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 7:30 AM, Matteo Caprari wrote: > On Tue, Mar 9, 2010 at 1:23 PM, Jonathan Ellis wrote: >> That's true.  So you'd want to use a custom comparator where first 64 >> bits is the Long and the rest is the userid, for instance. >> >> (Long + something else is common enough that w

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Brandon Williams
On Tue, Mar 9, 2010 at 1:14 PM, Sylvain Lebresne wrote: > I've inserted 1000 row of 100 column each (python stress.py -t 2 -n > 1000 -c 100 -i 5) > If I read, I get the roughly the same number of row whether I read the > whole row > (python stress.py -t 10 -n 1000 -o read -r -c 100) or only the f

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Sylvain Lebresne
> A row causes a disk seek while columns are contiguous.  So if the row isn't > in the cache, you're being impaired by the seeks.  In general, fatter rows > should be more performant than skinny ones. Sure, I understand that. Still, I get 400 columns by seconds (ie, 400 seeks by seconds) when the

cassandra 0.6.0 beta 2 download contains beta 1?

2010-03-09 Thread Omer van der Horst Jansen
The apache-cassandra-0.6.0-beta2-bin.tar.gz download contains both these files in the apache-cassandra-0.6.0-beta2/lib directory: apache-cassandra-0.6.0-beta1.jar apache-cassandra-0.6.0-beta2.jar Given the way the classpath is constructed, it's possible that anyone using this download is actual

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-09 Thread Brandon Williams
On Tue, Mar 9, 2010 at 2:28 PM, Sylvain Lebresne wrote: > > A row causes a disk seek while columns are contiguous. So if the row > isn't > > in the cache, you're being impaired by the seeks. In general, fatter > rows > > should be more performant than skinny ones. > > Sure, I understand that. S

IllegalStateException: Queue full

2010-03-09 Thread Todd Burruss
using tip of 0.6 branch with 864.txt patch. i have 4 nodes, one node is overcome with compaction right now. i started with no load then added a tiny bit of load and almost immediately got these errors on the other 3 nodes. 2010-03-09 16:05:43,004 ERROR [RESPONSE-STAGE:982] [CassandraDaemon.jav

Re: IllegalStateException: Queue full

2010-03-09 Thread Jonathan Ellis
v2 of patch attached to #864 (replaces old one) On Tue, Mar 9, 2010 at 6:08 PM, Todd Burruss wrote: > using tip of 0.6 branch with 864.txt patch.  i have 4 nodes, one node is > overcome with compaction right now.  i started with no load then added a tiny > bit of load and almost immediately got

Re: Hackathon?!?

2010-03-09 Thread Dan Di Spaltro
Alright guys, we have settled on a date for the Cassandra meetup on... April 15th, better known as, Tax day! We can host it here at Cloudkick, unless a cooler startup wants to host it. http://maps.google.com/maps/ms?ie=UTF8&hl=en&msa=0&msid=100290781618196563860.000478354937656785449&z=19

Re: Hackathon?!?

2010-03-09 Thread Jonathan Ellis
I can make it. \o/ On Tue, Mar 9, 2010 at 8:05 PM, Dan Di Spaltro wrote: > Alright guys, we have settled on a date for the Cassandra meetup on... > April 15th, better known as, Tax day! > We can host it here at Cloudkick, unless a cooler startup wants to host it. > http://maps.google.com/maps/ms?

Re: Hackathon?!?

2010-03-09 Thread Jeff Hodges
I'm down. -- Jeff On Tue, Mar 9, 2010 at 6:18 PM, Jonathan Ellis wrote: > I can make it. \o/ > > On Tue, Mar 9, 2010 at 8:05 PM, Dan Di Spaltro > wrote: >> Alright guys, we have settled on a date for the Cassandra meetup on... >> April 15th, better known as, Tax day! >> We can host it here at C

Re: Hackathon?!?

2010-03-09 Thread Stu Hood
Definitely on board! -Original Message- From: "Dan Di Spaltro" Sent: Tuesday, March 9, 2010 8:05pm To: cassandra-user@incubator.apache.org Subject: Re: Hackathon?!? Alright guys, we have settled on a date for the Cassandra meetup on... April 15th, better known as, Tax day! We can host

Re: atomicity across keys and secondary index support

2010-03-09 Thread Patricio Echagüe
Hey Jonathan, has there been any update on this feature? Thanks a lot Patricio On Thu, Dec 3, 2009 at 2:35 PM, Jonathan Ellis wrote: > that is still very firmly in the category of "future work." > > 2009/12/3 Patricio Echagüe : > > Hi all, I was reading the original paper[1] looking for answers

Re: atomicity across keys and secondary index support

2010-03-09 Thread Jonathan Ellis
Atomicity: no. 2ary indexes: CASSANDRA-749 is targeting the 0.8 release 2010/3/9 Patricio Echagüe : > Hey Jonathan, has there been any update on this feature? > > Thanks a lot > Patricio > > On Thu, Dec 3, 2009 at 2:35 PM, Jonathan Ellis wrote: >> >> that is still very firmly in the category of

Re: Hackathon?!?

2010-03-09 Thread Dan Di Spaltro
Great, that would probably get us a lot more room. Sweet, so its settled, we'll do it at Digg WHQ! On Tue, Mar 9, 2010 at 9:13 PM, Chris Goffinet wrote: > +1 from Digg if you wanna have it at our place as well, got the OK from the > boss. > > -Chris > > On Mar 9, 2010, at 6:05 PM, Dan Di Spalt

Re: Hackathon?!?

2010-03-09 Thread Ryan King
I'm already committed to talking about cassandra that day at our company's developer conference (chirp.twitter.com). -ryan On Tue, Mar 9, 2010 at 6:26 PM, Jeff Hodges wrote: > I'm down. > -- > Jeff > > On Tue, Mar 9, 2010 at 6:18 PM, Jonathan Ellis wrote: >> I can make it. \o/ >> >> On Tue, Mar

Re: Hackathon?!?

2010-03-09 Thread Jeff Hodges
Ah, hell. Thought this was the first day. Can't make it. -- Jeff On Mar 9, 2010 9:32 PM, "Ryan King" wrote: I'm already committed to talking about cassandra that day at our company's developer conference (chirp.twitter.com). -ryan On Tue, Mar 9, 2010 at 6:26 PM, Jeff Hodges wrote: > I'm down