Re: Hot, large row

2014-07-24 Thread DuyHai Doan
What are your jvm settings? Your read pattern implies that you may fetch lots of data into memory (reading all skus for a given user), maybe it stressed too much the jvm. Did you use native paging of Java Driver to avoid loading all columns at a time? And the loading all skus for one user, is i

Re: here's a good example of poor cassandra documentation.

2014-07-24 Thread Jack Krupansky
Thanks. I’ll pass it along to the doc team. -- Jack Krupansky From: Kevin Burton Sent: Thursday, July 24, 2014 6:34 PM To: user@cassandra.apache.org Subject: here's a good example of poor cassandra documentation. so searching google for "cassandra leveled compaction" there are no hits for ex

Re: What is C*?

2014-07-24 Thread Sumod Pawgi
It is called as C* or C8 as well. There are 8 letters after C in the name and * is above 8 in the qwerty keyboard. Sent from my iPhone > On 24-Jul-2014, at 1:34 pm, Mark Reddy wrote: > > Yes you are correct, Cassandra is often abbreviated as C*. With most > languages and applications being re

Re: Hot, large row

2014-07-24 Thread Keith Wright
One last item to add to this thread: we have consistently experienced this behavior where over time performance degrades (previously we were unable to bootstrap nodes to due long GC pauses from existing nodes). I believe its due to tombstone build up (as I mentioned previously one of the table

Re: 2x disk space required for full compaction? Don't vnodes help this problem?

2014-07-24 Thread Robert Coli
On Thu, Jul 24, 2014 at 3:30 PM, Colin Clark wrote: > Triggering a major compaction is usually not a good idea. > I think this is a little over-stated, there are an assortment of cases in which even perioidic major compaction is really a-ok. :D =Rob

here's a good example of poor cassandra documentation.

2014-07-24 Thread Kevin Burton
so searching google for "cassandra leveled compaction" there are no hits for explaining 2.x leveled compaction. The first hit is: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra which is a blog post from 2011. The second hit is documentation on 1.1 which has just 2 sent

Re: 2x disk space required for full compaction? Don't vnodes help this problem?

2014-07-24 Thread Colin Clark
Triggering a major compaction is usually not a good idea. If you've got ssd's, go leveled as DuyHai says. The results will be tasty. -- Colin 320-221-9531 On Jul 24, 2014, at 5:28 PM, Kevin Burton wrote: This was after a bootstrap… so I triggered a major compaction. Should I just turn on le

Re: 2x disk space required for full compaction? Don't vnodes help this problem?

2014-07-24 Thread Kevin Burton
This was after a bootstrap… so I triggered a major compaction. Should I just turn on leveled compaction and then never do a major compaction? On Thu, Jul 24, 2014 at 3:09 PM, DuyHai Doan wrote: > If you're using SizeTieredCompactionStrategy the disk space may double > temporarily during compac

Re: 2x disk space required for full compaction? Don't vnodes help this problem?

2014-07-24 Thread DuyHai Doan
If you're using SizeTieredCompactionStrategy the disk space may double temporarily during compaction. That's one of the big drawback of SizedTiered. Since you're on SSD, why not test switching to LeveledCompaction ? Put a node on write survey mode to see if this change has any impact on your I/O,

Re: Why is the cassandra documentation such poor quality?

2014-07-24 Thread Peter Lin
there's quite a few blog entries on Datastax blog that really should be included in the docs On Thu, Jul 24, 2014 at 5:32 PM, Hao Cheng wrote: > I second this, especially since the version association for blog posts is > often vague. This makes looking at historical blog posts quite annoying >

2x disk space required for full compaction? Don't vnodes help this problem?

2014-07-24 Thread Kevin Burton
I just bootstrapped a new node. The box had about 220GB of data on it on a 400GB SSD drive. I triggered a full compaction after it bootstrapped, and it ran out of disk space about 15 minutes later. so now that node is dead :-( I would have assumed that vnodes meant that I could keep my drive ne

Re: Why is the cassandra documentation such poor quality?

2014-07-24 Thread Hao Cheng
I second this, especially since the version association for blog posts is often vague. This makes looking at historical blog posts quite annoying because it's difficult to tell if some of the specific advice has changed since. On Jul 24, 2014 2:23 PM, "Jack Krupansky" wrote: > Blog posts are gr

Re: Why is the cassandra documentation such poor quality?

2014-07-24 Thread Jack Krupansky
Blog posts are great for highlighting and focusing the community on new features, changes, and techniques, but any knowledge content in them definitely needs to be in the docs as well. -- Jack Krupansky From: Tyler Hobbs Sent: Thursday, July 24, 2014 12:07 PM To: user@cassandra.apache.org Sub

Re: Hot, large row

2014-07-24 Thread Keith Wright
When a node is showing the high CMS issue, io is actually low likely due to the fact that none is going on during CMS GC. On a node not showing the issue, iostat shows disk usage around 50% (these are SSD) and load hovers around 10 for a dual octo core machine this is fine. In addition, nodeto

Re: Hot, large row

2014-07-24 Thread DuyHai Doan
For global_user_event_skus_v2 1. number of SSTables per read is quite huge. Considering you're using LCS, it means that LCS cannot keep up with write rate and is left behind. AFAIK LCS is using SizeTieredCompaction at L0 to cope with extreme write burst. Your high number of SSTables per read is qu

Re: Hot, large row

2014-07-24 Thread Keith Wright
Cfhistograms for the tables I believe are most likely the issue are below on the node that most recently presented the issue. Any ideas? Note that these tables are LCS and have droppable tombstone ratios of 27% for global_user_event_skus_v2 and 2.7% for the other. Table definitions also belo

Re: Hot, large row

2014-07-24 Thread Jack Krupansky
Could it be some “fat columns” (cells with large blob or text values) rather than the cell-count per se? IOW, a “big row” rather than a “wide row”? And, could it be a large partition rather than a large row (many rows in a single partition)? Are clustering columns being used in the primary key?

Re: Hot, large row

2014-07-24 Thread DuyHai Doan
"If I run nodetool tpstats, I see a high number of items in the Pending phase for ReadStage. Other items mostly appear near empty. In addition, I see dropped reads" --> Have a look at system I/O & CPU stats to check for possible bottlenecks. This symptom is not necessarily caused by widerows. So

Re: Hot, large row

2014-07-24 Thread Keith Wright
I appreciate the feedback, doing it on the client side is interesting and I will start looking into that. To be clear, here are the symptoms I am seeing: * A node will start showing high load and the CMS collection time jumps to 100+ ms/sec (per new is also up) * If I run nodetool tpstat

Re: Hot, large row

2014-07-24 Thread DuyHai Doan
Your extract of cfhistograms show that there are no particular "wide rows". The widest has 61214 cells which is big but not that huge to be really a concern. Turning on trace probabilty only tells give you some "hints" about what kind of queries are done, it does not give the exact partition key n

Re: 103% in nodetool netstats...

2014-07-24 Thread Robert Coli
On Thu, Jul 24, 2014 at 12:07 PM, Kevin Burton wrote: > Well this is a bug: > > > /d0/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-23-Data.db > 148995106/144432120 bytes(103%) sent to /10.24.74.148 > > … 103% … that's impressive!! :) > Various cases of >100% in Cassandra which look

Re: Hot, large row

2014-07-24 Thread Keith Wright
I can see from cfhistograms that I do have some wide rows (see below). I set trace probability as you suggested but the output doesn’t appear to tell me what row was actually read unless I missed something. I just see executing prepared statement. Any ideas how I can find the row in question

overall bootstrap % and ETA in nodetool netstats ?

2014-07-24 Thread Kevin Burton
Seems like nodetool netstats could be updated to give more metadata about a node bootstrap. Including overall % of bootstrap complete, throughput, bytes remaining, files remaining, etc. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check o

103% in nodetool netstats...

2014-07-24 Thread Kevin Burton
Well this is a bug: /d0/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-23-Data.db 148995106/144432120 bytes(103%) sent to /10.24.74.148 … 103% … that's impressive!! :) -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my

Re: Hot, large row

2014-07-24 Thread DuyHai Doan
"How can I detect wide rows?" --> nodetool cfhistograms Look at column "Column count" (last column) and identify a line in this column with very high value of "Offset". In a well designed application you should have a gaussian distribution where 80% of your row have a similar number of columns.

Hot, large row

2014-07-24 Thread Keith Wright
Hi all, We are seeing an issue where basically daily one of our nodes spikes in load and is churning in CMS heap pressure. It appears that reads are backing up and my guess is that our application is reading a large row repeatedly. Our write structure can lead itself to wide rows very infr

Re: Cassandra on AWS suggestions for data safety

2014-07-24 Thread Jeremy Jongsma
We also run a nightly "nodetool snapshot" on all nodes, and use duplicity to sync the snapshot to S3, keeping 7 days' worth of backups. Since duplicity tracks incremental changes this gives you the benefit of point-in-time snapshots without duplicating sstables that are common across multiple back

Re: Cassandra on AWS suggestions for data safety

2014-07-24 Thread Robert Coli
On Wed, Jul 23, 2014 at 4:12 PM, Hao Cheng wrote: > 3. Using a backup system, either manually via rsync or through something > like Priam, to directly push backups of the data on ephemeral storage to S3. > https://github.com/JeremyGrosser/tablesnap =Rob

Re: Why is the cassandra documentation such poor quality?

2014-07-24 Thread Redmumba
A lot of the information about the compaction strategies would be incredibly useful in the docs as well: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra On Thu, Jul 24, 2014 at 9:45 AM, Peter Lin wrote: > for example, this old blog entry from way back in 2012 > > http://

Re: Why is the cassandra documentation such poor quality?

2014-07-24 Thread Peter Lin
for example, this old blog entry from way back in 2012 http://www.datastax.com/dev/blog/cql3-for-cassandra-experts On Thu, Jul 24, 2014 at 12:07 PM, Tyler Hobbs wrote: > > On Thu, Jul 24, 2014 at 3:55 AM, Nicholas Okunew > wrote: > >> most of the important stuff being in blog format > > > Whi

Re: What is C*?

2014-07-24 Thread Redmumba
Obvious troll is obvious. On Wed, Jul 23, 2014 at 3:50 PM, jcllings wrote: > Keep seeing refs to C*. > > I assume that C* == Cassandra? IMHO not a good ref to use what with C, > C++, C#. A language called C* can't be far behind assuming it doesn't > already exist. > ;-) > > Jim C. > >

Re: Why is the cassandra documentation such poor quality?

2014-07-24 Thread Peter Lin
for starters all of the blog entries related to CQL3, like the change in terminology and compact storage. the last time I looked at the datastax documentation on CQL3, it wasn't nearly as detailed as the blog entries by jonathan ellis and sylvain. On Thu, Jul 24, 2014 at 12:07 PM, Tyler Hobbs w

Re: Why is the cassandra documentation such poor quality?

2014-07-24 Thread Tyler Hobbs
On Thu, Jul 24, 2014 at 3:55 AM, Nicholas Okunew wrote: > most of the important stuff being in blog format Which blog posts are you referring to? We could definitely have the docs team integrate some of the blog post topics into the normal docs and keep them updated for new C* versions. --

Re: Cassandra trigger following the CQL for Cassandra 2.0 tutorial does not work

2014-07-24 Thread Martin Marinov
Hi, I posted the question on stackoverflow: http://stackoverflow.com/questions/24937425/cassandra-trigger-following-the-cql-for-cassandra-2-0-tutorial-does-not-work On 07/24/2014 06:25 PM, Martin Marinov wrote: Hi, I'm following the tutorial at: http://www.datastax.com/documentation/cql/3.1/

Cassandra trigger following the CQL for Cassandra 2.0 tutorial does not work

2014-07-24 Thread Martin Marinov
Hi, I'm following the tutorial at: http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/trigger_r.html I'm using Cassandra 2.0.9 with Oracle Java (java version "1.7.0_60"). I have downloaded the example Java Class from https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob

Re: Cassandra on AWS suggestions for data safety

2014-07-24 Thread Alex Major
On Thu, Jul 24, 2014 at 12:47 PM, Hao Cheng wrote: > Thanks for your response! > > We're planning using the r3.large instances, they seem to offer the best > price/performance for our application (the cheapest way to get both 15GB of > RAM and SSD storage). Unfortunately cost wise we can't justif

Re: What is C*?

2014-07-24 Thread Jack Krupansky
Jim, by the way, what exactly does “C.” mean in your name? I mean, it’s so ambiguous that it could mean anything! Or is this an example of using a regular expression (you have a two-letter last name?) as opposed to C* using a traditional wildcard “glob”?! Maybe “C*” should be “C.*”. Or maybe s

Re: Cassandra on AWS suggestions for data safety

2014-07-24 Thread Hao Cheng
Thanks for your response! We're planning using the r3.large instances, they seem to offer the best price/performance for our application (the cheapest way to get both 15GB of RAM and SSD storage). Unfortunately cost wise we can't justify having beefier instances with satisfactory cluster sizes at

Re: Cassandra on AWS suggestions for data safety

2014-07-24 Thread Alex Major
On Thu, Jul 24, 2014 at 12:12 AM, Hao Cheng wrote: > Hello, > > Based on what I've read in the archives here and on the documentation on > Datastax and the Cassandra Community, EBS volumes, even provisioned IOPS > with EBS optimized instances, are not recommended due to inconsistent > performance

Re: CSV Import is taking huge time

2014-07-24 Thread Akshay Ballarpure
Tyler, Thanks for reply. I didn't understood you fully. can you please elaborate ? Thanks & Regards Akshay Ghanshyam Ballarpure Tata Consultancy Services Cell:- 9985084075 Mailto: akshay.ballarp...@tcs.com Website: http://www.tcs.com Experience certaint

Re: Why is the cassandra documentation such poor quality?

2014-07-24 Thread Nicholas Okunew
No - it took me a little while to see what is going on with datastax, I wasn't interested in datastax/DSE. I was interested in Cassandra. I see the value of datastax supporting Cassandra, I think it's a lot further along than it would have been without their support. But they've also vacuumed up a

Re: What is C*?

2014-07-24 Thread Mark Reddy
Yes you are correct, Cassandra is often abbreviated as C*. With most languages and applications being referenced by their acronym / abbreviation, I guess you just have to pick one that is available. I assume if someone creates a new language and wants to name it C*, they will see that it is taken a