I don’t know but my guess is it would be without tombstones.  I did more 
research this weekend (note that my Sunday was largely interrupted by again 
seeing a node go to high load/high CMS for ~3 hours) and came across this 
presentation:  
http://www.slideshare.net/mobile/planetcassandra/8-axel-liljencrantz-23204252

I definitely suggestion you give this a look, very informative.  The important 
take away is that they ran into the same issue as I due to using the same model 
where I am updating to the same row over time with a TTL causing that row to 
fragment across SSTables and once across 4+ tables, compaction can never 
actually remove tombstones.  As I see it, I have the following options and was 
hoping to get some advice:

1.  Modify my write structure to include time within the key.  Currently we 
want to get all of a row but I can likely add month to the time and it would be 
ok for the application to do two reads to get the most recent data (to deal 
with month boundaries).  This will contain the fragmentation to one month.

2.  Following off of item #1, it appears that according to CASSANDRA-5514 that 
if I include time within my query it will not bother going through older 
SSTables and thus reduce the impact of the row fragmentation.  Problem here is 
that likely my data space will still continue to grow over time as tombstones 
will never be removed.

3.  Move from LCS to STCS and run full compactions periodically to cleanup 
tombstones

I appreciate the help!

From: Jack Krupansky <j...@basetechnology.com<mailto:j...@basetechnology.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, July 25, 2014 at 11:15 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Hot, large row

Is it the accumulated tombstones on a row that make it act as if “wide”? Does 
cfhistograms count the tombstones or subtract them when reporting on cell-count 
for rows? (I don’t know.)

-- Jack Krupansky

From: Keith Wright<mailto:kwri...@nanigans.com>
Sent: Friday, July 25, 2014 10:24 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Cc: Don Jackson<mailto:djack...@nanigans.com>
Subject: Re: Hot, large row

Ha, check out who filed that ticket!   Yes I’m aware of it.  My hope is that it 
was mostly addressed in CASSANDRA-6563 so I may upgrade from 2.0.6 to 2.0.9.  
I’m really just surprised that others are not doing similar actions as I and 
thus experiencing similar issues.

To answer DuyHai’s questions:

How many nodes do you have ? And how many distinct user_id roughtly is there ?
- 14 nodes with approximately 250 million distinct user_ids

For GC activity, in general we see low GC pressure in both Par New and CMS (we 
see the occasional CMS spike but its usually under 100 ms).  When we see a node 
locked up in CMS GC, its not that anyone GC takes a long time, its just that 
the consistent nature of them causes the read latency to spike from the usual 
3-5 ms up to 35 ms which causes issues for our application.

Also Jack Krupansky question is interesting. Even though you limit a request to 
5000, if each cell is a big blob or block of text, it mays add up a lot into 
JVM heap …
- The columns values are actually timestamps and thus not variable in length 
and we cap the length of other columns used in the primary key so I find if 
VERY unlikely that this is a cause.

I will look into the paging option with that native client but from the docs it 
appears that its enabled by default, right?

I greatly appreciate all the help!

From: Ken Hancock <ken.hanc...@schange.com<mailto:ken.hanc...@schange.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, July 25, 2014 at 10:06 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Cc: Don Jackson <djack...@nanigans.com<mailto:djack...@nanigans.com>>
Subject: Re: Hot, large row

https://issues.apache.org/jira/browse/CASSANDRA-6654

Reply via email to