Why do you think the amount of partitions is different in these tables? The 
partition key is the same (system_name and event_name). The number of rows per 
partition is different.



From: kurt Greaves [mailto:k...@instaclustr.com]
Sent: Thursday, June 09, 2016 7:52 AM
To: user@cassandra.apache.org
Subject: Re: Interesting use case

I would say it's probably due to a significantly larger number of partitions 
when using the overwrite method - but really you should be seeing similar 
performance unless one of the schemas ends up generating a lot more disk IO.
If you're planning to read the last N values for an event at the same time the 
widerow schema would be better, otherwise reading N events using the overwrite 
schema will result in you hitting N partitions. You really need to take into 
account how you're going to read the data when you design a schema, not only 
how many writes you can push through.

On 8 June 2016 at 19:02, John Thomas 
<jthom...@gmail.com<mailto:jthom...@gmail.com>> wrote:
We have a use case where we are storing event data for a given system and only 
want to retain the last N values.  Storing extra values for some time, as long 
as it isn’t too long, is fine but never less than N.  We can't use TTLs to 
delete the data because we can't be sure how frequently events will arrive and 
could end up losing everything.  Is there any built in mechanism to accomplish 
this or a known pattern that we can follow?  The events will be read and 
written at a pretty high frequency so the solution would have to be performant 
and not fragile under stress.

We’ve played with a schema that just has N distinct columns with one value in 
each but have found overwrites seem to perform much poorer than wide rows.  The 
use case we tested only required we store the most recent value:

CREATE TABLE eventyvalue_overwrite(
    system_name text,
    event_name text,
    event_time timestamp,
    event_value blob,
    PRIMARY KEY (system_name,event_name))

CREATE TABLE eventvalue_widerow (
    system_name text,
    event_name text,
    event_time timestamp,
    event_value blob,
    PRIMARY KEY ((system_name, event_name), event_time))
    WITH CLUSTERING ORDER BY (event_time DESC)

We tested it against the DataStax AMI on EC2 with 6 nodes, replication 3, write 
consistency 2, and default settings with a write only workload and got 190K/s 
for wide row and 150K/s for overwrite.  Thinking through the write path it 
seems the performance should be pretty similar, with probably smaller sstables 
for the overwrite schema, can anyone explain the big difference?

The wide row solution is more complex in that it requires a separate clean up 
thread that will handle deleting the extra values.  If that’s the path we have 
to follow we’re thinking we’d add a bucket of some sort so that we can delete 
an entire partition at a time after copying some values forward, on the 
assumption that deleting the whole partition is much better than deleting some 
slice of the partition.  Is that true?  Also, is there any difference between 
setting a really short ttl and doing a delete?

I know there are a lot of questions in there but we’ve been going back and 
forth on this for a while and I’d really appreciate any help you could give.

Thanks,
John



--
Kurt Greaves
k...@instaclustr.com<mailto:k...@instaclustr.com>
www.instaclustr.com<http://www.instaclustr.com>

Reply via email to