Re: Strategy for dividing wide rows beyond just adding to the partition key

Jason Kania Sat, 12 Mar 2016 07:26:38 -0800

Our analytics currently pulls in all the data for a single sensor reading as we 
use it in its entirety during signal processing. We may add secondary indices 
to the table in the future to pull in broadly classified data, but right now, 
our only goal is this bulk retrieval.
      From: Jack Krupansky <jack.krupan...@gmail.com>
 To: user@cassandra.apache.org 
 Sent: Friday, March 11, 2016 7:25 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
  
Thanks, that level of query detail gives us a better picture to focus on. I 
think through this some more over the weekend.
Also, these queries focus on raw, bulk retrieval of sensor data readings, but 
do you have reading-based queries, such as range of an actual sensor reading?
-- Jack Krupansky
On Fri, Mar 11, 2016 at 7:08 PM, Jason Kania <jason.ka...@ymail.com> wrote:


The 5000 readings mentioned would be against a single sensor on a single sensor 
unit.

The scope of the queries on this table is intended to be fairly simple. Here 
are some example queries, without 'sharding', that we would perform on this 
table:

SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND 
"sensorId"=17 AND time<=?ORDER BY time DESC LIMIT 5000
SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND 
"sensorId"=17 AND time>=?ORDER BY time LIMIT 5000
SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND 
"sensorId"=17 AND time<=? AND classification=?
ORDER BY time DESC LIMIT 5000
where 'classification' is secondary index that we expect to add.

In some cases, we have to revisit all values too so a complete table scan is 
needed:
SELECT "time","readings" FROM "sensorReadings"
Getting the "next" and "previous" 5000 readings is also something we do, but is 
manageable from our standpoint as we can look at the range-end timestamps that 
are returned and use those in the subsequent queries.

SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND 
"sensorId"=17 AND time>=? AND time<=?ORDER BY time LIMIT 5000
Splitting the bulk content out of the main table is something we considered too 
but we didn't find any detail on whether that would solve our timeout problem. 
If there is a reference for using this approach, it would be of interest to us 
to avoid any assumptions on how we would approach it.

A question: Is the probability of a timeout directly linked to a longer seek 
time in reading through a partition's contents? If that is the case, splitting 
the partition keys into a separate table would be straightforward.

Regards,
Jason

      From: Jack Krupansky <jack.krupan...@gmail.com>
 To: user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com> 
 Sent: Friday, March 11, 2016 6:22 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
   
Thanks for the additional information, but there is still not enough color on 
the queries and too much focus on a premature data model.
Is this 5000 readings for a single sensor of a single sensor unit, or for all 
sensors of a specified unit, or... both?
I presume you want "next" and "previous" 5000 readings as well as first and 
last, but... you will have to confirm that.
One technique is to store the bulk of your raw sensor data in a separate table 
and then simply store the PK of that data in your time series. That way you can 
have a much wider row of time series (number of rows) without hitting a bulk 
size issue for the partition. But... I don't want to jump to solutions until we 
have a firmer handle on the query side of the fence.
-- Jack Krupansky
On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <jason.ka...@ymail.com> wrote:

Jack,
Thanks for the response.
We are targeting our database design to 10000 sensor units and each sensor unit 
has 32 sensors. We are seeing about 700 events per day per sensor, each 
providing about 2K of data. Based on keeping each partition to about 10 Mb 
(based on readings we saw on performance), we chose to break our partitions on 
a weekly basis. This is possibly finer than we need as we were seeing timeouts 
only once a single partition was about 150Mb in size

When pulling in data, we will typically need to pull 1 to 4 months of data for 
our analysis and will use only the sensorUnitId and sensorId to uniquely 
identify the data source with the timeShard value used to break up our 
partitions. We have handling to sequentially scan based on our "timeShard" 
value, but don't have a good handle on the determination of the "timeShard" 
portion of the partition key at read time. The data starts coming in when a 
subscriber starts using our system and finishes when they discontinue service 
or put the service on hold temporarily.

When I talk about hotspots, it isn't the time series data that is the concern, 
it is with respect to storing the maximum and minimum timeShard values in 
another table for subsequent lookup or the cost of running the current 
implementation of SELECT DISTINCT. We need to run queries such as getting the 
first or last 5000 sensor readings when we don't know the time frame at which 
they occurred so cannot directly supply the timeShard portion of our partition 
key.

I appreciate your input,
Thanks,
Jason

      From: Jack Krupansky <jack.krupan...@gmail.com>
 To: "user@cassandra.apache.org" <user@cassandra.apache.org> 
 Sent: Friday, March 11, 2016 4:45 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
   
I'll stay away from advising on a specific schema per se, but I'll stick to the 
advice that you need to make sure that your queries are depending solely on the 
columns of the primary key or relatively short slices/scans, rather than run 
the risk of very long scans or having to process multiple partitions for a 
single query. That's canned to some extent, but still essential.
Of course we generally wish to avoid hotspots, but with time series they are 
unavoidable. I mean, sure you could place successive events at separate 
partitions, but then you can't do any kind of scanning/slicing.
But, events for separate sensors are not true hotspots in the traditional sense 
- unless you have only a single sensor/unit.
After considering your queries, the next step is to consider the cardinality of 
your data - how many sensors, how many units, rate of events, etc. That will 
feedback into queries as well, such as how big a slice or scan might be, as 
well as sizing of partitions.
So, how many sensor units do you expect, how many sensors per unit, and 
expected rate of events per sensor?
Try not to jump too quickly to specific solutions - there really is a method to 
understanding all of this other stuff upfront.
-- Jack Krupansky
On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <jason.ka...@ymail.com> wrote:

Jack,
Thanks for the response. I don't think I provided enough information and used 
the wrong terminology as your response is more the canned advice is response to 
Cassandra antipatterns.
To make this clearer, this is what we are doing:
create table sensorReadings (sensorUnitId int,
sensorId int,time timestamp,timeShard int,
readings blob,primary key((sensorUnitId, sensorId, timeShard), time);
where timeShard is a combination of year and week of year
For known time range based queries, this works great. However, the specific 
problem is in knowing the maximum and minimum timeShard values when we want to 
select the entire range of data. Our understanding is that if we update another 
related table with the maximum and minimum timeShard value for a given 
sensorUnitId and sensorId combination, we will create a hotspot and lots of 
tombstones. If we SELECT DISTINCT, we get a huge list of partition keys for the 
table because we cannot reduce the scope with a where clause.

If there is a recommended pattern that solves this, we haven't come across it.

I hope makes the problem clearer.
Thanks,
Jason

      From: Jack Krupansky <jack.krupan...@gmail.com>
 To: user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com> 
 Sent: Thursday, March 10, 2016 10:42 AM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the 
partition key
   
There is an effort underway to support wider 
rows:https://issues.apache.org/jira/browse/CASSANDRA-9754

This won't help you now though. Even with that improvement you still may need a 
more optimal data model since large-scale scanning/filtering is always a very 
bad idea with Cassandra.
The data modeling methodology for Cassandra dictates that queries drive the 
data model and that each form of query requires a separate table ("query 
table.") Materialized view can automate that process for a lot of cases, but in 
any case it does sound as if some of your queries do require additional tables.
As a general proposition, Cassandra should not be used for heavy filtering - 
query tables with the filtering criteria baked into the PK is the way to go.

-- Jack Krupansky
On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <jason.ka...@ymail.com> wrote:

Hi,
We have sensor input that creates very wide rows and operations on these rows 
have started to timeout regulary. We have been trying to find a solution to 
dividing wide rows but keep hitting limitations that move the problem around 
instead of solving it.
We have a partition key consisting of a sensorUnitId and a sensorId and use a 
time field to access each column in the row. We tried adding a time based 
entry, timeShardId, to the partition key that consists of the year and week of 
year during which the reading was taken. This works for a number of queries but 
for scanning all the readings against a particular sensorUnitId and sensorId 
combination, we seem to be stuck.
We won't know the range of valid values of the timeShardId for a given 
sensorUnitId and sensorId combination so would have to write to an additional 
table to track the valid timeShardId. We suspect this would create tombstone 
accumulation problems given the number of updates required to the same row so 
haven't tried this option.

Alternatively, we hit a different bottleneck in the form of SELECT DISTINCT in 
trying to directly access the partition keys. Since SELECT DISTINCT does not 
allow for a where clause to filter on the partition key values, we have to 
filter several hundred thousand partition keys just to find those related to 
the relevant sensorUnitId and sensorId. This problem will only grow worse for 
us.

Are there any other approaches that can be suggested? We have been looking 
around, but haven't found any references beyond the initial suggestion to add 
some sort of shard id to the partition key to handle wide rows.
Thanks,
Jason

Re: Strategy for dividing wide rows beyond just adding to the partition key

Reply via email to