best practices?

Hiller, Dean Tue, 11 Dec 2012 15:26:46 -0800

Is there any column that would be a good qualifer as a partition key?

Some people partition by time like every month or every day, and then you can 
either have your own secondary indexes that you query into(high entropy is NOT 
a big deal here) or PlayOrm can do some for you or you could use CQL as well.


Other partitioning schemes are to partition by client.

The goal is to have less than probably about 5 million rows in a partition so 
your wide row index is not too large.


Dean

From: 
"stephen.m.thomp...@wellsfargo.com<mailto:stephen.m.thomp...@wellsfargo.com>" 
<stephen.m.thomp...@wellsfargo.com<mailto:stephen.m.thomp...@wellsfargo.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Tuesday, December 11, 2012 3:45 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: Primary/secondary index question / best practices?


Dean, thank you for your response.  To the second half of the query, I’m a 
little concerned about the secondary index approach since the indexes that I 
want to create are columns with high entropy.



For example, I would like to query by User name and IP address, values which 
are decidedly NOT like the pattern recommended in the Secondary Index field.   
The 8-10 columns I need to search by are all high a similar scatter rate.  
Since the documentation seems to suggest that this is a bad idea, what would 
the correct pattern look like?



In an RDBMS I would just slap an alternate key index on the table and let it 
roll.   It seems like maybe that is not the right approach for Cassandra?



Thanks again,

Steve



-----Original Message-----
From: Hiller, Dean [mailto:dean.hil...@nrel.gov]
Sent: Tuesday, December 11, 2012 4:57 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Primary/secondary index question / best practices?



Hard to help out on a design without specifics but here is some advice based on 
the limited information



Primary key : yes, must be cluster unique.  TimeUUID or UUID….PlayOrm has very 
unique TimeUUID like keys as in this one 7AL2S8Y.b1 (b1 is the hostname and the 
prefix is a "unique" timestamp but generated to a shorter string(ah, nice 
readable primary keys).



There are some patterns you can look into here that may help 
https://github.com/deanhiller/playorm/wiki/Patterns-Page



If you can partition your data virtually, it may help a lot so you can query 
into the partitions.



Later,

Dean



From: 
"stephen.m.thomp...@wellsfargo.com<mailto:stephen.m.thomp...@wellsfargo.com><mailto:stephen.m.thomp...@wellsfargo.com%3cmailto:stephen.m.thomp...@wellsfargo.com%3e>"
 
<stephen.m.thomp...@wellsfargo.com<mailto:stephen.m.thomp...@wellsfargo.com<mailto:stephen.m.thomp...@wellsfargo.com%3cmailto:stephen.m.thomp...@wellsfargo.com>>>

Reply-To: 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org%3cmailto:user@cassandra.apache.org%3e>"
 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org%3cmailto:user@cassandra.apache.org>>>

Date: Tuesday, December 11, 2012 2:49 PM

To: 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org%3cmailto:user@cassandra.apache.org%3e>"
 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org%3cmailto:user@cassandra.apache.org>>>

Subject: Primary/secondary index question / best practices?



m my reading, it seems like I need a UUID column that will be my primary index, 
and then I should set up secondary indexes on the 8-10 primary search columns.  
Am I understanding this correctly?  Any advice you can offer on this would be 
tremendously helpful.  I’m quite limited in how specific I can be about the 
data, of course.

Re: Primary/secondary index question / best practices?

Reply via email to