Hi folks - I'm doing an informal proof-of-concept with Cassandra and I've been getting some conflicting information about how my data layout should go. Perhaps somebody could point me in the right direction.
I have a column family that will have billions of rows of data. The data do not have any unique identifier intrinsically. A given row will have, say, 50 columns, and I'll need to be able to efficiently query on 8-10 of them. I've been told that I should just pick the most common search item and make that my primary key, even though it will not be unique. That seems contrary to the documentation I am seeing online. >From my reading, it seems like I need a UUID column that will be my primary >index, and then I should set up secondary indexes on the 8-10 primary search >columns. Am I understanding this correctly? Any advice you can offer on this >would be tremendously helpful. I'm quite limited in how specific I can be >about the data, of course. Steve