Ryan,
Thanks for the response. It offers a bit more clarity.
I think a series of blog posts with good real world examples would go a long 
way to increasing usability of Cassandra. Right now I find the process like 
going through a mine field because I only discover what is not possible after 
trying something that I would find logical and failing.

For my specific questions, the problem is that since searching is only possible 
on columns in the primary key and the primary key cannot be updated, I am not 
sure what the appropriate solution is when data exists that needs to be 
searched and then updated. What is the preferrable approach to this? Is the 
expectation to maintain a series of tables, one for each stage of data 
manipulation with its own primary key?
Thanks,
Jason
      From: Ryan Svihla <rsvi...@datastax.com>
 To: user@cassandra.apache.org 
 Sent: Tuesday, December 16, 2014 12:36 PM
 Subject: Re: Comprehensive documentation on Cassandra Data modelling
   
Data Modeling a distributed application could be a book unto itself. However, I 
will add, modeling by restriction is basically the entire thought process in 
Cassandra data modeling since it's a distributed hash table and a core aspect 
of that sort of application is you need to be able to quickly locate which 
server owns the data you want in the cluster (which is provided by the 
partition key).

in specific response to your questions
1) as long as you know the primary key and the column name this just works. I'm 
not sure what the problem is
2) Yes, the partition key tells you which server owns the data, otherwise you'd 
have to scan all servers to find what you're asking for.
3) I'm not sure I understand this.

To summarize, all modeling can be understood when you embrace the idea that :

   
   - Querying a single server will be faster than querying many servers
   - Multiple tables with the same data but with different partition keys is 
much easier to scale that a single table that you have to scan the whole 
cluster for your answer. 

If you accept this, you've basically got the key principle down...most other 
ideas are extensions of this, some nuance includes dealing with tombstones, 
partition size and order. and I can answer any more specifics. 

I've been meaning to write a series of blog posts on this, but as I stated, 
it's almost a book unto itself. Data modeling a distributed application 
requires a fundamental rethink of all the assumptions we've been taught for 
master/slave style databases.




On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania <jason.ka...@ymail.com> wrote:
Hi,
I have been having a few exchanges with contributors to the project around what 
is possible with Cassandra and a common response that comes up when I describe 
functionality as broken or missing is that I am not modelling my data 
correctly. Unfortunately, I cannot seem to find comprehensive documentation on 
modelling with Cassandra. In particular, I am finding myself modelling by 
restriction rather than what I would like to do.

Does such documentations exist? If not, is there any effort to create such 
documentation?The DataStax documentation on data modelling is far too weak to 
be meaningful.

In particular, I am caught because:
1) I want to search on a specific column to make updates to it after further 
processing; ie I don't know its value on first insert
2) If I want to search on a column, it has to be part of the primary key3) If a 
column is part of the primary key, it cannot be edited so I have a circular 
dependency
Thanks,
Jason



-- 
Ryan SvihlaSolution Architect
 

DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay. 


  

Reply via email to