On the scaleability and performance side, I found Yahoo's paper about the
YCSB project interesting (benchmarking some NoSQL solutions with MySQL). See
research.yahoo.com/files/*ycsb*.*pdf.

*My concern with the denormalization approach is that it shouldn't be
managed by the client side because this has big impact on your throughput.
Is the map-reduce in that respect any better?
Wouldn't it be nice to support a kind of PL-(No)SQL server side scripting
that allows you to create and maintain materialized views? You might still
give it as an option to maintain the view synchronously (extension of
current row-level-atomicity)  or asynchronously.

Not sure how complicated this support would be...

- David

On Mon, May 10, 2010 at 10:38 PM, Paul Prescod <p...@prescod.net> wrote:

> On Mon, May 10, 2010 at 1:23 PM, Peter Hsu <pe...@motivecast.com> wrote:
> > Thanks for the response, Paul.
> > ...
> >
> > * Cassandra and its siblings are weak at ad hoc queries on tables
> > that you did not think to index in advance
> >
> > What is the normal way of dealing with this in Cassandra?  Would you just
> > create a new "index" and bring a big honking machine to the table to
> process
> > all the existing data in the database and store the new "index"?
>
> The latest version of Cassandra introduces a "map/reduce" paradigm
> which is the main tool you'd use for batch processing of data. You
> could either use that to DO your ad hoc query or to process the data
> into an index for more efficient ad hoc queries in the future.
>
>  * http://en.wikipedia.org/wiki/MapReduce
>
>  * http://en.wikipedia.org/wiki/Hadoop
>
>  * http://architects.dzone.com/news/cassandra-adds-hadoop
>
> You can read criticisms of MapReduce in the first link there.
>
> > On May 10, 2010, at 11:22 AM, Paul Prescod wrote:
> >
> > This is a very, very big topic. For the most part, the issues are
> > covered in the various SQL versus NoSQL debates all over the Internet.
> > For example:
> >
> > * Cassandra and its NoSQL siblings have no concept of an in-database
> "join"
> >
> > * Cassandra and its NoSQL siblings do not allow you to update
> > multiple "tables" in a single transactions
> >
> > * Cassandra's API is specific to it, and not portable to any other data
> > store
> >
> > * Cassandra currently has simplistic facilities to deal with various
> > kinds of conflicting write.
> >
> > * Cassandra is strongly optimized for multiple machine distributions,
> > whereas relational databases tend to be optimized for a single
> > powerful machine.
> >
> > * Cassandra and its siblings are weak at ad hoc queries on tables
> > that you did not think to index in advance
> >
> > On Mon, May 10, 2010 at 11:06 AM, Peter Hsu <pe...@motivecast.com>
> wrote:
> >
> > I've seen a lot of threads and posts about why Cassandra is great.  I'm
> > fairly sold on the features, and the few big deployments on Cassandra
> give
> > it a lot of credibility.
> >
> > However, I don't believe in magic bullets, so I really want to understand
> > the potential downsides of Cassandra.  Right now, I don't really have a
> clue
> > as to what Cassandra is bad at.  I took a look at
> > http://wiki.apache.org/cassandra/CassandraLimitations which is helpful,
> but
> > doesn't characterize its weaknesses in ways that I can really comprehend
> > until I've actually used Cassandra and understand some of the internals.
>  It
> > seems that the community would benefit from being able to answer some of
> > these questions in terms of real world use cases.
> >
> > My main questions:
> >
> >  * Are there designs in which a SQL database out-performs or out-scales
> > Cassandra?
> >
> >  * Is there a pros vs cons page of Cassandra against an open source SQL
> > database (MySQL or Postgres)?
> >
> > I do plan on attending the training session next Friday in Palo Alto, but
> > it'd be great if I had some more food for thought before I attend.
> >
> >
> >
> >
>

Reply via email to