I'm not aware of anyone classifying what twitter is doing today as
'working.' In fact, I believe that twitter's problems are much larger
than just technology but that's a whole different subject.
What twitter may have realized is that they don't have the resources of
Facebook, that Facebook's use case is fairly limited (although a large
deployment), and that they may have been trudging off into the great
unknown.
Although I'm a fan of Cassandra, there's no way I'd use it today for my
tier 1 deployments, because I don't have the resources of Facebook, and
even though Cassandra is open source, that doesn't mean I can fix it
when it goes down. And, because it's open source, there's no one to
call to have it fixed reliably and within production constraints.
Cassandra's strength is its greatest weakness right now.
The bloom is starting to come off NoSQL, which is normal - it means that
people & firms are trying to do more with it and most probably realizing
that all of the tools, support, infrastructure, etc. surrounding
alternative solutions isn't such a bad thing. And that the world of
NoSQL had start to come up with a better mantra than "joins are bad,
dude", and "you're just protecting the status quo." There's a *lot
more* big data wrapped up inside of SQL databases and only a fraction of
the in NoSQL - and there's a lot of reasons for it.
For example, do I *really* need Cassandra if MySQL will work for me and
I just want to get up and running quickly without writing a bunch of
code? My team was pushing greater than 20k updates per second into,
GASP, Oracle 5 years ago. Sure, it was expensive. But it worked. And
it was worth it - or we wouldn't have spent the $$. What's your data
worth if you don't have your data? zero.
And then there's support - internal support. Picking a database du-jour
is organizationally expensive. Especially when there's probably one or
two databases that Twitter could have bought off the shelf that would
have solved their problems. But instead of bolstering the reliability
and robustness of their internal architecture, they've gone and used
very expensive equity for acquisitions. Running multiple databases in
a fault tolerant, geographically disperse deployment isn't easy (yes,
I've done it) and having multiple databases in the mix really
complicates things. And at this stage in Twitter's growth, I frankly
don't understand why they're looking to complicate their technological
landscape any more than absolutely required.
So, this entire rant can be summarized really quite succinctly:
"If data is your business (like Facebook & Twitter), if you don't have
the resources to cost effectively handle all of your data management
needs internally (Facebook does, Twitter doesn't), then basing your
solution on un-proven storage solutions (commercial or open source, SQL
or NoSQL) is a risky and short sighted strategy."
Please send death threats via the channels iterated below:
Colin
+1 315 886 3422 cell
+1 701 212 4314 office
http://blog.cloudeventprocessing.com
http://twitter.com/EventCloudPro <http://twitter.com/EventCloudPro%20>
On 7/10/2010 2:02 PM, Ryan King wrote:
On Sat, Jul 10, 2010 at 10:33 AM, Marty Greenia<martygree...@gmail.com> wrote:
It almost seems counter-intuitive. For analytics, you'd think they'd want a
database that supports more sophisticated query functionality (sql). Whereas
for everyday tweet storage, something fast and high-throughput (cassandra)
makes sense.
I'd be curious to here the details as well.
These decisions aren't made in a vacuum. One of these use cases has an
existing system that works, one doesn't.
-ryan