On Sat, Aug 28, 2010 at 02:44:41PM -0700, Benjamin Black wrote:
> On Sat, Aug 28, 2010 at 2:34 PM, Anthony Molinaro
> <antho...@alumni.caltech.edu> wrote:
> > I think maybe he thought you meant put a layer between cassandra internal
> > communication.
> 
> No, I took the question to be about client connections.

Sorry didn't mean to put words into your mouth

> > There's no problem balancing client connections with
> > haproxy, we've been pushing several billion requests per month through
> > haproxy to cassandra.
> >
> 
> Can it be done: yes.  Is it best practice: no.  Even 10 billion
> requests/month is an average of less than 4000 reqs/sec.   Just not
> that many for a distributed database like Cassandra.

I don't know it seems to tax our setup of 39 extra large ec2 nodes, its
also closer to 24000 reqs/sec at peak since there are different tables
(2 tables for each read and 2 for each write)

> Cassandra can, and does, fail in ways that do not stop it from
> answering TCP connection requests.  Are you saying it works fine
> because you have seen numerous types of node failures and this was
> sufficient? I would be quite surprised if that were so.  Using an LB
> for service discovery is a fine thing (connect to a VIP, call
> describe_ring, open direct connections to cluster nodes).  Relying on
> an LB to do the right thing when it is totally ignorant of what is
> going across those client connections (as is implied by simply
> checking for connectivity) is asking for trouble.  Doubly so when you
> use a leastconn policy (a failing node can spit out an error and close
> a connection with impressive speed, sucking all the traffic to itself;
> common problem with HTTP servers giving back errors).


The haproxy does seem sufficient for us.  We've been running with cassandra
in production since 0.3.0 and seen just about every possible failure.  For
the most part it has worked.  I'm not saying it's the most efficient, just
that it will work for most people's usage.  All the writes to this cluster
are via php, which creates a connection for each request, so a connection
check works fine in this case.  We attempt to pool connections via java for
reads, but they reconnect whenever they receive an error.

If one machine is misbehaving it tends to fail pretty quickly, at which 
point all the haproxies drop it (we have an haproxy on every client node,
so it acts like a connection pooling mechanism for the client).  describe_ring
is a newish call, it didn't exist when we wrote our systems and we have not
had a chance to revisit.  So while yes there are problems with using an
haproxy, they are not insurmountable, and it would probably work for many
use cases.  But like everything YMMV.

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <antho...@alumni.caltech.edu>

Reply via email to