On Sun, Nov 14, 2010 at 5:01 PM, Mike Fedyk <mfe...@mikefedyk.com> wrote:
> node.js + CouchDB == Crazy Delicious by Mikeal Rogers
> http://jsconf.eu/2010/speaker/nodejs_couchdb_crazy_delicious.html
>
> I was watching this a couple days ago and I've been thinking about how
> to deal with instance and service (think of sending emails as a
> "service") failures.  Because it's easy to make sure that only one
> email is sent if you only have one server sending emails, but if that
> machine fails, then no emails get sent out.
>
> You compose an email while offline and save it to your local couch
> instance.  Then later it gets replicated to one of the couchdb
> instances in your cloud.  And then:
>
> 1. You have the date when it was saved on the phone, etc.  If you had
> a timestamp when that replication happened, you'd be able to have a
> chain of couchdb instances try to send the email, but only if it is
> older than X time after it was replicated to your cloud of couchdb
> instances.  instance_a would try immediately, instance_b tries if it
> hasn't been taken in X minutes, and so on for instance_c.  see [A].
>
> 2. When instance_a wants to send the email, it updates the state to
> "taking" and then waits for instance_b and instance_c to ack the
> taking by adding fields to the current document.  oops, instance_b and
> instance_c will race more often than not and you'll get a conflict so
> it needs to be separate temporary state tracking documents.  You still
> need [A] or if there are no other instances you'll wait forever for
> acks that won't happen.
>
> 3. You have one instance that sends emails and you deal with the
> downtime if that instance fails or some other failure happens that
> prevents email from being sent.
>
> 4. You send periodic test emails to make sure they are being sent, and
> if they are not then take over the function on instance_$self.  see
> [B]
>

Or... (I just thought of this idea)

5. When you write the update to change the state machine status from
NEW to TAKING (as well as a field with your instance id), you write to
any other couchdb instance except for $self.  Then when the write
replicates to you and the instance id matches $self, you send the
email.

C) This way you naturally test the instance you write to, and no other
instance will race with you to send the email. You can either keep a
list of the other instances and use them round-robin, or possibly use
DNS RR to do it for you, you just need to depend on the quality of the
DNS resolver.  With this you should be able to do away with [A] and
[B].

What do you think?

> A) And this only works assuming that all of your cloud couchdb
> instances are replicating to each other correctly at the moment.  Now
> you have N > 1 emails sent out.  (and imagine if what's happening is
> something where it's more important than receiving an email or
> receiving more than one email)  To keep this from happening you need a
> couchdb instance heartbeat (maybe have an app update a document that
> describes that instances "registration" in the system with the current
> time stamp every 60 seconds) and a STONITH system to kill any
> instances of couchdb that stop updating their document.
>
> B) Do you still need [A]?  maybe it's good enough that the email
> didn't get back to you, but maybe it is sending emails to other
> places.  so it seems [A] is still needed.  Now you also need a service
> registration system (make sure this and other services like it are
> only running on one instance).
>
> So these are some of the ideas that I'm coming up with on this issue.
> I'm looking for more input.  What would you do?
>

Reply via email to