[service-orientated-architecture] Steve on SOA Resilience & Disaster Recovery

Gervas Douglas Fri, 02 Jan 2009 14:31:31 -0800

<<One of the things that continues to amaze me when I look at
companies Disaster Recovery policies is how much they concentrate on
the Backup and how little they concentrate on the restore. Chatting to
a CIO in the manufacturing industry a while back he gave me a great
stat on his business


    For every minute that we are down it costs 200k euros a minute, my
DR plan takes two days, that is why I have three redundant data
centres and a full set of passive backups.


Simply put this chap couldn't let his systems go down so he invested
very heavily in making sure that they didn't.

More often however people just tick the "backup" box, send it off to
tape and then don't worry about bringing a service backup and what
that really takes.

If you've lost a disk then its a physical job followed by a data
restore (you did of course test that). If the server is trashed then
you have procurement, install and then recovery. If the Data Centre is
trashed then you have a much bigger challenge.

This is where companies come in and charge quite a bit of money to
have "warm" servers on standby. You pay for them at a certain rate
(and more when you actually use them) and then have to do all the
rebuild job if disaster strikes.

Now in a networked world such as SOA this is a big issue as the
failure of one service can have significant knock-on effects. Now you
need to design around this, but there is still the question of the
quickest way to get a degraded instance of the service back.

Why degraded? Well lets say its a high demand service, you've gone
stateless, you've got 20 Linux boxes running it horizontally when a
muppet manages to dig up the network cable into your data centre. Its
going to be 3 days to get it fixed to 100% but you need something
degraded that works now.

This is where Virtual Machines really kick-in. Along side your normal
data backup strategy I'd recommend taking a Virtual Machine backup of
the server. Now in future the VM approach will probably be the normal
one, but its got a great job right now as part of your DR solution.
Take the VM backup at the same time and then if you need to just fire
it up on some commodity hardware that you have lying around. Its going
to perform badly so think about throttling, or fire up a virtual grid
with the hardware in the office. This then gives you the space to do
the full recovery and get everything up and performing at 100%. Having
the VM backup means putting your patch in place as quickly as possible.

When you do this however I'd also recommend that you run, at least
once a month, a full unit/system test on the VM backup to make sure
that it does actually work properly.

Disaster Recovery is about planning for the recovery, not planning for
backup.>>

You can read Steve's blog at:

http://service-architecture.blogspot.com/

Gervas

[service-orientated-architecture] Steve on SOA Resilience & Disaster Recovery

Reply via email to