Hello,

I'm about to start tackling our upgrade path for 0.94 to 1.0+. We have 6
production hbase clusters, 2 hadoop clusters, and hundreds of
APIs/daemons/crons/etc hitting all of these things.  Many of these clients
hit multiple clusters in the same process.  Daunting to say the least.

We can't take full downtime on any of these, though we can take read-only.
And ideally we could take read-only on each cluster in a staggered fashion.

>From a client perspective, all of our code currently assumes an
HTableInterface, which gives me some wiggle room I think.  With that in
mind, here's my current plan:

- Shade CDH5 to something like org.apache.hadoop.cdh5.hbase.
- Create a shim implementation of HTableInterface.  This shim would
delegate to either the old cdh4 APIs or the new shaded CDH5 classes,
depending on the cluster being talked to.
- Once the shim is in place across all clients, I will put each cluster
into read-only (a client side config of ours), migrate data to a new CDH5
cluster, then bounce affected services so they look there instead. I will
do this for each cluster in sequence.

This provides a great rollback strategy, and with our existing in-house
cluster cloning tools we can minimize the read-only window to a few minutes
if all goes well.

There are a couple gotchas I can think of with the shim, which I'm hoping
some of you might have ideas/opinions on:

1) Since protobufs are used for communication, we will have to avoid
shading those particular classes as they need to match the
package/classnames on the server side.  I think this should be fine, as
these are net-new, not conflicting with CDH4 artifacts.  Any
additions/concerns here?

2) I'd really like to be able to tackle HBase separately (and before) from
Hadoop.  With that in mind, *on the client side only*, it'd be great if I
could pull in our shaded CDH5 hbase, but with the CDH4 hadoop libraries.
All interactions with our hbase clusters happen through the hbase RPC.
Should this be fine?

3) If #2 is not possible, I'll need to further shade parts of hadoop.  Any
idea of the minimum parts of hadoop would need to be pulled in/shaded for
CDH5 hbase to work on the client side?

Thanks!  I'll look forward to posting all lessons learned at the end of
this upgrade path for the community, and appreciate any input you may have
on the above before I get started.

Bryan

Reply via email to