GM Sandon:

           I can empathize with your challenges.   In a previous lifetime,
I worked with Websphere 2.0 through 8.X, portals, process server, app
server.  One installation had over 500+ WAS instances.   Pain X 10.
           Distributed computing is a sword with two edges.  Scaling
is terrific when all the moving parts are in sync.  Most everyone loves
to tout performance.  The down side is maintenance, debugging
and as you put it, deployment.
           Keeping multiple topologies, bolts, etc for different environment
is a logistical nightmare.  Even with Puppet and Chef handling the
mundane tasks of keeping the Linux binaries and directories in sync,
the "Storm" layer becomes ever more complex.   You did not
mention Zookeeper in your discussion.
           Have you thought of pushing this out to AWS or Google
Cloud ???  Assuming you are doing real-time micro-batching,
you might have several test topologies(work in progress).  You
might have a QA and the production topology to mirror each
other.  Split the RT traffic to QA and to prod.   If you have
a dozen or so topologies, how much Bash scripting do you need ???
TOO MUCH sys admin work !!!
           Again, best wishes on your endeavors.

           Mason Yu Jr.
           Principal Enterprise Architect
           Big Data Architects, LLC.




著名的孫子

On Tue, Apr 28, 2015 at 8:49 AM, Sandon Jacobs <sjac...@appia.com> wrote:

> My company is using storm for various stream-processing solutions, mostly
> ingesting data from Kafka topics. We have chosen to implement our
> topologies in Scala, using APIs like Tormenta and Summingbird in the mix as
> well. We have about 9-10 topologies running in production as we speak.
>
> I find tons of useful information about Storm in general, but VERY little
> about how folks are managing the deployment, git repos, etc.
>
> Currently we have all of these topologies in the same GIT repo, with a
> main-class for each topology, allowing us to run them locally or remotely.
> Some of this code shares common components - we try to reuse some bolts we
> have written, and other dependencies cross topologies as well.
>
> So in our CI environment, we build an assembly jar using SBT containing
> all topologies and use storm jar command to deploy that jar N-times (N =
> number of topologies). We have functional tests that are run by Jenkins
> after each topology deployment to exercise the functionality of said
> topology. Given the number of topologies in our catalog, this is starting
> to become cumbersome in the current state, with the feedback loop from git
> push thru deployment-test getting longer and more unwieldy. The whole thing
> is starting to remind me too much of my Java EE container days with
> multiple EAR files or WAR files deployed in a cluster of WebSphere boxes
> (UGH!!!).
>
> I say all of that to frame the question of how folks are managing a
> similar situations/deployments. There has been some thought around breaking
> up the git repo into multiple repos. Or maybe a git repo with a parent SBT
> project, with subproject(s) for common components and 1 subproject per
> topology.
>
> I am interested to hear any thoughts or be pointed to any resources that
> have been helpful to others.

Reply via email to