Please load the images to see all the content in this ZapFlash!
<http://t.ymlp178.com/bmualauwsanauhbaoambmuu/click.php>
Document ID: ZAPFLASH-2009227 | Document Type: ZapFlash
/By: Ronald Schmelzer/
Posted: Feb. 27, 2009
In our conversations about the value of Service-Oriented Architecture
(SOA), we frequently discuss the need for agility. The constant problem
plaguing IT is its inability to deal with continuous and often
unpredictable change. Therefore, it makes sense that any Enterprise
Architecture (EA) initiative should focus on resolving that problem by
designing for change – agility. However, we also discussed in a prior
ZapFlash that it’s difficult to design for agility by focusing on
individual Services. Rather, agility is an emergent property of the
complex system that is IT.
<http://t.ymlp178.com/bjbalauwsacauhbarambmuu/click.php>
So, if developers, integration architects, and infrastructure
implementers can’t guarantee agility at their individual, atomic level
of operation, what can they guarantee? One of the concepts that
contributes to the emergence of agility in complex systems, but is often
missing from our SOA conversations is the notion of resilience.
What is resilience? Resilience is the property of an entity to absorb
energy when it is impacted by some change, but then rebound from that
change back to its original condition. The concept of resiliency is sort
of "a self-righting tendency" that allows the system to retain its
overall structure without lasting impact even when impacted by
significant change. And if we primarily want to enable the sort of
loosely coupled change that SOA purports, then certainly the Services we
build, infrastructure we implement, processes we model, and systems we
enable should have some measure of resilience.
*How Does Resilience Relate to Agility?*
In many ways, the concept of resilience is similar to that of agility.
Both agility and resilience deal with change in its various forms, but
there are distinct differences that inform the way in which we
architect, engineer, and design our complex systems. One way to
understand the difference is to compare the concept of resilience with
that of flexibility. Flexibility is another word frequently used to
describe one of the desired benefits of agility. If systems can stretch
and bend to meet new needs, then we don’t need to continuously
re-engineer them as things change.
However, resilience is not the same concept as flexibility. The best way
to understand the difference is to look at the antonyms of each of the
words. Rigidity, often couched in terms of “robustness”, is the antonym
of flexibility, and it implies the inability or resistance of an object
to change. However, fragility is the antonym of resilience, and it
implies that the given entity will break when a sufficient force is
applied. There’s clearly a relationship between flexibility and
resilience because things that are flexible have a higher tolerance for
force, but flexible systems can still be fragile. Things can be flexible
and not resilient, in that many systems can be changed but never regain
their original shape. However, if that happens often enough, you are
left with a system contorted beyond its original intention. Indeed, you
want resiliency and agility, not just flexibility and robustness. Even
more so, it is much easier to build systems for robustness than it is to
build them for flexibility. The general thinking goes that you should
build systems big, strong, and thick, and you can “withstand” change.
But who wants or even can withstand the inevitable force of change?
Would you rather not have colossal failure when the inevitable force of
change does happen to occur? Wouldn’t you rather capitalize on change?
One insight is that systems are fragile when you change them beyond
their “elastic limit”. From this perspective, things that are rigid have
a very low elastic limit, and are very fragile. Things that are flexible
have a high elastic limit and are resilient up to a point. Elasticity is
measured by variability, and we can plan ahead with regards to this
visibility by thinking about how much we expect things change and how
much force there is when they change. As you might guess, in a system
that’s continuously undergoing rapid and often unpredictable change,
resilience provided through robustness provides neither flexibility nor
emergent properties of agility. The only form of resilience that works
is that which is based on flexibility. In this way, we can think
resilience that we plan into our systems as variability, and resilience
that emerges unplanned in our systems as agility.
The idea of measuring flexibility by planning for variability should
sound familiar. We discussed this idea when we introduced the concept of
the Agility Model
<http://t.ymlp178.com/bjhadauwsafauhbagambmuu/click.php>. The Agility
Model provides architects with three key capabilities: a method for
planning Services and processes with regards to their expected
variability, a means for business users to express their desires with
respect to variability, and a means to measure developed systems and
Services for their actual variability. Having variability provides
flexibility, which in turn provides a measure of resilience, and
contributes to agility as an emergent property. Specifically, planning
for variability requires you to think beyond how a particular aspect of
that Service is designed for today. What could change in the future?
What is the cost/benefit trade-off for designing that variability in
now, rather than just acknowledging its inflexibility at that aspect?
But there’s more to the resilience picture. In reality, architects can
provide for resilience in one of two ways: by either building the system
rigid enough to resist the change or build them flexible enough to
absorb change without permanently changing the system. We often handle
these resilience issues through a few key mechanisms: redundancy,
distribution, fail-over, load-balancing, clustering, and an enforced no
single point of failure rule. With this in mind, it doesn’t matter how
flexible a particular Service might be if it can unexpectedly become
unavailable at a moment’s notice. And we shouldn’t come to depend on
systems to provide this sort of resilience either. Systems management
software, ESBs, and other infrastructure can introduce more brittleness
through a single point-of-failure. What if the SOA management system
stops functioning, even if the Services themselves are operating fine?
No, we can’t depend on infrastructure to solve architectural resiliency
issues. We have to design resilience into the architecture, regardless
of the current technology in use.
*The Role of Resilience in SOA*
Just as we can plan for flexibility at a variety of levels using
measures of variability in the Agility Model, so too can we plan for
resilience at those levels. Services that are resilient can not only
handle a wide range of request types, but also significant numbers of
Service requests without tipping over into failure. While it is possible
for Service infrastructure (including the now ubiquitous ESB products)
to handle such Service availability resilience, the best practice is for
architects to consider Service availability as part of resilient Service
design. For example, architects should consider fail-over Services,
clusters of Service implementations, or load-balancing by having
multiple Service interfaces and Service end-points defined in Service
contracts <http://www.zapthink.com/report.html?id=ZAPFLASH-2007719>. In
this way, the architect doesn’t have to depend on specific
infrastructure to handle variable Service loads.
Yet, resilience at the Service level is not enough to guarantee overall
resilience of the enterprise architecture. Just as we need fail-over,
redundancy, load-balancing, and just-in-time provisioning for Services,
so too we need them for the business processes implemented as
compositions of those Services. Consider fail-over processes that
provide an alternate execution path for business logic, redundant
processes that channel interactions across alternate invocation
mechanisms, and methods to create ad-hoc processes when other processes
are on the verge of tipping over.
Perhaps the easiest form of resilience can be achieved at the
infrastructure level. For sure, SOA infrastructure should be able to
handle a wide range of usage loads and invocation methods, but to depend
on a single vendor or single implementation to provide that guarantee is
foolhardy. Rather, good enterprise architects count on resilience of
infrastructure by having redundant, load-balanced, and alternate runtime
engines, and by using distributed, heterogeneous network intermediaries
instead of single-vendor, proprietary, single point of failure ESBs.
Organizations should also implement distributed caching, offloaded XML
parsing, federated registries with late binding, and network gateways
that handle security and policy enforcement away from the Service
end-points. Resilience at the infrastructure level is much more doable
when you count on high levels of reliability and throughput without
counting on one vendor’s implementation to pull all the weight.
But why stop there? Organizations seeking SOA resilience need to also
make sure to have resilient Service policies. This requires not just
redundant policy enforcement mechanisms, but also fail-over policy
definition points and even redundant, fail-over, and load-balanced
Service policies. When you’re using policies at runtime to determine
binding to Services, having unexpected outages of Service policy
definition availability can cause just as much havoc as if the Service
itself was not available.
Similarly, companies need to have resilience at the Service contract and
schema level. Having redundant Service implementations makes no sense if
they are all sharing a single Service contract file that is in danger of
disappearing, especially if it is sitting on an unprotected file server.
Protect your metadata by locking it behind a policy-enforced registry,
but also make sure to have redundancy, fail-over, and load-balancing to
avoid shifting a single point of failure. This also applies to all
Service metadata, process metadata, data schema, and semantic mappings
that might be necessary to allow for proper functioning of the system.
*The ZapThink Take*
Yet, all this doesn’t matter if the most important part of enterprise
architecture, namely the architect, is him/herself not resilient. Are
you the only EA in your organization that gets SOA? Even worse, are you
the only EA in your organization? What happens if your job changes, or
you get laid off, or the organization otherwise changes its feelings on
EA and/or SOA? Will that kill the whole SOA project? What about budgets
and funding? Are you operating your SOA projects on the edge, just
awaiting a single nudge to push it into project oblivion? If so, you
need architectural and organizational resilience. Make sure you have a
broad base of support (redundancy). Distribute the workload and
responsibility for architectural activities and make sure that there is
a team of architects, not a lone crusader (failover and clustering).
Provide visibility to the rest of the organization to the benefits of
your activities and make sure you provide closed-loop interaction on how
specific EA tasks result in specific business benefits, preferably
iteratively, on a short time schedule, and frequently.
Agility and flexibility are not enough to guarantee SOA success. In
fact, the real thrust of what ZapThink has been discussing on SOA for
the past eight plus years has been on agile, resilient enterprise
architecture. If some of the so-called benefits of SOA were to disappear
(namely, standards-based integration), but we remain with agile,
resilient EA, we have achieved the main objective of SOA. Enabling the
business to operate in a continuously changing, heterogeneous
environment without breaking, necessitating significant cost, or high
latency requires enterprise architects to think, act, and plan for
resilience as well as agility.
Copyright © 2008 ZapThink, LLC.