[service-orientated-architecture] [ZapFlash] Resilience: The Missing Word in the SOA Conversation

Gervas Douglas Tue, 03 Mar 2009 05:32:01 -0800

Please load the images to see all the content in this ZapFlash!<http://t.ymlp178.com/bmualauwsanauhbaoambmuu/click.php>

Document ID: ZAPFLASH-2009227 | Document Type: ZapFlash
/By: Ronald Schmelzer/
Posted: Feb. 27, 2009

In our conversations about the value of Service-Oriented Architecture(SOA), we frequently discuss the need for agility. The constant problemplaguing IT is its inability to deal with continuous and oftenunpredictable change. Therefore, it makes sense that any EnterpriseArchitecture (EA) initiative should focus on resolving that problem bydesigning for change – agility. However, we also discussed in a priorZapFlash that it’s difficult to design for agility by focusing onindividual Services. Rather, agility is an emergent property of thecomplex system that is IT.<http://t.ymlp178.com/bjbalauwsacauhbarambmuu/click.php>

So, if developers, integration architects, and infrastructureimplementers can’t guarantee agility at their individual, atomic levelof operation, what can they guarantee? One of the concepts thatcontributes to the emergence of agility in complex systems, but is oftenmissing from our SOA conversations is the notion of resilience.

What is resilience? Resilience is the property of an entity to absorbenergy when it is impacted by some change, but then rebound from thatchange back to its original condition. The concept of resiliency is sortof "a self-righting tendency" that allows the system to retain itsoverall structure without lasting impact even when impacted bysignificant change. And if we primarily want to enable the sort ofloosely coupled change that SOA purports, then certainly the Services webuild, infrastructure we implement, processes we model, and systems weenable should have some measure of resilience.


*How Does Resilience Relate to Agility?*

In many ways, the concept of resilience is similar to that of agility.Both agility and resilience deal with change in its various forms, butthere are distinct differences that inform the way in which wearchitect, engineer, and design our complex systems. One way tounderstand the difference is to compare the concept of resilience withthat of flexibility. Flexibility is another word frequently used todescribe one of the desired benefits of agility. If systems can stretchand bend to meet new needs, then we don’t need to continuouslyre-engineer them as things change.

However, resilience is not the same concept as flexibility. The best wayto understand the difference is to look at the antonyms of each of thewords. Rigidity, often couched in terms of “robustness”, is the antonymof flexibility, and it implies the inability or resistance of an objectto change. However, fragility is the antonym of resilience, and itimplies that the given entity will break when a sufficient force isapplied. There’s clearly a relationship between flexibility andresilience because things that are flexible have a higher tolerance forforce, but flexible systems can still be fragile. Things can be flexibleand not resilient, in that many systems can be changed but never regaintheir original shape. However, if that happens often enough, you areleft with a system contorted beyond its original intention. Indeed, youwant resiliency and agility, not just flexibility and robustness. Evenmore so, it is much easier to build systems for robustness than it is tobuild them for flexibility. The general thinking goes that you shouldbuild systems big, strong, and thick, and you can “withstand” change.But who wants or even can withstand the inevitable force of change?Would you rather not have colossal failure when the inevitable force ofchange does happen to occur? Wouldn’t you rather capitalize on change?

One insight is that systems are fragile when you change them beyondtheir “elastic limit”. From this perspective, things that are rigid havea very low elastic limit, and are very fragile. Things that are flexiblehave a high elastic limit and are resilient up to a point. Elasticity ismeasured by variability, and we can plan ahead with regards to thisvisibility by thinking about how much we expect things change and howmuch force there is when they change. As you might guess, in a systemthat’s continuously undergoing rapid and often unpredictable change,resilience provided through robustness provides neither flexibility noremergent properties of agility. The only form of resilience that worksis that which is based on flexibility. In this way, we can thinkresilience that we plan into our systems as variability, and resiliencethat emerges unplanned in our systems as agility.

The idea of measuring flexibility by planning for variability shouldsound familiar. We discussed this idea when we introduced the concept ofthe Agility Model<http://t.ymlp178.com/bjhadauwsafauhbagambmuu/click.php>. The AgilityModel provides architects with three key capabilities: a method forplanning Services and processes with regards to their expectedvariability, a means for business users to express their desires withrespect to variability, and a means to measure developed systems andServices for their actual variability. Having variability providesflexibility, which in turn provides a measure of resilience, andcontributes to agility as an emergent property. Specifically, planningfor variability requires you to think beyond how a particular aspect ofthat Service is designed for today. What could change in the future?What is the cost/benefit trade-off for designing that variability innow, rather than just acknowledging its inflexibility at that aspect?

But there’s more to the resilience picture. In reality, architects canprovide for resilience in one of two ways: by either building the systemrigid enough to resist the change or build them flexible enough toabsorb change without permanently changing the system. We often handlethese resilience issues through a few key mechanisms: redundancy,distribution, fail-over, load-balancing, clustering, and an enforced nosingle point of failure rule. With this in mind, it doesn’t matter howflexible a particular Service might be if it can unexpectedly becomeunavailable at a moment’s notice. And we shouldn’t come to depend onsystems to provide this sort of resilience either. Systems managementsoftware, ESBs, and other infrastructure can introduce more brittlenessthrough a single point-of-failure. What if the SOA management systemstops functioning, even if the Services themselves are operating fine?No, we can’t depend on infrastructure to solve architectural resiliencyissues. We have to design resilience into the architecture, regardlessof the current technology in use.


*The Role of Resilience in SOA*

Just as we can plan for flexibility at a variety of levels usingmeasures of variability in the Agility Model, so too can we plan forresilience at those levels. Services that are resilient can not onlyhandle a wide range of request types, but also significant numbers ofService requests without tipping over into failure. While it is possiblefor Service infrastructure (including the now ubiquitous ESB products)to handle such Service availability resilience, the best practice is forarchitects to consider Service availability as part of resilient Servicedesign. For example, architects should consider fail-over Services,clusters of Service implementations, or load-balancing by havingmultiple Service interfaces and Service end-points defined in Servicecontracts <http://www.zapthink.com/report.html?id=ZAPFLASH-2007719>. Inthis way, the architect doesn’t have to depend on specificinfrastructure to handle variable Service loads.

Yet, resilience at the Service level is not enough to guarantee overallresilience of the enterprise architecture. Just as we need fail-over,redundancy, load-balancing, and just-in-time provisioning for Services,so too we need them for the business processes implemented ascompositions of those Services. Consider fail-over processes thatprovide an alternate execution path for business logic, redundantprocesses that channel interactions across alternate invocationmechanisms, and methods to create ad-hoc processes when other processesare on the verge of tipping over.

Perhaps the easiest form of resilience can be achieved at theinfrastructure level. For sure, SOA infrastructure should be able tohandle a wide range of usage loads and invocation methods, but to dependon a single vendor or single implementation to provide that guarantee isfoolhardy. Rather, good enterprise architects count on resilience ofinfrastructure by having redundant, load-balanced, and alternate runtimeengines, and by using distributed, heterogeneous network intermediariesinstead of single-vendor, proprietary, single point of failure ESBs.Organizations should also implement distributed caching, offloaded XMLparsing, federated registries with late binding, and network gatewaysthat handle security and policy enforcement away from the Serviceend-points. Resilience at the infrastructure level is much more doablewhen you count on high levels of reliability and throughput withoutcounting on one vendor’s implementation to pull all the weight.

But why stop there? Organizations seeking SOA resilience need to alsomake sure to have resilient Service policies. This requires not justredundant policy enforcement mechanisms, but also fail-over policydefinition points and even redundant, fail-over, and load-balancedService policies. When you’re using policies at runtime to determinebinding to Services, having unexpected outages of Service policydefinition availability can cause just as much havoc as if the Serviceitself was not available.

Similarly, companies need to have resilience at the Service contract andschema level. Having redundant Service implementations makes no sense ifthey are all sharing a single Service contract file that is in danger ofdisappearing, especially if it is sitting on an unprotected file server.Protect your metadata by locking it behind a policy-enforced registry,but also make sure to have redundancy, fail-over, and load-balancing toavoid shifting a single point of failure. This also applies to allService metadata, process metadata, data schema, and semantic mappingsthat might be necessary to allow for proper functioning of the system.


*The ZapThink Take*

Yet, all this doesn’t matter if the most important part of enterprisearchitecture, namely the architect, is him/herself not resilient. Areyou the only EA in your organization that gets SOA? Even worse, are youthe only EA in your organization? What happens if your job changes, oryou get laid off, or the organization otherwise changes its feelings onEA and/or SOA? Will that kill the whole SOA project? What about budgetsand funding? Are you operating your SOA projects on the edge, justawaiting a single nudge to push it into project oblivion? If so, youneed architectural and organizational resilience. Make sure you have abroad base of support (redundancy). Distribute the workload andresponsibility for architectural activities and make sure that there isa team of architects, not a lone crusader (failover and clustering).Provide visibility to the rest of the organization to the benefits ofyour activities and make sure you provide closed-loop interaction on howspecific EA tasks result in specific business benefits, preferablyiteratively, on a short time schedule, and frequently.

Agility and flexibility are not enough to guarantee SOA success. Infact, the real thrust of what ZapThink has been discussing on SOA forthe past eight plus years has been on agile, resilient enterprisearchitecture. If some of the so-called benefits of SOA were to disappear(namely, standards-based integration), but we remain with agile,resilient EA, we have achieved the main objective of SOA. Enabling thebusiness to operate in a continuously changing, heterogeneousenvironment without breaking, necessitating significant cost, or highlatency requires enterprise architects to think, act, and plan forresilience as well as agility.






  Copyright © 2008 ZapThink, LLC.

[service-orientated-architecture] [ZapFlash] Resilience: The Missing Word in the SOA Conversation

Reply via email to