That's a nice turn-about of Ranadive's "The Power of Now" phrase.
While I take issue with gratuitous use of "BPM" in this posting, async vs. sync is good to keep in mind when designing the technical details of a system. I agree with Vinoski that synchronous interaction is too often the default approach (e.g. web services). Store-and-forward might be a better default. But let's not swing too far the other way in labelling it "the worst of all the options available." A robust, fault-tolerant synchronous interaction is sometimes the right approach. -Rob --- In [email protected], Gervas Douglas <gervas.doug...@...> wrote: > > <<Steve Vinoski <http://steve.vinoski.net/blog/> nominates RPC as an > historically bad idea > <http://qconlondon.com/london-2009/presentation/RPC+and+its+Offspring%3A+Convenient%2C+Yet+Fundamentally+Flawed>, > > yet the synchronous request reply message pattern is undeniably the most > common pattern out there in our client-server world. Steve puts this > down to "convenience" but I actually think it goes deeper than that. > Because RPC is actually not convenient - it causes an awful lot of problems. > > In addition to the usual problems cited by Steve, RPC makes heavy > demands on scalability. Consider the SLA requirements for an RPC service > provider. An RPC request must respond in a reasonable time interval - > typically a few tens of seconds at most. But what happens when the > service is under heavy load? What strategies can we use to ensure a > reasonable level of service availability? > > 1. Keep adding capacity to the service so as to maintain the required > responsiveness...damn the torpedoes and the budget! > 2. The client times-out, leaving the request in an indeterminate > state. In the worst case, the service may continue working only to > return to find the client has given up. Under continued assault, > the availability of both the client and the server continues to > degrade. > 3. The service stops accepting requests beyond a given threshold. > Clients which have submitted a request are responded to within the > SLA. Later clients are out of luck until the traffic drops back to > manageable levels. They will need to resubmit later (if it is > still relevant). > 4. The client submits the request to a proxy (such as a message queue > > <http://www.soabloke.com/2008/09/20/the-architectural-role-of-messaging/>) > and then carries on with other work. The service responds when it > can and hopefully the response is still relevant at that point. > > Out of all these coping strategies, it seems that Option 2 is the most > common, even when in many cases one of the other strategies is more > efficient or cost-effective. Option 1 might be the preferred option > given unlimited funds (and ignoring the fact it is often technically > infeasible). In the real world Option 2 more often becomes the default. > > The best choice depends on what the service consumer represents and what > is the cost of any of the side-effects when the service fails to meet > its SLA. > > When the client is a human - say ordering something at our web site: > > * Option 2 means that we get a pissed off user. That may represent a > high, medium or low cost to the organization > > <http://www.soabloke.com/2008/01/13/cost-vs-benefit-occams-razor-for-the-enterprise-architect/> > depending on the value of that user. In addition there is the cost > of indeterminate requests. What if a request was executed after > the client timed-out? There may be a cost of cleaning up or > reversing those requests. > * Option 3 means that we also get a pissed off user - with the > associated costs. We may lose a lot of potential customers who > visit us during the "outage". On the positive side, we minimise > the risk/cost of indeterminate outcomes. > * Option 4 is often acceptable to users - they know we have received > their request and are happy to wait for a notification in the > future. But there are some situations where immediate > gratification is paramount. > > On the other hand, if the client is a system participating in a > long-running BPM flow, then we have a different cost/benefit equation. > > * For Option 2, we don't have a "pissed off" user. But the > transaction times out into an "error bucket" and is left in an > indeterminate state. We must spend time and effort (usually costly > human effort) to determine where that request got to and remediate > that particular process. This can be very costly. > * Option 3 once again has no user impact, and we minimise the risk > of indeterminate requests. But what happens to the halted > processes? Either they error out and must be restarted - which is > expensive. Alternatively they must be queued up in some way - in > which case Option 3 becomes equivalent to Option 4. > * In the BPM scenario, option 4 represents the smoothest path. > Requests are queued up and acted upon when the service can get to > it. All we need is patience and the process will eventually > complete without the need for unusual process rollbacks or error > handling. If the queue is persistent then we can even handle a > complete outage and restoration of the service. > > So if I am a service designer planning to handle service capacity > constraints, for human clients I would probably choose (in order) Option > 3, 4 and consider the costs > <http://www.soabloke.com/2008/01/13/cost-vs-benefit-occams-razor-for-the-enterprise-architect/> > > of option 2. For BPM processes where clients are "machines" then I would > prefer Option 4 every time. Why make work for myself handling timeouts? > > One problem I see so often is that solution designers go for Option 2 by > default - the worst of all the options available to them.>> > > You can read this blog at: > http://www.soabloke.com/2009/05/05/the-power-of-later/ > > Gervas >
