I think that monitoring running services is like monitoring web applications. You need a tool that either parses the log files, detecting the potential errors that might cause incidents and/or a tool acting like a probe, constantly checking the service availability by invoking one of its methods.
But in SOA, the complexity is often not to monitor the services, the complexity is all what is between the service consumer and the service provider. Mediation systems(ESB, async middleware), registries, hardware devices (load-balancing, fail-over,...). When an application invokes an external service. We might know exactly which logical service is used but the service consumer application is largely ignorant of which physical path its message or request is following. A consequence of loosely coupling in large environments. In these circumstances, correlating an incident on a user-facing application with an undetected problem in the plumbing is a challenge. The complexity grows with the number of intermediaries in the chain. In one of my previous jobs, we had to manage 600+ services from a variety of back-ends and technologies. There was one "24*7 on call" operational team dealing exclusively with middleware and services. This team was using/keeping up-to-date a registry/repository where the dependencies between service consumers, service providers, installed service versions and deployment environments were documented. We have also created an end-2-end logging system used to track the physical path a message or a request was following. You can also think SOA operational management as something pro-active. For example: we are about to stop one physical machine for a planned maintenance, what is the impact? I should say that discovering the root cause of one incident is difficult but the most difficult situations are: - when only a fraction of the requests or messages are lost or unhandled - when the service is working but too slow - when the service is invoking dynamically other services In these situations, you really have to analyse the full chain and start correlating information from the different systems involved to see exactly where the problem is located. A painful job! <http://blogs.ittoolbox.com/eai/applications/archives/troubleshooting-composite-applications-8759> Robin --- In [email protected], "Erik van Gilder" <[EMAIL PROTECTED]> wrote: > > Hi, > > Have you had much success including your existing 7x24 operations > staff in managing your "SOA" environment, and, if so, to what do you > attribute your success? I'm working in a large centralized IT > operations where the 7x24 monitoring and management of the computing > environment is the responsibility of an enterprise command center. The > command center has deep roots in a mainframe operations and continues > to struggle with the e-commerce infrastructure. I'd like to see the > operations staff help monitor and manage the environment otherwise the > developers will bear a heavy burden. Any thoughts? > > In our case, the toolset includes WebMethods, WebSphere and Tivoli, > but I believe the problem to be tool-independent. > > Thanks, > Erik > Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo.com/group/service-orientated-architecture/ <*> To unsubscribe from this group, send an email to: [EMAIL PROTECTED] <*> Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/
