WSJ Management
Managing The Bottom Line
Service-level management is essential to deploying business-critical Web services systems
Apr. 5, 2004 12:00 AM
Web services are like your local auto repair shop. You don't want to do business with them until you have a clear idea of the level of service you can expect. Despite the many advantages of these new, standards-based systems, they will not become core business assets without capabilities for gauging and controlling their quality of service (QoS) attributes.
Consider the example of a manufacturer that uses internal Web services for order fulfillment as well as extranet Web services to automatically replenish its inventory. To make informed decisions about order fulfillment, the manufacturer needs current and comprehensive information on the services and systems consumed by its inventory Web service. Is the warehouse management service running when inventory updates are needed? How long do suppliers' services take to respond with a promise date? How should service problems be addressed to minimize costs? If the manufacturer has Web services that are consumed by customers, it might want to prioritize orders by the customers' value to the business. Without the ability to see and control quality-of-service attributes across internal and external services, the manufacturer would be putting its business at significant risk.
This is where service-level management comes in. When implemented properly, service-level management, or SLM, provides a means of understanding the business impact service levels have on revenue and productivity, while also facilitating the diagnosis and mitigation of service problems within business processes. It's a continuous and closed-loop process of measuring, reporting, and improving the quality of service of systems and applications. Effective SLM includes a methodology for establishing and maintaining acceptable service levels to meet business objectives, streamline processes, and, of course, minimize costs.
SLM has typically focused on system management in terms of the physical or system-level behavior alone. With their many benefits, Web services offer the ability to expand that role.
Traditional service-level management solutions focus on managing devices such as network routers, server equipment, and system software. Because these solutions are based on physical data and messages, they cannot be fully "aware" of Web services, to say nothing of logical- or application-level behavior. Moreover, they provide a limited perspective because they cannot see the business implications (such as the dollar value of orders) of the messages flying across the system. As a result, they are unable to proactively resolve quality-of-service problems or alter system behavior to meet underlying business objectives. (More on this point later.)
Abstracting the Management Layer
Whether you build your own SLM system or purchase one, you'll want to give some thought to the underlying mechanisms used to provide management capabilities. If you hard-code the logic into the Web services themselves you'll be challenged to maintain your management capabilities as your Web services evolve. Having your Web services and clients embed proprietary headers into SOAP requests and responses would limit your management capabilities to those Web services over which you have direct control. The best approach, therefore, is to abstract the management solution by using Web service intermediaries. With this approach, all the management policies reside in the intermediaries, enabling your Web services environment to grow naturally and without added development-time constraints.
Start Managing Service Levels Early
It's important to incorporate some service-level management into your Web services systems from the outset. Pilots and proofs of concept, which help to form your overall Web services strategy, call for careful evaluation and constant fine-tuning. With SLM, you and your team will benefit from a deeper understanding of system performance. What's more, you'll need SLM-generated data to quantify your successes in order to justify additional investments in Web services.
SLM can help you understand specific performance and availability issues of your Web service pilots. By recording and analyzing the service levels of your Web services experiments you'll have the knowledge and experience you need to ready your service-oriented architecture for prime time. For example, before exposing a new order-processing application via Web services interfaces, you'll first need to understand the impact the Web services will have on internal systems and the service levels you can expect to provide to users.
Rules of Thumb for Pilots
Keep in mind when planning and implementing SLM for Web service pilots:
- Be a discerning consumer of Web services: Service-level management will help you to distinguish third-party Web services by revealing QoS information such as performance and availability. You'll want to understand the QoS attributes of the Web services you consume over a period of time before incorporating them into business-critical applications.
- Differentiate the Web services you provide: You can actively manage your service levels to provide premium service to discrete user populations. For example, you might want to allocate your best-performing system resources to your highest-value customers. You'll want to practice such things with pilots before taking your Web services into production.
- Be aware of service quality: Pilots and proofs of concept are excellent opportunities to set informal quality-of-service goals for applications, and then track against those goals. For example, you can see if your application can provide 99% availability from Monday to Friday, 8 a.m. to 5 p.m. As you compile historic service-level data from your early experiences, you'll be better equipped to plan for growth and to respond to your user community.
- Create formal and informal service-level agreements: Service-level agreements, or SLAs, are the instruments for defining and enforcing SLM goals. An SLA sets the expectations of service providers and recipients alike. Formal SLAs could be contractual commitments that include noncompliance clauses, or they could be formal arrangements between internal departments that impact funds-exchange and charge-back policies. Informal SLAs represent a collection of internal service-level objectives based on internal goals designed to provide operations and IT departments with finer-grained control of their environments, beyond formal or legal commitments. An effective SLM solution will help you to ensure that your expectations are set on achievable goals, prevent service levels from becoming noncompliant with defined objectives, manage compliance failures according to the magnitude of the penalties, and minimize the cost of any noncompliance that occurs.
- Set objectives that match operational conditions: Each SLA has a set of service-level objectives, or SLOs, such as average response time or availability over a period of time. Take time into account when defining objectives. Factoring in hours of operation, peak hours of operation, scheduled downtimes, time zone adjustments, and so on, is key to ensuring the accuracy of your QoS data. This knowledge will also help you define trends and predict problems.
- Set objectives that match business conditions: As a service provider, you're likely to use the same service or application to support multiple clients. Because you'll probably commit to different service levels for those clients, you'll need to manage multiple SLOs for the same service. The more accurately your evaluation and mitigation strategies are matched to your business objectives, the more easily you'll be able to gauge and affect your service's impact on your bottom line. For example, any plan of action to mitigate problems should first address the most stringent SLAs. Even for informal SLAs that are tied to more generalized QoS goals, and not to specific customers, mitigation strategies should focus on business issues such prioritizing orders of higher dollar value.
- Continually evaluate service-level objectives: For greater understanding of your system's impact on your business, give consideration to the criteria for evaluating your objectives. For example, you might want to evaluate performance after the first 1,000 messages (to build a critical mass of data) and then every 30 seconds thereafter. Or you might want to track each order over $1,000,000. The more fine-grained the evaluation criteria, the more you'll learn from your experiments. Be sure to set warning thresholds in order to trigger alerts in advance of compliance failures.
To Unlock Business Insight, Content Is KeyXML provides standard syntax for representing information. This syntax makes it possible for an SLM solution to utilize the message content to play an active role in improving the system's quality of service. Tags, attributes, and element structure, as well as data from external sources such as LDAP, provide valuable contextual information that can augment use of the message content. By leveraging the content (e.g., order value) and context (e.g., customer priority) of the messages that are exchanged between Web services, the management solution can provide valuable business insight into service-based processes. Combining that business knowledge with system-level information, you can define service-level objectives and thresholds for any relevant set of business criteria.
Here are a few examples. By leveraging message content and context you can:
- Monitor the performance of an order-fulfillment process for all Platinum-level distributors
- Notify the account managers for all customers that are affected by a service failure
- Monitor response times by region
- Define service-level objectives by user roles, such as internal product managers, partner staff, high-value customers, etc.
Enforcing Service Quality According to Business PrioritiesTo optimize Web service systems and minimize risk, it's important to continually and proactively prioritize system usage according to business goals. An effective management solution not only measures, but actively enforces, end-to-end quality objectives that align with organizational goals. You can use the rich insight provided by service-based systems to optimize performance and availability in a manner that achieves the greatest business value. This ability becomes even more powerful with the opportunity to mitigate problems before they can impact business. For instance, if the performance of an order-fulfillment service starts degrading, you might prioritize customers with whom you have the most stringent SLAs. Or when a service is restored you might first process the orders of Platinum customers.
Managing Service Levels Across Complex, Distributed Processes
As you gain experience with Web services, you'll migrate to more mature environments, eventually integrating Web services that are distributed and federated across heterogeneous platforms. This brings forth business processes that span multiple divisions within the enterprise or cross corporate boundaries to include partners and customers. From end to end, the process will involve multiple, independently controlled Web services that might rely on other services and processes to complete a task. Typically, each Web service will interact with numerous other Web services, playing a provider role in some interactions and a consumer role in others.
Such complex architectures will also facilitate loosely coupled exchanges between multiple services. This will enable you to implement processes that can adjust to business conditions in real time by dynamically selecting the appropriate Web services. While this flexibility increases the effectiveness of the process, it also introduces a level of unpredictability into the system.
Consider the example of a company that would like to streamline its manufacturing and fulfillment processes across multiple factories and warehouses, as well as its partners' facilities. Depending on the available-to-promise inventory levels, the company will either fulfill the order from one of its warehouses or outsource the manufacturing to one of its established partners. This fulfillment process involves multiple Web services, both inside and outside the corporate enterprise, and must be very flexible. Managing such a process calls for:
- Gathering service-level statistics from distributed services, including inventory, manufacturing and warehouse services
- Monitoring services over which the manufacturer has no control
- Making real-time QoS decisions by monitoring each instance of the process (for instance, immediately notifying IT if the step that explodes the bill of materials goes down)
- Rapidly locating problems within a process and determining the impact on business
- Preemptively addressing issues before compliance failures occur
Combined, these capabilities provide the ability to successfully achieve business goals in a complex yet flexible system.
Rules of Thumb for Complex Systems
Some things to keep in mind about managing service levels across distributed, federated environments:
- Apply SLM to business processes: You'll need to manage service levels end-to-end across dynamic, multistep processes. Ideally, your service-level manager will be flexible enough to observe these processes without having to control them. This becomes especially important if an enterprise has multiple processes to track or there are numerous process engines within the organization.
- Determine the impact on business: By bridging system-level information (e.g., server availability) and application-level data (e.g., average orders per week), it's possible to get a clear understanding of the business impact service levels have on revenue and productivity. For example, if a service that calculates sales tax goes down, the IT group would want to know which business processes are affected, which customers are trying to access the service, and if there are any potential penalties for SLA noncompliance.
- Identify service problems early: IT and operations spend an inordinate amount of time trying to pinpoint bottlenecks in distributed, multistep processes. To ease this burden, they need a management solution that enables them to easily identify which operations or services are problematic. Better yet, given the opportunity to understand trends and predict failures with levels of confidence, they can avoid service problems altogether.
- Span multiple technical architectures: Your SLM solution must address the key architectural issues within a complex scenario of distributed and federated processes. It must aggregate measurements from distributed system components, including Web services hosted by other entities. It must scale as the logical network grows. It must be cross-platform to accommodate the heterogeneous nature of Web services and must collect data from a variety of sources (including WSDL-less Web services, non-SOAP XML, HTML, servlets, JSP, HTTP, and JMS). And it should leverage the system-level information that is collected by traditional management products such as IBM Tivoli or HP OpenView.
- Close the loop with historical information: Analyzing historical data provides added insight, such as an improved ability to identify bottlenecks, fine-tune service-level objectives, and anticipate changes in system performance. For example, by knowing the average order value during peak business hours you can better understand the revenue loss of a system failure. For more mature systems, this can also help identify new revenue opportunities - such as providing value-added services targeted at the most active users of a particular service.
Service-Level Management Is Taking Care of BusinessIn recent months, we've seen a great deal of momentum with Web services, as a growing number of companies are taking their new applications into production. What's more, we've seen Web service deployments from any variety of businesses - financial institutions, manufacturers, retailers, government agencies, telecommunications. The list goes on. One constant in all this change is that the companies deploying business-driving Web services have all implemented solutions that enable them to set service-level expectations, manage to those expectations, and resolve any issues that arise. To ensure the ongoing success and continual expansion of their Web service initiatives, the companies that are serious about Web services are using SLM to manage their bottom lines.
About Fred CarterFred Carter is the chief run-time architect for AmberPoint, a provider of Web services management software. Prior to AmberPoint, Fred was the architect and technical lead for EAI products at Forte, a role he continued at Sun Microsystems. Prior to Forte, he held several technical leadership positions at Oracle, where he designed distributed object services for interactive TV, online services, and content management.