Distributed Parallel Computing with Web Services
A pivotal role on the back end
By: Labro Dimitriou
Feb. 22, 2005 12:00 AM
Web services technology has become the ubiquitous connectivity fabric amongst diverse business domains and technical camps. At the same time, distributed parallel computing is becoming the de facto architecture for managing the performance of computationally intensive, long-running programs.
So, is it counterintuitive to consider Web services when pursuing performance improvement of compute-intense, long-running applications? It may seem that way but, most amazingly, Web services play a critical role not in one but in two areas of High Performance Computing (HPC) and distributed parallel computing:
This article looks at how the Web services scenario is unfolding in the distributed parallel computing space. First it discusses the developing infrastructure standards, followed by some definitions, and then delves into the real grid opportunity: to improve application response time while workload balancing. I'll conclude with a recent project implementation.
Grid Provides the Infrastructure for Parallel Distributed Computing
The most common approach to enabling an application for a grid environment is to decompose the application into smaller, independent subtasks or job "chunks," submit the jobs to the grid (schedule and deploy), and hope for the best. This paradigm provides a good solution for batch cycle applications and is well suited for coarse-grain parallelizable applications.
But what about the standards, providing best practices, support, and tools for distributing more complex applications, sometimes referred to as "fine-grain parallelizable."
Furthermore, many have argued that such services are orthogonal to what OGSA addresses. Yes, I agree that the specification is about the infrastructure, but only for now. Remember the debate about middleware, EAI, application servers, and the J2EE stack? I foresee that the same line of arguments regarding distributed parallel computing will soon be coming to a grid near you. In fact the two technologies - J2EE and grid - have very similar characteristics, including services, specifications, best practices patterns, and supporting technology middleware, runtime containers, and desirable quality of services. And in both cases Web services play a key role. Perhaps the overriding similarity is n-tiered distributed computing.
Parallel Distributed Computing and the Grid
These SMP configurations provide horizontal scaling by adding CPUs (typically up to 64). You add more processors until you can't add any more. But what happens when your application isn't completing fast enough and you need a 65th CPU? The only choices that don't require reprogramming are to buy a system with more powerful CPUs or a system that can support more CPUs. If this doesn't help the outlook is either chunking and distributing or bleak.
So the real distinction between distributed parallel processing and parallel processing is the access to the data that would have been in the shared memory in the parallel configuration. This could require a minor change and some data selection and movement logic being added, or it could be a major consideration because the movement of the data among the processors executing the instances of the application could take more time that the additional CPUs reduced. Typically, distributed parallel computing involves dozens or hundreds of computers, the grid or compute farm, and concurrently running components of an application.
I saved for last the notion of parallel running of an application. In a nutshell parallel computing exploits concurrency of execution, so no arguments here: it is parallel concurrently computing as opposed to serial computing. There is only one thing missing from the alphabet soup, the main character: the application.
Applications: Making Fast Faster
So the options to move programs from sequential to SMP machines are:
So the question remains: how do you improve the response time of an application by distributing it to a compute farm? If the application has a high-volume, sub-second response, transactional characteristic we are in luck. WebLogic and BEA's robust clustering technology can effectively distribute a high transactional application to a cluster. But there is a large class of application legacy of some sort, such as finance quant apps, engineering numerical analysis solvers, life science genome applications, or computational statistics which require processing large amounts of data and are compute- intense. These still need a solution.
The Compute-Intensive Application
Stage I distributed application candidates can be described as "same instructions small different data" design patterns. Consider the search for extra terrestrial seti@home project or the search of large Mersenne prime numbers - both items arguably on the far right of compute-intensive. In both cases the same application needs to be executed again and again using a small amount of different data; hence the name "same instructions different data." The only requirement to distribute the calculation is to use a job scheduler that distributes the calculation segments across computers. Sounds like a classic mainframe batch scheduler could do the job. No wonder IBM calls grid computing the next big thing - they have been providing the underlying services for years. From an application programming point of view, the grid is a vast number or transactions and jobs distributed by a scheduler.
These applications have one entry point and one exit point, and can have their work computed in parallel without sequential or data dependencies on the results of each. An application with these characteristics is referred to as "embarrassingly parallel."
A variant of the embarrassingly parallel application is a job or application that for historical reasons is being run as a sequential string of steps. These can easily be decomposed into smaller jobs and fall under the Stage I distribution umbrella. So all you need to distribute Stage I applications is to identify them, chunk them into parallel executable steps, and get a scheduler that deploys the "chunks" or steps. Several vendors, including Platform and Sun, provide such scheduling services.
While there are many applications that fit into the Stage I model, these applications have input data and usually intermediate result interdependencies. These are classified as Stage II distribution applications. You will recognize these as having the sources of their resource consumption embedded in loops with complex data dependency requirements. In other words, you can't just chunk the application and expect it to run faster with the same results
A few different algorithmic patterns immediately come to mind as Stage II distribution application candidates: serial nondependant, master-slave, binary non-recombining tree, simplex optimization algorithms, and so on.
Consider a simple binary search algorithm for finding the maximum from a sequence of numbers used in a sorting problem. Although it's a simple, commonly used algorithm, it exhibits the structural characteristics of Stage II distribution applications. You can devise a simple Web service that receives two numbers and returns back the maximum. A simple control master program could generate slave/worker services executing the comparisons on a compute farm, and returning the final result back to the parent/root node of the tree. The commercial tools for building such solutions include low-level parallel middleware (e.g., MPI and PVM), and a few higher-level paradigms that provide virtual shared memory or shared processes paradigms such as Java Spaces (a technology transfer of Linda Spaces), GemFire from Gemstone, and GigaSpaces. These offer generic supporting middleware services, with which the programmer must be conversant in order to enable the distribution communications and data transfer.
A recent entry that approaches such challenges differently and focuses on the application rather than on the infrastructure is ACCELERANT from ASPEED Software. It offers the developer a high-level algorithmic and computational pattern-aware interface that can be inserted in existing or new applications. This approach masks the application developer from having to deal with the middleware and distributed expertise while providing the resultant application with the use of the required runtime services to optimally manage the application execution across all instantiations of the execution.
An even more challenging variant of Stage II applications is one characterized as "impossible" to parallelize. Examples of this variant are step-wise iterative algorithms, where each step requires the computation of the previous one. A simple example is the common summation technique for adding a sequence of numbers. You iterate through the sequence of numbers and at each loop you add the next number to a tally. It turns out that even these Stage II variant algorithms can be recast to be handled like the easier Stage II algorithms, e.g. a binary tree master-slave algorithm can be applied to the above summation problem. This greatly simplifies the programming effort but obviously requires reverification since the algorithm has been altered. Other advanced techniques such as genetic algorithms are also available but their discussion does not belong to Web Services Journal - not until there is a Web services solution for them!
Now let's move on to how Web services facilitate grid enabling of a complex compute-intense application and harnessing the computing power of HPC center for fast pricing portfolio bond options. The Callable Bonds Portfolio pricing was selected because it is a representative class of particularly complex Monte Carlo simulations that yield greater accuracy.
Now is as good a time as ever to shed light on the celebrated Monte Carlo techniques. First, it's not just a jargon to confuse the unwary. It's a real scientific tool. Monte Carlo techniques are counterintuitive in the sense that they use probabilities and random numbers to solve some very concrete real-life problems. Buffon's Needle is one of the oldest problems in geometrical probability tackled with Monte Carlo. A needle is thrown on a lined sheet with a distance between the lines that is the same as the length of the needle. Doing the experiment many times computes the number (pi) of the circle, with great accuracy I must say. You can design a Web service that executes ranges of millions of throws on a compute farm. The master control program (server-side Web service) aggregates the experiments and serves you back the value of pi!
The Use Case Requirements
The company had existing C++ legacy code implementing the pricing model. Some of the bond portfolio calculations could take over an hour on the existing hardware. The new business requirement was to make the response time less than 30 seconds. Given the needs and the current implementation something had to give, and adding expensive cycles and reengineering the application was very risky - pardon the pun. The front GUI presented yet another challenge. The trading desk uses a new Java- based front-end trading system, but the sales desk primarily uses Excel spreadsheets for pricing for simplicity and easy of use.
Web Services for Robust Shared Business Services
A Web service interface provided the single API to the pricing engine and facilitated the Shared Business Service for the two LOBs. Furthermore, it provided an elegant solution to the technology interoperability gap. An unforseen benefit was the ability for the salesman to execute remotely from his laptop the Web service at a third- party office or at the coffee shop nearby (see Figure 2).
The server side of the Web service is the master/control program starting the computational/slave units on grid configuration using ASPEED's ACCELLERANT on-demand application servers and HP center's middleware fabric. Each slave component encapsulates the computation aspect of the portfolio pricing, the C++ legacy code (see Figure 3).
Every time the Web service is called, a number of slave calculations are fired on the grid. ACCELLERANT's on-demand server provides quality of services such as dynamic load balancing, fail-over, and managed optimal response time.
Building the Client Web Service
1. The first step was to create a Web service control. This was achieved simply by pointing at the published Web service URL followed by ?WSDL.:
2. The WSDL file received was saved at a local project directory. Figure 4 shows the input definition, a segment of the WSDL file.
3. Browse to the project and directory where the saved WSDL is located.
4. Right-click the WSDL file and select Generate JCX from WSDL. The resulting JCX file is a Web service control, which can be used from the Java client trading system.
Figure 4 shows a simple <XML> input file that is sent to the Web service.
The computational grid and distributed parallel computing deliver substantial performance improvement today. While the standards are still evolving, practitioners design and implement missing-critical applications, doing more at a faster rate in diverse commercial areas, and enjoy-great competitive advantage. Financial services professionals can execute complex financial models and provide exotic products to their clients for higher profits while they meet more stringent regulatory risk requirements and improve the bottom line through more efficient capital allocation. Pharmaceutical companies speed up preclinical and early clinical trials by a factor of five or more and gain FDA approvals faster. Manufacturing uses fluid dynamics, executing on powerful compute farms and connecting designers via Web services, to deliver faster simulation and to shorten new product life cycle while delivering better, cheaper, stronger products.
Web services play a pivotal role not only in the infrastructure back-end space, but also closer to the "final mile." I predict in the next 18 to 24 months, as the product stack matures and bandwidth increases, Web services, dynamic business process choreography, and informal on-demand networks will be able to tap the idle power of powerful compute farms, or even commuters' sleepy laptops, and deliver content on pervasive devices like never before. But remember, the grid is only as profitable as the applications you run on it.
Until then, get the grids crunching.
Reader Feedback: Page 1 of 1
SOA World Latest Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
SYS-CON Featured Whitepapers
Most Read This Week