|Version 1 (modified by scottmc, 7 years ago)|
Service Aggregation for Document Generation
This document describes the structure and use of a aggregation mechanism for using service oriented architectures and open standards for document generation. Additionally, this document describes the creation of such as system using Netscape's Rhino ECMA Script Compiler and Java based technologies, and its use for creating user interfaces to service oriented applications.
In modern business “software reuse” has taken yet another shape. In many common tasks where functionality would be added to applications, application appliances are put in their place using Services Oriented Architectures integrate these appliances with office support software and other internal applications that need to perform these functions. The interface of preference for these appliance applications has (thankfully) been an open, W3C recommended XML/RPC protocol called SOAP, or Simple Object Access Protocol.
As you may know, SOAP is an XML HTTP based protocol. eXtensible Markup Language conforming documents are transferred to and from the Hyper Text Transfer Protocol service in order to provide the arguments and result sets of each communication. The envelopes of SOAP requests hold XML document fragments. By use of more open XML based standards (Notably XSLT) these document fragments can be translated into various document forms suitable for user interfaces (such as XHTML, XUL, SVG).
XSLT to replace common document templates
The problem with most web applications as they exist today is that they do not treat a web page as simply a document, nor do they treat a web application the same as any other application. They attempt to isolate creation of the document by removing the actual document generation for the application itself, but still treat the resulting document as an integral part of the retrieval of the associated information. For this abstraction of the document generation, most web application frameworks employ a process called templating in which a template document is used, containing special tokens unique to the application to be replaced with the results of executing a specific operation of the application when a request is made.
The employment of XSLT in this process would mean that rather than using a templating mechanism local to the specific web application framework currently used, components of applications would rather simply use an interface such as the Document Object Model for generating an XML document unique to the component. This document contains the result of the specific request made to the component, and can then be transformed using eXtensible Stylesheet Language Transformation rules into the target document format of choice. The result is a component which results should never be obsoleted as they can be transformed to any current or future document markup or format by applying appropriate transformation rules.
Additional favorable behavior inherited by this mechanism is the ability to use a single document component to generate multiple document types for varying display mechanisms.
AJAX in the picture
In the public domain currently reside a number of a libraries creating uniform interfaces for these APIs. This fits into the picture by allowing user interfaces for internal office support software to request information of various forms without creating a complete request, and thus can be used for optimization of applications by rather than requesting a complete document, requesting the information necessary to update some particular segment of the current document. Keep this in mind later in this paper when you feel like things might get slow using this approach. Additionally, the majority of modern user agents have coupled this ability with the ability to apply XSL transformations to the result of one of these requests, directly within the user agent itself.
So now the question and answer this document addresses. If a common practice for building enterprise applications is to provide service interfaces for integration, why must the application provide multiple interfaces? If a service interface is sufficient to accomplish the task of the application, providing a unified mechanism for building interfaces should be sufficient for the interface of all services, and all application tasks there in can be implemented by these service interfaces alone.
Service Oriented Architecture
Service oriented architectures as you probably know open the functionality of varying aspects of an application to external applications by allowing the arbitrary invocation of specific internal functions of the application. Common practice today is to use a client library for accessing the specific service in which another application would like to make a request, perform the request and handle the results of the service request internally to the client application. This, unfortunately, requires programming and specific functionality must be added to the client application for the handling of each external request made. Thus, this approach requires developer man hours and provides a hindrance on the application.
The resolution to this issue I am proposing is to, rather than programming the client interaction for each request made to an external service oriented application, create entire applications using this mechanism and tie them together by aggregating the results of multiple service requests. Transformation of these resulting document fragments would thus create the intended document, providing the interface (be it a web or desktop interface) for the application. Why add services to applications when we can build applications out of services?
Aggregation of varying XML sources
Luckily, “Supports XML” is a bullet point on just about every application and appliance these days. Also luckily, “Supports XML” commonly means supports XML over HTTP. Aggregation simply means to pull from various sources, currently this is a common place for number of user end applications such as HTTP User Agents (“Web browsers”), RSS readers, and various other XML/HTTP supporting clients. Pulling XML from various sources means, however, that we can combine a number of XML resources as what is called a Compound Document. A Compound Document is simply an XML document composed of several varying formats. Currently some user agents (Such as the mozilla framework) support compound documents as a mechanism for rendering document fragments from one format (Such as Scalable Vector Graphcis) within the rendered result of another (Such as XHTML or the XML User interface Language).
Transformation of documents of this sort is trivial, thanks to XML Namespaces and XSLs handling of them, it is very simple to match only elements of a particular namespace in a transformation rule. In this approach, each service component would return document fragments of a unique type, thus transformation rules for varying document fragments in a compound document can be isolated while the transformation style sheet is contained in a single document. (Note, that while it says “single document” this would also include a document comprised of multiple documents through XSL inclusion).
Request Description Markup Language
Most hypertext documents (and such, every web application) begin with a single, simple operation; An HTTP request. The Request Description Language, or RDL, is an XML document format that describes how to fulfill an HTTP request by aggregation of various XML sources, local, service oriented or straight forward HTTP resources, and reference to how to transform the resulting compound document into the target document format.
Figure 2 – Request Description Example
<?xml version=”1.0” encoding=”utf-8”?> <configuration xmlns=”http://housevalues.com/2005/RDL”> <request method=”POST” uri=”/login”> <session-variable name=”ams-login-token”> <try> <soap-call href=”http://ams/soap/login”> <soap-parameter name=”username”> <http-value-of name=”login”/> </soap-parameter> <soap-parameter name=”password”> <http-value-of name=”passwd”/> </soap-parameter> </soap-call> </try> <catch><http-redirect src=”/login”/></catch> </session-variable> <soap-call href=”http://ams/soap/listvendors”> <soap-parameter name=”lid”> <session-value-of name=”ams-login-token”/> </soap-parameter> </soap-call> <soap-call href=”http://ams/soap/getnotes”> <soap-parameter name=”lid”> <session-value-of name=”ams-login-token”/> </soap-parameter> </soap-call> <soap-call href=”http://ams/soap/topagents”/> <xslt-template src=”loginSplash.xsl”/> <catch> <log><value-of name=”INTERNAL:error.message”/></log> <http-send src=”/error.xml”/> </catch> </request> </configuration>
note I've since decided that there are much better ways to represent requests such as these in XML. This example is conceptual only
The advantage of this approach is that one can easily alter the behavior of a URI simply by altering its description. Respectfully, all behavior of the URI is implemented via request description. The dependencies of each request are softly described using the request description language, so they can be changed dynamically as the application evolves and new dependencies are added or removed.
Being Fast Enough
Obviously this is not the absolute most runtime efficient approach to designing applications. Building applications from the aggregation of services does however provide enterprise quality scalability and increase interoperability over more traditional techniques. The reason being, a typical approach to attaching services to an application is to publish some routines that the developer, prior to attempting to integrate the application with anything else, believes may be desired in the future. The problem is obviously no one knows what people are going to want to do, so the approach of simply building only services in order to construct your application means all important tasks have published, accessible service interfaces. A service request for every bit of data could add up in overhead quite quickly, but just a little bit of creativity can avoid this being an issue.
The resolution for this is to provide caching on the part of the aggregation server. For instance, say the document is a HTML document for a content management system containing a portion of the latest article submitted. The article may be updated at any time during business hours, but will not be updated after 18:00 Monday through Friday and will not be updated before 08:00 Monday through Friday Pacific time. Thus, the behavior of the service or of the aggregation server, or a combination of either with the aggregation server's caching rules overriding the behavior of the response, could be to not update the cache of this request of the request happens outside business hours.
This would mean that during the nonbusiness hours the aggregation server would never update its cache, unless it was restarted and its cache was cleared for some reason. And then during business hours, lets say this article may or may not be updated on any given day (A generally static item) then the behavior of the aggregation server could be to update the cache once every 10 minutes. So for each request made after the 10 minute interval from the request the cache was updated with, the cache would again be updated. The desire may be for the author of the article, an internal employee, to see their changes immediately.
Under these caching rules, they would have to wait up to 10 minutes to see their changes on the page, this may be unsatisfactory. For this reason, the aggregation server could be instructed not to cache internal requests. This would mean if a request was made from the intranet subnet then the cache would be ignored, however since this page has caching behavior for the outside world the response would be cached, so the outside world would see the changes either 10 minutes after the last change was made, or immediately after the first internal request. However outside changes would still only update the cache once every 10 minutes during business hours, and not at all otherwise.
You can see how this bit of caching for generally static items would shave vast amounts of overhead from the aggregation of this particular service. If this behavior is to be replicated for most requests, the network overhead from the aggregation server would be in fact less than even a traditional application server model.
In addition to this approach, our implementation choices will actually make up for more of the overhead than anything else. If you were to do a few bench marks testing the runtime environment of a few high level programming language implementations commonly used to build these types of applications, such as Perl, PHP, Python, Ruby or other very high level runtime compiled languages, you would find that most of these languages are actually quite high in overhead, yet sufficient for creating these types of applications. A number of other approaches can be taken toward reducing the request time overhead involved with the service aggregation model.
Asynchronous Resource Resolution
Once a request has been parsed, and the document fragment describing its behavior is located, each branch of the document fragment at each level is split into threads of the document. Each of these threads can be at least handled asynchronously, and likely multiple asynchronous resolutions within each of these threads as well in some instances. Parsing our request description top to bottom obviously is not the best way to do things, there is nothing less efficient than sitting around waiting on I/O. Obviously, since many requests are likely to be comprised of only a few top level service requests, dispatching all of these requests simultaneously will be the fastest possible way to handle the request. Once each service description has been dispatched a callback invoked upon the completion of the request will propagate the blackboard for the request, once all of the services have been successfully invoked, the request is fulfilled and then the DOM structure representing the result will be handed to the XSLT processor along with the DOM of any XSL rules for the request, the result is streamed to the client connection.
Preemptive initialization of requests
Recognition of connection patterns can allow us to initialize requests to service resources prior to requests actually being made, allowing us to pull the already initialized connection from a pool of connections ready for that service resource. This will reduce the runtime cost of TCP handshakes during runtime.
Preparing requests Early
As soon as we get a request, often before the needed data is available, we know most of what to ask for and most of how to ask for it. For example, in an instance where we are making a SOAP request that is dependent upon a session variable that is not yet set, while an additional SOAP request is pending to propagate the session variable we are referring to is still being processed, we can send the HTTP headers and possibly even a segment of the content body, and then idle on the rest pending the completion of the prior SOAP request. This way, we aren't just sitting around doing nothing until we get the necessary data to complete the request, rather, the majority of our request is already made, and we are only waiting on the finalization of this request.
This technique can be used to greatly speed up the time taken for request handling. We are not in any way actually lowering the process load of each request, but rather delegating that load at the earliest possible time allowing each request to complete as quickly as possible.
Due to the fact that a service aggregation server is not in fact a place where web applications themselves are built, the server itself has no need to be implemented in an abstract, runtime compiled type of language typical to web applications development. The service aggregation system can be implemented in any language, regardless of the how services themselves are implemented.
This was a proposal was written while I worked for a company called House Values, they dismissed the idea and due to my agreement with them, I own this content and was able to publish it here, just incase anyone is was interested. To this day, I think it's a damn good idea