Searched for : pipelines

In my comment view for the last post (comment #1), Piyush Pant writes about the confusion around different pipeline models and frameworks that are popping up all over the place and mentions Proseware, so I need to clarify some things:

I'll address the "too many frameworks" concern first: Proseware's explicit design goal and my job was to use the technologies ASP.NET Web Services, WSE 2.0, IIS, MSMQ, and Enterprise Services as pure as possible and I did intentionally not introduce yet another framework for the runtime bits beyond a few utility classes used by the services as a common infrastructure (like a config-driven web service proxy factory, the queue listener, or the just-in-time activation proxy pooling). What my job was and what I reasonably succeeded at was to show that:

Writing Service Oriented Applications on today's Windows Server 2003 platform does not require yet another framework.

The framework'ish pieces that I had to add are simply addressing some deployment issues like creating accounts, setting ACLs or setting up databases, that need to be done in a "real" app hat isn't a toy. Such things are sometimes difficult to abstract on the level of what the .NET Framework can offer as a general-purpose platform or are simply not there yet. All of these extra classes reside in an isolated assembly that's only used by the installers.

The total number of utility classes that play a role of any importance at runtime is 5 (in words five) and none of them has more than three screen pages worth of actual code. Let me repeat:

Writing Service Oriented Applications on today's Windows Server 2003 platform does not require yet another framework.

I do have a dormant (newtelligence-owned) code branch sitting here that'd make a lot of things in Proseware easier and more elegant to develop and makes reconfiguring services more convenient, but it's a developer convenience and productivity framework. No pipelines, no other architecture, just a prettier shell around the exact Proseware architecture and technologies I chose.

To illustrate my point about the fact that we don't need another entirely new framework, I have here (MessageQueueWebRequest.cs.txt, MessageQueueWebResponse.cs.txt) an early 0.1 prototype copy of our MessageQueueWebRequest/-WebResponse class pair that supports sending WS messages through MSMQ. (That prototype only does very simple one-way messages; you can do a lot more with MSMQ).  

Take the code, put it in yours, create a private queue, take an arbitrary ASMX WebService proxy, call MessageQueueWebRequest.RegisterMSMQProtocol() when your app starts, instantiate the proxy, set the Url property of the proxy to msmq://mymachine/private$/myqueue, invoke the proxy and watch how a SOAP message materializes in the queue.

Next step: use a WSE proxy. Works too. I'll leave the receiver logic to your imagination, but that's not really much more than listening to the queue and throwing the message into a WSE 2.0 SoapMethod or throwing it as a raw HTTP request at an ASMX WebMethod or by using a SimpleWorkerRequest on a self-hosted ASP.NET AppDomain (just like WebMatrix's Cassini hosts that stuff).

 

On to "pipelines" in the same context: Pipelines are a very common design pattern and you can find hundreds of variations of them in many projects (likely dozens from MS) which all have some sort of a notion of a pipeline. It's just "pipeline", not Pipeline(tm) 2003 SP1.

User-extensible pipeline models are a nice idea, but I don't think they are very useful to have or consider for most services of the type that Proseware has (and that covers a lot of types).

Frankly, most things that are done with pipelines in generalized architectures that wrap around endpoints (in/out crosscutting pipelines) and that are not about "logging" (which is, IMHO, more useful if done explicitly and in-context) are already in the existing technology stack (Enterprise Services, WSE) or are really jobs for other services.

There is no need to invent another pipeline to process custom headers in ASMX, if you have SoapExtensions. There is no need to invent a new pipeline model to do WS-Security, if you can plug the WSE 2.0 pipeline into the ASMX SoapExtension pipeline already. There is no need to invent a new pipeline model to push a new transaction context on the stack, if you can hook the COM+ context pipeline into your call chain by using ES. There is no need to invent another pipeline for authorization, if you can hook arbitrary custom stuff into the ASP.NET Http Pipeline or the WSE 2.0 pipeline already has or simply use what the ES context pipeline gives you.

I just enumerated four (!) different pipeline models and all of them are in the bits you already have on a shipping platform today and as it happens, all of them compose really well with each other. The fact that I am writing this might show that most of us just use and configure their services without even thinking of them as a composite pipeline model.

"We don't need another Pipeline" (I want Tina Turner to sing that for me).

Of course there's other pipeline jobs, right? Mapping!

Well, mapping between schemas is something that goes against the notion of a well-defined contract of a service. Either you have a well-defined contract or two or three or you don't. If you have a well-defined contract and there's a sender that doesn't adhere to it, it's the job of another service to provide that sort of data negotiation, because that's a business-logic task in and by itself.

Umm ... ah! Validation!

That might be true if schema validation is enough, but validation of data is a business logic level task if things get more complex (like if you need to check a PO against your catalog and need to check whether that customer is actually entitled to get a certain discount bracket). That's not a cross-cutting concern. That's a core job of the app.

Pipelines are for plumbers

 

Now, before I confuse everyone (and because Piyush mentioned it explicitly):

FABRIQ is a wholly different ballgame, because it is precisely a specialized architecture for dynamically distributable, queued (pull-model), one-way pipeline message processing and that does require a bit of a framework, because the platform doesn't readily support it.

We don't really have a notion of an endpoint in FABRIQ that is the default terminal for any message arriving at a node. We just let stuff asynchronously flow in one direction and across machines and handlers can choose to look at, modify, absorb or yield resultant messages into the pipeline as a result of what they do. In that model, the pipeline is the application. Very different story, very different sets of requirements, very different optimization potential and not really about services in the first place (although we stick to the tenets), but rather about distributing work dynamically and about doing so as fast as we can make it go.

Sorry, Piyush! All of that totally wasn't going against your valued comments, but you threw a lit match into a very dry haystack.

 

Wednesday, July 07, 2004 7:52:10 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [4]  | 
After providing some background on what nodes and networks are and how they work, I’ll get to how they are configured. Warning: This post is pretty dense in terms of content ;-)
Friday, June 25, 2004 4:49:33 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
The most fundamental element in FABRIQ is a message handler and handlers are organized in pipelines to process messages. I explain the relationship here.
Wednesday, June 23, 2004 2:06:08 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 

Before I can get into explaining how the FABRIQ works and how to configure it, I need to explain a bit of the terminology we use:

  • A network is the FABRIQ term that's rougly equivalent to an "application". A network consists of an arbitrary number of network-distributed nodes that are running inside the scope of the network. The network creates a common namespace for all of these nodes. Networks are configured using a single XML configuration document that is submitted (or routed via another network) to all hosts that shall host the network's nodes.
  • A node is the FABRIQ term that is roughly equivalent to a "service" or "component". A node is the smallest addressable unit. Every node has a "relative node URI" that is composed of the network name and the node's own name into {network/node}. This relative node URI can be composed with absolute, transport dependent URIs such as http://server/vdir/network/node or msmq://machine/queuename/network/node. Within a network, the runtime is also capable of resolving logical addresses of the form fabriq://network/node and automatically map them to physical addresses. At runtime, a node accepts messages and dispatches them into one of one or more action pipelines. Each node may be guarded by a set of WS-Policy assertions, including Kerberos and X.509 cert authentication and authorization claims. A node may be hosted on a dedicated machine, one a well defined set of machines or on "any" machine within a cluster.
  • An action pipeline is a pipeline that is associated with an action identifier and is roughly equivalent to a "method". An action identifier is a URI as per WS-Addressing's definition of wsa:Action and is mapped to SOAPAction: whenever we go through HTTP. A node must host at least one action pipeline with no limit on the number of action pipelines it can support. An action may declare a set of message schema-types that it understands and those message definitions may be used for validation inbound messages. An action has one or more outbound message routes that are matched against the result message action or destination. Multiple routes may match a message, which causes the message flow to fork. For each route exist one or multiple prioritized routing destinations. If multiple destinations have the same priority, the engine will balance calls across those, otherwise the engine will use the ones with lower priority as backup routes. At the end of each action pipeline is a sender port that sends resulting messages out to their destinations, which may be other FABRIQ nodes or any other external endpoint that understands the respective one-way message being sent.
  • A pipeline is a composition of a sequence of handlers or nested pipelines. Pipelines can be nested in arbitrary depth. Pipelines are strictly unidirectional message processors that have no concept of a "response" on the same thread analogous to a return value (hence all actions are one-way only). A pipeline may or may not be based on a predefinable pipeline-type. Pipeline-types allow the definition of reusable pipelines that can be reused within the same network or (via import) in multiple networks.
  • A handler refers to a software component (a CLR class) implementing a set of interfaces that allow it to be composed into and hosted in a pipeline. Handlers should be designed to perform only very primitive operations that can then be composed into pipelines to implement specific functionality. Built-in handlers include a content-based routing handler and an XSLT transformation handler. Custom handlers may contain any type of logic. A handler receives messages and may consume them, evaluate and annotate them and yield any number of resulting messages. The definition of a handler embeds an XML fragment that allows the handler to configure itself. The actual reference to the CLR class implementing the handler is defined in a handler-type.
  • A handler-type associates a CLR class with a name that can be used to define handlers within a configuration file. It also allows the declaration of a code-base URL for the CLR class. This feature allows the installation of "virgin" FABRIQ runtimes in a cluster and have the runtimes auto-download all the required code for hosting a node from a central code store and therefore dramatically eases deployment and dynamic reconfiguration of a FABRIQ cluster.

In the next couple of postings I will map these terms to concrete config files.

The interesting bit about config is that FABRIQ's configuration mechanism uses the FABRIQ itself. FABRIQ has a predefined (extensible, configurable) network "fabriq" with a node "configuration" that currently defines a single action "configure". The pipeline for that action consists of a single handler (the FabriqConfigurationHandler) and that expects and accepts the configuration files I'll describe over the next days as the body of a message. With that, the configuration mechanism can be secured with policy, or can be embedded into a larger network that does preprocessing or even performs automatic assembly of configuration, or that automatically distributes configuration from a single point across a large cluster of machines.

To be continued ...

Tuesday, June 22, 2004 2:53:29 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [1]  | 

Slowly, slowly I am seeing some light at the end of the tunnel designing the FABRIQ. It’s a very challenging project and although I am having a lot of fun, it’s really much harder work than I initially thought.

Today I’d like to share some details with you on how I am employing the lightweight transaction coordinator “WorkSet” that Steve Swartz and I wrote during this year’s Scalable Applications tour inside the FABRIQ.

The obvious problem with one-way pipeline processing (and a problem with the composition of independent cross-cutting concerns in general) is that failure management is pretty difficult. Once one of the pipeline components fails, other components may already have done work that might not be valid if the processing fails further down through the pipeline. The simplest example of that is, of course, logging. If you log a message as the first stage of a pipeline and a subsequent stage fails, do you want the log entry to remain where it is? The problem is: it depends. So although you might need to see the message before it is being processed by stages further down the pipeline, you can only find out whether it is flagged as success or failure once processing is complete or you may want to discard the log entry altogether on failure.

Before I go into details, I’ll clarify some of the terminology I am using:

·         A message handler is an object that typically implements a ProcessMessage() method and a property Next pointing to the handler that immediately follows it in a chain of handlers.

·         A pipeline hosts a chain of message handlers and has a “head” and a “tail” message handler which link the pipeline with that chain of handlers. The pipeline itself is a message handler itself, so that pipelines can be nested inside pipelines. The FabriqPipeline is a concrete implementation of such a pipeline that has, amongst other things, support for the mechanism described here.

·         A message is an object representing a SOAP message and has a collection of headers, a body (as an XmlReader) and a transient collection of message properties that are only valid as long as the message is in memory.

·         A work set is a lightweight, in-memory 2PC transaction that provides really only the “atomicity” and “consistency” properties out of the well-known “ACID” transaction property set. “Durability” is not a goal here and “isolation” sort of guaranteed, because messages are not shared resources. If external resources are touched, isolation needs to be guaranteed by the enlisted workers. A worker is a lightweight resource manager that can enlist into a work set and provides Prepare/Commit/Abort entry points.

Whenever a new message arrives at a FabriqPipeline, a new work set is created that governs the fault management for processing the respective message. The work set is associated with the message by creating a “@WorkSet” property on the message that references the WorkSet object. The pipeline itself maintains no immediate reference to the work set – it is message-bound.

public class FabriqWorker : IWorker
{
   private Message msg;
   private FabriqMessageHandler handler;
       
    public FabriqWorker(Message msg, FabriqMessageHandler handler)
    {
         msg = msg;
         handler = handler;
    }
       
     bool IWorker.Prepare(bool vote)
    {
      return handler.Prepare(vote, msg );
    }

    void IWorker.Abort()
    {
       handler.Abort( msg );
    }

    void IWorker.Commit()
    {
       handler.Commit( msg );
    }
}

 

The FabriqPipeline does not enlist any workers into the work set directly. Instead, message handlers enlist their workers into the work set as the message flows through the pipeline. A “worker” is an implementation of the IWorker interface that can be enlisted into a work set as a participant. Because the pipeline instance along with all message handler instances shall be reusable and shall be capable of processing several messages concurrently, the worker is not implemented on the handler itself. Instead, workers are implemented as a separate helper class (FabriqWorker). Instances of these worker classes are enlisted into the message’s work set. The worker instance gets a reference to the message it deals with and to the handler which enlisted it into the work set; once the worker is called during the 2 phase commit protocol phases, it calls the message handler’s implementation of Prepare/Abort/Commit.

This way, we can have one “all in one place” implementation of all message-handling on the message handler, but are keeping the transaction dependent state in a very lightweight object; therefore we can share the entire (likely complex) pipeline and handlers for many concurrent transactions, because none of the pipeline is made dependent on the message or transaction state.

public abstract class FabriqMessageHandler :
   IMessageHandler
{
   IMessageHandler next = null;
   public FabriqMessageHandler()
   {
   }
  
   public virtual IMessageHandler Next  
   { get { return next; } set { next = value; }}
  
   bool IMessageHandler.Process(Message msg)
   {
      bool result = this.Preprocess(msg);
      WorkSet workSet = msg.Properties["@WorkSet"] as WorkSet;
      if ( workSet != null )
      {
          workSet.Register(new FabriqWorker( msg, this ) );
      }
      return result;           
   }

   protected bool Forward( Message msg )
   {
      if ( next != null )
      {
         return next.Process( msg );
      }
      else
      {
         return false;
      }
   }

   public virtual bool Preprocess( Message msg )
   {
       return false;
   }

   public abstract bool Prepare( bool vote, Message msg );

   public virtual void Commit( Message msg )
   {
   }

   public virtual void Abort( Message msg )
   {
   }
}

 

 

 

When a message flows into the pipeline, all a transactional message handler does when it gets called in ProcessMessage() is to enlist its worker and return. If the handler is not transactional, it must never fail (such things exist), can ignore the whole work set story and simply forward the message to the Next handler. So, in fact, a transactional message handler will never forward the message in the (non-transactional) ProcessMessage() method.

One problem that the dependencies between message handlers create is that it may be impossible to forward a message to the next message handler in the chain before the message is processed; at least you can’t make a Prepare==true promise for the transaction outcome until you’ve done most work on the message and have verified that all resultant work will very likely succeed. Messages may even be transformed into new messages or split into multiple messages inside the pipeline, so that you can’t do anything meaningful until you are at least preparing.

The resulting contradiction is that a transaction participant cannot perform all work resulting from on a message before it is asked to commit work, but that message handlers following in the sequence may not have received the resulting message until then and may not even be enlisted into the transaction.

To resolve this problem, the FABRIQ pipeline’s transaction management is governed by some special transaction handling rules that are more liberal than those of traditional transaction coordinators.

·         During the first (prepare) phase of the 2-phase commit protocol, workers may still enlist into the transaction. This allows a message handler to forward messages to a not-yet-enlisted message handler during the prepare phase. The worker(s) that is/are enlisted by a subsequent handler because the currently preparing message handler is forwarding one (or multiple) messages to it, is/are appended to the list of workers in the work set and asked to prepare their work once the current message handler is done preparing. We call this method a “rolling enlistment” during prepare.

·         Inside the pipeline, messages are considered to be transient data. Therefore, they may be manipulated and passed on during the Prepare phase, independent of the overall transaction outcome. The tail of the transaction controller pipeline (which is the outermost pipeline object) always enlists a worker into the transaction that will only forward messages to outside parties on Commit() and therefore takes care of hiding the transaction work to guarantee isolation.

·         Changes to any resources external to the message (so, anything that is not contained in message properties or message headers) must be guarded by the transaction workers. This means that all usual rules about guarding intermediate transaction state and transaction resources apply: The ability to make changes must be verified by tentative actions during Prepare() and changes may only be finally performed in Commit(). In case the external resources do not permit tentative actions, the Abort() method must take the necessary steps to undo actions performed during Prepare().

Whenever new messages get created during processing, the message properties (which hold the reference to the work set and, hence, to the current transaction) may be propagated into the newly created message, which causes the processing of these messages to be enlisted in the transaction, or a new or no work set can be created so that further processing of these messages is separate from the ongoing transaction. That’s what we do for failure messages.

During prepare, participants can log failure information to a message property called “@FaultInfo” that contains a collection of FaultInfo objects. If message processing fails, this information is logged and is, if possible, relayed to the message sender’s WS Addressing wsa:FaultTo, wsa:ReplyTo or wsa:From destination (in that order of preference) in a SOAP fault message.

For integration with “real” transactions, the entire work set may act as a DTC resource manager. If that’s so, the 2PC management is done by DTC and the work set acts as an aggregating proxy for the workers toward DTC. It collects its vote from its own enlistments and forwards the Commit/Abort to its enlistments.

Thursday, December 18, 2003 3:31:51 AM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [4]  | 

I see quite a few models for Service Oriented Architectures that employ pipelines with validating "gatekeeper" stages that verify whether inbound messages are valid according to an agreed contract. Validation on inbound messages is a reactive action resulting from distrust of the communication partner's ability to adhere to the contract. Validation on inbound messages shields a service from invalid input data, but seen from the perspective of the entire system, the action occurs too late.

What I see less often is a gatekeeper on outbound channels that verifies whether the currently executing local service adheres to the agreed communication contract. Validation on outbound messages is a proactive action taken in order to create trust with partners about the local service's ability to adhere to a contract. Furthermore, validation on outbound messages is quite often the last chance action before a well-known point of no return: the transaction boundary. If a service is faulty, for whatever reason, it needs to consistently fail and abort transactions instead of emitting incorrect messages that are in violation of the contract. If the service is faulty, it must consequently be assumed that compensating recovery strategies will not function properly and with the desired result.

Exception information that is generated on an inbound channel, especially in asynchronous one-way scenarios, vanishes into a log file at a location/organization that may not even own the sending service that's in violation of the contract. The only logical place to detect contract violations in order to isolate and efficiently eliminate problems is on the outbound, not on the inbound channel. Eliminating problems may mean to fix problems in the software, allow manual correction by an operator/clerk or an automatic rejection/rollback/retry of the operation yielding the incorrect result. None of these corrective actions can be done in a meaningful way by the message recipient. The recipient can shield itself, and that is and remains very important. However, it's just a desperate act of digging oneself in when the last line of defense did already fall.

Tuesday, October 07, 2003 4:08:26 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [4]  | 

Craig Andera on why AOP is broken & Why I surprisingly agree and what I am doing about it

Craig Andera has some interesting thoughts around AOP and specifically mentions the stuff that I have been doing in that area. And he say that it doesn't work and never has, because services are never truly orthogonal and have various interdependencies. In essence he's saying (I guess) that because the interdependencies just create a whole new level of complexity, the AOP approach is broken and it's better to generate explicit code instead of using interception techniques. I partially agree and always put a warning at the end of all of my talks around this issue: There is a limited set of use-cases for which an aspect'ish approach is useful. Security, logging, monitoring, billing, transaction enlistment, and a few others.

One of the biggest problems is service-order. You need to run the decryption and signature verification services before you can even evaulate a header that any other service can use. And even then, when you have something like a transaction-enlistment filter, do you open the transaction before or after a logging service wants to write something to a database? Does the logged data need to stay in the logging store when a transaction aborts? Yes? What if the log is used for billing? No? What if the log is used for diagnostics?

However, being explicit when chaining services together doesn't make things any better than using interception:

try
{
   handleServiceA(msg);
   handleServiceB(msg);
   handleServiceC(msg);
}
catch( Exception e )
{
   // do proper handling
}

is just as broken. I don't think it fundamentally matters much how code gets woven into the call chain. Setting up contexts is just one issue. What's even more difficult is to find a way to deal with errors in the presence of cooperating aspects (or, in more general terms, interception services). What's clear is that there's no way around interception-driven services in a web services world. It's all pipeline-based and, even worse, the pipelines are distributed pipelines of pipelines. It's too simple to say "it's broken, get over it". That doesn't help solving what is an actual problem.

A promising approach is to make aspects/interceptors act like resource managers and coordinate their work using a very lightweight 2PC protocol ("AC" guarantee only; no "ID"). Using 2PC for this approach allows interceptors/aspects to coordinate their work and know about each other before any work actually gets done. I have discussed these issues with a couple of people in depth we put some code together that essentially implements a little, in-memory "DTC" for that purpose. We call it a "WorkSet" instead of a transaction.  There's still some work to be done there, but I think I'll be able to post an example in a little while. Maybe around TechEd Europe time.

Sunday, June 08, 2003 8:40:43 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
Content Pipelines, discussed. Check the comments.
Sunday, April 13, 2003 7:11:44 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 

Content Pipelines?

On the flight from Athens to Madrid this last week I had an idea that I'd like to float in order to see what other people think.

The weblog infrastructure that I am (still, due to little free time) building, has its own aggregation system that flows aggregated content though a pipeline until it's pushed into the storage system. So, what the system does is to pull content from RSS feeds, from Exchange public folders, websites and others sources (the "feed readers" are pluggable), maps everything into a common representation and flows articles through the pipeline. The stages in the pipeline can look at the content and make adjustments (fix up HTML), do filtering (assign categories) and, most importantly, can enrich the content with metadata. So, when the system is pulling information from an RSS source, it may invoke a stage that runs all the words in the article against a dictionary and add links to a site like dictionary.com, it may invoke a stage that find relevant books on amazon.com or a stage to get Google links or even a stage that translates the original Russian text into German for me, and add all that additional information to the "extended metadata" of the article, etc.  Everything is pluggable.

Here's the idea: I really don't want to write the Amazon, Google, Dictionary and Babelfish stages, myself. What I rather want to do is to enlist those sites as web services into my pipeline. Using one-way messaging and <keyword>WS-Routing</keyword> I could say "here's an article, add your metadata to it and send it back <to> me or <via> the next pipeline stage here at my system or <via> elsewhere when you're done". Or I could just walk up to an RSS provider and say, "don't reply to be directly, please route back to me <via> these stages".

So, if such a distributed infrastructure existed, and you'd aggregate this entry "backrouted" through a pipeline of filters provided by Weather.com, Google.com, Dictionary.com and Amazon.com, you'd have the weather for Athens and Madrid, all relevant Google links and books on "content" and/or "pipelines" and WS-Routing, and links to explanations of all non-trivial words in this text. How's that?

Friday, April 11, 2003 11:37:51 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [1]  |