April 11, 2003
@ 06:37 PM

Content Pipelines?

On the flight from Athens to Madrid this last week I had an idea that I'd like to float in order to see what other people think.

The weblog infrastructure that I am (still, due to little free time) building, has its own aggregation system that flows aggregated content though a pipeline until it's pushed into the storage system. So, what the system does is to pull content from RSS feeds, from Exchange public folders, websites and others sources (the "feed readers" are pluggable), maps everything into a common representation and flows articles through the pipeline. The stages in the pipeline can look at the content and make adjustments (fix up HTML), do filtering (assign categories) and, most importantly, can enrich the content with metadata. So, when the system is pulling information from an RSS source, it may invoke a stage that runs all the words in the article against a dictionary and add links to a site like dictionary.com, it may invoke a stage that find relevant books on amazon.com or a stage to get Google links or even a stage that translates the original Russian text into German for me, and add all that additional information to the "extended metadata" of the article, etc.  Everything is pluggable.

Here's the idea: I really don't want to write the Amazon, Google, Dictionary and Babelfish stages, myself. What I rather want to do is to enlist those sites as web services into my pipeline. Using one-way messaging and <keyword>WS-Routing</keyword> I could say "here's an article, add your metadata to it and send it back <to> me or <via> the next pipeline stage here at my system or <via> elsewhere when you're done". Or I could just walk up to an RSS provider and say, "don't reply to be directly, please route back to me <via> these stages".

So, if such a distributed infrastructure existed, and you'd aggregate this entry "backrouted" through a pipeline of filters provided by Weather.com, Google.com, Dictionary.com and Amazon.com, you'd have the weather for Athens and Madrid, all relevant Google links and books on "content" and/or "pipelines" and WS-Routing, and links to explanations of all non-trivial words in this text. How's that?