Hi, I had similar thoughts like Olivier. We have to find a balance between a complex configuration and the needs of a "standard" Stanbol user. For simple linear chains I don't see why we want a user to learn the Camel style. This should be configurable straight forward.
IMO the main use case is: - Have different URIs for different chains - At each chain URI you can configure N engines in a user defined order (optional: allow chains to be nested within other chains) That's what I would start with and wait for user feedback if more complex scenarios come up on the mailing list. Best, - Fabian Am 17. Januar 2012 12:08 schrieb Olivier Grisel <[email protected]>: > 2012/1/17 florent andré <[email protected]>: >> Hi Rupert, * >> >> First, thanks a lot for this first draft definition. >> >> I really like the idea of an RDF graph description of "enhancement chain" >> and "engine". >> >> Here come my points : >> >> °°°°°° Entreprise integration patterns (EIP) and Apache Camel °°°°°° >> >> My major remark is about not use a well know, and defined pattern : the >> enterprise integration pattern [1]. >> Behind this "big name", this is all about transferring messages between >> "processing unit". >> Camel is a very generic framework that implements most of EIP [2], where >> messages and processing unit can be almost anything. >> Apply to Stanbol, we can consider ContentItem as message and Engines as >> processing unit. >> Cherry on the cake, camel take care of messages and processing units but >> also machinery to make this in "music" (poll, ordering, grouping, error >> management,...), and provide pretty simple ways to manage this. >> >> Let's stop my Camel "commercial speech" :), and just say that I will really >> try to commit the first version of a Camel enhancer this week. >> >> By the way, as far as I know, Camel don't provide a graph to route (Camel's >> term for chain) or route to graph utility... but there is well define DSL's >> - spring[1], scala,... - so this can be a clue. >> >> °°°°°° Forward building of chain °°°°°° >> >> In you proposal, the chain is build on a "forward" nature : >> you know that A is before B, because B depend on A (property ep:dependsOn). >> >> I don't really like this way of define chain (but it's may be almost my >> personal taste), for mainly two reasons : >> - As a human, building, but more reading, understanding and make a cognitive >> representation of a chain build in that way is pretty difficult, and >> difficulty increase with chain complexity. Forward processing is not a >> natural way for thinking chains. >> - Chain is about processing data, information, message and in usual way >> information come from a point and go to another point... and IMO describe a >> chain is more about describe the path of the message than the inner >> structure of the chain. >> >> °°°°°° Missing features °°°°°° >> >> There is IMO two main missing features in this definition : >> 1) No way to link chains each others ("chain linking") >> 2) No way to select engines (or subchain) depending of a condition >> ("selector") >> >> Let's illustrate this feature with an example : >> >> Imagine we have this 4 chains already defined : >> - MusicChain : define a chain with music specifics engines (thesaurus, ws, >> etc) >> - FoodChain : define a chain with food specifics engines >> - PizzaChain : the better chain for pizza >> - otherStuffChain : chain for the rest >> >> So far so good, but now I have content with no idea on that content... >> I can submit it to all chains (not optimal), or to one random chain (with >> the risk to put a Restaurant story in the musicChain)... >> >> So let's define a CategorisationChain. >> This chain have for example the topic engine and a generic dbpedia enhancer. >> At the end of the chain we have a graph that lead to a with a pretty good >> idea of the content's nature. >> >> Now, with the "linking chain" and "selector" features we can define an >> "UltimateBigChain" like that : >> >> from(input_file) --> categorisationChain >> --> if (graph has "music") --> musicChain. >> --> elseif (graph has "food") --> foodChain --> if (graph has "pizza")--> >> pizzaChain. >> --> otherwise() --> otherStuffChain. > > I am not entirely sure this use case is worth the configuration > complexity that will be induced and also I am not sure the Enhancer > jobmanager should handle this kind of semantic reasoning at its level. > What would not the engines them self be able to handle that directly? > First engine could be a topic extractor and then the following engines > in the chain only process the content items is they found the > previously extracted metadata suiting their own configuration an > behaviors. > > Debugging chain routing issues from a REST client developer who has no > idea on how to debug java code will be hard. I prefer to have explicit > linear chain configurations with the explicit order list of engine ids > in a direct OSGi configuration. > > However that does not prevent us to expose Stanbol engines and chains > as Camel Endpoints [1] for people would like to benefit from the Camel > wide support for various messaging systems (i.e. as an ETL). > > [1] > https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/main/java/org/apache/camel/Endpoint.java > > However I don't think the default the deployment of Stanbol Enhancer > should force the administrator to understand the generic concept model > and configuration format of Apache Camel just to chain three Stanbol > engines by ids. > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel -- Fabian http://twitter.com/fctwitt
