Re: Feedback about stanbol-414 specification

Fabian Christ Tue, 17 Jan 2012 04:30:12 -0800

Hi,

I had similar thoughts like Olivier. We have to find a balance between
a complex configuration and the needs of a "standard" Stanbol user.
For simple linear chains I don't see why we want a user to learn the
Camel style. This should be configurable straight forward.


IMO the main use case is:

- Have different URIs for different chains
- At each chain URI you can configure N engines in a user defined order
(optional: allow chains to be nested within other chains)

That's what I would start with and wait for user feedback if more
complex scenarios come up on the mailing list.

Best,
 - Fabian

Am 17. Januar 2012 12:08 schrieb Olivier Grisel <[email protected]>:
> 2012/1/17 florent andré <[email protected]>:
>> Hi Rupert, *
>>
>> First, thanks a lot for this first draft definition.
>>
>> I really like the idea of an RDF graph description of "enhancement chain"
>> and "engine".
>>
>> Here come my points :
>>
>> °°°°°° Entreprise integration patterns (EIP) and Apache Camel °°°°°°
>>
>> My major remark is about not use a well know, and defined pattern : the
>> enterprise integration pattern [1].
>> Behind this "big name", this is all about transferring messages between
>> "processing unit".
>> Camel is a very generic framework that implements most of EIP [2], where
>> messages and processing unit can be almost anything.
>> Apply to Stanbol, we can consider ContentItem as message and Engines as
>> processing unit.
>> Cherry on the cake, camel take care of messages and processing units but
>> also machinery to make this in "music" (poll, ordering, grouping, error
>> management,...), and provide pretty simple ways to manage this.
>>
>> Let's stop my Camel "commercial speech" :), and just say that I will really
>> try to commit the first version of a Camel enhancer this week.
>>
>> By the way, as far as I know, Camel don't provide a graph to route (Camel's
>> term for chain) or route to graph utility... but there is well define DSL's
>> - spring[1], scala,... - so this can be a clue.
>>
>> °°°°°° Forward building of chain °°°°°°
>>
>> In you proposal, the chain is build on a "forward" nature :
>> you know that A is before B, because B depend on A (property ep:dependsOn).
>>
>> I don't really like this way of define chain (but it's may be almost my
>> personal taste), for mainly two reasons :
>> - As a human, building, but more reading, understanding and make a cognitive
>> representation of a chain build in that way is pretty difficult, and
>> difficulty increase with chain complexity. Forward processing is not a
>> natural way for thinking chains.
>> - Chain is about processing data, information, message and in usual way
>> information come from a point and go to another point... and IMO describe a
>> chain is more about describe the path of the message than the inner
>> structure of the chain.
>>
>> °°°°°° Missing features °°°°°°
>>
>> There is IMO two main missing features in this definition :
>> 1) No way to link chains each others ("chain linking")
>> 2) No way to select engines (or subchain) depending of a condition
>> ("selector")
>>
>> Let's illustrate this feature with an example :
>>
>> Imagine we have this 4 chains already defined :
>> - MusicChain : define a chain with music specifics engines (thesaurus, ws,
>> etc)
>> - FoodChain : define a chain with food specifics engines
>> - PizzaChain : the better chain for pizza
>> - otherStuffChain : chain for the rest
>>
>> So far so good, but now I have content with no idea on that content...
>> I can submit it to all chains (not optimal), or to one random chain (with
>> the risk to put a Restaurant story in the musicChain)...
>>
>> So let's define a CategorisationChain.
>> This chain have for example the topic engine and a generic dbpedia enhancer.
>> At the end of the chain we have a graph that lead to a with a pretty good
>> idea of the content's nature.
>>
>> Now, with the "linking chain" and "selector" features we can define an
>> "UltimateBigChain" like that :
>>
>> from(input_file) --> categorisationChain
>> --> if (graph has "music") --> musicChain.
>> --> elseif (graph has "food") --> foodChain --> if (graph has "pizza")-->
>> pizzaChain.
>> --> otherwise() --> otherStuffChain.
>
> I am not entirely sure this use case is worth the configuration
> complexity that will be induced and also I am not sure the Enhancer
> jobmanager should handle this kind of semantic reasoning at its level.
> What would not the engines them self be able to handle that directly?
> First engine could be a topic extractor and then the following engines
> in the chain only process the content items is they found the
> previously extracted metadata suiting their own configuration an
> behaviors.
>
> Debugging chain routing issues from a REST client developer who has no
> idea on how to debug java code will be hard. I prefer to have explicit
> linear chain configurations with the explicit order list of engine ids
> in a direct OSGi configuration.
>
> However that does not prevent us to expose Stanbol engines and chains
> as Camel Endpoints [1] for people would like to benefit from the Camel
> wide support for various messaging systems (i.e. as an ETL).
>
>  [1] 
> https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/main/java/org/apache/camel/Endpoint.java
>
> However I don't think the default the deployment of Stanbol Enhancer
> should force the administrator to understand the generic concept model
> and configuration format of Apache Camel just to chain three Stanbol
> engines by ids.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel



-- 
Fabian
http://twitter.com/fctwitt

Re: Feedback about stanbol-414 specification

Reply via email to