Re: One Content Item, many representations

Rupert Westenthaler Thu, 20 Oct 2011 05:36:15 -0700

On 20.10.2011, at 14:04, Florent André wrote:
> 
> 
> On 10/20/2011 01:31 PM, Rupert Westenthaler wrote:
>> Hi
>> 
>>> On 10/20/2011 10:35 AM, florent andré wrote:
>>>> With camel Route you have a splitter [2] build in and as a counter part
>>>> an aggregator [3].
>>>> 
>>>> For both you can define particular split/aggregate business logic.
>> 
>> So you use this to send the different parts of an email to different
>> Stanbol Instances and after that you merge the enhancement results
>> together?
> 
> The point is that I added the camel framework *inside* Stanbol - as an 
> implementation of JobManager.
> So all EIP routing capabilities are available inside Stanbol, as a process 
> chain endpoint (eg : engines/chain1, engines/chain2, ...).
> 
> For example, you can build a chain like that :
> from("direct://chain2").to("org.apache.stanbol.engine.MyEngine1").to("org.apache.stanbol.engine.MyEngine2");
> ==> Classical CI output occur
> 
> Or have Stanbol polling info from one of the many camel's component [1] and 
> output the result as the same.
> ex :
> from("imap://imap.my.mail?login=toto&pass=tata").to("org.apache.stanbol.engine.MyEngine1").to("org.apache.stanbol.engine.MyEngine2").to("http://mySite/addContent";);
> ==> pick information from imap, process it and send http request result to 
> the CMS.
>


That looks really great. Especially for users that want to use Stanbol to 
enhance content from different Enterprise information sources (e.g. Mail, CMS, 
RSS feeds …)

Is there a UI/Script language to configure such workflows, or do you need to 
write such things in Java.


> [1] http://camel.apache.org/components.html
> 
>> 
>> On Thu, Oct 20, 2011 at 11:08 AM, florent andré
>> <[email protected]>  wrote:
>>> maybe this one : http://www.semanticdesktop.org/ontologies/nmo/
>>> What do you think about that ? Others more suitable ?
>> 
>> In the case of E-Mails the semanticdesktop NMO ontology looks ok.
>> 
>> I think that the decision on how to model relations between
>> ContentItems should be up to the Stanbol User. Stanbol returns a RDF
>> Graph that connects all enhancements to the ContentItems they are
>> extracted from. Users can than use any Ontology they like to to link
>> such ContentItems together (e.g. in the Business logic of the
>> aggregator) .
> 
> The thing is that in this case, the mix can occur in stanbol...
> And IMO as Stanbol offer a way to store contentItem it could be cool that 
> Stanbol also offer a way to link this CI when suitable.
> 
>> 
>> 
>> Also note that this is related to the following two topics:
>> 
>> 1. Content Adapter Pattern: (User sends PDF; Enhancement Engine asks
>> the ContentAdapter to get the Text version of the PDF). The
>> ContentAdapter could not only support the conversion of Format A>>
>> Format B but also - as in the case of E-Mails - know that there is
>> already a Text AND a HTML version.
> 
> Yep, that a point I have to invest more in Camel...
> they have the type-converter element [2], that could also answer to : how to 
> convert a CI for send it via CMIS ?
> 
> [2] http://camel.apache.org/type-converter.html
> 
>> 
>> 2. Definition of the Stanbol Enhancement Structure (see STANBOL-351)
> 
> that's an hot topic !
> 
> ++
> 
>> [1]. Here one could argue that Stanbol should support parent child
>> relations between ContentItems.
>> 
>> best
>> Rupert
> 
> 
> 
>> 
>>> ++
>>> 
>>>> 
>>>> 
>>>> This idea will be not so hard to implement then :
>>>>  >>  One could also add some additional triples that link the attachment
>>>> with
>>>>  >>  the Mail and that the content of the Mail is available as a text and
>>>>  >>  html version.
>>>> 
>>>> There is some particular / recommended / standard type of triples for
>>>> describe :
>>>> - attachment graph is link to Mail graph
>>>> - content available as text and html
>>>> ?
>>>> 
>>>> Thanks.
>>>> 
>>>> [1] : http://camel.apache.org/enterprise-integration-patterns.html
>>>> [2] : http://camel.apache.org/splitter.html
>>>> [3] : http://camel.apache.org/aggregator2.html
>>>> 
>>>> On 10/20/2011 09:07 AM, Fabian Christ wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> if I remember correctly, we had the idea to allow different chains of
>>>>> enhancement engines to be configured under different URLs. Maybe
>>>>> Florent's use case is interesting for this. Florent could create an
>>>>> engine that is able to split the different content types and then
>>>>> start enhancement with different chains for each content type. If
>>>>> chains can call other chains, it would be possible to define such
>>>>> complex workflows for content enhancement.
>>>>> 
>>>>> Best,
>>>>> - Fabian
>>>>> 
>>>>> 2011/10/19 Rupert Westenthaler<[email protected]>:
>>>>>> 
>>>>>> Hi florent
>>>>>> 
>>>>>> I would create use two enhancement request
>>>>>> 
>>>>>> 1. for the Text and
>>>>>> 2. for the Attachment.
>>>>>> 
>>>>>> and then merge the returned RDF graphs with the enhancements. One
>>>>>> could also add some additional triples that link the attachment with
>>>>>> the Mail and that the content of the Mail is available as a text and
>>>>>> html version.
>>>>>> 
>>>>>> best
>>>>>> Rupert
>>>>>> 
>>>>>> On Wed, Oct 19, 2011 at 6:22 PM, florent andré
>>>>>> <[email protected]>  wrote:
>>>>>>> 
>>>>>>> Hi Stanbolers !
>>>>>>> 
>>>>>>> Imagine a classical html mail with attachment.
>>>>>>> This mail is in fact composed by (at least) 3 parts :
>>>>>>> * text/plain mail body
>>>>>>> * html mail body
>>>>>>> * attachment.
>>>>>>> 
>>>>>>> One html mail + attachment can be considered as one CI - one piece of
>>>>>>> information/knowledge send by a guy.
>>>>>>> 
>>>>>>> In fact, text plain and html will have (pretty much*) the same
>>>>>>> metadatas and
>>>>>>> keeping both is interesting :
>>>>>>> - text plain for processing and annotations positions
>>>>>>> - html for keep the source and be able to enhance the html with rdfa,
>>>>>>> links,...
>>>>>>> 
>>>>>>> And attachment, will mostly have a different metadata, but this
>>>>>>> metadatas
>>>>>>> are in a way related to the mail body's one...
>>>>>>> 
>>>>>>> It could be domageable - IMO - to manage attachment and mail body
>>>>>>> metadatas
>>>>>>> in a totally disconnected way (aka two different Content Item).
>>>>>>> 
>>>>>>> Note that this usecase also match with CMS articles with files (pdf,
>>>>>>> odt...)
>>>>>>> to downloads for further reading.
>>>>>>> 
>>>>>>> And now the real question :
>>>>>>> How can we manage nicely this kind of "composed things" ?
>>>>>>> 
>>>>>>> Insights are very welcome ! :)
>>>>>>> Have a good day
>>>>>>> ++
>>>>>>> 
>>>>>>> 
>>>>>>> * pretty much because when can imagine be able to extract some more
>>>>>>> metatadas from html (color, font size, rdfa, ...)
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> | Rupert Westenthaler [email protected]
>>>>>> | Bodenlehenstraße 11 ++43-699-11108907
>>>>>> | A-5500 Bischofshofen
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>> 
>>

Re: One Content Item, many representations

Reply via email to