On Fri, Sep 9, 2011 at 11:45 AM, Rupert Westenthaler
<[email protected]> wrote:
> Am 09.09.2011, 10:18 Uhr, schrieb Reto Bachmann-Gmür <[email protected]>:
>
>> On Thu, Sep 8, 2011 at 11:33 AM, Rupert Westenthaler
>> <[email protected]> wrote:
>>>
>>> Am 07.09.2011, 10:10 Uhr, schrieb Reto Bachmann-Gmür <[email protected]>:
>>>
>>>> Hi Aldo,
>>>>
>>>> The functionality the enhancer engines currently offer is with no
>>>> doubt useful, especially when dealing with content from different
>>>> domains and from different sources.
>>>>
>>>> Integrating the enhancer engines in CQ5 I noticed that a usecase for
>>>> it is simply to associate content items to a category out of a small
>>>> set of predefined ones. It seems that for this usecase it is quite
>>>> inefficient to get all concepts just to see if one happens to match a
>>>> local category/concept. Also it is frustrating if the system keeps
>>>> repeating the same error ("no, our press releases have nothing to do
>>>> with physics even if our spokesperson's name is Einstein").
>>>>
>>> Using NamedEntities for the categorization of Documents is not very
>>> efficient.
>>
>> Yet, even for enrichment we have the Einstein-problem.
>>
>>> For doing this the Topic Classification Engine Olivier is currently
>>> working
>>> on should be much better suited.
>>
>> I don't find any information about this. The Enricher engine is for
>> adding annotations to parts of the document and the topic
>> classification engine for tagging a text as whole?
>>
>
> Yes. Olivier showed it in Paris. I think he is still working on it.
> AFAIK it is currently not in the SVN.
>
>>
>>>
>>>
>>>> My proposal is to extend the Engine API so that clients may give
>>>> feedback on which Enhancements were of actual use and the engines may
>>>> use this information. This would allow an engine delegating to other
>>>> engines to weigh those engines based on their success-rate. Also it
>>>> would allow engines based other trainable text classification
>>>> algorithms such as naive bayes.
>>>>
>>> This would be an interesting feature however I can not clearly see how
>>> it could work out (especially in a stateless fashion).
>>
>> Well the learning outcome should be instance-wide and not just limited
>> to a user "session". Of course if the engines doesn't store the
>> content and the content is not associated to a URI it has to be
>> resubmitted with the feedback, i.e. in the simplest case one submits a
>> text and a set of categories this text isn't related with.
>>>
>>>
>>>> My comment related to Wernher's remark on having useful limited
>>>> content understand for customers in niche businesses. If I understand
>>>> things correctly the proposal to understand verb is a proposal for a
>>>> new enhancer engine but wouldn't require a change of the api (the
>>>> interface org.apache.stanbol.enhancer.servicesapi.EnhancementEngine),
>>>
>>> This is true. The API will not need to be extended. However this Engine
>>> will use some new Annotation types that are currently not part of the
>>> Enhancement Structure.
>>>
>>>> My impression is, that to provide useful content classification for
>>>> typical cms customers our api should support trainable engines and for
>>>> this we should extend the API.
>>>>
>>> Maybe add a Trainable interface that can be optionally implemented by
>>> an EnhancementEngine. It would than be possible to send Feedback to those
>>> engines.
>>
>> Yes I considered this, however this would require a client to check
>> the type and distinguish between instances of EnhancementEngine and
>> TrainableEnhancementEngine, on the hand if they are trainable and
>> implementation that doesn't support training can just provide an empty
>> implementation of the new method, I think having only one interface is
>> to be preferred to keep api and client implementation simpler.
>
> My intend was to add an RESTful service that allows to send feedback to
> the Stanbol Enhancer. The Enhancer would than forward feedback only to
> Engines that also implement the TrainableEngine interface.
> Clients do not interact with single EnhancementEngines anyway. The
> JobManager takes care of that.
True, so it doesn't make a big difference. I don't see why
EnhancementJobManager doesn't extend EnhancementEngine, it seems that
for many clients it make no difference if they interact with a
delegating JobManager or a single EnhancementEngine

>
> Having an own interface would also allow to register components that are
> only interested in feedback (e.g. an component that manages a
> controlled vocabulary of all Entities used by Users and may even adapt
> the ranks by the number of usages.)
> Such component might not be specific to a specific EnhancementEngine, but
> to specific type of Feedback provided by Users (e.g. a confirmed Suggestion)
I wasn't suggesting to implement something for a specific
EnhancementEngine but to enhance the api to allow such a feedback
needed for training an Engine or for training the JobManager (causing
the delegates to be weighted). I don't mind having this in an
interface extending EnhancementEngine but having a completely separate
interface seems to make it less clear what the interface is about. I
don't see the usecase for training something that doesn't also at
least potentially delivers results so I'd recommend implementing
something easy to understand rather than choosing the most generic
approach.

Cheers,
Reto

Reply via email to