2011/11/23 valentina presutti <[email protected]>:
> Hi Olivier,
> as I said we have some work done related to automatic categorization.
> In the meantime, I have collected some documentation that you may want to
> have a look at.
> We are willing to bring this method in Stanbol either by reusing the
> software directly or re-implementing the methods.
> The only thing we ask for is that anything comes from it to be open :)
> I am pretty sure that some services can be reused and integrated, hence
> you're welcome to review it, and ask us any question and support. We are
> happy to discuss solutions that can be carried out collaboratively. If there
> is space for this in the hackathon Alberto can join you.
> At [1] you can find a digram that describes the workflow implemented.
> Please, notice that the software addresses NER, Terminology extraction and
> identity resolution and relies on some of this elaborations for performing
> automatic categorization.
> We use Alchemy API that are commercial, but this is not a mandatory piece of
> the component, it can be replaced with Stanbol Enhancers.
> Of course, the performances of such step impact on the overall performance.
> The exploitation of identity resolution makes this approach slightly
> different from yours, but still I think we can find a good hybrid for
> improving performances.
> [2] contains a description of the main functionalities and the methods
> implemented. You will notice that the index is obtained through customizable
> SPARQL queries, we see here a possible integration with the EntityHub.
> [3] is the javadoc.
> Val
> [1]
> http://wit.istc.cnr.it/API/WikiFierAPI/WikiFierFlowChart/WikiFierFlowChart_Page-1.html
> [2] http://stlab.istc.cnr.it/stlab/STLabWikifier
> [3] http://wit.istc.cnr.it/API/WikiFierAPI/javadoc/index.html
> On Nov 18, 2011, at 4:53 PM, Olivier Grisel wrote:

Ok thanks for the links. I think I will go on with my version first:
it might be a bit more complicated to build the initial Solr index but
it's probably much faster at classification time (1 single full-text
query, albeit a large one) vs. many steps involving sub-queries in the
system you describe.

Once implemented it would be worth comparing the results on the same
dataset to make some qualitative evaluation of the output.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Reply via email to