Hi Olivier, as I said we have some work done related to automatic categorization.
In the meantime, I have collected some documentation that you may want to have a look at. We are willing to bring this method in Stanbol either by reusing the software directly or re-implementing the methods. The only thing we ask for is that anything comes from it to be open :) I am pretty sure that some services can be reused and integrated, hence you're welcome to review it, and ask us any question and support. We are happy to discuss solutions that can be carried out collaboratively. If there is space for this in the hackathon Alberto can join you. At [1] you can find a digram that describes the workflow implemented. Please, notice that the software addresses NER, Terminology extraction and identity resolution and relies on some of this elaborations for performing automatic categorization. We use Alchemy API that are commercial, but this is not a mandatory piece of the component, it can be replaced with Stanbol Enhancers. Of course, the performances of such step impact on the overall performance. The exploitation of identity resolution makes this approach slightly different from yours, but still I think we can find a good hybrid for improving performances. [2] contains a description of the main functionalities and the methods implemented. You will notice that the index is obtained through customizable SPARQL queries, we see here a possible integration with the EntityHub. [3] is the javadoc. Val [1] http://wit.istc.cnr.it/API/WikiFierAPI/WikiFierFlowChart/WikiFierFlowChart_Page-1.html [2] http://stlab.istc.cnr.it/stlab/STLabWikifier [3] http://wit.istc.cnr.it/API/WikiFierAPI/javadoc/index.html On Nov 18, 2011, at 4:53 PM, Olivier Grisel wrote: > 2011/11/18 valentina presutti <[email protected]>: >>>> On Tue, Nov 15, 2011 at 12:45 PM, Stefane Fermigier <[email protected]> wrote: >>>>> Is online here: >>>>> >>>>> >>>> http://www.slideshare.net/nuxeo/apache-stanbol-and-the-web-of-data-apachecon-2011 >> >> Very nice presentation Olivier :) >> please have a look at [1], it's a demo we showed some time ago. >> It also makes some general classification based on Wikipedia categories. >> I think there's some procedure we use for creating the index that can be >> integrated in this component, it could possibly further improve the results. > > Interesting. Is there a technical description or the source code > available online? > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel
