[ 
https://issues.apache.org/jira/browse/STANBOL-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Grisel reopened STANBOL-201:
------------------------------------


I reopen this issue as the generated topic index quality is not good enough for 
accurate text classification (according to test performed on a direct Solr 
instance).

Also work is under way on the pignlproc project to improve this by following a 
hierarchy of "interesting topics" so as to get rid of most of the noisy output. 
However the ntriples serialization need to be extended to be able to export the 
materialized category paths (from the root topics) for each index topic so as 
to make the classifier more efficient. This part is not implement yet.

> Integrate pignlproc outpout (TSV or other format) with the Stanbol indexing 
> tools
> ---------------------------------------------------------------------------------
>
>                 Key: STANBOL-201
>                 URL: https://issues.apache.org/jira/browse/STANBOL-201
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer, Entity Hub
>            Reporter: Olivier Grisel
>            Assignee: Olivier Grisel
>
> Either make pignlproc able to output ntriples or extend the Stanbol indexing 
> tools to be able to index data expressed in a TSV format (e.g. using the solr 
> UpdateCSV handler which is problably well optimized and does not require 
> loading the data into a temporaray TDB store).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to