Enhancement Engine for Wikipedia/DBpedia-based topic classification of text
content
-----------------------------------------------------------------------------------
Key: STANBOL-197
URL: https://issues.apache.org/jira/browse/STANBOL-197
Project: Stanbol
Issue Type: New Feature
Components: Enhancer, Entity Hub
Reporter: Olivier Grisel
Assignee: Olivier Grisel
Implementation plan:
Use MoreLikeThis queries on a SolrYard instance with topics indexed by
aggregating the text of abstracts of all entities marked categorized by a given
SKOS topic from DBpedia.
Such an index can be constructed using the pig scripts available at:
https://github.com/ogrisel/pignlproc/tree/master/examples/topic-corpus
In order to perform MoreLikeThis queries using the SolrJ API it is possible to
do the following:
#1 - Define the mlt handles in solrconfig.xml (it's not defined in the example
solrconfig.xml I was using):
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler" />
#2 - with Solrj, access the mlt handler via something similar to the following:
query.setQueryType("/" + MoreLikeThisParams.MLT);
query.set(MoreLikeThisParams.MATCH_INCLUDE, false);
query.set(MoreLikeThisParams.MIN_DOC_FREQ, 1);
query.set(MoreLikeThisParams.MIN_TERM_FREQ, 1);
query.set(MoreLikeThisParams.SIMILARITY_FIELDS, "subject,body");
query.setQuery("Your query here or in my case the unique key field:value");
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira