Enhancement Engine for Wikipedia/DBpedia-based topic classification of text 
content
-----------------------------------------------------------------------------------

                 Key: STANBOL-197
                 URL: https://issues.apache.org/jira/browse/STANBOL-197
             Project: Stanbol
          Issue Type: New Feature
          Components: Enhancer, Entity Hub
            Reporter: Olivier Grisel
            Assignee: Olivier Grisel


Implementation plan:

Use MoreLikeThis queries on a SolrYard instance with topics indexed by 
aggregating the text of abstracts of all entities marked categorized by a given 
SKOS topic from DBpedia.

Such an index can be constructed using the pig scripts available at:
https://github.com/ogrisel/pignlproc/tree/master/examples/topic-corpus

In order to perform MoreLikeThis queries using the SolrJ API it is possible to 
do the following:

#1 - Define the mlt handles in solrconfig.xml (it's not defined in the example
solrconfig.xml I was using):

<requestHandler name="/mlt" class="solr.MoreLikeThisHandler" />

#2 - with Solrj, access the mlt handler via something similar to the following:

query.setQueryType("/" + MoreLikeThisParams.MLT);
query.set(MoreLikeThisParams.MATCH_INCLUDE, false);
query.set(MoreLikeThisParams.MIN_DOC_FREQ, 1);
query.set(MoreLikeThisParams.MIN_TERM_FREQ, 1);
query.set(MoreLikeThisParams.SIMILARITY_FIELDS, "subject,body");
query.setQuery("Your query here or in my case the unique key field:value");

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to