RE: Keyword extraction

Scurtu Vitalie Wed, 26 Nov 2008 03:39:18 -0800

Dear Partick, I had the same problem with MoreLikeThis function. 

After  briefly reading and analyzing the source code of moreLikeThis function 
in solr, I conducted:


MoreLikeThis uses term vectors to ranks all the terms from a document
by its frequency. According to its ranking, it will start to generate
queries, artificially, and search for documents. 

So, moreLikeThis will retrieve related documents by artificially generating 
queries based on most frequent terms. 

There's a big problem with "most frequent terms"  from documents. Most frequent 
words are usually meaningless, or so called function words, or, people from 
Information Retrieval like to call them stopwords. However, ignoring  technical 
problems of implementation of moreLikeThis function, this approach is very 
dangerous, since queries are generated artificially based on a given document. 
Writting queries for retrieving a document is a human task, and it assumes some 
knowledge (user knows what document he wants). 

I advice to use others approaches, depending on your expectation. For example, 
you can extract similar documents just by searching for documents with similar 
title (more like this doesn't work in this case). 

I hope it helps,
Best Regards,
Vitalie Scurtu
--- On Wed, 11/26/08, Plaatje, Patrick <[EMAIL PROTECTED]> wrote:
From: Plaatje, Patrick <[EMAIL PROTECTED]>
Subject: RE:  Keyword extraction
To: solr-user@lucene.apache.org
Date: Wednesday, November 26, 2008, 10:52 AM

Hi All,
 
as an addition to my previous post, no interestingTerms are returned
when i execute the folowing url:
 
http://localhost:8080/solr/select/?q=id=18477975&mlt.fl=text&mlt.interes
tingTerms=list&mlt=true&mlt.match.include=true
 
I get a moreLikeThis list though, any thoughts?
 
Best,
 
Patrick

RE: Keyword extraction

Reply via email to