Hi Aleksander, This was a typo on my end, the original query included a semicolon instead of an equal sign. But I think it has to do with my field not being stored and not being identified as termVectors="true". I'm recreating the index now, and see if this fixes the problem.
Best, patrick -----Original Message----- From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] Sent: woensdag 26 november 2008 14:37 To: solr-user@lucene.apache.org Subject: Re: Keyword extraction Hi there! Well, first of all i think you have an error in your query, if I'm not mistaken. You say http://localhost:8080/solr/select/?q=id=18477975... but since you are referring to the field called "id", you must say: http://localhost:8080/solr/select/?q=id:18477975... (use colon instead of the equals sign). I think that will do the trick. If not, try adding the &debugQuery=on at the end of your request url, to see debug output on how the query is parsed and if/how any documents are matched against your query. Hope this helps. Cheers, Aleksander On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick <[EMAIL PROTECTED]> wrote: > Hi Aleksander, > > Thanx for clearing this up. I am confident that this is a way to > explore for me as I'm just starting to grasp the matter. Do you know > why I'm not getting any results with the query posted earlier then? It > gives me the folowing only: > > <lst name="moreLikeThis"> > <result name="18477975" numFound="0" start="0"/> </lst> > > Instead of delivering details of the interestingTerms. > > Thanks in advance > > Patrick > > > -----Original Message----- > From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] > Sent: woensdag 26 november 2008 13:03 > To: solr-user@lucene.apache.org > Subject: Re: Keyword extraction > > I do not agree with you at all. The concept of MoreLikeThis is based > on the fundamental idea of TF-IDF weighting, and not term frequency alone. > Please take a look at: > http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/simil > ar/MoreLikeThis.html As you can see, it is possible to use cut-off > thresholds to significantly reduce the number of unimportant terms, > and generate highly suitable queries based on the tf-idf frequency of > the term, since as you point out, high frequency terms alone tends to > be useless for querying, but taking the document frequency into > account drastically increases the importance of the term! > > In solr, use parameters to manipulate your desired results: > http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e2 > 2ec5d1519c456b2c > For instance: > mlt.mintf - Minimum Term Frequency - the frequency below which terms > will be ignored in the source doc. > mlt.mindf - Minimum Document Frequency - the frequency at which words > will be ignored which do not occur in at least this many docs. > You can also set thresholds for term length etc. > > Hope this gives you a better idea of things. > - Aleks > > On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]> > wrote: > >> Dear Partick, I had the same problem with MoreLikeThis function. >> >> After briefly reading and analyzing the source code of moreLikeThis >> function in solr, I conducted: >> >> MoreLikeThis uses term vectors to ranks all the terms from a document >> by its frequency. According to its ranking, it will start to generate >> queries, artificially, and search for documents. >> >> So, moreLikeThis will retrieve related documents by artificially >> generating queries based on most frequent terms. >> >> There's a big problem with "most frequent terms" from documents. >> Most frequent words are usually meaningless, or so called function >> words, or, people from Information Retrieval like to call them stopwords. >> However, ignoring technical problems of implementation of >> moreLikeThis function, this approach is very dangerous, since queries >> are generated artificially based on a given document. >> Writting queries for retrieving a document is a human task, and it >> assumes some knowledge (user knows what document he wants). >> >> I advice to use others approaches, depending on your expectation. For >> example, you can extract similar documents just by searching for >> documents with similar title (more like this doesn't work in this case). >> >> I hope it helps, >> Best Regards, >> Vitalie Scurtu >> --- On Wed, 11/26/08, Plaatje, Patrick >> <[EMAIL PROTECTED]> >> wrote: >> From: Plaatje, Patrick <[EMAIL PROTECTED]> >> Subject: RE: Keyword extraction >> To: solr-user@lucene.apache.org >> Date: Wednesday, November 26, 2008, 10:52 AM >> >> Hi All, >> as an addition to my previous post, no interestingTerms are returned >> when i execute the folowing url: >> http://localhost:8080/solr/select/?q=id=18477975&mlt.fl=text&mlt.inte >> r es tingTerms=list&mlt=true&mlt.match.include=true >> I get a moreLikeThis list though, any thoughts? >> Best, >> Patrick >> >> >> >> > > > > -- > Aleksander M. Stensby > Senior software developer > Integrasco A/S > www.integrasco.no > -- Aleksander M. Stensby Senior software developer Integrasco A/S www.integrasco.no