Unfortunately, as it stands the interestingTerms and the debugQuery do not explain why solr chose the matches it did for moreLikeThis. There is currently a task in jira to try to add the information to debugQuery.
The ticket can be found here: https://issues.apache.org/jira/browse/SOLR-860 -Jeff On 11/26/08 5:41 AM, "Plaatje, Patrick" <[EMAIL PROTECTED]> wrote: > Hi Aleksander, > > This was a typo on my end, the original query included a semicolon instead of > an equal sign. But I think it has to do with my field not being stored and not > being identified as termVectors="true". I'm recreating the index now, and see > if this fixes the problem. > > Best, > > patrick > > -----Original Message----- > From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] > Sent: woensdag 26 november 2008 14:37 > To: solr-user@lucene.apache.org > Subject: Re: Keyword extraction > > Hi there! > Well, first of all i think you have an error in your query, if I'm not > mistaken. > You say http://localhost:8080/solr/select/?q=id=18477975... > but since you are referring to the field called "id", you must say: > http://localhost:8080/solr/select/?q=id:18477975... > (use colon instead of the equals sign). > I think that will do the trick. > If not, try adding the &debugQuery=on at the end of your request url, to see > debug output on how the query is parsed and if/how any documents are matched > against your query. > Hope this helps. > > Cheers, > Aleksander > > > > On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick > <[EMAIL PROTECTED]> wrote: > >> Hi Aleksander, >> >> Thanx for clearing this up. I am confident that this is a way to >> explore for me as I'm just starting to grasp the matter. Do you know >> why I'm not getting any results with the query posted earlier then? It >> gives me the folowing only: >> >> <lst name="moreLikeThis"> >> <result name="18477975" numFound="0" start="0"/> </lst> >> >> Instead of delivering details of the interestingTerms. >> >> Thanks in advance >> >> Patrick >> >> >> -----Original Message----- >> From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] >> Sent: woensdag 26 november 2008 13:03 >> To: solr-user@lucene.apache.org >> Subject: Re: Keyword extraction >> >> I do not agree with you at all. The concept of MoreLikeThis is based >> on the fundamental idea of TF-IDF weighting, and not term frequency alone. >> Please take a look at: >> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/simil >> ar/MoreLikeThis.html As you can see, it is possible to use cut-off >> thresholds to significantly reduce the number of unimportant terms, >> and generate highly suitable queries based on the tf-idf frequency of >> the term, since as you point out, high frequency terms alone tends to >> be useless for querying, but taking the document frequency into >> account drastically increases the importance of the term! >> >> In solr, use parameters to manipulate your desired results: >> http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e2 >> 2ec5d1519c456b2c >> For instance: >> mlt.mintf - Minimum Term Frequency - the frequency below which terms >> will be ignored in the source doc. >> mlt.mindf - Minimum Document Frequency - the frequency at which words >> will be ignored which do not occur in at least this many docs. >> You can also set thresholds for term length etc. >> >> Hope this gives you a better idea of things. >> - Aleks >> >> On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]> >> wrote: >> >>> Dear Partick, I had the same problem with MoreLikeThis function. >>> >>> After briefly reading and analyzing the source code of moreLikeThis >>> function in solr, I conducted: >>> >>> MoreLikeThis uses term vectors to ranks all the terms from a document >>> by its frequency. According to its ranking, it will start to generate >>> queries, artificially, and search for documents. >>> >>> So, moreLikeThis will retrieve related documents by artificially >>> generating queries based on most frequent terms. >>> >>> There's a big problem with "most frequent terms" from documents. >>> Most frequent words are usually meaningless, or so called function >>> words, or, people from Information Retrieval like to call them stopwords. >>> However, ignoring technical problems of implementation of >>> moreLikeThis function, this approach is very dangerous, since queries >>> are generated artificially based on a given document. >>> Writting queries for retrieving a document is a human task, and it >>> assumes some knowledge (user knows what document he wants). >>> >>> I advice to use others approaches, depending on your expectation. For >>> example, you can extract similar documents just by searching for >>> documents with similar title (more like this doesn't work in this case). >>> >>> I hope it helps, >>> Best Regards, >>> Vitalie Scurtu >>> --- On Wed, 11/26/08, Plaatje, Patrick >>> <[EMAIL PROTECTED]> >>> wrote: >>> From: Plaatje, Patrick <[EMAIL PROTECTED]> >>> Subject: RE: Keyword extraction >>> To: solr-user@lucene.apache.org >>> Date: Wednesday, November 26, 2008, 10:52 AM >>> >>> Hi All, >>> as an addition to my previous post, no interestingTerms are returned >>> when i execute the folowing url: >>> http://localhost:8080/solr/select/?q=id=18477975&mlt.fl=text&mlt.inte >>> r es tingTerms=list&mlt=true&mlt.match.include=true >>> I get a moreLikeThis list though, any thoughts? >>> Best, >>> Patrick >>> >>> >>> >>> >> >> >> >> -- >> Aleksander M. Stensby >> Senior software developer >> Integrasco A/S >> www.integrasco.no >> > > > > -- > Aleksander M. Stensby > Senior software developer > Integrasco A/S > www.integrasco.no