RE: Keyword extraction

Plaatje, Patrick Wed, 26 Nov 2008 05:43:10 -0800

Hi Aleksander,

This was a typo on my end, the original query included a semicolon instead of 
an equal sign. But I think it has to do with my field not being stored and not 
being identified as termVectors="true". I'm recreating the index now, and see 
if this fixes the problem.


Best,

patrick

-----Original Message-----
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] 
Sent: woensdag 26 november 2008 14:37
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

Hi there!
Well, first of all i think you have an error in your query, if I'm not mistaken.
You say http://localhost:8080/solr/select/?q=id=18477975...
but since you are referring to the field called "id", you must say:
http://localhost:8080/solr/select/?q=id:18477975...
(use colon instead of the equals sign).
I think that will do the trick.
If not, try adding the &debugQuery=on at the end of your request url, to see 
debug output on how the query is parsed and if/how any documents are matched 
against your query.
Hope this helps.

Cheers,
  Aleksander



On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick <[EMAIL PROTECTED]> wrote:

> Hi Aleksander,
>
> Thanx for clearing this up. I am confident that this is a way to 
> explore for me as I'm just starting to grasp the matter. Do you know 
> why I'm not getting any results with the query posted earlier then? It 
> gives me the folowing only:
>
> <lst name="moreLikeThis">
>       <result name="18477975" numFound="0" start="0"/> </lst>
>
> Instead of delivering details of the interestingTerms.
>
> Thanks in advance
>
> Patrick
>
>
> -----Original Message-----
> From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
> Sent: woensdag 26 november 2008 13:03
> To: solr-user@lucene.apache.org
> Subject: Re: Keyword extraction
>
> I do not agree with you at all. The concept of MoreLikeThis is based 
> on the fundamental idea of TF-IDF weighting, and not term frequency alone.
> Please take a look at:
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/simil
> ar/MoreLikeThis.html As you can see, it is possible to use cut-off 
> thresholds to significantly reduce the number of unimportant terms, 
> and generate highly suitable queries based on the tf-idf frequency of 
> the term, since as you point out, high frequency terms alone tends to 
> be useless for querying, but taking the document frequency into 
> account drastically increases the importance of the term!
>
> In solr, use parameters to manipulate your desired results:
> http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e2
> 2ec5d1519c456b2c
> For instance:
> mlt.mintf - Minimum Term Frequency - the frequency below which terms 
> will be ignored in the source doc.
> mlt.mindf - Minimum Document Frequency - the frequency at which words 
> will be ignored which do not occur in at least this many docs.
> You can also set thresholds for term length etc.
>
> Hope this gives you a better idea of things.
> - Aleks
>
> On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie <[EMAIL PROTECTED]>
> wrote:
>
>> Dear Partick, I had the same problem with MoreLikeThis function.
>>
>> After  briefly reading and analyzing the source code of moreLikeThis 
>> function in solr, I conducted:
>>
>> MoreLikeThis uses term vectors to ranks all the terms from a document 
>> by its frequency. According to its ranking, it will start to generate 
>> queries, artificially, and search for documents.
>>
>> So, moreLikeThis will retrieve related documents by artificially 
>> generating queries based on most frequent terms.
>>
>> There's a big problem with "most frequent terms"  from documents. 
>> Most frequent words are usually meaningless, or so called function 
>> words, or, people from Information Retrieval like to call them stopwords.
>> However, ignoring  technical problems of implementation of 
>> moreLikeThis function, this approach is very dangerous, since queries 
>> are generated artificially based on a given document.
>> Writting queries for retrieving a document is a human task, and it 
>> assumes some knowledge (user knows what document he wants).
>>
>> I advice to use others approaches, depending on your expectation. For 
>> example, you can extract similar documents just by searching for 
>> documents with similar title (more like this doesn't work in this case).
>>
>> I hope it helps,
>> Best Regards,
>> Vitalie Scurtu
>> --- On Wed, 11/26/08, Plaatje, Patrick 
>> <[EMAIL PROTECTED]>
>> wrote:
>> From: Plaatje, Patrick <[EMAIL PROTECTED]>
>> Subject: RE:  Keyword extraction
>> To: solr-user@lucene.apache.org
>> Date: Wednesday, November 26, 2008, 10:52 AM
>>
>> Hi All,
>> as an addition to my previous post, no interestingTerms are returned 
>> when i execute the folowing url:
>> http://localhost:8080/solr/select/?q=id=18477975&mlt.fl=text&mlt.inte
>> r es tingTerms=list&mlt=true&mlt.match.include=true
>> I get a moreLikeThis list though, any thoughts?
>> Best,
>> Patrick
>>
>>
>>
>>
>
>
>
> --
> Aleksander M. Stensby
> Senior software developer
> Integrasco A/S
> www.integrasco.no
>



--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no

RE: Keyword extraction

Reply via email to