Getting cosine similarity of any given two Lucene 5.1 Documents using latest APIs

2015-07-11 Thread Nitish Nitish
Hi All, Greetings, Just started with Lucene 5.1 a month ago for my research. I have a set of documents indexed with term frequencies option enabled during indexing. For given any two documents, I would like to calculate their tfidf cosine similarity could you please point me to the right

Re: raw cosine similarity

2013-07-21 Thread lukai
It's not hard to implement one. Store your term value of your document with payload. Then create your own Query and override the score function with your cosine similarity logic. The problem here is you need to watch out the performance, especially for terms have very high DF. It may dec

raw cosine similarity

2013-07-21 Thread Malgorzata Urbanska
Hi, I would like to calculate raw cosine similarity between query and document. I read documentation about lucene scoring but I'm still confused. Does exist any implementation in Luscen 4.3.0 to do that. If not, what is the easiest way to do this. So far I'm retrieving a TermVector fo

Cosine Similarity Using Two or More Terms`

2013-03-07 Thread Peter Lavin
Dear Users, I'm calculation cosine similarity between two documents using code based on the code at this link... http://sujitpal.blogspot.ch/2011/10/computing-document-similarity-using.html Is it working fine, but I want to use terms from two different fields in my indexed docu

Re: Better Way of calculating Cosine Similarity between documents

2012-05-18 Thread nemeskey . david
and their term frequencies by reading the index and calculate TF-IDF scores vector for each document. Then using TF-IDF vectors, I calculate pairwise cosine similarity between documents using the equation here http://en.wikipedia.org/wiki/Cosine_similarity. This is my problem Say I have two identi

Re: Better Way of calculating Cosine Similarity between documents

2012-05-18 Thread Akos Tajti
vector for each document. > Then using TF-IDF vectors, I calculate pairwise cosine similarity between > documents using the equation here > http://en.wikipedia.org/wiki/Cosine_similarity. > > This is my problem > > Say I have two identical documents “A” and “B” in this collection (A

Better Way of calculating Cosine Similarity between documents

2012-05-18 Thread Kasun Perera
Hi all I’m indexing collection of documents using Lucene specifying TermVerctor at the indexing time. Then I retrieve terms and their term frequencies by reading the index and calculate TF-IDF scores vector for each document. Then using TF-IDF vectors, I calculate pairwise cosine similarity

Re: Weighted cosine similarity calculation using Lucene

2012-04-20 Thread Kasun Perera
calculate the TFIDF values for documents then calculate the cosine similarity using TFIDF. The field.setboost() function will give NO effect on term Frequencies. Is there anyother way to do the boosting that will give effect on term-frequencies? Thanks > > Best > Erick > > On F

Re: Weighted cosine similarity calculation using Lucene

2012-04-20 Thread Erick Erickson
ew Field(docNames[curDocNo], strRdElt, > Field.TermVector.YES);* > > > > I’m using Lucene index .TermFreqVector functions to calculate TFIDF values > and, then calculate cosine similarity between two documents using TFIDF > values. > > > For give weights to Ontology an

Weighted cosine similarity calculation using Lucene

2012-04-20 Thread Kasun Perera
Field.Index.ANALYZED, Field.TermVector.YES);* *Field document = new Field(docNames[curDocNo], strRdElt, Field.TermVector.YES);* I’m using Lucene index .TermFreqVector functions to calculate TFIDF values and, then calculate cosine similarity between two documents using TFIDF values. For give weights to O

Re: get the cosine similarity measure as output results ?

2011-03-26 Thread Patrick Diviacco
Update: I actually don't understand why if the scores are substantially the cosine similarity between query and the docs, such scores are not comparable between queries. Isn't cosine similarity describing the divergence between vectors ? If I have vector A and B (my queries) and vecto

get the cosine similarity measure as output results ?

2011-03-26 Thread Patrick Diviacco
need to find a comparable score across queries, and more specifically the cosine similarity... as similarity measure between my query document and the documents in the collection. could you give me some tip about it ? thanks

Re: applying cosine similarity directly

2009-09-12 Thread Anthony Urso
There is a MoreLikeThis similarity search class in Lucene, it should do what you're looking for. http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/similar/MoreLikeThis.html Cheers, Anthony On Fri, Sep 11, 2009 at 11:25 PM, Alexy Khrabrov wrote: > Given that I have a field for whi

applying cosine similarity directly

2009-09-11 Thread Alexy Khrabrov
Given that I have a field for which term vector was computed and stored, and that field is the text of a document, I'd like to rank a subset of such documents by similarity to a given held-out document, or query, directly using the cosine measure. How can that be done without going through creatin

Simple tf cosine similarity

2009-08-13 Thread Claudio Gennaro
I would like to know if there is a simple way to force Lucene to adopt the simple cosine similarity of the term frequency vectors of the documents and the query for ranking the result. In practice the score sc_i of the document i should be given by: sc_i = (D_i*Q)/(|D_i|*|Q|) where D_i = vector

Simple Cosine Similarity

2009-08-13 Thread Claudio Gennaro
I would like to know if there is a simple way to force Lucene to adopt the simple cosine similarity of the term frequency vectors of the documents and the query for ranking the result. Thank you Claudio - To unsubscribe, e

Re: Cosine similarity

2009-07-25 Thread starz10de
p, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: starz10de >> To: java-user@lucene.apache.org >> Sent: Friday, July 24, 2009 4:50:22 PM >> Subject: Cosine similarity >> >> >> Does lucene use cosine smiliarity measu

Re: Cosine similarity

2009-07-24 Thread Otis Gospodnetic
0de > To: java-user@lucene.apache.org > Sent: Friday, July 24, 2009 4:50:22 PM > Subject: Cosine similarity > > > Does lucene use cosine smiliarity measure to measure the similarity between > the query and the indexed documents? > > Thanks > -- > View this message

Cosine similarity

2009-07-24 Thread starz10de
Does lucene use cosine smiliarity measure to measure the similarity between the query and the indexed documents? Thanks -- View this message in context: http://www.nabble.com/Cosine-similarity-tp24651759p24651759.html Sent from the Lucene - Java Users mailing list archive at Nabble.com

Re: Re: I got the score "0.3044460713 863373" for the cosine similarity of two do cument with the same text content !!

2009-05-08 Thread Kamal Najib
KENIZED)); > then I indexed it and i ran the followed Similarity query to get the > cosine similarity : > query=SimilarityQueries.formSimilarQuery("this expression of > galectin-1 in blood vessel walls was correlated with > vascular",analyzer,"term",null);

Re: I got the score "0.3044460713863373" for the cosine similarity of two document with the same text content !!

2009-05-07 Thread Grant Ingersoll
the folow: I created a doc: doc.add(new Field("term","this expression of galectin-1 in blood vessel walls was correlated with vascular", Field.Store.YES,Field.Index.TOKENIZED)); then I indexed it and i ran the followed Similarity query to get the cosin

Re: Re: I got the score "0.3044460713863373 " for the cosine similarity of two document with the same text content !!

2009-05-06 Thread Kamal Najib
od vessel walls was correlated with vascular", Field.Store.YES,Field.Index.TOKENIZED)); then I indexed it and i ran the followed Similarity query to get the cosine similarity : query=SimilarityQueries.formSimilarQuery("this expression of galectin-1 in blood vessel walls was correlate

Re: I got the score "0.3044460713863373" for the cosine similarity of two document with the same text content !!

2009-05-05 Thread Grant Ingersoll
What is SimilarityQueries? I'd try the explain capabilities to see more. On May 5, 2009, at 2:23 PM, Kamal Najib wrote: hi all, i got the similarity score 0.3044460713863373 between two docs which have the same text content, is it correct? I expected 1.0, hier is my result line: doc:"

I got the score "0.3044460713863373" for the cosine similarity of two document with the same text content !!

2009-05-05 Thread Kamal Najib
hi all, i got the similarity score 0.3044460713863373 between two docs which have the same text content, is it correct? I expected 1.0, hier is my result line: doc:"this expression of galectin-1 in blood vessel walls was correlated with vascular" doc2 :"this expression of galectin-1 in blood v

get the cosine similarity between two docs

2009-05-04 Thread Kamal Najib
Hi all, I try to get the cosine similarity between two docs: I have tried first to create a document for a String like this: Document doc1=new Document(); doc1.add(new Field("term","nodular lesions over years responding kamal najib nodular lesions over years responding&q

Re: Cosine Similarity between two documents, using different zone weights

2008-07-15 Thread Karl Wettin
ms of performance) way to get the Cosine Similarity between two Lucene Documents. I have seen that this can be done with: 1. Converting the document into a query and submitting the query, getting the results and their score. --TOO SLOW if you want this for all documents in a corpus

Cosine Similarity between two documents, using different zone weights

2008-07-14 Thread Asterios Katsifodimos
Hello *, I have been trying to find an *efficient *(in terms of performance) way to get the Cosine Similarity between two Lucene Documents. I have seen that this can be done with: 1. Converting the document into a query and submitting the query, getting the results and their score. --TOO

Re: Straight TF-IDF cosine similarity?

2006-08-29 Thread Jason Polites
Have you looked at the MoreLikeThis class in the similarity package? On 8/30/06, Winton Davies <[EMAIL PROTECTED]> wrote: Hi All, I'm scratching my head - can someone tell me which class implements an efficient multiple term TF.IDF Cosine similarity scoring mechanism? There is

Straight TF-IDF cosine similarity?

2006-08-29 Thread Winton Davies
Hi All, I'm scratching my head - can someone tell me which class implements an efficient multiple term TF.IDF Cosine similarity scoring mechanism? There is clearly the single TermScorer - but I can't find the class that would do a bucketed TF.IDF cosine - i.e. fill an accumulator