Erik,
Could you expand on this just a wee bit, perhaps with an example of how to
compute this vector angle?
TIA,
Terry
- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, June 01, 2004 9:39 AM
Subject: Re: similarity of two
/Vector_Space_Search_Engine_Theory.pdf
TIA,
Terry
- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, June 01, 2004 9:39 AM
Subject: Re: similarity of two texts
On Jun 1, 2004, at 9:24 AM, Grant Ingersoll wrote:
Hey Eric,
Eri*K* :)
What did you
On Jun 2, 2004, at 1:39 PM, David Spencer wrote:
Erik,
Could you expand on this just a wee bit, perhaps with an example of
how to
compute this vector angle?
I'm tempted to write the code to see how it works, but FYI this doc
seems to nicely explain the concepts:
Hmm, the term vector does not have to consist of only term frequencies,
does it? To give weight to rare terms, could you create a term vector of
(TF*IDF) values for each term? Then, a distance function would measure
how many terms two vectors have in common, giving weight to how many
rare terms
Gerard Sychay wrote:
Hmm, the term vector does not have to consist of only term frequencies,
does it? To give weight to rare terms, could you create a term vector of
(TF*IDF) values for each term? Then, a distance function would measure
how many terms two vectors have in common, giving weight to
On May 31, 2004, at 2:17 PM, Stefan Groschupf wrote:
Lucene can't help you.
What about using term vectors though? I've been able to do rudimentary
document similarity calculations using the new support in Lucene 1.4.
Search the 'net for more info on term vectors and the formulas needed
Zitiere Erik Hatcher [EMAIL PROTECTED]:
On May 31, 2004, at 2:17 PM, Stefan Groschupf wrote:
Lucene can't help you.
What about using term vectors though? I've been able to do rudimentary
document similarity calculations using the new support in Lucene 1.4.
Ups?! Is it build-in Lucene
On Jun 1, 2004, at 6:06 AM, [EMAIL PROTECTED] wrote:
Zitiere Erik Hatcher [EMAIL PROTECTED]:
On May 31, 2004, at 2:17 PM, Stefan Groschupf wrote:
Lucene can't help you.
What about using term vectors though? I've been able to do
rudimentary
document similarity calculations using the new support
Thanks guys for ur invaluable help and ideas. I'll take a look at Lucene 1.4 and tell
you more whether it could deal with my problem.
-
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger
Hey Eric,
What did you do to calc similarity? I haven't had time, but was thinking of ways to
add the ability to get the similarity score (as calculated when doing a search) given
a term vector (or just a document id). Any ideas on how to approach this would be
appreciated. The scoring in
On Jun 1, 2004, at 9:24 AM, Grant Ingersoll wrote:
Hey Eric,
Eri*K* :)
What did you do to calc similarity?
I computed the angle between two vectors. The vectors are obtained
from IndexReader.getTermFreqVector(docId, field).
I haven't had time, but was thinking of ways to add the ability to
Sorry, about the mispelling, Erik!
Thanks for the insight.
Explain is my friend as an end user, but it, too, is confusing at the code level! At
some point I will have time to dig deeper and step through the scoring code.
[EMAIL PROTECTED] 06/01/04 09:39AM
On Jun 1, 2004, at 9:24 AM, Grant
Erik Hatcher wrote:
On Jun 1, 2004, at 4:41 PM, uddam chukmol wrote:
Well, a question again, how does Lucene compute the score between a
document and a query?
And I might add, thus, this approach to similarity gives more weight to
rare terms that match, which one might want for this kind of
Hi,
I'm a newbie to Lucene and heard that it helps in the information retrieval process.
However, my problem is not really related to the information retrieval but to the
comparison of two texts. I think Lucene may help resolving it.
I would like to have a clue on how to compare two given
Lucene can't help you.
Search for text classification or text clustering.
Browse the tools section @ www.text-mining.org there you will found may
be tools that can help you with this task.
In general some key words for your further search:
Feature extraction from text.
Data mining algorithms
15 matches
Mail list logo