Re: Better Way of calculating Cosine Similarity between documents

2012-05-18 Thread Akos Tajti
köszi! On Fri, May 18, 2012 at 11:19 AM, Kasun Perera wrote: > Hi all > > I’m indexing collection of documents using Lucene specifying TermVerctor at > the indexing time. Then I retrieve terms and their term frequencies by > reading the index and calculate TF-IDF scores vector for each docum

Re: two fields, the first important than the second

2012-04-27 Thread Akos Tajti
n come up with is using complicated query: > >+(title:hello desc:hello) +(title:world desc:hello) > >(+title:hello +title:world)^10 (+desc:hello +desc:world)^5 > > The must occurrence condition is the same as before. but if hello > world > > are all in title,

Re: two fields, the first important than the second

2012-04-26 Thread Akos Tajti
a look at > http://lucene.apache.org/core/3_6_0/scoring.html#Score > Boosting > > - Jake > > On Thu, Apr 26, 2012 at 3:12 PM, Akos Tajti wrote: > > > Dear List, > > > > we've been struggling the following problem for a while: > > we have two fields:

Re: stored field norm

2012-04-23 Thread Akos Tajti
for Similarity. Note use of the > word "encapsulates". Also note the stuff on loss of precision. > > > -- > Ian. > > > On Mon, Apr 23, 2012 at 12:11 PM, Akos Tajti wrote: > > Dear All, > > > > when indexing an object I create a document that contains a f

stored field norm

2012-04-23 Thread Akos Tajti
Dear All, when indexing an object I create a document that contains a field called title. I set the boost of that field to 60. After the indexing was complete I checked the document using luke. The norm field for it contained 40. Shouldn't this column (the field norm) contain the boost that was se

Re: a higher-level layer above lucene

2012-04-17 Thread Akos Tajti
gt; done through a RESTful API. What I need is a Java API that I can use > > programmatically. > > > > Ákos Tajti > > > > > > > > On 2012.04.16., at 19:58, Erick Erickson > wrote: > > > >> What kind of hiding are you interested in? Solr do

a higher-level layer above lucene

2012-04-16 Thread Akos Tajti
Hi All, I'm looking for a solution that hides the complexity and the low level structure of Lucene (to make it much simpler to use). I came across the Compass Project which looks pretty good. I just want to know if there are any comparable solutions (I didn't find any). Do you know about such solu

Re: Higher rank for closer matches

2011-09-21 Thread Akos Tajti
e them with the classical queries. > > Regards, > Em > > Am 21.09.2011 13:46, schrieb Akos Tajti: > > Dear List, > > > > for multi term expressions I'd like to add higher rank if the matches are > > closer to each other. For example for the search term "like

Higher rank for closer matches

2011-09-21 Thread Akos Tajti
Dear List, for multi term expressions I'd like to add higher rank if the matches are closer to each other. For example for the search term "like eating" the string "i like eating" comes before "I like some eating". Is this possible? Thanks in advance, Ákos Tajti

Re: Question about prefix query

2011-09-06 Thread Akos Tajti
tax.html#Boosting%20a%20Term > > http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_difference_between_field_.28or_document.29_boosting_and_query_boosting.3F > > -- > Ian. > > > On Tue, Sep 6, 2011 at 8:31 AM, Akos Tajti wrote: > > Dear List, > > > > I&

Question about prefix query

2011-09-06 Thread Akos Tajti
Dear List, I'm running a prefix query, something like this: text:dummy*. The problem: in the result some non-exact matches get higher scores than the exact ones. For example the document containing dummythales comes before the document containing dummy exactly. How can this behavious be changed?

Re: Changing index-time boosts without reindexing

2011-09-05 Thread Akos Tajti
e to compute the right byte value. > > Mike McCandless > > http://blog.mikemccandless.com > > On Mon, Sep 5, 2011 at 5:12 AM, Akos Tajti wrote: > > Dear List, > > > > I'd like to test fine-tune the boosts in the search module of our > > application

Index-time boosts are not taken into account

2011-09-05 Thread Akos Tajti
Hi All, I'm setting the boost of our documents in indexing time based upon some properties. When searching, however, it seems that these index time boosts are not taken into account. I'm parsing the query with lucene's queryparser and sending the result directly to the searcher. What might be wron

Changing index-time boosts without reindexing

2011-09-05 Thread Akos Tajti
Dear List, I'd like to test fine-tune the boosts in the search module of our application. The problem is that we have many documents and it takes a lot of time to reindex them. Is there a way to change the index time boosts (afaik it's stored in the fieldNorm) without actually executing the reinde

distance of matches

2011-08-24 Thread Akos Tajti
Dear List, does the distance of the matches for a multi-term query matter? For example if I search for "dog cat", which one of the following matches will get higher rank? "dog, cat, snake, apple" or "dog, apple, snake, cat" I expect the second. Am I right? Thanks in advance, Ákos Tajti

shared IndexSearcher (lucene 3.0.3)

2011-02-25 Thread Akos Tajti
Hi all, in our project we're using lucene in tomcat. To avoid some overhead we have a shared IndexSearcher instance. In the past we had too many open files errors many times. To prevent this the IndexSearcher is closed and reopened after indexing. The shared instance is not closed anywhere else in

ClosedChannelException

2011-02-23 Thread Akos Tajti
Hi, I'm using lucene 3.0.3 on ubuntu and always getting ClosedChannelException: java.nio.channels.ClosedChannelException at sun.nio.ch.FileChannelImpl.ensureOpen(Unknown Source) at sun.nio.ch.FileChannelImpl.read(Unknown Source) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readIntern

Boost value is always 1

2011-02-16 Thread Akos Tajti
I'm trying to set different boost values for different fields. Before adding the document to the index every value is fine. But when I run a search in the explanation every boost is 1 and the final score of the matches is not affected by the boost values set. I set omitNorms to false and index to A

boost value is always 1

2011-02-16 Thread Akos Tajti
ex to ANALYZED. The only solution I found is setting store to YES. Do you have any ideas? Akos Tajti