Re: Scores between words. Boosting?

2009-03-19 Thread liat oren
I am looking for a quick solution to expand queries so they will look for synonms as well. The same way WordNet is doing - it will looks for other words that mean the same as written in the query. So Synonyms and WordNet are better categories to describe what I need. Any idea? Currently what I d

Re: Pagination with MultiSearcher

2009-03-19 Thread Amin Mohammed-Coleman
Hi I've implemented the solution using the PageHitCounter from the link and I have noticed that in certain instances I get a 0 score for queries like "document OR aspectj". has anyone else experienced this? Cheers Amin On Mon, Mar 16, 2009 at 8:07 PM, Amin Mohammed-Coleman wrote: > Hi > I've c

Boolean query ...

2009-03-19 Thread Dragon Fly
Let's say I have 3 fields in a document (Type, FirstName, and LastName). For example: Document 0 -- Type: Public FirstName: John LastName: Deere If I execute the following boolean query, document 0 is returned. Type:Public OR FirstName:Candy OR LastName:Deere

boosting query

2009-03-19 Thread m.harig
Hello all, i've a search application which uses lucene-2.3.0 , and my application running for a banking domain. Am indexing some banking urls as an input and am searching some keywords. What my doubt is when i search "cards", the less count keyword url comes up. I mean , for exa

Re: Scores between words. Boosting?

2009-03-19 Thread Grant Ingersoll
On Mar 19, 2009, at 5:13 AM, liat oren wrote: I am looking for a quick solution to expand queries so they will look for synonms as well. The same way WordNet is doing - it will looks for other words that mean the same as written in the query. So Synonyms and WordNet are better categories

Re: boosting query

2009-03-19 Thread Erick Erickson
This might help you understand Lucene scoring better... http://lucene.apache.org/java/2_4_1/scoring.html The number of occurrences is not the sole determinant of a document's score and boosting won't change that. But I have to ask why counting words is important to you. What problem are you try

Re: boosting query

2009-03-19 Thread Grant Ingersoll
First off, I would start by using Lucene's explain functionality to see why one result appears before the other. The explain method will tell you all the factors that go into scoring each of your results, as it goes beyond just term frequency. Finally, you might find http://www.lucidimagina

help for indexer.java update

2009-03-19 Thread Meral Ozkaya
Hi I try to update indexer.java. I want to add new code. I added and build nutch with eclipse IDE. But result is not change. Could you help me? Meral

help for indexer.java update

2009-03-19 Thread Meral Ozkaya
Hi I try to update indexer.java. I want to add new code. I added and build nutch with eclipse IDE. But result is not change. Could you help me? Meral

Re: boosting query

2009-03-19 Thread Andrzej Bialecki
Grant Ingersoll wrote: First off, I would start by using Lucene's explain functionality to see why one result appears before the other. The explain method will tell you all the factors that go into scoring each of your results, as it goes beyond just term frequency. Finally, you might find h

Re: boosting query

2009-03-19 Thread Meral Ozkaya
Hi On Thu, Mar 19, 2009 at 4:44 PM, Andrzej Bialecki wrote: > Grant Ingersoll wrote: > >> First off, I would start by using Lucene's explain functionality to see >> why one result appears before the other. The explain method will tell you >> all the factors that go into scoring each of your resu

All the values of a particular field

2009-03-19 Thread Paul J. Lucas
The Lucene FAQ has a Q, "How do I retrieve all the values of a particular field that exists within an index, across all documents?" and gives some code. However, it looks like that code returns only unique values. How does one get all values including duplicates? - Paul -

Research Question

2009-03-19 Thread bruce
Hi... This may/may not have anything to do with Lucene/Nutch, but I figured I'd ask/post anyway. I'm working on a project, dealing with courses/classes on college sites. I'm trying to figure out how to create an automated process where I can create a process to link a given faculty member to a gi

Re: All the values of a particular field

2009-03-19 Thread Erick Erickson
See TermEnum/TermDocs. On Thu, Mar 19, 2009 at 12:41 PM, Paul J. Lucas wrote: > The Lucene FAQ has a Q, "How do I retrieve all the values of a particular > field that exists within an index, across all documents?" and gives some > code. However, it looks like that code returns only unique v

Re: Research Question

2009-03-19 Thread Erick Erickson
Well, it seems that your problem, as stated, is not soluble. You've stated that the instructors have the same name but given us no clue as to what other information you have access to. So there's nothing to distinguish them. So what other information do you have about the particular instructors? I

LUCENE-1453 not fixed?

2009-03-19 Thread Chris Salem
I'm using Lucene 2.4.1 and I'm still getting an AlreadyClosedException when trying to reopen an IndexReader. Here's the code I'm using, in case I'm doing something wrong, there isn't an error if I don't close the old reader: String indexPath = "C:\\Lucene\\test"; IndexReader reader = IndexReader

Re: Pagination with MultiSearcher

2009-03-19 Thread Amin Mohammed-Coleman
Hi Please ignore the problem I raised. User error ! Sorry Amin On 19 Mar 2009, at 09:41, Amin Mohammed-Coleman wrote: Hi I've implemented the solution using the PageHitCounter from the link and I have noticed that in certain instances I get a 0 score for queries like "document OR as

Re: LUCENE-1453 not fixed?

2009-03-19 Thread Michael McCandless
Hmm... the code looks OK. Though: can multiple threads call that method at the same time? And: could in-flight searches be using the reader, when you close it? If instead of opening with String indexPath, you pass in an FSDirectory that you opened, do you still hit the AlreadyClosedExcepti

Re: Research Question

2009-03-19 Thread Grant Ingersoll
What you are trying to do is called record linkage. There is a fair amount of info in the Lucene archives on this, see http://www.lucidimagination.com/search/?q=record+linkage As Erick says, you will need more info than just the name to do this. I doubt you will be able to get completely

Re: LUCENE-1453 not fixed?

2009-03-19 Thread Chris Salem
Changing it to use the FSDirectory instead of the indexPath string seems to work. thanks alot! Sincerely, Chris Salem - Original Message - To: java-user@lucene.apache.org From: Michael McCandless Sent: 3/19/2009 2:17:33 PM Subject: Re: LUCENE-1453 not fixed? Hmm... the code looks

Re: LUCENE-1453 not fixed?

2009-03-19 Thread Michael McCandless
Hmm that's good that it resolves your issue, but not good in that it means the bug may in fact still be there. Can you answer the other questions below? Mike Chris Salem wrote: Changing it to use the FSDirectory instead of the indexPath string seems to work. thanks alot! Sincerely, Chr

Re: All the values of a particular field

2009-03-19 Thread Paul J. Lucas
Uhm, a code snippet, perhaps?? Thanks. - Paul On Mar 19, 2009, at 10:26 AM, Erick Erickson wrote: See TermEnum/TermDocs. On Thu, Mar 19, 2009 at 12:41 PM, Paul J. Lucas wrote: The Lucene FAQ has a Q, "How do I retrieve all the values of a particular field that exists within an in

Re: LUCENE-1453 not fixed?

2009-03-19 Thread Chris Salem
sure. the method that does the reopening of the index is synchronized. it would be possible for in-flight searches to be using the reader, but that wasn't the problem since I was the only one testing it. here's the full exception that was thrown: org.apache.lucene.store.AlreadyClosedException

Re: All the values of a particular field

2009-03-19 Thread Paul J. Lucas
Actually, the code I had previously was: TermEnum e = reader.terms( new Term( fieldName, "" ) ); Collection values = new LinkedList(); while ( fieldName.equals( e.term().field() ) ) { String text = e.term().text(); if ( text.length() > 0 ) values.add( text

[ANN] Luke 0.9.2 release

2009-03-19 Thread Andrzej Bialecki
(sorry for cross-posting) Hi all, I'm happy to announce a new release of Luke, the Lucene Index Toolbox. As usually, you can obtain it from here: http://www.getopt.org/luke This release upgrades Luke to the Lucene 2.4.1 jars. * New features and improvements: o Added term counts p

Re: LUCENE-1453 not fixed?

2009-03-19 Thread Michael McCandless
That exception looks like it's from 2.4.0, not 2.4.1. Can you double check your CLASSPATH? Mike Chris Salem wrote: sure. the method that does the reopening of the index is synchronized. it would be possible for in-flight searches to be using the reader, but that wasn't the problem since

RE: Research Question

2009-03-19 Thread bruce
hi erick/grant.. and others.. the basic issue as best i can state it is that i have an initial query, that returns some data, along with a faculty name. the name can be firts,last. i can also search via the web staff search function, such that i return one or more possible faculty, who might be t