Re: Fine Tuning Lucene implementation

2007-07-25 Thread Askar Zaidi
Hey Guys, Thanks for all the responses. I finally got it working with some query modification. The idea was to pick an itemID from the database and for that itemID in the Index, get the scores across 4 fields; add them up and ta-da ! I still have to verify my scores. Thanks a ton, I'll be activ

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Doron Cohen
"Askar Zaidi" wrote: > ... Heres what I am trying to accomplish: > > 1. Iterate over itemID (unique) in the database using one SQL query. > 2. For every itemID found, run 4 searches on Lucene Index. > 3. doTagSearch(itemID) ; collect score > 4. doTitleSearch(itemID...) ; collect score > 5. doS

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Grant Ingersoll
On Jul 25, 2007, at 1:26 PM, Askar Zaidi wrote: Hey guys, One last question and I think I'll have an optimized algorithm. How can I build a query in my program ? This is what I am doing: QueryParser queryParser = new QueryParser("contents", new StandardAnalyzer()); queryParser.setDefaultO

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Askar Zaidi
> I am not sure you need access to the hits. It seems like you just > >>>>> need to make better queries. > >>>>> > >>>>> Is your itemID a unique identifier? If yes, then you shouldn't > >>>>> need > >>>>>

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Grant Ingersoll
Askar Zaidi" <[EMAIL PROTECTED] > To: ; <[EMAIL PROTECTED]> Sent: Wednesday, July 25, 2007 12:39 AM Subject: Re: Fine Tuning Lucene implementation Hey Hira , Thanks so much for the reply. Much appreciate it. Quote: Would it be possible to just include a query clause? - i.e.,

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Askar Zaidi
> >> On Jul 25, 2007, at 10:10 AM, Askar Zaidi wrote: > > >> > > >>> Hey Guys, > > >>> > > >>> I need to know how I can use the HitCollector class ? I am using > > >>> Hits and > > >>> looping over all th

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Askar Zaidi
use the HitCollector class ? I am using > >>> Hits and > >>> looping over all the possible document hits (turns out its 92 times > >>> I am > >>> looping; for 300 searches, its 300*92 !!). Can I avoid this using > >>> HitCollector ? I can't se

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Grant Ingersoll
ine news forms - Original Message - From: "Askar Zaidi" <[EMAIL PROTECTED]> To: ; <[EMAIL PROTECTED]> Sent: Wednesday, July 25, 2007 12:39 AM Subject: Re: Fine Tuning Lucene implementation Hey Hira , Thanks so much for the reply. Much appreciate it. Quote: W

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Askar Zaidi
rstand how its used. > > > > thanks a lot, > > > > Askar > > > > On 7/25/07, Dmitry <[EMAIL PROTECTED]> wrote: > >> > >> Askar, > >> why do you need to add +id:? > >> thanks, > >> dt, > >> www.ejinz.c

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Grant Ingersoll
t;[EMAIL PROTECTED]> To: ; <[EMAIL PROTECTED]> Sent: Wednesday, July 25, 2007 12:39 AM Subject: Re: Fine Tuning Lucene implementation Hey Hira , Thanks so much for the reply. Much appreciate it. Quote: Would it be possible to just include a query clause? - i.e., instead of just contents:, a

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Askar Zaidi
[EMAIL PROTECTED]> > Sent: Wednesday, July 25, 2007 12:39 AM > Subject: Re: Fine Tuning Lucene implementation > > > > Hey Hira , > > > > Thanks so much for the reply. Much appreciate it. > > > > Quote: > > > > Would it be possible to just include a

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Dmitry
Askar, why do you need to add +id:? thanks, dt, www.ejinz.com search engine news forms - Original Message - From: "Askar Zaidi" <[EMAIL PROTECTED]> To: ; <[EMAIL PROTECTED]> Sent: Wednesday, July 25, 2007 12:39 AM Subject: Re: Fine Tuning Lucene implementation

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Hey Hira , Thanks so much for the reply. Much appreciate it. Quote: Would it be possible to just include a query clause? - i.e., instead of just contents:, also add +id: How can I do that ? I see my query as : +contents:harvard +contents:business +contents:review where the search phrase w

Re: Fine Tuning Lucene implementation

2007-07-24 Thread N. Hira
I'm no expert on this (so please accept the comments in that context) but 2 things seem weird to me: 1. Iterating over each hit is an expensive proposition. I've often seen people recommending a HitCollector. 2. It seems that doBodySearch() is essentially saying, do this search and return the

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Grant Ingersoll
Inline below On Jul 24, 2007, at 8:14 PM, Askar Zaidi wrote: Sure. public float doBodySearch(Searcher searcher,String query, int id){ try{ score = search(searcher, query,id); } catch(IOException io){}

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Mark Miller
Are you sure you are using the same Searcher for every search? Don't open a new one unless you have modified the index. You are iterating over every hit with the Hits class. You don't ever want to do this. Use a HitCollector if you want to iterate over more than a hundred or so hits. You will f

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Sure. public float doBodySearch(Searcher searcher,String query, int id){ try{ score = search(searcher, query,id); } catch(IOException io){} catch(ParseException pe){}

Re: Fine Tuning Lucene implementation

2007-07-24 Thread N. Hira
Could you show us the relevant source from doBodySearch()? -h On Tue, 2007-07-24 at 19:58 -0400, Askar Zaidi wrote: > I ran some tests and it seems that the slowness is from Lucene calls when I > do "doBodySearch", if I remove that call, Lucene gives me results in 5 > seconds. otherwise it takes

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Shall I setMergeFactor = 2 ? Slow indexing is not a bother. On 7/24/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > I ran some tests and it seems that the slowness is from Lucene calls when > I do "doBodySearch", if I remove that call, Lucene gives me results in 5 > seconds. otherwise it takes ab

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
I ran some tests and it seems that the slowness is from Lucene calls when I do "doBodySearch", if I remove that call, Lucene gives me results in 5 seconds. otherwise it takes about 50 seconds. But I need to do Body search and that field contains lots of text. The field is . How can I optimize that

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Grant Ingersoll
Sorry, I mistyped. I don't mean the get methods, I mean the doTagSearch, doTitleSearch, etc. As for the stop watch, not really sure what to make of that... Try System.currentTimeMillis()... You can get just the fields you want when loading a Document by using the FieldSelector API on

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Can someone please tell me how to cache results in Lucene ? I know the classes, but I don't know how to go about it. thanks, Askar On 7/24/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > Thanks for the reply. > > I am timing the entire search process with a stop watch, a bit ghetto > style. My get

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Thanks for the reply. I am timing the entire search process with a stop watch, a bit ghetto style. My getXXX methods are: Document doc = hits.doc(i); String str = doc.get("item"); So you can see that I am retrieving the entire document in a search query. Ideally , I'd like to just retrieve the F

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Grant Ingersoll
Where are you getting your numbers from? That is, where are your timers? Are you timing the rs.next() loop, or the individual calls to Lucene? What do the getX methods look like? How big are your queries? How big is your index? Essentially, we need more info to really help you. Fr

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
I have 512MB RAM allocated to JVM Heap. If I double my system RAM from 768MB to say 2GB or so, and give JVM 1.5GB Heap space, will I get quicker results ? Can I expect results which take 1 minute to be returned in 30 seconds with more RAM ? Should I also get a more powerful CPU ? A real server cla

Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Hey Guys, I just finished up using Lucene in my application. I have data in a database , so while indexing I extract this data from the database and pump it into the index. Specifically , I have the following data in the index: where itemID is just a number (primary key in the DB) tags : te