Re: recall/precision with lucene

2008-02-10 Thread Doron Cohen
Take a look at the quality package under contrib/benchmark. Regards, Doron On Sat, Feb 9, 2008 at 2:59 AM, Panos Konstantinidis [EMAIL PROTECTED] wrote: Hello I am a new lucene user. I am trying to calculate the recall/precision of a query and I was wondering if lucene provides an easy way

Re: problem with Whitespace analyzer

2008-02-10 Thread Doron Cohen
Should be the parenthesis which are part of the query syntax Try escaping - \( \) Also see http://lucene.apache.org/java/2_3_0/queryparsersyntax.html#Escaping%20Special%20Characters Doron On Sun, Feb 10, 2008 at 9:03 AM, saikrishna venkata pendyala [EMAIL PROTECTED] wrote: Hi, I am facing

Offsets-highlight newbie question

2008-02-10 Thread Katya
Dear All, I'm relatively new to Java and especially new to Lucene. I study Computational Linguistics and this term we are required to make projects using Lucene. My choice was to write a user-friendly application, which could allow user to crate collections of text files and then search through

RE: IndexWriter: setRAMBufferSizeMB

2008-02-10 Thread spring
Thank you. So I will call flush in 2.3 (and may lose data when machine dies) and commit() in 2.4+ (here a sync() will save the data). -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: Freitag, 8. Februar 2008 21:01 To: java-user@lucene.apache.org Subject:

Re: IndexWriter: setRAMBufferSizeMB

2008-02-10 Thread Michael McCandless
Exactly! Mike [EMAIL PROTECTED] wrote: Thank you. So I will call flush in 2.3 (and may lose data when machine dies) and commit() in 2.4+ (here a sync() will save the data). -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: Freitag, 8. Februar 2008 21:01

Re: problem with Whitespace analyzer

2008-02-10 Thread Erik Hatcher
QueryParser uses special syntax, which can get in the way, for operators and grouping, etc. Parenthesis are part of that special syntax, and need to be backslash escaped for QueryParser to skip treating them as grouping operators, for example: Ajit_\(Agarkar\) Erik On Feb 10,

Re: Extracting terms from a query splitting a phrase.

2008-02-10 Thread Doron Cohen
PhraseQuery.extractTerms() returns the terms making up the phrase, and so it is not adequate for 'finding' a single term that represents the phrase query, one that represents the searched entire text. It seems you are trying to obtain a string that can be matched against the displayed text for

Re: Distributed Indexes

2008-02-10 Thread Ruslan Sivak
So nobody's run into anything like this before? The need to share the index between many copies of the app possibly running on multiple servers? Russ Ruslan Sivak wrote: The app does other things then search the index. I'm basically using ColdFusion for the website and have four instances

Re: Faceting with payloads

2008-02-10 Thread Karl Wettin
9 feb 2008 kl. 00.53 skrev Matt Ronge: On Feb 8, 2008, at 11:17 AM, Karl Wettin wrote: 6 feb 2008 kl. 23.10 skrev Matt Ronge: I may index the token house maybe found in different places with different types. If the user query contains house, I want to report the number of instances of

Re: Distributed Indexes

2008-02-10 Thread Cedric Ho
On Feb 9, 2008 12:07 AM, Ruslan Sivak [EMAIL PROTECTED] wrote: The app does other things then search the index. I'm basically using ColdFusion for the website and have four instances running on two servers for load balancing. Each app does the searches, and the search times are small, the

Re: large term vectors

2008-02-10 Thread Cedric Ho
Is it a single index ? My index is also in the 200G range, but I never managed to get a single index of size 20G and still get acceptable performance (in both searching and updating). So I split my indexes into chunks of 10G I am curious as to how you manage such a single large index. Cedric

Re: large term vectors

2008-02-10 Thread Briggs
So, I have a question about 'splitting indexes'. I see people say this all over, but how have people been handling this. I'm going to start a new thread, and there probably was one back in the day, but I am going to fire it up again. But, how did you do it? On Feb 10, 2008 9:18 PM, Cedric Ho

Re: problem with Whitespace analyzer

2008-02-10 Thread saikrishna venkata pendyala
Hi, Thanks a lot Cohen and Erik. Yes \) works, I tried it even before. But I was wondering why the Whitespace analyzer is breaking the string at (. Now it's clear, thnks once again. --Saikrishna. On Feb 10, 2008 9:17 PM, Erik Hatcher [EMAIL PROTECTED] wrote: QueryParser uses special syntax,

Re: large term vectors

2008-02-10 Thread Cedric Ho
I guess it would be quite different for different apps. For me, I do index update on a single machine: index each incoming documents into one chunk according to some rule to ensure even distribution. Then copy all the updated indexes to some other machines for searching. Each machine will then