Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-08 Thread Otis Gospodnetic
Hi Oliver, I think Yonik simply misunderstood you in that earlier email. Have you tried modifying that FieldSortedHitQueue class and making the appropriate method(s) synchronized? It sounds like that would fix the issue. If it does, please let us know. Otis - Original Message From: [EM

RE: Poor performance "race condition" in FieldSortedHitQueue

2006-08-08 Thread Oliver Hutchison
> The nature of the field cache itself means that the first > sort on a particular field can take a long, long time. > Synchronization won't really help that much. I think you may be misunderstanding my description (probably because it was not clear enough :). The issue is not that the first se

Re: About the use of HitCollector

2006-08-08 Thread hu andy
Hey, Ryan, Thanks for your reply. The scenario is I use a custom Filter which get some information from a database table which consists of hundreds of thousands of rows. I use the IndexSearcher.search(query, filter, hitcollector). I found it was consumed more time with filter than that without no

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-08 Thread Paul Smith
On 09/08/2006, at 12:47 PM, Yonik Seeley wrote: The nature of the field cache itself means that the first sort on a particular field can take a long, long time. Synchronization won't really help that much. I'm not so sure I agree with that. If you have, say, 4 threads concurrently starti

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-08 Thread Yonik Seeley
The nature of the field cache itself means that the first sort on a particular field can take a long, long time. Synchronization won't really help that much. There are two ways around this... 1) incrementally generate the field cache (hard... not currently supported by Lucene) 2) warm searchers

Re: "Field Grouping" query restrained to same field on a 'multi'-field'

2006-08-08 Thread Chris Hostetter
: That would also score documents higher the closer together the words : appeared (which may or may not be desirable). if it's not desirable, it could be "fixed" by overriding the sloppyFreq method of your Similarity. -Hoss -

Re: Linear search using reader vs. scorer implementation

2006-08-08 Thread Paul Borgermans
Hi Mathias I delved a bit further in the lucene docs and the book "Lucene in action" (Ch 6.1, pp194-201): an alternative approach may be the use of a custom sort with a dedicated implementation of the SortComparatorSource, taking as arguments the array of vectors for which a "distance" needs to b

Re: Multiple lock files

2006-08-08 Thread Michael McCandless
Simon Willnauer wrote: The index writer creates the lock in its constructor via the public FSDirectory makeLock method. regards simon On 8/8/06, Leandro Saad <[EMAIL PROTECTED]> wrote: I'm trying to use them, and I maybe be wrong, but I can't unlock the dir before I create the Directory right?

Re: "Field Grouping" query restrained to same field on a 'multi'-field'

2006-08-08 Thread Yonik Seeley
On 8/8/06, Laurent Hoss <[EMAIL PROTECTED]> wrote: Suppose having an Index containing Lucene documents, having multiple fields (equally) named 'paragraph'. Now I want to make a "Field Grouping" query (described in: http://lucene.apache.org/java/docs/queryparsersyntax.html ) "paragraph:( word1 AN

Re: Multiple lock files

2006-08-08 Thread Simon Willnauer
The index writer creates the lock in its constructor via the public FSDirectory makeLock method. regards simon On 8/8/06, Leandro Saad <[EMAIL PROTECTED]> wrote: I'm trying to use them, and I maybe be wrong, but I can't unlock the dir before I create the Directory right? Do you know if the lock

Re: Multiple lock files

2006-08-08 Thread Leandro Saad
I want to use the same lock dir, but remove only the associated lock file when I start the application. :: Leandro On 8/8/06, Simon Willnauer <[EMAIL PROTECTED]> wrote: You can start your applications with a system property set: "org.apache.lucene.lockDir" to specify your lock directory Hope

Re: Multiple lock files

2006-08-08 Thread Simon Willnauer
You can start your applications with a system property set: "org.apache.lucene.lockDir" to specify your lock directory Hope that helps... regards Simon On 8/8/06, Leandro Saad <[EMAIL PROTECTED]> wrote: Yeah. But how do I know if a lock file is related to an index or app? I don't want to remov

Re: Multiple lock files

2006-08-08 Thread Leandro Saad
I'm trying to use them, and I maybe be wrong, but I can't unlock the dir before I create the Directory right? Do you know if the lock is created when I create the Directory? :: Leandro On 8/8/06, Michael Busch <[EMAIL PROTECTED]> wrote: > Yeah. But how do I know if a lock file is related to a

Re: Multiple lock files

2006-08-08 Thread Michael Busch
Yeah. But how do I know if a lock file is related to an index or app? I don't want to remove a lock file that another app is using Leandro, check out the static method of IndexReader: unlock(Directory). Link: http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#unl

Re: Multiple lock files

2006-08-08 Thread Leandro Saad
Yeah. But how do I know if a lock file is related to an index or app? I don't want to remove a lock file that another app is using :: Leandro On 8/8/06, Michael McCandless <[EMAIL PROTECTED]> wrote: > How do I remove lucene locks (startup) if there are multiple applications > using lucene on

Re: Lucene hits.length()

2006-08-08 Thread Erick Erickson
I'll take a stab at it When are you opening/closing your searcher? When you open a searcher, you get a snapshot of the index at that instant, and subsequent modifications aren't visible until you open a new searcher (at least I think I've got this right). And I'm sure this also interacts with

Re: More like this returning similarities that are too generic

2006-08-08 Thread Chad Hardin
You're soo right! I'm totally new to lucene (and text analyses, searching etc), but now that you showed me I "get it". Thank you so much for your reply. Chad On Aug 8, 2006, at 12:45 AM, Chris Hostetter wrote: I've never used MoreLikeThis myself, but based on how i know it works, your

"Field Grouping" query restrained to same field on a 'multi'-field'

2006-08-08 Thread Laurent Hoss
Hi Suppose having an Index containing Lucene documents, having multiple fields (equally) named 'paragraph'. Now I want to make a "Field Grouping" query (described in: http://lucene.apache.org/java/docs/queryparsersyntax.html ) "paragraph:( word1 AND word2 )" retrieving only documents where the

Re: Multiple lock files

2006-08-08 Thread Michael McCandless
How do I remove lucene locks (startup) if there are multiple applications using lucene on the same box and all use the same lock dir? The lock files are just files, so you can up and remove them. However: this is in general dangerous and should not be necessary. Lucene uses the lock files to

Re: About the use of HitCollector

2006-08-08 Thread Ryan O'Hara
Hey Andy, If you have enough RAM, try using FieldCache: String[] fieldYouWant = FieldCache.DEFAULT.getStrings (searcher.getIndexReader(), "fieldYouWant"); searcher.search(query, new HitCollector(){ public void collect(int doc, float score){ doWhatYouWant(fieldYouWant[do

Multiple lock files

2006-08-08 Thread Leandro Saad
Hi all. How do I remove lucene locks (startup) if there are multiple applications using lucene on the same box and all use the same lock dir? -- Leandro Rodrigo Saad Cruz CTO - InterBusiness Technologies db.apache.org/ojb guara-framework.sf.net xingu.sf.net

Lucene hits.length()

2006-08-08 Thread Marcus Falck
I have noticed some strange behavior when searching my lucene index. I'm adding 500.000 docs to an index. MergeFactor = 10 MinMerge = 5000 When 4 have been added ( just before the first 10 * 5000 merge ) the hits.length() is reporting around 1000 hits for a keyword (which by the wa

Re: About the use of HitCollector

2006-08-08 Thread Simon Willnauer
One thing is a bit confusing... On 8/8/06, Simon Willnauer <[EMAIL PROTECTED]> wrote: Hey Andy, It would be interesting how many ids you include into your query. I do have just a couple of usergroups for that. I create a BooleanQuery BooleanQuery q = new BooleanQuery(); q.add(new BooleanClaus

Re: About the use of HitCollector

2006-08-08 Thread Simon Willnauer
Hey Andy, It would be interesting how many ids you include into your query. I do have just a couple of usergroups for that. I create a BooleanQuery BooleanQuery q = new BooleanQuery(); q.add(new BooleanClause(new TermQuery(new Term("idfield","id")),BooleanClause.Occur.MUST)); QueryFilter filt

Re: Stemmer Implementation Strategy - feedback?

2006-08-08 Thread eks dev
I would suggest you to have a look at Egothor stemmer (http://www.egothor.org/book/bk01ch01s06.html), can be trained rather easily (if your only use of "roots" is for searching) I have only heard of it as a good thing, never tried it On Aug 4, 2006, at 1:29 PM, Marios Skounakis wrote: > > > >

Re: More like this returning similarities that are too generic

2006-08-08 Thread Chris Hostetter
I've never used MoreLikeThis myself, but based on how i know it works, your problem probably has more to do with the size of your test corpus and th frequency of the words in your docs then by the size of the docs themselves. : There's still the issue of the queries from MoreLikeThis not : return