[jira] Created: (LUCENE-770) CfsExtractor tool

2007-01-11 Thread Otis Gospodnetic (JIRA)
CfsExtractor tool - Key: LUCENE-770 URL: https://issues.apache.org/jira/browse/LUCENE-770 Project: Lucene - Java Issue Type: New Feature Components: Index Affects Versions: 2.1 Reporter: Otis Gospodn

[jira] Updated: (LUCENE-770) CfsExtractor tool

2007-01-11 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated LUCENE-770: Attachment: LUCENE-770.patch > CfsExtractor tool > - > > Ke

[jira] Updated: (LUCENE-741) Field norm modifier (CLI tool)

2007-01-11 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated LUCENE-741: Attachment: LUCENE-741.patch > Field norm modifier (CLI tool) > ---

[jira] Commented: (LUCENE-140) docs out of order

2007-01-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463872 ] Michael McCandless commented on LUCENE-140: --- Phew! I'm glad we finally got to the bottom of this one. Tha

[jira] Commented: (LUCENE-140) docs out of order

2007-01-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463875 ] Michael McCandless commented on LUCENE-140: --- Actually, this reminds me that, as of lockless commits, there

[jira] Resolved: (LUCENE-140) docs out of order

2007-01-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-140. --- Resolution: Fixed Fix Version/s: 2.1 Resolving this now, finally (I'll move th

Re: Beyond Lucene 2.0 Index Design

2007-01-11 Thread Grant Ingersoll
Hi Jeff, Wondering if you (and/or others) would be interested in taking a look at https://issues.apache.org/jira/browse/LUCENE-662 and vetting the new interfaces, etc. to see if you could come up w/ a prototype implementation. This would help move along 662 as it would sort out some of t

IndexWriter forceOptimize() ?

2007-01-11 Thread Otis Gospodnetic
Hi, What do people here think about adding forceOptimize() to IndexWriter? public synchronized void forceOptimize() throws IOException { flushRamSegments(); int minSegment = segmentInfos.size() - mergeFactor; mergeSegments(minSegment < 0 ? 0 : minSegment); } I need it

Re: Lockless commits -- great stuff!

2007-01-11 Thread Michael McCandless
Marvin Humphrey wrote: I've finished integrating the lockless commits concept into KinoSearch, and I wanted to pop in and say that it's a very nice piece of work. Real outside-the-box thinking -- or at least outside my box. :) Nothing better than an innovation which solves long-standing pro

Re: IndexWriter forceOptimize() ?

2007-01-11 Thread Doron Cohen
Otis Gospodnetic <[EMAIL PROTECTED]> wrote on 11/01/2007 06:25:59: > Hi, > > What do people here think about adding forceOptimize() to IndexWriter? > > public synchronized void forceOptimize() throws IOException { > flushRamSegments(); > int minSegment = segmentInfos.size() - mergeFa

Re: IndexWriter forceOptimize() ?

2007-01-11 Thread Otis Gospodnetic
Hi Doron, Yeah, you are right, adding that (empty) Doc would force the optimize to actually optimize. I was trying to avoid doing that and forceOptimize() looked cleaner but I'm not sure if others would agree. Are there other situations where one would want to force index optimization eve

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-11 Thread Artem Vasiliev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463969 ] Artem Vasiliev commented on LUCENE-769: --- Hi guys! Thanks for value comments. What a feedback! :) I'd like to

Re: Lockless commits -- great stuff!

2007-01-11 Thread Marvin Humphrey
On Jan 11, 2007, at 6:48 AM, Michael McCandless wrote: I too am happy that we have no more commit lock :) Not just that. :) No more lock directory, since we can put write.lock in the index directory itself. No more lock file name munging, since lock files from different indexes no lon

[jira] Updated: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-11 Thread Artem Vasiliev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Vasiliev updated LUCENE-769: -- Attachment: StoredFieldSorting.patch > [PATCH] Performance improvement for some cases of sorted

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-11 Thread Artem Vasiliev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463996 ] Artem Vasiliev commented on LUCENE-769: --- Renamed classes as Hoss proposed. Tried to hide DocFieldCachingIndexRe

Re: IndexWriter forceOptimize() ?

2007-01-11 Thread Chris Hostetter
: What do people here think about adding forceOptimize() to IndexWriter? I like the idea, but i don't have any value add to offer to the discussion of wether the implimentation you suggest is "safe" ... in particular i notice that the current optimize method is an iterative loop, presumably to ma

Re: [jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2007-01-11 Thread Chris Hostetter
: Oops... I had the impression that compiling with compliance level 1.4 is : sufficient to prevent this, but guess I need to read again what that : compliance level setting guarantees exactly. NOTE: see LUCENE-718 for an explanation of your problem, and a possible solution i've been toying with.

Re: IndexWriter forceOptimize() ?

2007-01-11 Thread robert engels
I agree with the boolean addition. optimize(false) is a request to maybe optimize, optimize(true) always should optimize to a single segment optimize(false) might check some parameter as to the maximum number of segments allowed before an actual optimize if performed. On Jan 11, 2007, at

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-11 Thread Chuck Williams (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464012 ] Chuck Williams commented on LUCENE-769: --- I have this same issue with a constantly changing large index where us

Re: IndexWriter forceOptimize() ?

2007-01-11 Thread Chris Hostetter
: optimize(false) is a request to maybe optimize, optimize(true) always : should optimize to a single segment : : optimize(false) might check some parameter as to the maximum number : of segments allowed before an actual optimize if performed. maybe it should be optimize(int minSegmentCountToSkip

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-11 Thread Artem Vasiliev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464015 ] Artem Vasiliev commented on LUCENE-769: --- Ok guys, I think I'm finished on this. Feel free to include it in Luce

Re: [jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-11 Thread robert engels
I would assume the "incremental" field cache would be very similar to my "incremental" query filter. I have found this to be the #1 performance improvement I've been able to make with Lucene - especially for highly dynamic indexes. I have attached again to this email the code: On Jan 11

Re: IndexWriter forceOptimize() ?

2007-01-11 Thread Doron Cohen
Otis Gospodnetic <[EMAIL PROTECTED]> wrote on 11/01/2007 09:30:08: > > I'd actually appreciate it if you could look at https://issues. > apache.org/jira/browse/LUCENE-741 . The code can completely remove > norms for a given field, but this assumes a pre-.nrm index structure > (.fN field norms file

[jira] Updated: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-11 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] robert engels updated LUCENE-769: - Attachment: QueryFilter.java > [PATCH] Performance improvement for some cases of sorted search >

Re: Beyond Lucene 2.0 Index Design

2007-01-11 Thread Marvin Humphrey
On Jan 9, 2007, at 6:25 AM, Dalton, Jeffery wrote: e. f. ],...[docN, freq ,]) How do you build an efficient PhraseScorer to work with an impact- sorted posting list? The way PhraseScorer currently works is: find a doc that contains all terms, then see if the terms occur consecutively in

Re: IndexWriter forceOptimize() ?

2007-01-11 Thread Yonik Seeley
On 1/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: maybe it should be optimize(int minSegmentCountToSkip), with optimize(0) forcing an optimize even if there is only 1 segment, and optimize() remaining undeprecated and using a "sensible default" (whatever that may be ... 1 perhaps?) If we a

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-11 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464038 ] Hoss Man commented on LUCENE-769: - Artem: while i agree with Yonik/Chuck's comments about your performance tests pro

Re: Beyond Lucene 2.0 Index Design

2007-01-11 Thread jian chen
I also got the same question. It seems it is very hard to efficiently do phrase based query. I think most search engines do phrase based query, or at least appear to be. So, like in google, the query result must contain all the words user searched on. It seems to me that the impacted-sorted list

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-11 Thread Chuck Williams (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464055 ] Chuck Williams commented on LUCENE-769: --- Robert, Could you attach your current implementation of reopen() as w

[jira] Updated: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-11 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] robert engels updated LUCENE-769: - Attachment: IndexReaderUtils.java > [PATCH] Performance improvement for some cases of sorted sear

[jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-11 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464056 ] robert engels commented on LUCENE-769: -- The IndexReaderUtils I posted is not compilable - there are a few more c

Re: Beyond Lucene 2.0 Index Design

2007-01-11 Thread Marvin Humphrey
On Jan 11, 2007, at 2:30 PM, jian chen wrote: It seems to me that the impacted-sorted list makes sense if you are trying to do pure vector space based ranking. This is from what I have read from the research papers. They all talk about how to optimize the vector space model using this imp

Re: [jira] Commented: (LUCENE-140) docs out of order

2007-01-11 Thread Chris Hostetter
: I think we should deprecate the "create" argument to : FSDirectory.getDirectory(*) and leave only the create argument in : IndexWriter's constructors. Am I missing something? Is there are a : reason not to do this? i actual wonder about hte problem from the oposite direction: to me it makes s

Re: [jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

2007-01-11 Thread Chris Hostetter
: Chuck Williams commented on LUCENE-769: : --- : : Robert, : : Could you attach your current implementation of reopen() as well? The : attachment did not come through in your java-dev message today, or the : one from 12/11. I'd like to look at an incremental

Re: IndexWriter forceOptimize() ?

2007-01-11 Thread Otis Gospodnetic
Yeah, I actually had: public int segments() { return segmentInfos.size(); } in my IndexReader, but then erased it precisely because I thought this was exposing too much about the impl. I think optimize(int) that Chris mentioned exposes too much. I thought about having optimize(boolean force) i

Re: IndexWriter forceOptimize() ?

2007-01-11 Thread Otis Gospodnetic
Doron, Maybe my browser is misbehaving, but I don't see your comments in http://issues.apache.org/jira/browse/LUCENE-741 . Didn't see the JIRA email with them either... Otis - Original Message From: Doron Cohen <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Thursday, Januar

Re: Beyond Lucene 2.0 Index Design

2007-01-11 Thread Ming Lei
Marvin, Several posts back on this thread, I talked about an algorithm of impact-sorted posting list for conjunctive boolean query. Your concerns on impact-sorting in boolean retrieval model is valid. But practically, the approximation (as in my original post) should work well enough for large corp

Re: IndexWriter forceOptimize() ?

2007-01-11 Thread Chris Hostetter
: I think optimize(int) that Chris mentioned exposes too much. I thought : about having optimize(boolean force) in place of optimize(), but then : we'd have to deprecate, so I opted for forceOptimize() that, I feel : exposes a little less. i have no strong feelings about exposing the number of s

Re: IndexWriter forceOptimize() ?

2007-01-11 Thread Otis Gospodnetic
One day I read email in a different order, I miss replies like this. If optimize(boolean force) looks more attractive than optimizeForce(), that's fine by me. I just want to be able to force the cfs index, even if it's already optimized, to expand. Getting it to have a single segment is just a

[jira] Commented: (LUCENE-741) Field norm modifier (CLI tool)

2007-01-11 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464105 ] Doron Cohen commented on LUCENE-741: I was looking at what it would take to make this work with .nrm file as well

Re: IndexWriter forceOptimize() ?

2007-01-11 Thread Doron Cohen
Otis Gospodnetic <[EMAIL PROTECTED]> wrote on 11/01/2007 20:17:31: > Doron, > > Maybe my browser is misbehaving, but I don't see your comments in > http://issues.apache.org/jira/browse/LUCENE-741 . Didn't see the > JIRA email with them either... > > Otis Otis, your browser is perfect, just that

[jira] Updated: (LUCENE-741) Field norm modifier (CLI tool)

2007-01-11 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-741: --- Attachment: for.nrm.patch > Field norm modifier (CLI tool) > -- > >

[jira] Updated: (LUCENE-741) Field norm modifier (CLI tool)

2007-01-11 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-741: --- Attachment: (was: for.nrm.patch) > Field norm modifier (CLI tool) > -

[jira] Updated: (LUCENE-741) Field norm modifier (CLI tool)

2007-01-11 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-741: --- Attachment: for.nrm.patch > Field norm modifier (CLI tool) > -- > >

[jira] Commented: (LUCENE-741) Field norm modifier (CLI tool)

2007-01-11 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464109 ] Doron Cohen commented on LUCENE-741: Attached for.nrm.patch was very noisy - so I replaced it with one created wi

Re: Beyond Lucene 2.0 Index Design

2007-01-11 Thread Marvin Humphrey
On Jan 11, 2007, at 8:37 PM, Ming Lei wrote: But practically, the approximation (as in my original post) should work well enough for large corpus and relevancy-driven retrieval. The saving on disk access for large corpus (implies very long posting list) will be huge by impact-sorted posting