[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-06-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: index.presharedstores.nocfs.zip index.presharedstores.cfs.zi

[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-06-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: LUCENE-843.take9.patch OK, I attached a new version (take9) of the patch th

[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-06-15 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: LUCENE-843.take8.patch Attached latest patch. I think this patch is ready

[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-06-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: LUCENE-843.take7.patch Latest working patch attached. I've cutover to usin

[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-05-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: LUCENE-843.take6.patch Attached latest patch. I'm now working towards simp

[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-30 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: LUCENE-843.take5.patch I attached a new iteration of the patch. It's quite

[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-02 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: LUCENE-843.take4.patch Another rev of the patch. All tests pass except dis

[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-28 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: LUCENE-843.take3.patch Another rev of the patch: * Got thread concurren

[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: LUCENE-843.take2.patch New rev of the patch: * Fixed at least one data c

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Michael McCandless
"Grant Ingersoll" <[EMAIL PROTECTED]> wrote: > I've only been loosely following this... > > Do you think it is possible to separate the stored/term vector > handling into a separate patch against the current trunk? This seems > like a quick win and I know it has been speculated about before.

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Grant Ingersoll
I've only been loosely following this... Do you think it is possible to separate the stored/term vector handling into a separate patch against the current trunk? This seems like a quick win and I know it has been speculated about before. On Mar 23, 2007, at 12:00 PM, Michael McCandless wro

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Ning Li
On 3/22/07, Michael McCandless <[EMAIL PROTECTED]> wrote: Yes the code re-computes the level of a given segment from the current values of maxBufferedDocs & mergeFactor. But when these values have changed (or, segments were flushed by RAM not by maxBufferedDocs) then the way it computes level no

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Ning Li
On 3/22/07, Michael McCandless <[EMAIL PROTECTED]> wrote: Right I'm calling a newly created segment (ie flushed from RAM) level 0 and then a level 1 segment is created when you merge 10 level 0 segments, level 2 is created when merge 10 level 1 segments, etc. That is not how the current merge p

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Michael McCandless
"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On 3/22/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Merging is costly because you read all data in then write all data > > out, so, you want to minimize for byte of data in the index in the > > index how many times it will be "serviced" (read i

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Yonik Seeley
On 3/22/07, Michael McCandless <[EMAIL PROTECTED]> wrote: Merging is costly because you read all data in then write all data out, so, you want to minimize for byte of data in the index in the index how many times it will be "serviced" (read in, written out) as part of a merge. Avoiding the re-w

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Michael McCandless
"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > > We say that > > developers should not rely on docIDs but people still seem to rely on > > their monotonic ordering (even though they change). > > Yes. If the benefits of removing that guarantee are large enough, we > could consider dumping it... but

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Yonik Seeley
On 3/22/07, Michael McCandless <[EMAIL PROTECTED]> wrote: We say that developers should not rely on docIDs but people still seem to rely on their monotonic ordering (even though they change). Yes. If the benefits of removing that guarantee are large enough, we could consider dumping it... but

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Steven Parkes
e.org Subject: Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > : > Actually is #2 a hard requirement? > : > : A lot of Lucene users depend on having document number correspond to &

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Erik Hatcher
On Mar 22, 2007, at 8:13 PM, Marvin Humphrey wrote: On Mar 22, 2007, at 3:18 PM, Michael McCandless wrote: Actually is #2 a hard requirement? A lot of Lucene users depend on having document number correspond to age, I think. ISTR Hatcher at least recommending techniques that require it.

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Michael McCandless
"Chris Hostetter" <[EMAIL PROTECTED]> wrote: > : > Actually is #2 a hard requirement? > : > : A lot of Lucene users depend on having document number correspond to > : age, I think. ISTR Hatcher at least recommending techniques that > : require it. > > "Corrispond to age" may be missleading as it

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Chris Hostetter
: > Actually is #2 a hard requirement? : : A lot of Lucene users depend on having document number correspond to : age, I think. ISTR Hatcher at least recommending techniques that : require it. "Corrispond to age" may be missleading as it implies that the actual docid has meaning ... it's more tha

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Steven Parkes
> But when these values have > changed (or, segments were flushed by RAM not by maxBufferedDocs) then > the way it computes level no longer results in the logarithmic policy > that it's trying to implement, I think. That's right. Parts of the implementation assume that the segments are logarithmic

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Michael McCandless
Steven Parkes wrote: >> Right I'm calling a newly created segment (ie flushed from RAM) >> level 0 and then a level 1 segment is created when you merge 10 >> level 0 segments, level 2 is created when merge 10 level 1 segments, >> etc. > > This isn't the way the current code treats things. I'm not

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Marvin Humphrey
On Mar 22, 2007, at 3:18 PM, Michael McCandless wrote: Actually is #2 a hard requirement? A lot of Lucene users depend on having document number correspond to age, I think. ISTR Hatcher at least recommending techniques that require it. Do the loose ports of Lucene (KinoSearch, Ferret,

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Steven Parkes
> Right I'm calling a newly created segment (ie flushed from RAM) level > 0 and then a level 1 segment is created when you merge 10 level 0 > segments, level 2 is created when merge 10 level 1 segments, etc. This isn't the way the current code treats things. I'm not saying it's the only way to loo

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Michael McCandless
On Thu, 22 Mar 2007 13:34:39 -0700, "Steven Parkes" <[EMAIL PROTECTED]> said: > > EG if you set maxBufferedDocs to say 1 but then it turns out based > > on RAM usage you actually flush every 300 docs then the merge policy > > will incorrectly merge a level 1 segment (with 3000 docs) in with th

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Steven Parkes
> EG if you set maxBufferedDocs to say 1 but then it turns out based > on RAM usage you actually flush every 300 docs then the merge policy > will incorrectly merge a level 1 segment (with 3000 docs) in with the > level 0 segments (with 300 docs). This is because the merge policy > looks at th

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Michael McCandless
"Steven Parkes" <[EMAIL PROTECTED]> wrote: > * Merge policy has problems when you "flush by RAM" (this is true > even before my patch). Not sure how to fix yet. > > Do you mean where one would be trying to use RAM usage to determine when > to do a flush? Right, if you have your indexer m

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Steven Parkes
PROTECTED] Sent: Thursday, March 22, 2007 10:09 AM To: java-dev@lucene.apache.org Subject: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents [ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira .plugin.system.issuetabpanels:al

[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: LUCENE-843.patch I'm attaching a patch with my current state. NOTE: this i