[jira] Created: (LUCENE-842) ParallelMultiSearcher memory leak

2007-03-22 Thread Thomas Connolly (JIRA)
ParallelMultiSearcher memory leak - Key: LUCENE-842 URL: https://issues.apache.org/jira/browse/LUCENE-842 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.1

Positions vs. Term Vectors

2007-03-22 Thread Matt Chaput
Hi, another abstract implementation question: Per Term Position (prox) data vs. Per Doc Term Vectors. Belt and Suspenders? Can't Term Vectors effectively (performantly) replace position data for doing phrase matches? Is there another use of position data that term vectors doesn't satisfy? D

[jira] Updated: (LUCENE-842) ParallelMultiSearcher memory leak

2007-03-22 Thread Thomas Connolly (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Connolly updated LUCENE-842: --- Environment: Windows XP SP2 and Red Hat EL 4 (was: Windows XP SP1 and Red Hat EL 4) > Paral

[jira] Commented: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.

2007-03-22 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483086 ] Karl Wettin commented on LUCENE-841: Thanks Paul! I'll update it the patch to contain the old UTF8 as comments b

Re: Positions vs. Term Vectors

2007-03-22 Thread karl wettin
22 mar 2007 kl. 10.42 skrev Matt Chaput: Per Term Position (prox) data vs. Per Doc Term Vectors. Belt and Suspenders? Can't Term Vectors effectively (performantly) replace position data for doing phrase matches? Is there another use of position data that term vectors doesn't satisfy? Does

[jira] Created: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Michael McCandless (JIRA)
improve how IndexWriter uses RAM to buffer added documents -- Key: LUCENE-843 URL: https://issues.apache.org/jira/browse/LUCENE-843 Project: Lucene - Java Issue Type: Improvement

[jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-843: -- Attachment: LUCENE-843.patch I'm attaching a patch with my current state. NOTE: this i

Re: improving RAM usage by IndexWriter

2007-03-22 Thread Michael McCandless
"Chris Hostetter" <[EMAIL PROTECTED]> wrote: > A dirty broken patch is more still better then no patch at all -- worst > case scenerio: nothing happens; typical scenerio: you get some eyeballs > reading your patch even if they can't apply it; best case scenerio: > someone else is really excited b

[jira] Commented: (LUCENE-842) ParallelMultiSearcher memory leak

2007-03-22 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483264 ] Doron Cohen commented on LUCENE-842: It is not clear to me how this code demonstrates a mem leak. The program its

[jira] Commented: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.

2007-03-22 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483272 ] Hoss Man commented on LUCENE-841: - Karl: better still would be static constants using the unicode character name...

[jira] Commented: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.

2007-03-22 Thread Doug Cutting (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483288 ] Doug Cutting commented on LUCENE-841: - > For anyone else who ever needs to do this, it's a 10-second job in the f

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Steven Parkes
* Merge policy has problems when you "flush by RAM" (this is true even before my patch). Not sure how to fix yet. Do you mean where one would be trying to use RAM usage to determine when to do a flush? -Original Message- From: Michael McCandless (JIRA) [mailto:[EMAIL PROTECTED] S

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Michael McCandless
"Steven Parkes" <[EMAIL PROTECTED]> wrote: > * Merge policy has problems when you "flush by RAM" (this is true > even before my patch). Not sure how to fix yet. > > Do you mean where one would be trying to use RAM usage to determine when > to do a flush? Right, if you have your indexer m

[jira] Created: (LUCENE-844) org.apache.lucene.index.SegmentInfos.FindSegmentsFile.run() throwing NullPointerException

2007-03-22 Thread Jean-Philippe Robichaud (JIRA)
org.apache.lucene.index.SegmentInfos.FindSegmentsFile.run() throwing NullPointerException - Key: LUCENE-844 URL: https://issues.apache.org/jira/browse/LUCENE-844

[jira] Created: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

2007-03-22 Thread Michael McCandless (JIRA)
If you "flush by RAM usage" then IndexWriter may over-merge --- Key: LUCENE-845 URL: https://issues.apache.org/jira/browse/LUCENE-845 Project: Lucene - Java Issue Type: Bug Co

[jira] Updated: (LUCENE-842) ParallelMultiSearcher memory leak

2007-03-22 Thread Thomas Connolly (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Connolly updated LUCENE-842: --- Description: When using the org.apache.lucene.search.ParallelMultiSearcher to search on a si

[jira] Resolved: (LUCENE-844) org.apache.lucene.index.SegmentInfos.FindSegmentsFile.run() throwing NullPointerException

2007-03-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-844. --- Resolution: Duplicate Thanks for opening this issue! This is actually fixed already

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Steven Parkes
> EG if you set maxBufferedDocs to say 1 but then it turns out based > on RAM usage you actually flush every 300 docs then the merge policy > will incorrectly merge a level 1 segment (with 3000 docs) in with the > level 0 segments (with 300 docs). This is because the merge policy > looks at th

[jira] Commented: (LUCENE-844) org.apache.lucene.index.SegmentInfos.FindSegmentsFile.run() throwing NullPointerException

2007-03-22 Thread Jean-Philippe Robichaud (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483305 ] Jean-Philippe Robichaud commented on LUCENE-844: Oh, that's great. Will it be released soon against

[jira] Commented: (LUCENE-842) ParallelMultiSearcher memory leak

2007-03-22 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483314 ] Otis Gospodnetic commented on LUCENE-842: - Thomas: Can you write a JUnit test that builds a sample index and

[jira] Updated: (LUCENE-842) ParallelMultiSearcher memory leak

2007-03-22 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-842: --- Attachment: TestParallelMultiSearcherMemLeak.java Best always is a unit test that fails due to the bu

[jira] Commented: (LUCENE-844) org.apache.lucene.index.SegmentInfos.FindSegmentsFile.run() throwing NullPointerException

2007-03-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483327 ] Michael McCandless commented on LUCENE-844: --- Good question, I'm not sure. I think normally a point release

[jira] Commented: (LUCENE-806) Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers

2007-03-22 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483322 ] Otis Gospodnetic commented on LUCENE-806: - Paul: Haven't looked at the patch, but liked that 2x-4x performanc

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Michael McCandless
On Thu, 22 Mar 2007 13:34:39 -0700, "Steven Parkes" <[EMAIL PROTECTED]> said: > > EG if you set maxBufferedDocs to say 1 but then it turns out based > > on RAM usage you actually flush every 300 docs then the merge policy > > will incorrectly merge a level 1 segment (with 3000 docs) in with th

[jira] Resolved: (LUCENE-840) contrib/benchmark unit tests

2007-03-22 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-840. Resolution: Fixed Lucene Fields: (was: [New]) Just commited this. (Note: tests added are

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Steven Parkes
> Right I'm calling a newly created segment (ie flushed from RAM) level > 0 and then a level 1 segment is created when you merge 10 level 0 > segments, level 2 is created when merge 10 level 1 segments, etc. This isn't the way the current code treats things. I'm not saying it's the only way to loo

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Marvin Humphrey
On Mar 22, 2007, at 3:18 PM, Michael McCandless wrote: Actually is #2 a hard requirement? A lot of Lucene users depend on having document number correspond to age, I think. ISTR Hatcher at least recommending techniques that require it. Do the loose ports of Lucene (KinoSearch, Ferret,

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Michael McCandless
Steven Parkes wrote: >> Right I'm calling a newly created segment (ie flushed from RAM) >> level 0 and then a level 1 segment is created when you merge 10 >> level 0 segments, level 2 is created when merge 10 level 1 segments, >> etc. > > This isn't the way the current code treats things. I'm not

[jira] Updated: (LUCENE-842) ParallelMultiSearcher memory leak

2007-03-22 Thread Thomas Connolly (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Connolly updated LUCENE-842: --- Attachment: search_test_heap.PNG Netbeans 5.5 profile of heap > ParallelMultiSearcher memory

[jira] Updated: (LUCENE-842) ParallelMultiSearcher memory leak

2007-03-22 Thread Thomas Connolly (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Connolly updated LUCENE-842: --- Attachment: search_test_gc.PNG Netbeans 5.5 garbage collection stats > ParallelMultiSearcher

[jira] Closed: (LUCENE-842) ParallelMultiSearcher memory leak

2007-03-22 Thread Thomas Connolly (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Connolly closed LUCENE-842. -- Resolution: Invalid Fix Version/s: 2.1 It was not a bug with lucene rather the netbeans

RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Steven Parkes
> But when these values have > changed (or, segments were flushed by RAM not by maxBufferedDocs) then > the way it computes level no longer results in the logarithmic policy > that it's trying to implement, I think. That's right. Parts of the implementation assume that the segments are logarithmic

ScoreDocComparator extends Comparator?

2007-03-22 Thread Antony Bowesman
Should ScoreDocComparator extend java.util.Comparator. The existing compare() method has the Javadoc comment @see java.util.Comparator. It would then be useful with Java 1.5's PriorityQueue and that would be good because PriorityQueue has a remove() method which makes it useful for manipulati

Re: ScoreDocComparator extends Comparator?

2007-03-22 Thread Antony Bowesman
Oops. Java 1.5 PriorityQueue.remove(o) would not be useful for ScoreDoc as it would delete the first object where compare(o1, o2) == 0. Antony Should ScoreDocComparator extend java.util.Comparator. The existing compare() method has the Javadoc comment @see java.util.Comparator. It would th

[jira] Reopened: (LUCENE-837) contrib/benchmark QueryMaker and Task Refactorings

2007-03-22 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reopened LUCENE-837: Lucene Fields: [Patch Available] more updates coming shortly. I will attach patch, but am

[jira] Updated: (LUCENE-837) contrib/benchmark QueryMaker and Task Refactorings

2007-03-22 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-837: --- Attachment: field-selector-bench.patch Here's my changes. Am going to commit shortly > cont

[jira] Commented: (LUCENE-837) contrib/benchmark QueryMaker and Task Refactorings

2007-03-22 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483388 ] Grant Ingersoll commented on LUCENE-837: Committed field-selector-bench.patch on revision 521569 > contrib/b

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-22 Thread Chris Hostetter
: > Actually is #2 a hard requirement? : : A lot of Lucene users depend on having document number correspond to : age, I think. ISTR Hatcher at least recommending techniques that : require it. "Corrispond to age" may be missleading as it implies that the actual docid has meaning ... it's more tha

[jira] Commented: (LUCENE-806) Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers

2007-03-22 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483409 ] Hoss Man commented on LUCENE-806: - Paul: I have not had a chance to look at your patch (or most patches i've wanted

[jira] Commented: (LUCENE-837) contrib/benchmark QueryMaker and Task Refactorings

2007-03-22 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483413 ] Doron Cohen commented on LUCENE-837: Hi, I like the new field selector stuff. Few comments: - copyright notice m

[jira] Commented: (LUCENE-842) ParallelMultiSearcher memory leak

2007-03-22 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483418 ] Doron Cohen commented on LUCENE-842: Thomas, thanks for following up closely on this (and great that this is not