Re: [jira] Updated: (LUCENE-1381) Hanging while indexing/digesting on multiple threads

2008-09-11 Thread robert engels
By the stacktraces, I think there may be a bug in MethodUtils. By it's name it would appear to be static, with a "weak hash map" of names to methods, but it appears that multiple threads are accessing the same map without synchronization This may be wrecking havoc with the WeakReference

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread Michael McCandless
What if we wrap the value in a WeakReference, but secondarily hold a hard reference to it in a "normal" list? Then, when TermInfosReader is closed we clear that list of all its hard references, at which point GC will be free to reclaim the object out from under the ThreadLocal even before

Re: docid set compression and boolean docid set operations

2008-09-11 Thread Michael McCandless
Hi John, I would love to see this added to Lucene! Do you have actual performance numbers? (Looks like the table in that link below isn't "real"?). Mike John Wang wrote: Sorry, I meant lucene 2.4 -John On Wed, Sep 10, 2008 at 2:08 PM, John Wang <[EMAIL PROTECTED]> wrote: Hi guys:

Re: docid set compression and boolean docid set operations

2008-09-11 Thread Paul Elschot
John, I've taken a first look at the code, and I have a few questions. Did I understand correctly that it is basically a two way conversion between an integer array and an (Open)BitSet representing a p4delta data structure? In that case it would still be necessary to extend the lucene index stru

[jira] Created: (LUCENE-1382) Allow storing user data when IndexWriter.commit() is called

2008-09-11 Thread Michael McCandless (JIRA)
Allow storing user data when IndexWriter.commit() is called --- Key: LUCENE-1382 URL: https://issues.apache.org/jira/browse/LUCENE-1382 Project: Lucene - Java Issue Type: Improvement

Re: Realtime Search for Social Networks Collaboration

2008-09-11 Thread Michael McCandless
Right, there would need to be a snapshot taken of all terms when IndexWriter.getReader() is called. This snapshot would 1) hold a frozen int docFreq per term, and 2) sort the terms so TermEnum can just step through them. (We might be able to delay this sorting until the first time someth

[jira] Updated: (LUCENE-1380) Patch for ShingleFilter.coterminalPositionIncrement

2008-09-11 Thread Michael Semb Wever (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated LUCENE-1380: --- Attachment: (was: LUCENE-1380.patch) > Patch for ShingleFilter.coterminalPositio

[jira] Updated: (LUCENE-1380) Patch for ShingleFilter.coterminalPositionIncrement

2008-09-11 Thread Michael Semb Wever (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated LUCENE-1380: --- Attachment: LUCENE-1380.patch New version with option named enablePositions > Patch

[jira] Updated: (LUCENE-1380) Patch for ShingleFilter.enablePositions

2008-09-11 Thread Michael Semb Wever (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated LUCENE-1380: --- Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Summar

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread robert engels
But you need it by thread, so it can't be a list. You could have a HashMap of in FieldsReader, and when SegmentReader is closed, FieldsReader is closed, which clears the map, and not use thread locals at all. The difference being you would need a sync'd map. On Sep 11, 2008, at 4:56 AM,

[jira] Updated: (LUCENE-995) Add open ended query syntax to QueryParser

2008-09-11 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-995: --- Issue Type: Improvement (was: Bug) Summary: Add open ended query syntax to QueryParser (was:

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread Michael McCandless
I don't need it by thread, because I would still use ThreadLocal to retrieve the SegmentTermEnum. This avoids any sync during get. The list is just a "fallback" to hold a hard reference to the SegmentTermEnum to keep it alive. That's it's only purpose. Then, when SegmentReader is close

[jira] Commented: (LUCENE-112) [PATCH] Add an IndexReader implementation that frees resources when idle and refreshes itself when stale

2008-09-11 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630203#action_12630203 ] Mark Miller commented on LUCENE-112: This is pretty old (2003) and seems unlikely to go

[jira] Updated: (LUCENE-995) Add open ended range query syntax to QueryParser

2008-09-11 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-995: --- Summary: Add open ended range query syntax to QueryParser (was: Add open ended query syntax to Query

[jira] Resolved: (LUCENE-1300) Negative wildcard searches on MultiSearcher not eliminating correctly.

2008-09-11 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-1300. - Resolution: Duplicate > Negative wildcard searches on MultiSearcher not eliminating correctly. >

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread robert engels
You still need to sync access to the list, and how would it be removed from the list prior to close? That is you need one per thread, but you can have the reader shared across all threads. So if threads were created and destroyed without ever closing the reader, the list would grow unbounde

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread Michael McCandless
OK so we compact the list (removing dead threads) every time we add a new entry to the list. This way for a long lived SegmentReader but short lived threads, the list keeps only live threads. We do need sync access to the list, but that's only on binding a new thread. Retrieving an exis

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread robert engels
I think that would work, but I think you would be better off encapsulating that in an extended ThreadLocal, e.g. WeakThreadLocal, and use that every where. Add a method clear(), that clears the ThreadLocals list (which will allow the values to be GC'd). On Sep 11, 2008, at 9:43 AM, Michael

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread robert engels
Technically, you need to sync on the set as well, since you need to remove the old value, and add the new to the list. Although Lucene doesn't use the set. just the initial value set, so the overhead is minimal. On Sep 11, 2008, at 9:43 AM, Michael McCandless wrote: OK so we compact the

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread Michael McCandless
Yeah I think that's the right approach. I'll code it up. Mike robert engels wrote: I think that would work, but I think you would be better off encapsulating that in an extended ThreadLocal, e.g. WeakThreadLocal, and use that every where. Add a method clear(), that clears the ThreadLoca

update NOTICE/LICENSE when new jars are added

2008-09-11 Thread Yonik Seeley
I'm finished my audit of the jars we include. There were missing elements for junit, stax (now removed), and stax-utils. In the future we should take care of updating LICENSE/NOTICE immediately when a new jar is added. -Yonik - T

Re: update NOTICE/LICENSE when new jars are added

2008-09-11 Thread Yonik Seeley
Oops, meant for this to go to solr-dev... sorry. On Thu, Sep 11, 2008 at 11:52 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > I'm finished my audit of the jars we include. There were missing > elements for junit, stax (now removed), and stax-utils. > In the future we should take care of updating LI

[jira] Commented: (LUCENE-112) [PATCH] Add an IndexReader implementation that frees resources when idle and refreshes itself when stale

2008-09-11 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630265#action_12630265 ] Otis Gospodnetic commented on LUCENE-112: - +1 for closing it. Half a decade ago...

[jira] Issue Comment Edited: (LUCENE-1381) Hanging while indexing/digesting on multiple threads

2008-09-11 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630270#action_12630270 ] otis edited comment on LUCENE-1381 at 9/11/08 10:47 AM:

[jira] Commented: (LUCENE-1381) Hanging while indexing/digesting on multiple threads

2008-09-11 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630270#action_12630270 ] Otis Gospodnetic commented on LUCENE-1381: -- David, why not bring this up on java-

[jira] Commented: (LUCENE-1381) Hanging while indexing/digesting on multiple threads

2008-09-11 Thread David Fertig (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630280#action_12630280 ] David Fertig commented on LUCENE-1381: -- Otis, You are correct, I should have started

[jira] Commented: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter

2008-09-11 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630286#action_12630286 ] Karl Wettin commented on LUCENE-1320: - Cool, thanks! The only thing I could see is th

[jira] Closed: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter

2008-09-11 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wettin closed LUCENE-1320. --- Resolution: Fixed JDK downgrade committed. Thanks for the time spent Grant! > ShingleMatrixFilter,

[jira] Resolved: (LUCENE-1381) Hanging while indexing/digesting on multiple threads

2008-09-11 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic resolved LUCENE-1381. -- Resolution: Invalid This is a new piece of code and the stack trace doesn't show Lucen

[jira] Commented: (LUCENE-1354) Provide Programmatic Access to CheckIndex

2008-09-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630341#action_12630341 ] Michael McCandless commented on LUCENE-1354: bq. Mike, I think you forgot to a

[jira] Created: (LUCENE-1383) Workaround ThreadLocal's "leak"

2008-09-11 Thread Michael McCandless (JIRA)
Workaround ThreadLocal's "leak" --- Key: LUCENE-1383 URL: https://issues.apache.org/jira/browse/LUCENE-1383 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.2, 2.3.1, 2.

[jira] Updated: (LUCENE-1383) Workaround ThreadLocal's "leak"

2008-09-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1383: --- Attachment: LUCENE-1383.patch Attached patch. All tests pass. The patch adds o.a.l

RE: [jira] Commented: (LUCENE-1354) Provide Programmatic Access to CheckIndex

2008-09-11 Thread Steven A Rowe
On 09/11/2008 at 4:13 PM, Michael McCandless (JIRA) wrote: > We really need the "svn patch" command, so that it would > have locally added that file on me applying the > patch. Sigh. But you worked around this just fine :) Looks like this feature will be available in Subversion 1.6, which is sla

Re: [jira] Commented: (LUCENE-1354) Provide Programmatic Access to CheckIndex

2008-09-11 Thread Michael McCandless
Steven A Rowe wrote: On 09/11/2008 at 4:13 PM, Michael McCandless (JIRA) wrote: We really need the "svn patch" command, so that it would have locally added that file on me applying the patch. Sigh. But you worked around this just fine :) Looks like this feature will be available in Subvers

[jira] Updated: (LUCENE-1383) Work around ThreadLocal's "leak"

2008-09-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1383: --- Summary: Work around ThreadLocal's "leak" (was: Workaround ThreadLocal's "leak") >

Re: docid set compression and boolean docid set operations

2008-09-11 Thread John Wang
Hi guys: I will let the author, Anmol Bhasin to respond with details. In our use case, we are not making changes to the index because we do not want to diverge from the lucene code base. (thought it'd be great if we can enhance indexing structure with this) We load the docIdSets into memo

[jira] Commented: (LUCENE-1039) Bayesian classifiers using Lucene as data store

2008-09-11 Thread Toby Segaran (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630434#action_12630434 ] Toby Segaran commented on LUCENE-1039: -- I'm the author of "Programming Collective Int

Re: Is the COMPANY rule in StandardTokenizer valid?

2008-09-11 Thread Shai Erera
So I've been thinking about this more, and I can't seem to reach to any reasonable conclusion other than removing that rule. I'll explain: COMPANY identifies AT&T, [EMAIL PROTECTED] but it also identifies R&D, AD&D, Q&A all are not really COMPANY. So there's a semantic error in the name of the rule

Change to MultiReader

2008-09-11 Thread Antony Bowesman
There was a message from Kirk Roberts, 18/4/2007 - MultiSearcher vs MultiReader Grant mentioned the visibility of the readerIndex() method in MultiReader, but nothing seems ever came of it. Is there any reason why the following could not be put into MultiReader? Something like this seems nece