Re: ThreadLocal in SegmentReader

2008-07-11 Thread robert engels
This is only an issue for static ThreadLocals ... On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote: The problem here is not because ThreadLocal instances are not GC'd (they are GC'd, and your test shows this clearly). But even one instance which is not removed from its Thread is enou

Re: Token implementation

2008-07-11 Thread Hiroaki Kawai
Another suggestion from me: How about making token object as an singleton? > Maybe we should un-deprecate the termText() method but add javadocs > explaining that for better performance you should use the char[] reuse > methods instead? > > Mike > > DM Smith wrote: > > > Michael McCandless

Re: ThreadLocal in SegmentReader

2008-07-11 Thread Roman Puchkovskiy
The problem here is not because ThreadLocal instances are not GC'd (they are GC'd, and your test shows this clearly). But even one instance which is not removed from its Thread is enough to prevent the classloader from being unloaded, and that's the problem. Michael McCandless-2 wrote: > > OK,

Re: Token implementation

2008-07-11 Thread DM Smith
On Jul 11, 2008, at 9:42 PM, Hiroaki Kawai wrote: Another suggestion from me: How about making token object as an singleton? Would that work for a multi-threaded application? Maybe we should un-deprecate the termText() method but add javadocs explaining that for better performance you s

Re: Token implementation

2008-07-11 Thread Hiroaki Kawai
Another suggestion from me: How about making token object as an singleton? > Maybe we should un-deprecate the termText() method but add javadocs > explaining that for better performance you should use the char[] reuse > methods instead? > > Mike > > DM Smith wrote: > > > Michael McCandless

Re: Token implementation

2008-07-11 Thread DM Smith
Michael McCandless wrote: Maybe we should un-deprecate the termText() method but add javadocs explaining that for better performance you should use the char[] reuse methods instead? I think so, too. Should we leave it as deprecated until 3.0? With the performance note and the encouragement to

[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force)

2008-07-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613025#action_12613025 ] Michael McCandless commented on LUCENE-1314: bq. Because Ocean does not flush

Re: Token implementation

2008-07-11 Thread Michael McCandless
Maybe we should un-deprecate the termText() method but add javadocs explaining that for better performance you should use the char[] reuse methods instead? Mike DM Smith wrote: Michael McCandless wrote: DM Smith wrote: Shouldn't Term have constructors that take a Token? I think tha

Re: CorruptIndexException docs out of order

2008-07-11 Thread Michael McCandless
I think if your SegmentReader is not returning the right result when numDocs() is called it can lead to this. Eg if your maxDoc() is 1000 and you think you have 100 deleted docs (so numDocs() returns 900) but upon iterating through the docs you only saw say 50 that were deleted, then, whe

[jira] Commented: (LUCENE-1282) Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene

2008-07-11 Thread Jed Wesley-Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613018#action_12613018 ] Jed Wesley-Smith commented on LUCENE-1282: -- Sun has posted their evaluation on th

[jira] Created: (LUCENE-1335) Correctly handle concurrent calls to addIndexes, optimize, commit

2008-07-11 Thread Michael McCandless (JIRA)
Correctly handle concurrent calls to addIndexes, optimize, commit - Key: LUCENE-1335 URL: https://issues.apache.org/jira/browse/LUCENE-1335 Project: Lucene - Java Issue Type: Bu

Re: Commit while addIndexes is in progress

2008-07-11 Thread Michael McCandless
Ning Li wrote: I think there're similar problems with calling optimize() while addIndexes is in progress... I think we should disallow that? Optimize waits for addIndexes to finish? I think it's useful to allow addIndexes during maybeMerge and optimize, no? OK I agree it would be nice

Re: Commit while addIndexes is in progress

2008-07-11 Thread Michael McCandless
Yonik Seeley wrote: On Fri, Jul 11, 2008 at 2:38 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: Hmm, I think we should. What should it "mean" when you call commit(), while another thread is in the middle of addIndexes? Seems like either all or none of the segments in addIndexes shoul

Re: Commit while addIndexes is in progress

2008-07-11 Thread Michael McCandless
Yonik Seeley wrote: On Fri, Jul 11, 2008 at 3:27 PM, Ning Li <[EMAIL PROTECTED]> wrote: We should also disallow concurrent addIndexes, right? Hmmm, the current implementation looks like it won't currently won't work correctly (docWriter.resumeAllThreads() being called while another thread is

CorruptIndexException docs out of order

2008-07-11 Thread Jason Rutherglen
Periodically seeing this exception when testing out Ocean. What would be a possible cause for this? I assume it is a problem in the index. The code is merging custom segmentreaders that are created using http://issues.apache.org/jira/browse/LUCENE-1314 IndexReader.clone. I need to isolate if it

[jira] Updated: (LUCENE-1315) Add setIndexReader in IndexSearcher

2008-07-11 Thread Anthony Urso (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Urso updated LUCENE-1315: - Attachment: LUCENE-1315.patch Renamed the patch file to standard name as per http://wiki.apache

[jira] Updated: (LUCENE-1315) Add setIndexReader in IndexSearcher

2008-07-11 Thread Anthony Urso (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Urso updated LUCENE-1315: - Attachment: (was: setIndexReader.diff) > Add setIndexReader in IndexSearcher >

[jira] Updated: (LUCENE-1334) Term improvement

2008-07-11 Thread DM Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DM Smith updated LUCENE-1334: - Attachment: LUCENE-1334.txt patch for the issue > Term improvement > > >

Re: Commit while addIndexes is in progress

2008-07-11 Thread Yonik Seeley
On Fri, Jul 11, 2008 at 3:27 PM, Ning Li <[EMAIL PROTECTED]> wrote: > We should also disallow concurrent addIndexes, right? Hmmm, the current implementation looks like it won't currently won't work correctly (docWriter.resumeAllThreads() being called while another thread is calling addIndexes, etc

Re: Commit while addIndexes is in progress

2008-07-11 Thread Yonik Seeley
On Fri, Jul 11, 2008 at 2:38 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Hmm, I think we should. > > What should it "mean" when you call commit(), while another thread is in the > middle of addIndexes? Seems like either all or none of the segments in addIndexes should be committed. > We

Re: Token implementation

2008-07-11 Thread DM Smith
Michael McCandless wrote: DM Smith wrote: Shouldn't Term have constructors that take a Token? I think that makes sense, though normally Token appears during analysis and Term during searching (I think?) -- how often would you need to make a Term from a Token? The problem I'm addressing

Re: Commit while addIndexes is in progress

2008-07-11 Thread Ning Li
> What should it "mean" when you call commit(), while another thread is in the > middle of addIndexes? > > We could 1) disallow it (throw an exception if you try), 2) allow it but > block until addIndexes is done, 3) allow it but commit all changes up until > when addIndexes was first called ... an

[jira] Resolved: (LUCENE-1328) FileNotFoundException in

2008-07-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1328. Resolution: Invalid I think this came down to not closing IndexSearchers... > Fil

Re: TokenStream#reset():boolean?

2008-07-11 Thread Michael McCandless
Karl Wettin wrote: 7 jul 2008 kl. 13.04 skrev Michael McCandless: If we make this change (migrate to "boolean TokenStream.reset()"), what would IndexWriter do if it calls reset and false is returned? I don't think the writer ever should call reset(), it is the consumer who is passing

Re: Commit while addIndexes is in progress

2008-07-11 Thread Michael McCandless
Hmm, I think we should. What should it "mean" when you call commit(), while another thread is in the middle of addIndexes? We could 1) disallow it (throw an exception if you try), 2) allow it but block until addIndexes is done, 3) allow it but commit all changes up until when addIndexes

Re: Token implementation

2008-07-11 Thread Michael McCandless
DM Smith wrote: Shouldn't Term have constructors that take a Token? I think that makes sense, though normally Token appears during analysis and Term during searching (I think?) -- how often would you need to make a Term from a Token? Mike --

[jira] Updated: (LUCENE-1301) Refactor DocumentsWriter

2008-07-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1301: --- Attachment: LUCENE-1301.patch New rev of the patch attached. I've fixed all nocommi

[jira] Created: (LUCENE-1334) Term improvement

2008-07-11 Thread DM Smith (JIRA)
Term improvement Key: LUCENE-1334 URL: https://issues.apache.org/jira/browse/LUCENE-1334 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.3.1 Environment: all

[jira] Created: (LUCENE-1333) Token implementation needs improvements

2008-07-11 Thread DM Smith (JIRA)
Token implementation needs improvements --- Key: LUCENE-1333 URL: https://issues.apache.org/jira/browse/LUCENE-1333 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects

Commit while addIndexes is in progress

2008-07-11 Thread Ning Li
Hi, Should we guard against the case when commit() is called during addIndexes? Otherwise, errors such as a file does not exist could happen during commit. Cheers, Ning Li - To unsubscribe, e-mail: [EMAIL PROTECTED] For addition

[jira] Updated: (LUCENE-1322) Remove synchronization in CompoundFileReader

2008-07-11 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1322: - Comment: was deleted > Remove synchronization in CompoundFileReader > --

[jira] Commented: (LUCENE-1322) Remove synchronization in CompoundFileReader

2008-07-11 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612911#action_12612911 ] Jason Rutherglen commented on LUCENE-1322: -- Seeing a possible bug in this patch:

Re: Hadoop RPC for distributed Lucene

2008-07-11 Thread Ken Krugler
I believe Hadoop RPC was originally built for distributed search for Nutch. Here's some core code I think Nutch still uses http://svn.apache.org/viewvc/lucene/nu

Re: Hadoop RPC for distributed Lucene

2008-07-11 Thread Jason Rutherglen
I believe Hadoop RPC was originally built for distributed search for Nutch. Here's some core code I think Nutch still uses http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/searcher/DistributedSearch.java?revision=619648&view=markup One thing I wanted to add to the original

Re: ThreadLocal in SegmentReader

2008-07-11 Thread robert engels
As always, you still have the issue that if the object in the ThreadLocal has a reference to a native resource (e.g. file handle), you might run out of file handles before any OOM which triggers the GC (to close the file handle if relying on finalization). On Jul 11, 2008, at 4:54 AM, Micha

Re: Hadoop RPC for distributed Lucene

2008-07-11 Thread Grant Ingersoll
I believe there is a subproject over at Hadoop for doing distributed stuff w/ Lucene, but I am not sure if they are doing search side, only indexing. I was always under the impression that it was too slow for search side, as I don't think Nutch even uses it for the search side of the equat

RE: Ocean Documentation

2008-07-11 Thread Ard Schrijvers
Hello Jason et al, Indeed there are plenty of usecases of instantly needed updated searches, for example the jsr-170 (jcr) compliant Jackrabbit implementation: it havily relies on lucene for searching and hierarchy resolving, and according jsr-170 spec after a save(), changes need to be visible in

Re: Token implementation

2008-07-11 Thread DM Smith
I am now looking at this in depth. Here is what I am finding: Token and Term are tightly paired, but their implementation is not. Term holds a field and a word. Both of these are Strings. So to get the term out of a Token and put it into a Term, one of two constructs is used: t = new Term

Re: Ocean Documentation

2008-07-11 Thread Jason Rutherglen
I started a wiki name at http://wiki.apache.org/lucene-java/OceanRealtimeSearch linked from http://wiki.apache.org/lucene-java/LuceneResources. Perhaps I should add some background on the wiki. I can add a little bit here. I was an early Solr developer/user at a social networking company when Go

Re: Wordnet Synonym index

2008-07-11 Thread matt connolly
I got a message back from Dave at Tropo, and thought I'd share here: > This was developed ages before Solr appeared. > So I suppose next thing to do is to make a Solr Filter + Factory that does the same thing -- View this message in context: http://www.nabble.com/Wordnet-Synonym-index-t

JVM index corruption bug

2008-07-11 Thread Michael McCandless
FYI, Sun has upgraded the priority to high, and added an Evaluation comment: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 This is the JVM bug behind https://issues.apache.org/jira/browse/LUCENE-1282 . Thanks for all the votes! Mike -

Re: Ocean Documentation

2008-07-11 Thread Karl Wettin
10 jul 2008 kl. 22.08 skrev Jason Rutherglen: Is there a good place to put Ocean https://issues.apache.org/jira/browse/LUCENE-1313 documentation? Is there a place on the wiki that is good? Hi Janson, the wiki is just fine. I've been reading the docs and looked at your patch. There is a lo

Re: ThreadLocal in SegmentReader

2008-07-11 Thread Michael McCandless
OK, I created a simple test to test this (attached). The test just runs 10 threads, each one creating a 100 KB byte array which is stored into a ThreadLocal, and then periodically the ThreadLocal is replaced with a new one. This is to test whether GC of a ThreadLocal, even though the thre