[jira] Updated: (LUCENE-1068) Invalid behavior of StandardTokenizerImpl

2007-11-29 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1068: --- Attachment: StandardTokenizerImpl-3.patch The previous patch I put was incorrect since it would stil

[jira] Commented: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly

2007-11-29 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546969 ] Hoss Man commented on LUCENE-588: - you're refering to the documentation for the querysyntax, used by the QueryParser

Re: Payload Loading and Reloading

2007-11-29 Thread Grant Ingersoll
Good points, like I said, I will look more into caching in the Near Spans. I need to profile them some anyway, as I am hoping there is some speedup to be had there. -Grant On Nov 29, 2007, at 6:23 PM, Michael Busch wrote: Grant Ingersoll wrote: As for the cost of the seeks, why can't w

Re: Payload Loading and Reloading

2007-11-29 Thread Michael Busch
Michael Busch wrote: > once. For convenience, user could also create a very simple > Termpositions decorator that caches the most recently loaded payload and > allows calling getPayload() more than once. Something like this should do the trick (I stole resizeBuffer() from Token). It's untested cod

[jira] Assigned: (LUCENE-1072) NullPointerException during indexing in DocumentsWriter$ThreadState$FieldData.addPosition

2007-11-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1072: -- Assignee: Michael McCandless > NullPointerException during indexing in > Docu

Re: Payload Loading and Reloading

2007-11-29 Thread Michael Busch
Grant Ingersoll wrote: > > As for the cost of the seeks, why can't we just document that this is > what is going on and discourage people from doing it? I'm just trying to keep SegmentTermPositions#getPayload() as efficient as possible because it's often used in the most inner loops of scorers

[jira] Commented: (LUCENE-1072) NullPointerException during indexing in DocumentsWriter$ThreadState$FieldData.addPosition

2007-11-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546936 ] Grant Ingersoll commented on LUCENE-1072: - I think this is related: https://issues.apache.org/jira/browse/LU

Re: Payload Loading and Reloading

2007-11-29 Thread Grant Ingersoll
The use case I have is for Lucene-1001, so the caching is going to happen somewhere in Lucene, not necessarily the application. I think caching it in SegTermPos. is the simplest, but I will have to look at the alternatives. It is particularly problematic in the Near Spans case (ordered an

[jira] Created: (LUCENE-1072) NullPointerException during indexing in DocumentsWriter$ThreadState$FieldData.addPosition

2007-11-29 Thread Alexei Dets (JIRA)
NullPointerException during indexing in DocumentsWriter$ThreadState$FieldData.addPosition - Key: LUCENE-1072 URL: https://issues.apache.org/jira/browse/LUCENE-1072

Re: Payload Loading and Reloading

2007-11-29 Thread Michael Busch
I designed the API with this limitation intentionally to prevent users from thinking that they can call TermPositions.getPayload() more than once with no costs. If we allow to call it more often than once then we have to seek back in the posting stream. Even if this is just a seek in the underlyin

[jira] Commented: (LUCENE-1058) New Analyzer for buffering tokens

2007-11-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546891 ] Grant Ingersoll commented on LUCENE-1058: - {quote} Why not? Seems more flexible, and this is an expert level

[jira] Commented: (LUCENE-1058) New Analyzer for buffering tokens

2007-11-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546875 ] Grant Ingersoll commented on LUCENE-1058: - And, of course, if we are calling add() outside of the Tee proces

[jira] Commented: (LUCENE-1058) New Analyzer for buffering tokens

2007-11-29 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546874 ] Yonik Seeley commented on LUCENE-1058: -- > I see. Then we should set iter=null in add() in case after reset() mo

[jira] Commented: (LUCENE-1058) New Analyzer for buffering tokens

2007-11-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546872 ] Grant Ingersoll commented on LUCENE-1058: - In looking again, I also wonder whether the getTokens() shouldn't

[jira] Commented: (LUCENE-1058) New Analyzer for buffering tokens

2007-11-29 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546871 ] Michael Busch commented on LUCENE-1058: --- I see. Then we should set iter=null in add() in case after reset() mo

[jira] Commented: (LUCENE-1058) New Analyzer for buffering tokens

2007-11-29 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546866 ] Yonik Seeley commented on LUCENE-1058: -- > In SinkTokenizer you could initalize the iterator in the constructor

[jira] Commented: (LUCENE-1058) New Analyzer for buffering tokens

2007-11-29 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546861 ] Michael Busch commented on LUCENE-1058: --- Sorry, this review is a bit late. Only a simple remark: In SinkToken

Re: [jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2007-11-29 Thread robert engels
As long as the value displayed is allowed to be wrong or inconsistent you're fine :) The 'fits in int' might hold true but doesn't have to - at a low level in the memory controller it might only write 8 bits, (or 1 bit) at a time. Cache consistency doesn't work as most people think it sho

Re: [jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2007-11-29 Thread Mark Miller
I've been thinking about this and I think the situation we are in is okay. The variable is an int and so should be read in one memory access (i would think?). So at worst, the result might be stale? Since this read is just for informational type purposes, (eg its ok if this particular method re

[jira] Updated: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2007-11-29 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1026: Attachment: IndexAccessor.zip > Provide a simple way to concurrently access a Lucene index from mu

Re: [jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2007-11-29 Thread Mark Miller
Thanks Robert. Ill keep the sync then. I only considered it possible as the read is for reporting type purposes and so is not relied on for functionality. Sounds like we better retain the sync anyway though. Shai: I have incorporated your code into mine. Looks great so far. There is a rather

Re: [jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2007-11-29 Thread robert engels
FYI, any time one thread reads a value while another thread updates it it needs to be synchronized, or with current JVMs a volatile variable. The Java Memory Model requires this. Otherwise you can get a partial value (when the underlying value requires more than one memory access to retrie

[jira] Resolved: (LUCENE-1058) New Analyzer for buffering tokens

2007-11-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved LUCENE-1058. - Resolution: Fixed Lucene Fields: (was: [New]) Committed revision 599478. > New

[jira] Commented: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly

2007-11-29 Thread Sunil Kamath (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546743 ] Sunil Kamath commented on LUCENE-588: - The documentation does state that escaping of the "?" character by prepend

Re: [jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2007-11-29 Thread Shai Erera
Part of the changes I've done is to have close() not throw exception when it is being called twice (I don't think it's bad to call it twice). Maybe you didn't copy that part of the code? Because the tests run on my machine ... I agree on the last statement. I just didn't want to introduce too many

[jira] Updated: (LUCENE-1068) Invalid behavior of StandardTokenizerImpl

2007-11-29 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1068: --- Attachment: StandardTokenizerImpl-2.patch I've found a way to do it (I think): I've added a new type

[jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2007-11-29 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546719 ] Mark Miller commented on LUCENE-1026: - Quick comment: You have the MultiSearcher closing its own IndexAccessor'

[jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2007-11-29 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546702 ] Mark Miller commented on LUCENE-1026: - Hey Mark, few more questions: 1. Why is StopWatch needed? StopWatch cou

[jira] Updated: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2007-11-29 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1026: --- Attachment: shai-IndexAccessor3.zip Based on my previous comments. > Provide a simple way to concur

[jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2007-11-29 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546694 ] Shai Erera commented on LUCENE-1026: Hey Mark, I've cleaned up the code, added javadoc and organized the tests.

Re: Potential bug in StandardTokenizerImpl

2007-11-29 Thread Grant Ingersoll
Yeah, one of the things that I am not thrilled about our model is that it essentially means we can only make these kinds of changes on 3.0- dev (i.e. before releasing 3.0), not a big deal in theory, but as evidenced by Hoss's history on this particular item, it has been around for a long tim

Re: First cut at web-based Luke for contrib

2007-11-29 Thread Grant Ingersoll
I wouldn't worry about it too much. The binary distribution probably should just contain the built WAR (and Jetty?) and the source dist can have everything. -Grant On Nov 29, 2007, at 2:33 AM, markharw00d wrote: The 17 MB bundle I provided is essentially the source plus dependencies, the

[jira] Commented: (LUCENE-809) No source or classes for the similarity contrib

2007-11-29 Thread Vincent van Beveren (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546604 ] Vincent van Beveren commented on LUCENE-809: Should the similarity.jar not be removed from the build, for

[jira] Updated: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2007-11-29 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1026: --- Attachment: shai-IndexAccessor-2.zip Includes the changes to the files based on the comments I've re

[jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2007-11-29 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546592 ] Shai Erera commented on LUCENE-1026: Hey Mark, few more questions: 1. Why is StopWatch needed? it seems like the