[jira] Commented: (LUCENE-826) Language detector

2007-11-08 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541202 ] Karl Wettin commented on LUCENE-826: Peter Taylor - 08/Nov/07 10:15 AM > Just out of curiosity which version of W

Re: Term pollution from binary data

2007-11-08 Thread Michael McCandless
"Doug Cutting" <[EMAIL PROTECTED]> wrote: > Aren't indexes loaded lazily? That's an important optimization for > merging, no? For performance reasons, opening an IndexReader shouldn't > do much more than open files. However, if we build a more generic > mechanism, we should not rely on that

Re: Term pollution from binary data

2007-11-08 Thread robert engels
I was thinking of more along the Java ImageIO ImageRead/WriteParam stuff. class IndexReaderParam { get/set UseLargeBuffers() get/set UseReadAhead(); .. etc. other "standard" options, a particular index reader if free to ignore them ... } a custom IndexReader would create a

Re: Term pollution from binary data

2007-11-08 Thread Doug Cutting
robert engels wrote: I think it would be better to have IndexReaderProperties, and IndexWriterProperties. What methods would these have? The notion of a termIndexDivisor is specific to a particular IndexReader implementation, so probably shouldn't be handled by a generic IndexReaderPropertie

[jira] Commented: (LUCENE-826) Language detector

2007-11-08 Thread Peter Taylor (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541115 ] Peter Taylor commented on LUCENE-826: - Uh never mind ;) I have poked around and I am guessing you are using versi

[jira] Commented: (LUCENE-826) Language detector

2007-11-08 Thread Peter Taylor (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541094 ] Peter Taylor commented on LUCENE-826: - Just out of curiosity which version of Weka are you using... I ask becaus

Re: Term pollution from binary data

2007-11-08 Thread robert engels
I think it would be better to have IndexReaderProperties, and IndexWriterProperties. Just seems an easier API for maintenance. It is more logical, as it keeps related items together. On Nov 8, 2007, at 12:04 PM, Doug Cutting wrote: Michael McCandless wrote: One thing is: I'd prefer to no

Re: Term pollution from binary data

2007-11-08 Thread Doug Cutting
Michael McCandless wrote: One thing is: I'd prefer to not use system property for this, since it's so global, but I'm not sure how to better do it. I agree. That was the quick-and-dirty hack. Ideally it should be a method on IndexReader. I can think of two ways to do that: 1. Add a generi

setSimilarity on Query

2007-11-08 Thread John Wang
Hi: We are running into a situation that we want to have a similarity depending on a field. We also want to leverage QueryParser. The easiest thing we can find is to override the QueryParser class with the method getFieldQuery. It would be a lot simpler if we can just set the similarity on

Re: setSimilarity on Query

2007-11-08 Thread John Wang
It would be cleaner if we can add some sort of factory pattern for similarities to Query. It is essentially using the Searcher as the source for Similarity. Thoughts? -john On Nov 8, 2007 9:37 AM, John Wang <[EMAIL PROTECTED]> wrote: > Hi: > >We are running into a situation that we want to h

Re: BufferingAnalyzer (or something like that)

2007-11-08 Thread Mark Miller
I think it is certainly useful as I use something similar myself. My implementation is not as generic as I would like (requires a specific special analyzer written for the task), but works great for my case. I use a CachingTokenFilter as well as a couple ThreadLocals so that I can have a stemme

[jira] Updated: (LUCENE-1048) Lock.obtain(timeout) behaves incorrectly for large timeouts

2007-11-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1048: --- Attachment: LUCENE-1048.patch Simple patch that fixes the bug. I also added static

[jira] Created: (LUCENE-1048) Lock.obtain(timeout) behaves incorrectly for large timeouts

2007-11-08 Thread Michael McCandless (JIRA)
Lock.obtain(timeout) behaves incorrectly for large timeouts --- Key: LUCENE-1048 URL: https://issues.apache.org/jira/browse/LUCENE-1048 Project: Lucene - Java Issue Type: Bug

[jira] Resolved: (LUCENE-1043) Speedup merging of stored fields when field mapping "matches"

2007-11-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1043. Resolution: Fixed I just committed this. Thanks Robert! > Speedup merging of sto

Re: Term pollution from binary data

2007-11-08 Thread Michael McCandless
I like this approach: it means, at search time, you can choose to further subsample the already subsampled (during indexing) set of terms for the TermInfosReader index. So you can easily turn the knob to trade off memory usage vs IO cost/latency during searching. I'll open an issue and work thro

[jira] Resolved: (LUCENE-1047) Change MergePolicy & MergeScheduler to be abstract base classes instead of an interfaces

2007-11-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1047. Resolution: Fixed > Change MergePolicy & MergeScheduler to be abstract base classe