Build failed in Hudson: Lucene-trunk #848

2009-06-03 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/848/changes Changes: [mikemccand] LUCENE-1660: make enablePositionIncrement required up-front arg [mikemccand] LUCENE-1451: deprecate methods that use FSDirectory.getDirectory under-the-hood [uschindler] Also remove the removed similar

[jira] Commented: (LUCENE-1609) Eliminate synchronization contention on initial index reading in TermInfosReader ensureIndexIsRead

2009-06-03 Thread Jed Wesley-Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716118#action_12716118 ] Jed Wesley-Smith commented on LUCENE-1609: -- We get hit by this too. We'd love to

Re: EnwikiDocMaker

2009-06-03 Thread Grant Ingersoll
I think I see what might be my problem. I'm pulling in the dependencies via Maven, and the benchmarker POM is not publishing the XERCES dependency, etc. -Grant On Jun 3, 2009, at 11:53 AM, Jason Rutherglen wrote: I saw a weird error related to the xerces, I think it was a class version p

[jira] Commented: (LUCENE-1491) EdgeNGramTokenFilter stops on tokens smaller then minimum gram size.

2009-06-03 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716053#action_12716053 ] Otis Gospodnetic commented on LUCENE-1491: -- I'm getting convinced to just drop ng

[jira] Created: (LUCENE-1677) Remove GCJ IndexReader specializations

2009-06-03 Thread Earwin Burrfoot (JIRA)
Remove GCJ IndexReader specializations -- Key: LUCENE-1677 URL: https://issues.apache.org/jira/browse/LUCENE-1677 Project: Lucene - Java Issue Type: Task Reporter: Earwin Burrfoot

Re: Enhance StandardTokenizer to support words which will not be tokenized

2009-06-03 Thread ami dudu
This can be good solution but it will have to be maintained every update of the StandardAnalyzer rules. Is there a way to workaround it? Grant Ingersoll-6 wrote: > > You'd have to modify the JFlex grammar. I'd suggest adding in a > generic "protected words" approach whereby you can pass in a

Re: EnwikiDocMaker

2009-06-03 Thread Jason Rutherglen
I saw a weird error related to the xerces, I think it was a class version problem. I'll try it again though to make sure. On Wed, Jun 3, 2009 at 5:58 AM, Shai Erera wrote: > Then perhaps as part of 1595 I can change it to use Java's XML parser, and > test the Enwiki file. If all goes well, we m

[jira] Resolved: (LUCENE-1660) Make StopFilter.enablePositionIncrements explicit

2009-06-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1660. Resolution: Fixed Fix Version/s: 2.9 > Make StopFilter.enablePositionIncrem

Re: EnwikiDocMaker

2009-06-03 Thread Grant Ingersoll
Doh! Not sure how I missed that! Sure enough, I see it now. I'll try my stuff using those libs and make sure they are at the front of the classpath On Jun 3, 2009, at 11:13 AM, Shai Erera wrote: The current benchmark contains xerces-2.9.1-patched- XERCESJ-1257.jar, and its build.xml sets

[jira] Updated: (LUCENE-1272) Support for boost factor in MoreLikeThis

2009-06-03 Thread Jonathan Leibiusky (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Leibiusky updated LUCENE-1272: --- Attachment: morelikethis_boostfactor.patch Updated to work with trunk > Support for

[jira] Updated: (LUCENE-1272) Support for boost factor in MoreLikeThis

2009-06-03 Thread Jonathan Leibiusky (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Leibiusky updated LUCENE-1272: --- Attachment: (was: morelikethis_boostfactor.patch) > Support for boost factor in

Re: EnwikiDocMaker

2009-06-03 Thread Michael McCandless
Shai, make sure you're able to process the full Wikipedia export, ie you don't hit that weird issue (with Xerces) from LUCENE-1591, that caused us to switch to the patched version of Xerces. Mike On Wed, Jun 3, 2009 at 2:13 PM, Shai Erera wrote: > The current benchmark contains xerces-2.9.1-patc

Re: EnwikiDocMaker

2009-06-03 Thread Shai Erera
The current benchmark contains xerces-2.9.1-patched-XERCESJ-1257.jar, and its build.xml sets the classpath to include all .jar under the lib folder. So it looks like it is part of Be

[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

2009-06-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716014#action_12716014 ] Michael McCandless commented on LUCENE-1614: Latest patch looks good. I plan

[jira] Commented: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716004#action_12716004 ] Michael McCandless commented on LUCENE-1651: OK patch looks good. I plan to co

[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715973#action_12715973 ] Earwin Burrfoot commented on LUCENE-1630: - Searcher is supposed to be a little che

Re: Enhance StandardTokenizer to support words which will not be tokenized

2009-06-03 Thread Earwin Burrfoot
Not sure you can easily marry generated JFlex grammar and runtime-provided list of protected words. I took the approach of creating tokens for punctuation inside my tokenizer and later gluing them with nearby text tokens or dropping from the stream with a tokenfilter. On Wed, Jun 3, 2009 at 20:10,

[jira] Updated: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1651: Attachment: LUCENE-1651.patch One more version, applies against current trunk without fuzz

Re: Enhance StandardTokenizer to support words which will not be tokenized

2009-06-03 Thread Grant Ingersoll
You'd have to modify the JFlex grammar. I'd suggest adding in a generic "protected words" approach whereby you can pass in a list of protected words. This would be a nice patch/improvement. -Grant On Jun 3, 2009, at 4:07 AM, ami dudu wrote: Hi, I'm using a StandardTokenizer which do gre

Re: EnwikiDocMaker

2009-06-03 Thread Grant Ingersoll
+1 Note, Xerces Jar is not in benchmark, AFAICT. It relies on the fact that Java uses it under the hood. I'm having this really weird situation where I'm using EnwikiDocMaker outside the context of the benchmarker and I'm grasping at straws as to why it is not working. It seems to be a

[jira] Updated: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-1651: Attachment: LUCENE-1651-tag.patch LUCENE-1651.patch Argh! The rename broke

[jira] Commented: (LUCENE-1672) Deprecate all String/File ctors/opens in IndexReader/IndexWriter/IndexSearcher

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715962#action_12715962 ] Earwin Burrfoot commented on LUCENE-1672: - bq. And DirectoryIR/MSR still have this

[jira] Commented: (LUCENE-1672) Deprecate all String/File ctors/opens in IndexReader/IndexWriter/IndexSearcher

2009-06-03 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715949#action_12715949 ] Uwe Schindler commented on LUCENE-1672: --- Nice. And DirectoryIR/MSR still have this F

[jira] Commented: (LUCENE-1672) Deprecate all String/File ctors/opens in IndexReader/IndexWriter/IndexSearcher

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715944#action_12715944 ] Earwin Burrfoot commented on LUCENE-1672: - bq. I will later try to solve this prob

[jira] Issue Comment Edited: (LUCENE-1672) Deprecate all String/File ctors/opens in IndexReader/IndexWriter/IndexSearcher

2009-06-03 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715936#action_12715936 ] Uwe Schindler edited comment on LUCENE-1672 at 6/3/09 7:26 AM: -

[jira] Commented: (LUCENE-1672) Deprecate all String/File ctors/opens in IndexReader/IndexWriter/IndexSearcher

2009-06-03 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715936#action_12715936 ] Uwe Schindler commented on LUCENE-1672: --- With IndexModifier, you are right. I was ju

[jira] Commented: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715935#action_12715935 ] Michael McCandless commented on LUCENE-1651: Let's just go w/ DirectoryReader?

Re: EnwikiDocMaker

2009-06-03 Thread Shai Erera
Then perhaps as part of 1595 I can change it to use Java's XML parser, and test the Enwiki file. If all goes well, we may not need the XERCES jar in benchmark? Anyway, I'll check that too On Wed, Jun 3, 2009 at 1:59 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > I also don't know wh

[jira] Commented: (LUCENE-1672) Deprecate all String/File ctors/opens in IndexReader/IndexWriter/IndexSearcher

2009-06-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715913#action_12715913 ] Michael McCandless commented on LUCENE-1672: Patch looks good Uwe! You don't

IR static methods

2009-06-03 Thread Earwin Burrfoot
I have a strong desire to remove all these static methods from IR - lastModified, getCurrentVersion, getCommitUserData, indexExists. But haven't found a good place for them yet. Directory - is a bad place, it shouldn't concern itself with details of what exactly is stored inside, it should think o

Re: svn commit: r781333 - in /lucene/java/trunk/src/java/org/apache/lucene: index/IndexReader.java index/IndexWriter.java store/RAMDirectory.java

2009-06-03 Thread Michael McCandless
Woops, sorry, I had forgotten you had this issue open. Thanks for merging... Mike On Wed, Jun 3, 2009 at 7:45 AM, Uwe Schindler wrote: > I wonder about this commit, I started an issue for that: > https://issues.apache.org/jira/browse/LUCENE-1672 > > I will merge the changes and attach a new pat

[jira] Updated: (LUCENE-1672) Deprecate all String/File ctors/opens in IndexReader/IndexWriter/IndexSearcher

2009-06-03 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1672: -- Attachment: LUCENE-1672.patch Updated patch merged with Mike's last commit. > Deprecate all S

RE: svn commit: r781333 - in /lucene/java/trunk/src/java/org/apache/lucene: index/IndexReader.java index/IndexWriter.java store/RAMDirectory.java

2009-06-03 Thread Uwe Schindler
I wonder about this commit, I started an issue for that: https://issues.apache.org/jira/browse/LUCENE-1672 I will merge the changes and attach a new patch there. There are more deprecations needed (methods that use these now deprecated methods under the hood, e.g. in IndexSearcher). Uwe - Uw

[jira] Commented: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-03 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715908#action_12715908 ] Earwin Burrfoot commented on LUCENE-1651: - bq. Patch looks good Earwin, thanks! I

Enhance StandardTokenizer to support words which will not be tokenized

2009-06-03 Thread ami dudu
Hi, I'm using a StandardTokenizer which do great job for me but i need to enhance it somehow to consider words like "c++" "c#", ".net" as is and not tokenized it into "c" or "net". I know that there are other tokenizers such as KeywordTokenizer and WhitespaceTokenizer but they do not include the S

[jira] Commented: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715900#action_12715900 ] Michael McCandless commented on LUCENE-1651: OK, I had one hunk fail in Segmen

Re: EnwikiDocMaker

2009-06-03 Thread Michael McCandless
I also don't know why it's specifically using Xerces... Mike On Wed, Jun 3, 2009 at 4:26 AM, Shai Erera wrote: > Grant, note that I'm changing the DocMakers in LUCENE-1595 including this > one. So whatever the decision is following your question, I can do it as > part of this issue, since that c

[jira] Commented: (LUCENE-1651) Make IndexReader.open() always return MSR to simplify (re-)opens.

2009-06-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715888#action_12715888 ] Michael McCandless commented on LUCENE-1651: Hmm -- let me figure out what hap

Re: EnwikiDocMaker

2009-06-03 Thread Shai Erera
Grant, note that I'm changing the DocMakers in LUCENE-1595 including this one. So whatever the decision is following your question, I can do it as part of this issue, since that code will no longer be in EnwikiDocMaker. Regarding to your question, I don't know why it should depend on Xerces (rathe

Re: Question on CachingWrapperFilter

2009-06-03 Thread Shai Erera
Thanks Paul ! I'll work such a utility (which takes a Filter and reads it into an OpenBitSet, SortedVIntList) and then post back in case you'll be interested in adopting it, and change CWF to use it, or something else. Shai On Tue, Jun 2, 2009 at 9:35 PM, Paul Elschot wrote: > On Tuesday 02 Ju

[jira] Commented: (LUCENE-1491) EdgeNGramTokenFilter stops on tokens smaller then minimum gram size.

2009-06-03 Thread viobade (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715849#action_12715849 ] viobade commented on LUCENE-1491: - I think is better to keep the main goal of ngram: group