Re: [jira] Commented: (LUCENE-781) NPE in MultiReader.isCurrent() and getVersion()

2007-07-24 Thread Chris Hostetter
: I think the cleanest solution here is it to separate MultiReader into two : classes: MultiSegmentReader (package-protected) and MultiReader : (public) that extends MultiSegmentReader. i'm currently on vacation and don't have as much time to review patches as i'd like (not because i'm on

[jira] Commented: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.

2007-07-24 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514883 ] Karl Wettin commented on LUCENE-841: I just hit this again while compiling. Is there one of the strategies in

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-07-24 Thread Jason van Zyl
To whom it may engage... This is an automated request, but not an unsolicited one. For more information please visit http://gump.apache.org/nagged.html, and/or contact the folk at [EMAIL PROTECTED] Project lucene-java has an issue affecting its community integration. This issue affects

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-07-24 Thread Jason van Zyl
To whom it may engage... This is an automated request, but not an unsolicited one. For more information please visit http://gump.apache.org/nagged.html, and/or contact the folk at [EMAIL PROTECTED] Project lucene-java has an issue affecting its community integration. This issue affects

[jira] Commented: (LUCENE-868) Making Term Vectors more accessible

2007-07-24 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514963 ] Paul Elschot commented on LUCENE-868: - I just got this warning from ant javadocs-internal:

[jira] Commented: (LUCENE-868) Making Term Vectors more accessible

2007-07-24 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515004 ] Grant Ingersoll commented on LUCENE-868: Thanks, Paul, I have added a comment. Making Term Vectors more

[jira] Commented: (LUCENE-868) Making Term Vectors more accessible

2007-07-24 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515024 ] Paul Elschot commented on LUCENE-868: - [

[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow

2007-07-24 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515053 ] Grant Ingersoll commented on LUCENE-871: Right, but StringBuilder is 1.5. Sigh... ISOLatin1AccentFilter a

Re: [jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow

2007-07-24 Thread eks dev
I do not use it, but it looks rather simple to make it with some kind of simple CharArray enveloping char[], just some careful growth policy and will work as fast as it gets - Original Message From: Grant Ingersoll (JIRA) [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Tuesday,

[jira] Commented: (LUCENE-781) NPE in MultiReader.isCurrent() and getVersion()

2007-07-24 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515057 ] Michael Busch commented on LUCENE-781: -- except that it makes the javadoc a bit odd, since the non-public class

[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow

2007-07-24 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515060 ] Mark Miller commented on LUCENE-871: Yes, I feel that sigh. So perhaps the point is moot. I was just thinking

Re: Token termBuffer issues

2007-07-24 Thread Michael McCandless
OK, I ran some benchmarks here. The performance gains are sizable: 12.8% speedup using Sun's JDK 5 and 17.2% speedup using Sun's JDK 6, on Linux. This is indexing all Wikipedia content using LowerCaseTokenizer + StopFilter + PorterStemFilter. I think it's worth pursuing! Here are the

Re: Token termBuffer issues

2007-07-24 Thread Michael McCandless
Doron Cohen [EMAIL PROTECTED] wrote: Agreed, so we can't change the API. So the next/nextDirect proposal should work well: it doesn't change the API yet would allow consumers that don't require full private copy of every Token, like DocumentsWriter, to have good performance. If we

Re: Best Practices for getting Strings from a position range

2007-07-24 Thread Grant Ingersoll
Sorry, Peter, I haven't had a chance to work on it. I don't see it happening this week, but maybe next. I do think the Mapper approach via TermVectors will work. It will require implementing a new mapper that orders by position, but I don't think that is too hard. I started on one on

[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow

2007-07-24 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515077 ] Michael McCandless commented on LUCENE-871: --- I think we can likely get a sizable speedup here by using the

Re: Token termBuffer issues

2007-07-24 Thread Yonik Seeley
On 7/24/07, Michael McCandless [EMAIL PROTECTED] wrote: OK, I ran some benchmarks here. The performance gains are sizable: 12.8% speedup using Sun's JDK 5 and 17.2% speedup using Sun's JDK 6, on Linux. This is indexing all Wikipedia content using LowerCaseTokenizer + StopFilter +

[jira] Commented: (LUCENE-947) Some improvements to contrib/benchmark

2007-07-24 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515087 ] Doron Cohen commented on LUCENE-947: It looks good Michael! (I like the time printing.) Mmm one test issue in

The JDK 1.5 Can o' Worms

2007-07-24 Thread Grant Ingersoll
Well, it has been over a year since we have had the 1.5 debate (see http://www.gossamer-threads.com/lists/lucene/java-dev/35972? search_string=Java%201.5;#35972) and I think it is time we start accepting 1.5 code. Nutch, Solr, Hadoop all use JDK 1.5 and I imagine Tika will as well (and no,

Re: Token termBuffer issues

2007-07-24 Thread Michael McCandless
Yonik Seeley [EMAIL PROTECTED] wrote: On 7/24/07, Michael McCandless [EMAIL PROTECTED] wrote: OK, I ran some benchmarks here. The performance gains are sizable: 12.8% speedup using Sun's JDK 5 and 17.2% speedup using Sun's JDK 6, on Linux. This is indexing all Wikipedia content using

[jira] Commented: (LUCENE-947) Some improvements to contrib/benchmark

2007-07-24 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515099 ] Michael McCandless commented on LUCENE-947: --- Mmm one test issue in Windows - the reading of linFile in

Re: Token termBuffer issues

2007-07-24 Thread Doron Cohen
Michael McCandless wrote: boolean next(Token resToken) which returns true if it has updated resToken with another token, else false if end-of-stream was hit. I would actually prefer Token next(Token resToken) because: - this was the API with reuse is very much like the one without

[jira] Updated: (LUCENE-871) ISOLatin1AccentFilter a bit slow

2007-07-24 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-871: --- Attachment: fasterisoremove1.patch Here is a 1.4 solution using a char array. Doesn't yet use

Re: Best Practices for getting Strings from a position range

2007-07-24 Thread Peter Keegan
Hi Grant, No problem - I know you are very busy. I just wanted to get a sense for the timing because I'd like to use this for a release this Fall. If I can get a prototype working in the coming weeks AND the performance is great :) , this would be terrific. If not, I'll have to fall back on a

[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow

2007-07-24 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515103 ] Mark Miller commented on LUCENE-871: Woah! I thought I was seeing this patch on the trunk already. Should have

Re: Token termBuffer issues

2007-07-24 Thread Michael McCandless
Doron Cohen [EMAIL PROTECTED] wrote: Michael McCandless wrote: boolean next(Token resToken) which returns true if it has updated resToken with another token, else false if end-of-stream was hit. I would actually prefer Token next(Token resToken) because: - this was the API

Re: Token termBuffer issues

2007-07-24 Thread Yonik Seeley
On 7/24/07, Michael McCandless [EMAIL PROTECTED] wrote: Yonik Seeley [EMAIL PROTECTED] wrote: On 7/24/07, Michael McCandless [EMAIL PROTECTED] wrote: OK, I ran some benchmarks here. The performance gains are sizable: 12.8% speedup using Sun's JDK 5 and 17.2% speedup using Sun's JDK 6,

Re: The JDK 1.5 Can o' Worms

2007-07-24 Thread Andi Vajda
On Tue, 24 Jul 2007, Grant Ingersoll wrote: Well, it has been over a year since we have had the 1.5 debate (see http://www.gossamer-threads.com/lists/lucene/java-dev/35972?search_string=Java%201.5;#35972) and I think it is time we start accepting 1.5 code. Nutch, Solr, Hadoop all use JDK 1.5

Re: The JDK 1.5 Can o' Worms

2007-07-24 Thread Bill Janssen
Grant Ingersoll writes: I also believe all committers and all contributors are using 1.5 already for there environment. I would also _guess_ the large majority of our users are on 1.5. Now, I know, it isn't a big deal to run 1.4 code in 1.5, but it is annoying for development and

[jira] Updated: (LUCENE-871) ISOLatin1AccentFilter a bit slow

2007-07-24 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-871: --- Attachment: fasterisoremove2.patch Since I prefer my char array handling to the current, I have

[jira] Issue Comment Edited: (LUCENE-871) ISOLatin1AccentFilter a bit slow

2007-07-24 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515111 ] Mark Miller edited comment on LUCENE-871 at 7/24/07 3:51 PM: - Since I prefer my char

Re: The JDK 1.5 Can o' Worms

2007-07-24 Thread Grant Ingersoll
On Jul 24, 2007, at 6:26 PM, Bill Janssen wrote: Grant Ingersoll writes: I also believe all committers and all contributors are using 1.5 already for there environment. I would also _guess_ the large majority of our users are on 1.5. Now, I know, it isn't a big deal to run 1.4 code in 1.5,

Re: The JDK 1.5 Can o' Worms

2007-07-24 Thread DM Smith
On Jul 24, 2007, at 7:00 PM, Grant Ingersoll wrote: I am going to guess that GCJ will always be significantly behind Sun's Java, There is an effort to release OpenJDK. That will be Java 1.7 (my cynicism is perhaps later). I can't find the web page now, but it appears that it will stall

Re: The JDK 1.5 Can o' Worms

2007-07-24 Thread Doug Cutting
Bill Janssen wrote: The big issue wasn't whether developers and application users were using Sun's Java 1.5, it was gcj and where it was. Several of the downstream packages of Lucene involves gcj instead of Sun Java, because gcj provides different functionality. GCJ is also available on a