Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Doron Cohen
On Dec 31, 2007 7:54 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > I actually think indexing should try to be as robust as possible. You > could test like crazy and never hit a massive term, go into production > (say, ship your app to lots of your customer's computers) only to > suddenly se

Build failed in Hudson: Lucene-Nightly #321

2007-12-31 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/321/changes -- [...truncated 866 lines...] A contrib/db/bdb-je/src/java A contrib/db/bdb-je/src/java/org A contrib/db/bdb-je/src/java/org/apache A contrib/db/bd

Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Michael McCandless
Grant Ingersoll wrote: On Dec 31, 2007, at 12:54 PM, Michael McCandless wrote: I actually think indexing should try to be as robust as possible. You could test like crazy and never hit a massive term, go into production (say, ship your app to lots of your customer's computers) only to su

[jira] Commented: (LUCENE-1114) contrib/Highlighter javadoc example needs to be updated

2007-12-31 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555145 ] Grant Ingersoll commented on LUCENE-1114: - It also only demonstrates using the Analyzer to get the tokenStre

[jira] Created: (LUCENE-1114) contrib/Highlighter javadoc example needs to be updated

2007-12-31 Thread Grant Ingersoll (JIRA)
contrib/Highlighter javadoc example needs to be updated --- Key: LUCENE-1114 URL: https://issues.apache.org/jira/browse/LUCENE-1114 Project: Lucene - Java Issue Type: Bug Componen

Re: Let's release Lucene 2.3 soon?

2007-12-31 Thread Michael Busch
Michael McCandless wrote: > I just opened a new issue, which I think should be fixed for 2.3, to > fix IndexWriter.add/updateDocument to not "partially add" a document > when an exception is hit: > > https://issues.apache.org/jira/browse/LUCENE-1112 > > I'll try to work out a patch by Thu but i

Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Grant Ingersoll
On Dec 31, 2007, at 12:54 PM, Michael McCandless wrote: I actually think indexing should try to be as robust as possible. You could test like crazy and never hit a massive term, go into production (say, ship your app to lots of your customer's computers) only to suddenly see this exception. I

[jira] Commented: (LUCENE-1112) Document is partially indexed on an unhandled exception

2007-12-31 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555119 ] Michael McCandless commented on LUCENE-1112: Thanks Doron; I'll fold this in (though, I'll move it to th

Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Yonik Seeley
On Dec 31, 2007 12:54 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > I actually think indexing should try to be as robust as possible. You > could test like crazy and never hit a massive term, go into production > (say, ship your app to lots of your customer's computers) only to > suddenly se

Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Michael McCandless
I actually think indexing should try to be as robust as possible. You could test like crazy and never hit a massive term, go into production (say, ship your app to lots of your customer's computers) only to suddenly see this exception. In general it could be a long time before you "accidentally"

Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Yonik Seeley
On Dec 31, 2007 12:25 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Sure, but I mean in the >16K (in other words, in the case where > DocsWriter fails, which presumably only DocsWriter knows about) case. > I want the option to ignore tokens larger than that instead of failing/ > throwing an exce

Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Grant Ingersoll
On Dec 31, 2007, at 12:11 PM, Yonik Seeley wrote: On Dec 31, 2007 11:59 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: On Dec 31, 2007, at 11:44 AM, Yonik Seeley wrote: I meant (1)... it leaves the core smaller. I don't see any reason to have logic to truncate or discard tokens in the cor

Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Yonik Seeley
On Dec 31, 2007 11:59 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Dec 31, 2007, at 11:44 AM, Yonik Seeley wrote: > > I meant (1)... it leaves the core smaller. > > I don't see any reason to have logic to truncate or discard tokens in > > the core indexing code (except to handle tokens >16

Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Grant Ingersoll
On Dec 31, 2007, at 11:44 AM, Yonik Seeley wrote: On Dec 31, 2007 11:37 AM, Doron Cohen <[EMAIL PROTECTED]> wrote: On Dec 31, 2007 6:10 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: I think I like the 3'rd option - is this what you meant? I meant (1)... it leaves the core smaller. I don't se

Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Yonik Seeley
On Dec 31, 2007 11:37 AM, Doron Cohen <[EMAIL PROTECTED]> wrote: > > On Dec 31, 2007 6:10 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > On Dec 31, 2007 5:53 AM, Michael McCandless <[EMAIL PROTECTED]> > > wrote: > > > Doron Cohen <[EMAIL PROTECTED]> wrote: > > > > I like the approach of configur

Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Doron Cohen
On Dec 31, 2007 6:10 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Dec 31, 2007 5:53 AM, Michael McCandless <[EMAIL PROTECTED]> > wrote: > > Doron Cohen <[EMAIL PROTECTED]> wrote: > > > I like the approach of configuration of this behavior in Analysis > > > (and so IndexWriter can throw an exce

Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Yonik Seeley
On Dec 31, 2007 5:53 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Doron Cohen <[EMAIL PROTECTED]> wrote: > > I like the approach of configuration of this behavior in Analysis > > (and so IndexWriter can throw an exception on such errors). > > > > It seems that this should be a property of An

[jira] Commented: (LUCENE-1113) fix for Document.getBoost() documentation

2007-12-31 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555110 ] Doron Cohen commented on LUCENE-1113: - How about: {noformat} Returns, at indexing time, the boost factor as s

Fuzzy makes no sense for short tokens

2007-12-31 Thread Timo Nentwig
Hi! it generally makes no sense to search fuzzy for short tokens because changing even only a single character of course already results in a high edit distance. So it actually only makes sense in this case: if( token.length() > 1f / (1f - minSimilarity) ) E.g. changing one characte

[jira] Updated: (LUCENE-1113) fix for Document.getBoost() documentation

2007-12-31 Thread Daniel Naber (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-1113: - Attachment: document-getboost.diff > fix for Document.getBoost() documentation > ---

[jira] Created: (LUCENE-1113) fix for Document.getBoost() documentation

2007-12-31 Thread Daniel Naber (JIRA)
fix for Document.getBoost() documentation - Key: LUCENE-1113 URL: https://issues.apache.org/jira/browse/LUCENE-1113 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Ver

Re: Let's release Lucene 2.3 soon?

2007-12-31 Thread Grant Ingersoll
On Dec 30, 2007, at 1:02 PM, Michael Busch wrote: Grant Ingersoll wrote: On Dec 30, 2007, at 6:29 AM, Michael Busch wrote: In this time period only critical/blocking issues and documentation patches can be committed to the branch. I'd add that we should make some effort to clean up old J

[jira] Resolved: (LUCENE-1102) EnwikiDocMaker id field

2007-12-31 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved LUCENE-1102. - Resolution: Fixed Lucene Fields: (was: [New]) Committed > EnwikiDocMaker id fi

[jira] Resolved: (LUCENE-458) Merging may create duplicates if the JVM crashes half way through

2007-12-31 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch resolved LUCENE-458. -- Resolution: Duplicate The problem here apparently is that when the JVM crashed not all files ar

[jira] Updated: (LUCENE-1112) Document is partially indexed on an unhandled exception

2007-12-31 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1112: Attachment: lucene-1112-test.patch Patch demonstrating the problem: testWickedLongTerm() modified

[jira] Resolved: (LUCENE-488) adding docs with large (binary) fields of 5mb causes OOM regardless of heap size

2007-12-31 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-488. Resolution: Fixed This problem was resolved by LUCENE-843, after which stored fields are written d

Re: DocumentsWriter.checkMaxTermLength issues

2007-12-31 Thread Michael McCandless
Doron Cohen <[EMAIL PROTECTED]> wrote: > I like the approach of configuration of this behavior in Analysis > (and so IndexWriter can throw an exception on such errors). > > It seems that this should be a property of Analyzer vs. > just StandardAnalyzer, right? > > It can probably be a "policy" prop

Re: Let's release Lucene 2.3 soon?

2007-12-31 Thread Michael McCandless
I just opened a new issue, which I think should be fixed for 2.3, to fix IndexWriter.add/updateDocument to not "partially add" a document when an exception is hit: https://issues.apache.org/jira/browse/LUCENE-1112 I'll try to work out a patch by Thu but it may be tight... Mike Michael Busch <

[jira] Created: (LUCENE-1112) Document is partially indexed on an unhandled exception

2007-12-31 Thread Michael McCandless (JIRA)
Document is partially indexed on an unhandled exception --- Key: LUCENE-1112 URL: https://issues.apache.org/jira/browse/LUCENE-1112 Project: Lucene - Java Issue Type: Bug Componen

[jira] Resolved: (LUCENE-1095) StopFilter should have option to incr positionIncrement after stop word

2007-12-31 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-1095. - Resolution: Fixed Lucene Fields: [Patch Available] (was: [New]) Committed . (already yes