Fwd: Re: possible bug with indexing with term vectors

2007-09-29 Thread Michael McCandless
I forgot to CC java-dev in my response: Mike "Michael McCandless" <[EMAIL PROTECTED]> wrote: > "Andi Vajda" <[EMAIL PROTECTED]> wrote: > > > > On Fri, 28 Sep 2007, Michael McCandless wrote: > > > > >> I tried all morning to isolate the problem but I seem to be unable > > >> to reproduce it in a

[jira] Commented: (LUCENE-766) Two same new field with and without Term vector make an IllegalStateException

2007-09-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531237 ] Grant Ingersoll commented on LUCENE-766: Hi Nicolas, Can you still produce this against the trunk now with t

Re: possible bug with indexing with term vectors

2007-09-29 Thread Grant Ingersoll
Hmmm, not sure, but in looking at DocumentsWriter, it seems like lines around 553 might be at issue: if (tvx != null) { tvx.writeLong(tvd.getFilePointer()); if (numVectorFields > 0) { tvd.writeVInt(numVectorFields); for(int i=0;iSpecifically, the exception bei

[jira] Commented: (LUCENE-766) Two same new field with and without Term vector make an IllegalStateException

2007-09-29 Thread JIRA
[ https://issues.apache.org/jira/browse/LUCENE-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531239 ] Nicolas Lalevée commented on LUCENE-766: well, I have attached a patch with a test. Add the test and you'll s

Re: possible bug with indexing with term vectors

2007-09-29 Thread Michael McCandless
You are right Grant -- good catch!!! I have a unit test showing it now. Thank you :) So, this case is tickled if you have a doc (or docs) that have some fields with term vectors enabled, but then later as part of the same buffered set of docs you have 1 or more docs that have no fields with ter

[jira] Created: (LUCENE-1008) document with no term vector fields after documents with term vector fields corrupts the index

2007-09-29 Thread Michael McCandless (JIRA)
document with no term vector fields after documents with term vector fields corrupts the index -- Key: LUCENE-1008 URL: https://issues.apache.org/jira/browse/LUCENE-1008

[jira] Resolved: (LUCENE-1008) document with no term vector fields after documents with term vector fields corrupts the index

2007-09-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1008. Resolution: Fixed Andi could you test with this fix? But, if this fixes your prob

Re: possible bug with indexing with term vectors

2007-09-29 Thread Grant Ingersoll
There are a couple of JIRA issues related to TVs as well, mostly edge cases, but Andi might want to take a look at them to see if they describe his situation. -Grant On Sep 29, 2007, at 8:35 AM, Michael McCandless wrote: You are right Grant -- good catch!!! I have a unit test showing it

[jira] Resolved: (LUCENE-1006) QueryParser doesn't accept empty string

2007-09-29 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved LUCENE-1006. -- Resolution: Fixed committed. > QueryParser doesn't accept empty string >

Re: possible bug with indexing with term vectors

2007-09-29 Thread Andi Vajda
On Sat, 29 Sep 2007, Michael McCandless wrote: The new PyLucene is built with a code generator and all public APIs and classes are made available to Python. SerialMergeScheduler is available. Wild! Does this mean PyLucene will track tightly to Lucene releases going forward? Yes, even more

[jira] Commented: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

2007-09-29 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531259 ] Mark Miller commented on LUCENE-994: Sorry for the delay. Here is the debug output. As I said, I am actually writ

[jira] Commented: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

2007-09-29 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531260 ] Mark Miller commented on LUCENE-994: Sorry for the delay. Here is the debug output. As I said, I am actually writ

[jira] Commented: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

2007-09-29 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531261 ] Mark Miller commented on LUCENE-994: Sorry for the delay. Here is the debug output. As I said, I am actually writ

[jira] Commented: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

2007-09-29 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531262 ] Mark Miller commented on LUCENE-994: Sorry for the delay. Here is the debug output. As I said, I am actually writ

[jira] Updated: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

2007-09-29 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-994: --- Attachment: writerinfo.zip > Change defaults in IndexWriter to maximize "out of the box" performance

[jira] Updated: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

2007-09-29 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-994: --- Comment: was deleted > Change defaults in IndexWriter to maximize "out of the box" performance >

[jira] Updated: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

2007-09-29 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-994: --- Comment: was deleted > Change defaults in IndexWriter to maximize "out of the box" performance >

[jira] Updated: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

2007-09-29 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-994: --- Comment: was deleted > Change defaults in IndexWriter to maximize "out of the box" performance >

Re: possible bug with indexing with term vectors

2007-09-29 Thread Andi Vajda
On Sat, 29 Sep 2007, Andi Vajda wrote: Ok, this could explain why the test is passing. In the test I only do one batch of indexing, not several like here. I missed that difference. My apologies. I'm going to change my test now and report back... I finally isolated the bug into a simple Java

Re: possible bug with indexing with term vectors

2007-09-29 Thread Andi Vajda
On Sat, 29 Sep 2007, Andi Vajda wrote: I finally isolated the bug into a simple Java unit test. It indeed had to do with doing multiple batches of document additions. I forgot to say that I ran this with the most recent fixes for bug 1008. To be precise, lucene svn rev 580605. Andi.. --

[jira] Commented: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

2007-09-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531269 ] Michael McCandless commented on LUCENE-994: --- Thanks Mark! OK, I noticed a few things from the logs: * I

[jira] Commented: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

2007-09-29 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531270 ] Mark Miller commented on LUCENE-994: Anytime Michael. Thanks for pointing out the mergefactor issue to me. I rec

[jira] Created: (LUCENE-1009) LogByteSizeMergePolicy over-merges with autoCommit=false and documents with term vectors and/or stored fields

2007-09-29 Thread Michael McCandless (JIRA)
LogByteSizeMergePolicy over-merges with autoCommit=false and documents with term vectors and/or stored fields - Key: LUCENE-1009 URL: https://issues.apache

[jira] Resolved: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

2007-09-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-994. --- Resolution: Fixed Marking this as fixed again; I opened LUCENE-1009 for the slowdown

[jira] Commented: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

2007-09-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531273 ] Michael McCandless commented on LUCENE-994: --- > Anytime Michael. Thanks for pointing out the mergefactor iss

[jira] Created: (LUCENE-1010) Document with no term vectors mixed with ones that have term vectors cause EOFException during merge

2007-09-29 Thread Michael McCandless (JIRA)
Document with no term vectors mixed with ones that have term vectors cause EOFException during merge Key: LUCENE-1010 URL: https://issues.apache.org/jira/browse/L

[jira] Updated: (LUCENE-1010) Document with no term vectors mixed with ones that have term vectors cause EOFException during merge

2007-09-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1010: --- Description: Another spinoff from here: http://www.gossamer-threads.com/lists/luc

[jira] Resolved: (LUCENE-1010) Document with no term vectors mixed with ones that have term vectors cause EOFException during merge

2007-09-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1010. Resolution: Fixed > Document with no term vectors mixed with ones that have term v

[jira] Resolved: (LUCENE-1009) LogByteSizeMergePolicy over-merges with autoCommit=false and documents with term vectors and/or stored fields

2007-09-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1009. Resolution: Fixed > LogByteSizeMergePolicy over-merges with autoCommit=false and d

[jira] Created: (LUCENE-1011) Two or more writers over NFS can cause index corruption

2007-09-29 Thread Michael McCandless (JIRA)
Two or more writers over NFS can cause index corruption --- Key: LUCENE-1011 URL: https://issues.apache.org/jira/browse/LUCENE-1011 Project: Lucene - Java Issue Type: Bug Componen