[jira] Commented: (LUCENE-1040) Can't quickly create StopFilter

2007-11-01 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539469 ] Hoss Man commented on LUCENE-1040: -- > But it does if the Set is not a CharArraySet. > Docs should be clearer though

[jira] Created: (LUCENE-1042) discrepancy in getTermFreqVector-methods

2007-11-01 Thread Karl Wettin (JIRA)
discrepancy in getTermFreqVector-methods - Key: LUCENE-1042 URL: https://issues.apache.org/jira/browse/LUCENE-1042 Project: Lucene - Java Issue Type: Bug Components: Term Vectors Affects

[jira] Commented: (LUCENE-935) Improve maven artifacts

2007-11-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539446 ] Grant Ingersoll commented on LUCENE-935: well, I don't know that I have a lot of control over it unless I wan

Re: another merge improvement?

2007-11-01 Thread robert engels
Sorry, ignore this... I work of the 1.9 codebase. I see that 2.2 already has this... On Nov 1, 2007, at 1:20 PM, robert engels wrote: Is there a way to get a IndexInput with a larger buffer "sometimes"? Since the fields/documents and terms processing are basically sequential, it would seem

another merge improvement?

2007-11-01 Thread robert engels
Is there a way to get a IndexInput with a larger buffer "sometimes"? Since the fields/documents and terms processing are basically sequential, it would seem that merging would benefit from a much larger buffer size. The recent changes alleviate this a bit for document merging, since the d

[jira] Commented: (LUCENE-935) Improve maven artifacts

2007-11-01 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539425 ] Michael Busch commented on LUCENE-935: -- Grant, two comments: - How often are you planning two publish a snapshot

[jira] Commented: (LUCENE-1040) Can't quickly create StopFilter

2007-11-01 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539423 ] Yonik Seeley commented on LUCENE-1040: -- > If the StopFilter constructor that takes in a Set no longer needs the

[jira] Commented: (LUCENE-1040) Can't quickly create StopFilter

2007-11-01 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539422 ] Yonik Seeley commented on LUCENE-1040: -- > I noticed that in your patch you now assume that String.hashCode() is

Re: possible segment merge improvement?

2007-11-01 Thread robert engels
I have looked into modifying FieldInfos to keep the fields sorted by field name, so the user would not be forced to add the fields in the same order. Sparse documents are really not a problem. Since after the first merge of that document it will pickup the other fields from the other segm

Re: possible segment merge improvement?

2007-11-01 Thread Marvin Humphrey
On Nov 1, 2007, at 7:10 AM, Yonik Seeley wrote: Does "all docs have matching fields" mean that the fields must be present (as well as identically typed) on each doc, or could they still be sparse? If they can be sparse, how do you avoid renumbering??? The fields still get renumbered, but if

Re: possible segment merge improvement?

2007-11-01 Thread Marvin Humphrey
On Nov 1, 2007, at 3:04 AM, Michael McCandless wrote: In KinoSearch, merging of stored fields & term vectors is always a fast concatenation of the entry for that document, whereas Lucene must re-interpret/re-number all fields on the doc, in general. In fact I think that KinoSearch stores field

[jira] Commented: (LUCENE-935) Improve maven artifacts

2007-11-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539405 ] Grant Ingersoll commented on LUCENE-935: I have the necessary pieces in place for this, just need to figure o

Re: possible segment merge improvement?

2007-11-01 Thread Yonik Seeley
On 11/1/07, Doron Cohen <[EMAIL PROTECTED]> wrote: > My reading of Robert's suggestion is that when we know that > FieldInfos of the resulted segment is identical to the > FieldInfos of a certain (sub) segment being merged then > there is no need to parse+rewrite the field data for all > docs of th

Re: possible segment merge improvement?

2007-11-01 Thread Doron Cohen
[EMAIL PROTECTED] wrote on 01/11/2007 16:10:27: > > If we make this change to Lucene then for those apps that effectively > > have a static field schema (because all docs always have matching > > fields), we can get the same performance that KinoSearch always gets > > during its merging of stored

Re: possible segment merge improvement?

2007-11-01 Thread Yonik Seeley
On 11/1/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > "robert engels" <[EMAIL PROTECTED]> wrote: > > > Why not check the fields dictionary for the segments being merged, > > and if the same, just copy the binary data directly? > > +1 > > While Lucene does not have a global field schema/semant

[jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-11-01 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539359 ] Michael McCandless commented on LUCENE-743: --- {quote} Hmm I was thinking about this before (that's actually

Re: possible segment merge improvement?

2007-11-01 Thread Michael McCandless
"robert engels" <[EMAIL PROTECTED]> wrote: > Why not check the fields dictionary for the segments being merged, > and if the same, just copy the binary data directly? +1 While Lucene does not have a global field schema/semantics, unlike eg KinoSearch, I think for many apps the fields are in fact

[jira] Updated: (LUCENE-1040) Can't quickly create StopFilter

2007-11-01 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1040: --- Attachment: CharArraySet.take2.patch Woops, you're right, sorry about that, and tha