Re: possible TermInfosReader speedup

2009-04-08 Thread Michael Busch
On 4/8/09 2:08 PM, Earwin Burrfoot wrote: On Thu, Apr 9, 2009 at 00:14, Michael McCandless wrote: On Wed, Apr 8, 2009 at 3:46 PM, Earwin Burrfoot wrote: Currently, when we're seeking a given Term, it does a binary search across all term space, including terms belonging to other fi

ArrayUtils.getNextSize

2009-04-08 Thread Shai Erera
Hi I used ArrayUtils.getNextSize recently to expand an array to a new size. When I read the documentation (the inline in the method), I saw this: "The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ..." I'm not sure if I'm misunderstanding the comment, or if there is a bug in the imple

Re: possible TermInfosReader speedup

2009-04-08 Thread Earwin Burrfoot
On Thu, Apr 9, 2009 at 02:01, Uwe Schindler wrote: >> >> Also, on the other topic - how hard is it to boost >> >> TermEnum.skipTo(term) speed to IndexReader.terms(term) level? Would be >> >> nice for TrieRangeFilter and probably some other filters. >> > I think all that's needed is to implement Se

RE: possible TermInfosReader speedup

2009-04-08 Thread Uwe Schindler
> >> Also, on the other topic - how hard is it to boost > >> TermEnum.skipTo(term) speed to IndexReader.terms(term) level? Would be > >> nice for TrieRangeFilter and probably some other filters. > > I think all that's needed is to implement SegmentTermEnum.skipTo, > > calling something like tis.ter

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697233#action_12697233 ] Michael McCandless commented on LUCENE-1591: After some iterations on XERCESJ-

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697238#action_12697238 ] Uwe Schindler commented on LUCENE-1590: --- bq. Since FieldInfos is per-segment, one ch

[jira] Commented: (LUCENE-1313) Realtime Search

2009-04-08 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697226#action_12697226 ] Jason Rutherglen commented on LUCENE-1313: -- {quote} Still, it's synthetic. If you

[jira] Updated: (LUCENE-1539) Improve Benchmark

2009-04-08 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1539: - Attachment: LUCENE-1539.patch Above mentioned issues fixed. It seems a bit awkward that

Re: possible TermInfosReader speedup

2009-04-08 Thread Earwin Burrfoot
On Thu, Apr 9, 2009 at 00:14, Michael McCandless wrote: > On Wed, Apr 8, 2009 at 3:46 PM, Earwin Burrfoot wrote: > >> Currently, when we're seeking a given Term, it does a binary search >> across all term space, including terms belonging to other fields. >> I propose augmenting fields file with t

[jira] Commented: (LUCENE-1284) Set of Java classes that allow the Lucene search engine to use morphological information developed for the Apertium open-source machine translation platform (http://www

2009-04-08 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697208#action_12697208 ] Otis Gospodnetic commented on LUCENE-1284: -- One more for Felipe. Is there a page

[jira] Commented: (LUCENE-1284) Set of Java classes that allow the Lucene search engine to use morphological information developed for the Apertium open-source machine translation platform (http://www

2009-04-08 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697185#action_12697185 ] Otis Gospodnetic commented on LUCENE-1284: -- Felipe: I took another look at this.

Re: possible TermInfosReader speedup

2009-04-08 Thread Michael McCandless
On Wed, Apr 8, 2009 at 3:46 PM, Earwin Burrfoot wrote: > Currently, when we're seeking a given Term, it does a binary search > across all term space, including terms belonging to other fields. > I propose augmenting fields file with two pointers (firstTerm, > lastTerm) for each field. That reduce

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697164#action_12697164 ] Michael McCandless commented on LUCENE-1591: So, after upgrading to xerces 2.9

possible TermInfosReader speedup

2009-04-08 Thread Earwin Burrfoot
Currently, when we're seeking a given Term, it does a binary search across all term space, including terms belonging to other fields. I propose augmenting fields file with two pointers (firstTerm, lastTerm) for each field. That reduces range we need to search, and instead of comparing Terms we only

[jira] Resolved: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1561. Resolution: Fixed > Maybe rename Field.omitTf, and strengthen the javadocs > -

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697152#action_12697152 ] Michael McCandless commented on LUCENE-1575: I came across another simple sear

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697143#action_12697143 ] Michael McCandless commented on LUCENE-1591: I'm hitting this, when trying to

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697130#action_12697130 ] Shai Erera commented on LUCENE-1575: of course ! > Refactoring Lucene collectors (Hit

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697124#action_12697124 ] Michael McCandless commented on LUCENE-1575: Sounds right! Wanna update the p

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697122#action_12697122 ] Shai Erera commented on LUCENE-1575: Right ... so basically we're talking about change

[jira] Created: (LUCENE-1591) Enable bzip compression in benchmark

2009-04-08 Thread Shai Erera (JIRA)
Enable bzip compression in benchmark Key: LUCENE-1591 URL: https://issues.apache.org/jira/browse/LUCENE-1591 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark

[jira] Commented: (LUCENE-1539) Improve Benchmark

2009-04-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697117#action_12697117 ] Shai Erera commented on LUCENE-1539: bq. Can you open a new issue? Will do. > Improv

[jira] Commented: (LUCENE-1539) Improve Benchmark

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697088#action_12697088 ] Michael McCandless commented on LUCENE-1539: Enabling bzip compression sounds

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697083#action_12697083 ] Michael McCandless commented on LUCENE-1575: bq. maxScore is only tracked in

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-08 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697076#action_12697076 ] Marvin Humphrey commented on LUCENE-1231: - FWIW, I think priority for document fet

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697061#action_12697061 ] Shai Erera commented on LUCENE-1575: That's actually what's done in TopScoreDocCollect

[jira] Commented: (LUCENE-1539) Improve Benchmark

2009-04-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697057#action_12697057 ] Shai Erera commented on LUCENE-1539: Is it also interesting to add extensions to Enwik

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697052#action_12697052 ] Michael McCandless commented on LUCENE-1575: I wonder if we should break out t

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697034#action_12697034 ] Michael McCandless commented on LUCENE-1590: bq. In principle the Field insta

Re: omitTF comment

2009-04-08 Thread Michael McCandless
OK I'll make this change in the pending patch on LUCENE-1561. Mike On Wed, Apr 8, 2009 at 8:52 AM, Mark Miller wrote: > Yeah, you got my vote. I think this one actually felt a bit more dangerous > when it was just called omitTf(). > > Michael McCandless wrote: >> >> How about simply: >> >> /** E

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697024#action_12697024 ] Uwe Schindler commented on LUCENE-1590: --- {quote} Patch looks good! All tests pass. T

Re: omitTF comment

2009-04-08 Thread Mark Miller
Yeah, you got my vote. I think this one actually felt a bit more dangerous when it was just called omitTf(). Michael McCandless wrote: How about simply: /** Expert: * * If set, omit term freq, positions and payloads from postings for this field. * * NOTE: While this option reduces storage

Re: omitTF comment

2009-04-08 Thread Michael McCandless
How about simply: /** Expert: * * If set, omit term freq, positions and payloads from postings for this field. * * NOTE: While this option reduces storage space required in the index, * it also means any query requiring positional * information, such as {...@link PhraseQuery} or {...@link *

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697001#action_12697001 ] Michael McCandless commented on LUCENE-1589: {quote} The deletes are coming in

omitTF comment

2009-04-08 Thread Mark Miller
The omitTf comment is: /** Expert: * * If set, omit term freq, positions and payloads from postings for this field. * NOTE: this is a dangerous option to enable. * While it reduces storage space required in the index, * it also means any query requiring positional * infromation, su

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696994#action_12696994 ] Michael McCandless commented on LUCENE-1590: Patch looks good! All tests pas

[jira] Assigned: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1590: -- Assignee: Michael McCandless > Stored-only fields automatically enable norms a

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-08 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696977#action_12696977 ] Earwin Burrfoot commented on LUCENE-1231: - I can share my design for doc loading,

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696972#action_12696972 ] Michael McCandless commented on LUCENE-1231: bq. If you e.g. want to show 5 fi

[jira] Updated: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-08 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1590: -- Attachment: LUCENE-1590.patch Here is the final patch. I added two tests (one for the bug itss

[jira] Commented: (LUCENE-1539) Improve Benchmark

2009-04-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696971#action_12696971 ] Michael McCandless commented on LUCENE-1539: I think DeleteByPercentTask.java

Re: Future projects

2009-04-08 Thread Michael McCandless
On Tue, Apr 7, 2009 at 7:05 PM, Jason Rutherglen wrote: >  >  I think we should keep it simple, unless we discover real perf problems > with the current approach. > > Simple is good, however the indexing performance will lag because we're back > to the indexing speed of pre ram buffer? (i.e. mergi

Re: Probelm sort on TermEnum

2009-04-08 Thread Federica Falini Data Management S.p.A
Hi Steve, in fact the list of terms returned is for user consumption. From every term is possible with a link to activate a search on the term itself and access to document. Annales cafe Cafè zucche Thanks Federica Steven A Rowe ha scritto: On 4/7/2009 at 1:19 PM, Michael McCandless wr