[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-07 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696873#action_12696873 ] Michael Busch commented on LUCENE-1231: --- {quote} is for column-stride fields to be a

[jira] Updated: (LUCENE-1539) Improve Benchmark

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1539: - Attachment: LUCENE-1539.patch Fixed the above mentioned problems. When LUCENE-1516 is i

[jira] Updated: (LUCENE-1313) Realtime Search

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1313: - Attachment: LUCENE-1313.jar Latest realtime code, transactions are removed. * Needs to

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696838#action_12696838 ] Jason Rutherglen commented on LUCENE-1231: -- +1 Making it automatic makes sense.

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696837#action_12696837 ] Jason Rutherglen commented on LUCENE-1589: -- I took a walk and thought about this,

[jira] Updated: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1590: -- Attachment: LUCENE-1590.patch Here the patch that also fixes the missing omitTf settings in Fi

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696821#action_12696821 ] Uwe Schindler commented on LUCENE-1590: --- bq. The problem is: Luke does not show the

[jira] Updated: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1590: -- Attachment: LUCENE-1590.patch Here is it, not fully tested, but seems to work at least for nor

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696811#action_12696811 ] Michael McCandless commented on LUCENE-1231: One interesting idea, from Earwin

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696812#action_12696812 ] Jason Rutherglen commented on LUCENE-1589: -- The deletes are coming into the exist

[jira] Commented: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696810#action_12696810 ] Michael McCandless commented on LUCENE-1590: Uwe are you working out a patch f

Re: Future projects

2009-04-07 Thread Jason Rutherglen
> I think we should keep it simple, unless we discover real perf problems with the current approach. Simple is good, however the indexing performance will lag because we're back to the indexing speed of pre ram buffer? (i.e. merging segments using a ramdirectory). > need to do a merge sort (acr

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696809#action_12696809 ] Michael McCandless commented on LUCENE-1589: Hmm yes. This is also tricky: ho

Re: Size of IndexReaders(potentially leading into an OOM)?

2009-04-07 Thread Goddard, Michael J.
- Original Message - From: java-dev-return-31898-michael.j.goddard=saic@lucene.apache.org To: java-dev@lucene.apache.org Sent: Tue Apr 07 17:13:45 2009 Subject: Size of IndexReaders(potentially leading into an OOM)? I have a map of indexpaths against readers which I cache. For ev

Re: Size of IndexReaders(potentially leading into an OOM)?

2009-04-07 Thread Michael McCandless
Could you re-ask this on java-user instead? Thanks. Mike On Tue, Apr 7, 2009 at 5:13 PM, MakMak wrote: > > I have a map of indexpaths against readers which I cache. For every new > search, I know the indexpath, get the reader, reopen it and perform the > search. Problem is, after the system run

[jira] Commented: (LUCENE-1546) Add IndexReader.flush(commitUserData)

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696803#action_12696803 ] Michael McCandless commented on LUCENE-1546: OK I just committed that, thanks

[jira] Updated: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1561: --- Attachment: LUCENE-1561.patch Attached patch, also deprecating omitTf in AbstractFie

[jira] Updated: (LUCENE-1546) Add IndexReader.flush(commitUserData)

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1546: -- Attachment: LUCENE-1546-deprecation.patch This patch fixes deprecation errors: I wrote a class

Size of IndexReaders(potentially leading into an OOM)?

2009-04-07 Thread MakMak
I have a map of indexpaths against readers which I cache. For every new search, I know the indexpath, get the reader, reopen it and perform the search. Problem is, after the system runs for a while the size of the readers grows scary. Does any one know how much does a typical reader hold on to? Do

[jira] Commented: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696758#action_12696758 ] Michael McCandless commented on LUCENE-1561: bq. Wasn't it the plan to remove

[jira] Commented: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696661#action_12696661 ] Uwe Schindler commented on LUCENE-1561: --- Wasn't it the plan to remove these interfac

RE: Probelm sort on TermEnum

2009-04-07 Thread Steven A Rowe
On 4/7/2009 at 1:19 PM, Michael McCandless wrote: > I think the new contrib/collation package may address this use case? > It converts each term to its CollationKey, outside of Lucene. Since AFAIK CollationKey creation is a one-way process, CollationKeyFilter may not be useful for Federica. Fede

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696650#action_12696650 ] Jason Rutherglen commented on LUCENE-1589: -- I started, but because MergePolicy.On

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-07 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696643#action_12696643 ] Jason Rutherglen commented on LUCENE-1589: -- Yes, because this will block the RAMD

[jira] Updated: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1575: --- Attachment: LUCENE-1575.patch New patch which just fixes contrib/spatial's cutover t

Re: Probelm sort on TermEnum

2009-04-07 Thread Michael McCandless
Though, this is not yet released: it's on trunk (will be included in 2.9). Mike On Tue, Apr 7, 2009 at 1:19 PM, Michael McCandless wrote: > I think the new contrib/collation package may address this use case? > It converts each term to its CollationKey, outside of Lucene. > > Mike > > On Tue, Ap

Re: Probelm sort on TermEnum

2009-04-07 Thread Michael McCandless
I think the new contrib/collation package may address this use case? It converts each term to its CollationKey, outside of Lucene. Mike On Tue, Apr 7, 2009 at 7:36 AM, Federica Falini Data Management S.p.A wrote: > Good morning, > In Lucene 2.2 i have made modification to Term.java, TermBuffer.j

[jira] Commented: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696628#action_12696628 ] Michael McCandless commented on LUCENE-1561: bq. setOmitTf() and other are onl

[jira] Updated: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1590: --- Fix Version/s: 2.9 > Stored-only fields automatically enable norms and tf when added

[jira] Commented: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696603#action_12696603 ] Uwe Schindler commented on LUCENE-1561: --- I found a deprecation bug: setOmitTf() and

[jira] Reopened: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened LUCENE-1561: --- > Maybe rename Field.omitTf, and strengthen the javadocs > -

Re: MoreLikeThisQuery term frequency caching

2009-04-07 Thread Richard Marr
Thanks Mike, I'll leave it a few days to give people time to respond then start looking into creating a Jira ticket and a patch. 2009/4/7 Michael McCandless : > I don't have direct experience with MLT, but this sounds like a great > improvement, so in answer to (3) I would say "definitely!". > >

[jira] Created: (LUCENE-1590) Stored-only fields automatically enable norms and tf when added to document

2009-04-07 Thread Uwe Schindler (JIRA)
Stored-only fields automatically enable norms and tf when added to document --- Key: LUCENE-1590 URL: https://issues.apache.org/jira/browse/LUCENE-1590 Project: Lucene - Java

[jira] Commented: (LUCENE-1584) Callback for intercepting merging segments in IndexWriter

2009-04-07 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696583#action_12696583 ] Earwin Burrfoot commented on LUCENE-1584: - bq. The problem is you need more inform

Re: omitNorms, omitTermFreqAndPositions in combination with stored-only fields

2009-04-07 Thread Michael McCandless
That sounds like a real bug to me. If the field is not indexed, then the norm/omitTFAP should be ignored. Can you open a Jira/patch? Thanks, and good catch! Mike On Tue, Apr 7, 2009 at 10:46 AM, Uwe Schindler wrote: > Hi, > > during updating my internal components to the new TrieAPI, I have s

omitNorms, omitTermFreqAndPositions in combination with stored-only fields

2009-04-07 Thread Uwe Schindler
Hi, during updating my internal components to the new TrieAPI, I have seen the following: I index a lot of numeric fields with trie encoding omitting norms and term frequency. This works great. Luke shows that both is omitted. As I sometimes also want to have the components of the field stored a

[jira] Commented: (LUCENE-1584) Callback for intercepting merging segments in IndexWriter

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696550#action_12696550 ] Michael McCandless commented on LUCENE-1584: bq. This is required in one form

[jira] Commented: (LUCENE-1584) Callback for intercepting merging segments in IndexWriter

2009-04-07 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696511#action_12696511 ] Earwin Burrfoot commented on LUCENE-1584: - .bq I'd like to step back and understan

[jira] Commented: (LUCENE-1582) Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696497#action_12696497 ] Michael McCandless commented on LUCENE-1582: b.q Finally: Let's go on with 831

[jira] Resolved: (LUCENE-1582) Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-1582. --- Resolution: Fixed Committed Revision: 762710 I only added term number statistics in the filt

[jira] Commented: (LUCENE-1582) Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696485#action_12696485 ] Uwe Schindler commented on LUCENE-1582: --- Thanks, i will then go forward with this. F

Probelm sort on TermEnum

2009-04-07 Thread Federica Falini Data Management S.p.A
Title: Firma Good morning, In Lucene 2.2 i have made modification to Term.java, TermBuffer.java (see below)  in order to have  Term enumerations sorted case-insensitive (when a field is not-tokenized): TermEnum terms = reader.terms(new Term("myFieldNotTokenized", ""));   while ("myFieldNotT

[jira] Commented: (LUCENE-1582) Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696475#action_12696475 ] Michael McCandless commented on LUCENE-1582: OK I committed the FieldCache par

[jira] Commented: (LUCENE-1582) Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696471#action_12696471 ] Michael McCandless commented on LUCENE-1582: OK the changes to FieldCache look

[jira] Updated: (LUCENE-1582) Make TrieRange completely independent from Document/Field with TokenStream of prefix encoded values

2009-04-07 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1582: -- Attachment: LUCENE-1582.patch New patch. In my opinion, it is now stable. New features/change

Re: HitCollector#collect(int,float,Collection)

2009-04-07 Thread Michael McCandless
On Tue, Apr 7, 2009 at 6:13 AM, Karl Wettin wrote: > > 7 apr 2009 kl. 10.23 skrev Michael McCandless: > >> Do you mean tracking the "atomic queries" that caused a given hit to >> match (where "atomic query" is a query that actually uses >> TermDocs/Positions to check matching, vs other queries lik

Re: HitCollector#collect(int,float,Collection)

2009-04-07 Thread Karl Wettin
7 apr 2009 kl. 10.23 skrev Michael McCandless: Do you mean tracking the "atomic queries" that caused a given hit to match (where "atomic query" is a query that actually uses TermDocs/Positions to check matching, vs other queries like BooleanQuery that "glomm together" sub-query matches)? EG fo

[jira] Updated: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1575: --- Attachment: LUCENE-1575.patch Attached new patch: * Changed members & methods in

[jira] Commented: (LUCENE-1584) Callback for intercepting merging segments in IndexWriter

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696445#action_12696445 ] Michael McCandless commented on LUCENE-1584: Jason once LUCENE-1516 is in, can

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696444#action_12696444 ] Michael McCandless commented on LUCENE-1516: I think NRT search is finally rea

[jira] Commented: (LUCENE-1313) Realtime Search

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696438#action_12696438 ] Michael McCandless commented on LUCENE-1313: {quote} > I'd be very interested

[jira] Commented: (LUCENE-1589) IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696434#action_12696434 ] Michael McCandless commented on LUCENE-1589: Jason are you working on a patch

Re: Future projects

2009-04-07 Thread Michael McCandless
On Mon, Apr 6, 2009 at 6:43 PM, Jason Rutherglen wrote: >> The realtime reader would have to have sub-readers per thread, > and an aggregate reader that "joins" them by interleaving the > docIDs > > Nice (i.e. nice and complex)! Right, this is why I like the current [simple] near real-time approa

[jira] Resolved: (LUCENE-1586) add IndexReader.getUniqueTermCount

2009-04-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1586. Resolution: Fixed Thanks Derek! > add IndexReader.getUniqueTermCount > --

Re: HitCollector#collect(int,float,Collection)

2009-04-07 Thread Michael McCandless
Do you mean tracking the "atomic queries" that caused a given hit to match (where "atomic query" is a query that actually uses TermDocs/Positions to check matching, vs other queries like BooleanQuery that "glomm together" sub-query matches)? EG for a boolean query w/ N clauses, which of those N cl

Re: MoreLikeThisQuery term frequency caching

2009-04-07 Thread Michael McCandless
I don't have direct experience with MLT, but this sounds like a great improvement, so in answer to (3) I would say "definitely!". Mike On Tue, Apr 7, 2009 at 2:28 AM, Richard Marr wrote: > Hi all, > > I've been exploring MoreLikeThisQuery as part of a recent project and > something that came out