Re: Modularization

2009-03-31 Thread Babak Farhang
maturity, and their back compat commitments. The demo and getting started guies could also be expanded to refrence the contrib jars that contain code many people may want to reuse... Here's an idea. Each contrib is really a project onto its own. And any project, I suggest, ought to have its

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-31 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694068#action_12694068 ] Michael McCandless commented on LUCENE-1575: bq. If we're touching

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-31 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694074#action_12694074 ] Shai Erera commented on LUCENE-1575: bq. If we do this approach, on committing this

Re: Modularization

2009-03-31 Thread Michael McCandless
On Mon, Mar 30, 2009 at 7:31 PM, Chris Hostetter hossman_luc...@fucit.org wrote: code isolation (by directory hierarchy) is hte best way i've seen to ensure modularization, and protect against inadvertent dependency bleeding. OK I agree this (divorced top-level directories) is a great way to

Empty Sink Tokenizer

2009-03-31 Thread Grant Ingersoll
Has the way fields get added changed recently? http://www.lucidimagination.com/search/document/954555c478002a3/empty_sinktokenizer See also: http://www.lucidimagination.com/search/document/274ec8c1c56fdd54/order_of_field_objects_within_document#5ffce4509ed32511

Re: Lucene analyzer and dots

2009-03-31 Thread Matthew Hall
Sure, you could simply use a different analyzer, like the KeywordAnalyzer, or if that doesn't suit your needs, roll your own. The Analyzer/Tokenizers are setup in such a way that they are pretty easy to extend, and you can chain their functionality together pretty easily. Matt mitu2009

Re: Empty Sink Tokenizer

2009-03-31 Thread Michael McCandless
Uh-oh: I think this happened as part of LUCENE-843, which landed in 2.3. IndexWriter now first collates each Field instance, by name, and then visits those fields in sorted order. Multiple instances of the same field name are written in the order that they appeared in the document.

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-31 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694151#action_12694151 ] Shai Erera commented on LUCENE-1575: When I was about to make the changes to

[jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions)

2009-03-31 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694155#action_12694155 ] Michael McCandless commented on LUCENE-1575: I like 2 as well. I find

[jira] Resolved: (LUCENE-1579) Cloned SegmentReaders fail to share FieldCache entries

2009-03-31 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1579. Resolution: Fixed Cloned SegmentReaders fail to share FieldCache entries

[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-03-31 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1516: --- Attachment: LUCENE-1516.patch New patch: sync'd to trunk, cleaned up the nocommits

Re: Empty Sink Tokenizer

2009-03-31 Thread Grant Ingersoll
Well, we don't make any guarantees about it in docs, AFAICT, but we have in the past advertised it (via the mailing lists) as such. The Tee/Sink stuff does rely on what has been the de facto way of doing things up until 2.3 it sounds. The snippet of code I included can easily be

Re: Empty Sink Tokenizer

2009-03-31 Thread Yonik Seeley
On Tue, Mar 31, 2009 at 12:26 PM, Grant Ingersoll gsing...@apache.org wrote: What's the benefit of collation? AFAIK, the main reason is to handle multi-valued fields. The need to sort partially stems from the fact that the Document class does not explicitly handle multi-valued fields. Solr must

Re: Empty Sink Tokenizer

2009-03-31 Thread Michael McCandless
There are two separate things, here. First is that indexed fields are now processed in alpha order (stable/partial sort for multivalued fields), as of 2.3. That I think is something internal to Lucene and I'm not sure we should make promises one way or another in what order Lucene visits the

[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-03-31 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1516: --- Attachment: LUCENE-1516.patch Attached patch: added more test cases, and fixed the

RE: Reading document in Lucene

2009-03-31 Thread Steven A Rowe
Hi Ed, On 3/30/2009 at 8:17 PM, mitu2009 wrote: My indexed document in Lucene has got multiple cities assigned to it...ie. doc.Add(new Field(city, city1.Trim(), Field.Store.YES, Field.Index.TOKENIZED)); doc.Add(new Field(city, city2.Trim(), Field.Store.YES, Field.Index.TOKENIZED)); etc how