Re: Global field semantics

2006-07-10 Thread Doug Cutting
Chuck Williams wrote: Lucene today allows many field properties to vary at the Field level. E.g., the same field name might be tokenized in one Field on a Document while it is untokenized in another Field on the same or different Document. The rationale for this design was to keep the API

Re: Global field semantics

2006-07-10 Thread David Balmain
On 7/10/06, Doug Cutting [EMAIL PROTECTED] wrote: Chuck Williams wrote: Lucene today allows many field properties to vary at the Field level. E.g., the same field name might be tokenized in one Field on a Document while it is untokenized in another Field on the same or different Document.

Re: Global field semantics

2006-07-10 Thread Chris Hostetter
(f, z, Store.NO, Index.UN_TOKENIZED)): ...both docs have two FIelds for field name f, both have a stored value for f, both have some indexed terms for f, both have some tokenized terms and one utokenized term for f ... but do these two docs both conform to the same Global field semantics

Re: Global field semantics

2006-07-10 Thread Chuck Williams
David Balmain wrote on 07/10/2006 01:04 AM: The only problem I could find with this solution is that fields are no longer in alphabetical order in the term dictionary but I couldn't think of a use-case where this is necessary although I'm sure there probably is one. So presumably fields are

Re: Global field semantics

2006-07-10 Thread Chuck Williams
a large issue, as David says he has achieved a 5x performance gain. My interest in global field semantics originally sprang from functionality considerations, not performance considerations. I've got many features that require reasoning about field semantics. I previously mentioned a very simple

Re: Global field semantics

2006-07-10 Thread David Balmain
On 7/11/06, Chuck Williams [EMAIL PROTECTED] wrote: David Balmain wrote on 07/10/2006 01:04 AM: The only problem I could find with this solution is that fields are no longer in alphabetical order in the term dictionary but I couldn't think of a use-case where this is necessary although I'm

Re: Global field semantics

2006-07-10 Thread Yonik Seeley
On 7/10/06, David Balmain [EMAIL PROTECTED] wrote: I don't think declaring all fields up front is necessary for substantial optimizations. I've found that the key to some really good optimizations is having constant field numbers. That is, once a field is added to the index it is assigned a

Re: Global field semantics

2006-07-10 Thread David Balmain
On 7/11/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 7/10/06, David Balmain [EMAIL PROTECTED] wrote: I don't think declaring all fields up front is necessary for substantial optimizations. I've found that the key to some really good optimizations is having constant field numbers. That is,

Re: Global field semantics

2006-07-10 Thread Chris Hostetter
: previously mentioned a very simple one: validating fields in the query : parser. More interesting examples are: This strikes me as something that can be done with an abstraction layer above and seperate from the physical index (this is in fact what Solr does) without needing to add any hard

Re: Global field semantics

2006-07-09 Thread Marvin Humphrey
. were a function of just the field name and the index. This is the direction I would like to go. This approach would naturally admit a class, say IndexFieldSet, that would hold global field semantics for an index. Lucene today allows many field properties to vary at the Field level. E.g

Re: Global field semantics

2006-07-09 Thread Chuck Williams
Marvin Humphrey wrote on 07/08/2006 11:13 PM: On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote: Many things would be cleaner in Lucene if fields had a global semantics, i.e., if properties like text vs. binary, Index, Store, TermVector, the appropriate Analyzer, the assignment of Directory

Re: Global field semantics

2006-07-09 Thread Marvin Humphrey
On Jul 9, 2006, at 11:31 AM, Chuck Williams wrote: Marvin Humphrey wrote on 07/08/2006 11:13 PM: On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote: Many things would be cleaner in Lucene if fields had a global semantics, i.e., if properties like text vs. binary, Index, Store, TermVector,

Re: Global field semantics

2006-07-09 Thread Chuck Williams
that several of us independently came to desire this. However, deciding what can be global and what cannot is more subtle. I agree. I can't see global field semantics making it into Lucene in the short term. It's a rather large change, particularly if you want to make full use

Re: Global field semantics

2006-07-09 Thread David Balmain
that several of us independently came to desire this. However, deciding what can be global and what cannot is more subtle. I agree. I can't see global field semantics making it into Lucene in the short term. It's a rather large change, particularly if you want to make full use of the performance benifits

Global field semantics

2006-07-08 Thread Chuck Williams
. This approach would naturally admit a class, say IndexFieldSet, that would hold global field semantics for an index. Lucene today allows many field properties to vary at the Field level. E.g., the same field name might be tokenized in one Field on a Document while it is untokenized in another Field

Re: Global field semantics

2006-07-08 Thread Chuck Williams
karl wettin wrote on 07/08/2006 10:27 AM: On Sat, 2006-07-08 at 09:46 -0700, Chuck Williams wrote: Many things would be cleaner in Lucene if fields had a global semantics, Has this been considered before? Are there good reasons this path has not been followed? I've been

Re: Global field semantics

2006-07-08 Thread karl wettin
On Sat, 2006-07-08 at 11:08 -0700, Chuck Williams wrote: Karl, do you have specific reasons or use cases to normalize fields at Document rather than at Index? Nothing more than that the way the API looks it implies features that does not exist. Boost, store, index and vectors. I've learned,

Re: Global field semantics

2006-07-08 Thread Chuck Williams
karl wettin wrote on 07/08/2006 12:27 PM: On Sat, 2006-07-08 at 11:08 -0700, Chuck Williams wrote: Karl, do you have specific reasons or use cases to normalize fields at Document rather than at Index? Nothing more than that the way the API looks it implies features that does not