Chuck Williams wrote:
Lucene today allows many field properties to vary at the Field level.
E.g., the same field name might be tokenized in one Field on a Document
while it is untokenized in another Field on the same or different
Document.
The rationale for this design was to keep the API
On 7/10/06, Doug Cutting [EMAIL PROTECTED] wrote:
Chuck Williams wrote:
Lucene today allows many field properties to vary at the Field level.
E.g., the same field name might be tokenized in one Field on a Document
while it is untokenized in another Field on the same or different
Document.
(f, z, Store.NO, Index.UN_TOKENIZED)):
...both docs have two FIelds for field name f, both have a stored
value for f, both have some indexed terms for f, both have
some tokenized terms and one utokenized term for f ... but do these two
docs both conform to the same Global field semantics
David Balmain wrote on 07/10/2006 01:04 AM:
The only problem I could find with this solution is that
fields are no longer in alphabetical order in the term dictionary but
I couldn't think of a use-case where this is necessary although I'm
sure there probably is one.
So presumably fields are
a large issue, as David says he has achieved a 5x
performance gain.
My interest in global field semantics originally sprang from
functionality considerations, not performance considerations. I've got
many features that require reasoning about field semantics. I
previously mentioned a very simple
On 7/11/06, Chuck Williams [EMAIL PROTECTED] wrote:
David Balmain wrote on 07/10/2006 01:04 AM:
The only problem I could find with this solution is that
fields are no longer in alphabetical order in the term dictionary but
I couldn't think of a use-case where this is necessary although I'm
On 7/10/06, David Balmain [EMAIL PROTECTED] wrote:
I don't think declaring all fields up front is necessary for
substantial optimizations. I've found that the key to some really good
optimizations is having constant field numbers. That is, once a field
is added to the index it is assigned a
On 7/11/06, Yonik Seeley [EMAIL PROTECTED] wrote:
On 7/10/06, David Balmain [EMAIL PROTECTED] wrote:
I don't think declaring all fields up front is necessary for
substantial optimizations. I've found that the key to some really good
optimizations is having constant field numbers. That is,
: previously mentioned a very simple one: validating fields in the query
: parser. More interesting examples are:
This strikes me as something that can be done with an abstraction layer
above and seperate from the physical index (this is in fact what Solr
does) without needing to add any hard
. were a function of just the field name and the
index.
This is the direction I would like to go.
This approach would naturally admit a class, say IndexFieldSet,
that would hold global field semantics for an index.
Lucene today allows many field properties to vary at the Field level.
E.g
Marvin Humphrey wrote on 07/08/2006 11:13 PM:
On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote:
Many things would be cleaner in Lucene if fields had a global semantics,
i.e., if properties like text vs. binary, Index, Store, TermVector, the
appropriate Analyzer, the assignment of Directory
On Jul 9, 2006, at 11:31 AM, Chuck Williams wrote:
Marvin Humphrey wrote on 07/08/2006 11:13 PM:
On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote:
Many things would be cleaner in Lucene if fields had a global
semantics,
i.e., if properties like text vs. binary, Index, Store,
TermVector,
that several of us independently came to desire this. However,
deciding what can be global and what cannot is more subtle.
I agree. I can't see global field semantics making it into Lucene in
the short term. It's a rather large change, particularly if you want
to make full use
that several of us independently came to desire this. However,
deciding what can be global and what cannot is more subtle.
I agree. I can't see global field semantics making it into Lucene in
the short term. It's a rather large change, particularly if you want
to make full use of the performance benifits
. This approach would naturally admit a class, say IndexFieldSet,
that would hold global field semantics for an index.
Lucene today allows many field properties to vary at the Field level.
E.g., the same field name might be tokenized in one Field on a Document
while it is untokenized in another Field
karl wettin wrote on 07/08/2006 10:27 AM:
On Sat, 2006-07-08 at 09:46 -0700, Chuck Williams wrote:
Many things would be cleaner in Lucene if fields had a global semantics,
Has this been considered before? Are there good reasons this path has
not been followed?
I've been
On Sat, 2006-07-08 at 11:08 -0700, Chuck Williams wrote:
Karl, do you have specific reasons or use cases to normalize fields at
Document rather than at Index?
Nothing more than that the way the API looks it implies features that
does not exist. Boost, store, index and vectors. I've learned,
karl wettin wrote on 07/08/2006 12:27 PM:
On Sat, 2006-07-08 at 11:08 -0700, Chuck Williams wrote:
Karl, do you have specific reasons or use cases to normalize fields at
Document rather than at Index?
Nothing more than that the way the API looks it implies features that
does not
18 matches
Mail list logo