: Unfortunately, unless I've missed something obvious, the "tokenized"
: property is not available to classes that extend FieldType: the setArgs()
: method of FieldType strips "tokenized" and other standard properties away
: before calling the init() method. Yes, of course one could override
: setArgs(), but that's not a robust solution.

in an ideal world Solr would not strip that property from the Map, since
it doesn't care about it, but sicne it does can't your init method just
call "isTokenized()" to determine it's value (like any of hte other
properties handled automaticly) ... the build in field types ignore it,
but you could write a custom FieldType that inspects it.

: The terminology confusion stems (sorry, pun sort of not intended) from the
: frequent overlap of the terms "tokenize" and "analyze". As I mentioned in
: an earlier message on this thread, it is quite possible to create an
: Analyzer that does all sorts of things without tokenizing, or, more
: precisely, creates a single Token from the field value. I would posit that
: tokenization and analysis are two separate things, albeit most frequently
: done together.

The semi-equivilece of the word "tokenize" when refering to fields and the
broader concept of "Analysis" orriginates with Lucene: in lucene you
declare a field TOKENIZED if you want the Analyzer used at all --
regardless of what the Analyzer does.  While i agreed "ANALYZED" would
have been a better name for that constant, in practice the istinction is
so subtle it almost doesn't matter: what you desribe as "an Analyzer that
does all sorts of things without tokenizing" i would call "an Analyzer
that tokenizes it's input into a single token, and then does all
sorts of things"  KeywordTokenizer works exactly like this.



-Hoss

Reply via email to