Thanks, but I think I'm going to have to work out a different solution. I have written my own analyzer that does everything I need: it's not a different analyzer I need but a way to specify that certain fields should be tokenized and others not -- while still leaving all other options open.
As far as the generic options parsing resulting in unused properties in a ShcemaField object, not it is not specifically documented anywhere, but the Solr Wiki lists, for both fields and field types: "Common options that fields can have are...". I could not find anywhere a definitive list of what is allowed/used or excluded, so I went to the code and found that the "tokenized" would indeed be respected in SchemaField. -- Robert [EMAIL PROTECTED] wrote on 05/31/2007 11:30:04 AM: > On 5/31/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > You say the "tokenized" attribute is not settable from the schema, but the > > output from IndexSchema.readConfig shows that the properties are indeed > > read, and the resulting SchemaField object retains these properties: are > > they then ignored? > > Not sure off the top of my head, but don't use it... it's shouldn't be > documented anywhere. > It probably slipped through as part of generic options parsing. > > > > "untokenized" means don't use the analyzer. If you don't want an > > > analyzer, then use the "string" type. > > > > > This is true only in the simplest of cases. An analyzer can do far more > > than tokenize: it can stem, change to lower case, etc. What if you want > > one or more of these things to happen, but you don't want tokenization? > > From a Lucene perspective, if you create an untokenized field, the > analyzer will not be used at all. It should have probably been named > unanalyzed, as that's more accurate. > > KeywordTokenizer (via KeywordTokenizerFactory) is probably what you > are looking for. > Create a new text field type with that as the tokenizer, followed by > whatever filters you want (like lowercasing). > > -Yonik