Hi, folks. Earlier I used solr.TextField with preprocessing (ASCII folding, lowercase etc) on some fields for search and faceting. But on larger index it takes several minutes to uninvert that fields for faceting (I use fieldValueCache & warmup queries with facets). It becomes too expensive in case of frequent soft commits (5-10 mins), so I want to migrate to docValues to avoid uninvert phase.
Documentation[1] says that only Trie*Field, StrField and UUIDField (which itself is subtype of StrField) support docValues="true". I have tried two ways to workaround this issue: 1. Make a subtype of TextField which overrides `checkSchemaField` efficiently turning docValues for this "TextField" on. All preprocessing is specified in TokenizeChain analyzer with KeywordTokenizerFactory (so it produces exactly one token for each value in this multivalued field), defined via schema.xml. It seems to work but I haven't tested it under load. What are potential caveats in such scheme? Why it isn't used in trunk Solr? 2. Make subtype of StrField which will perform hardcoded preprocessing (like ASCII folding, lowercasing) but I can't find appropriate point to insert this behavior. The only working method was to override both toInternal and createFields (since creating BytesRef for docValues don't use toInternal there) and do value preprocessing there. What are potential caveats? Search becomes case-insensitive (since toInternal is used by createField and default tokenizer), facets become lowercase because docValues created lowercase by createFields override. StrField-based variant should be faster than TextField-based since TokenStream is reused internally in first case and recreated on each doc with TokenizedChain in second one. But StrField-based approach hardcodes preprocessing. Next issue is that I want to use prefix and suffix wildcard search for some fields. As I understood from code it works only on TextField (because it requires Analyzer to be an instance of TokenizerChain with ReversedWildcardFilterFactory in TokenFilter chain). Should I use it in StrField-based variant by overriding getIndexAnalyzer/getQueryAnalyzer or it would break something? [1]: https://cwiki.apache.org/confluence/display/solr/DocValues -- Best regards, Konstantin Gribov