: Another question I have is where the processing of this "first letter" is : more adequate. : I am considering updating my data import handler to execute a script to : extract the first letter from the author field. : : I saw other thread when someone mentioned using a field analyser to extract : the letter using a regex. : Which one is the best option?
"best" is subjective. conceptually, "inherient" rules/concepts of your data (ie: what files it has, what types those fields have, etc...) should live in your schema.xml, while things specific to where your data comes from should live in other configs (ie: your DIH config, update processors, etc...) so for something like an "first_letter_author_name" field that should (by definition) always be the same as the first letter of the "author_name" field, it should be specified in your schema.xml (two ways i can think of: copyField w/maxChars, or an EdgeNGramTokenizer) .. thta way no matter how a document gets in your index (DIH, XML Push, CSV Push, etc...) you can be certain the fields will be internally consistents. Practically speaking: there's a lot of "inherient" rules that can't be expressed in the schema.xml, or may be confusing to people if they are expressed there while other more complex rules are expressed elsewhere -- so go with whatever makes the most sense to you, and is the easiest for you to maintain. -Hoss