On 12/8/2017 9:56 AM, Bradley Belyeu wrote: > I’m wanting to do a result grouping by the first three characters, period, & > digit(s). For example, docs with the unique keys JHN.3.16 & JHN.3.17 I would > want grouped together. > So my thought was to define another field and then copy the USFM into it and > use the regex tokenizer defined as so: > > <fieldType name="chapter" class="solr.TextField" positionIncrementGap="0"> > <analyzer> > <tokenizer class="solr.PatternTokenizerFactory" > pattern="^(\w+\.\d+)\.\d+-*\d*$" group="1" /> > </analyzer> > </fieldType> > <field name="chapter" type="chapter" indexed="true" required="true" > stored="true" /> > <copyField source="usfm" dest="chapter" /> > > BUT, when I import my data the entire USFM is being stored inside the chapter > field. And I get query results that look like:
Analysis only affects indexed terms. The field contents in query results is *ALWAYS* the original indexed text -- analysis *CANNOT* affect the fields returned for a document. The copyField feature does not copy the results of analysis, it always copies the original input. Since this is a "solr.TextField" type, you cannot define docValues on it, which means that the Result Grouping feature in Solr will use the indexed terms. Note that if your index is distributed, you probably won't be able to use the grouping feature -- that seems to require docValues. But if your index has a single shard, you should be OK. Thanks, Shawn