On 12/8/2017 9:56 AM, Bradley Belyeu wrote:
> I’m wanting to do a result grouping by the first three characters, period, & 
> digit(s). For example, docs with the unique keys JHN.3.16 & JHN.3.17 I would 
> want grouped together.
> So my thought was to define another field and then copy the USFM into it and 
> use the regex tokenizer defined as so:
>
>     <fieldType name="chapter" class="solr.TextField" positionIncrementGap="0">
>         <analyzer>
>             <tokenizer class="solr.PatternTokenizerFactory" 
> pattern="^(\w+\.\d+)\.\d+-*\d*$" group="1" />
>         </analyzer>
>     </fieldType>
>     <field name="chapter" type="chapter" indexed="true" required="true" 
> stored="true" />
>     <copyField source="usfm" dest="chapter" />
>
> BUT, when I import my data the entire USFM is being stored inside the chapter 
> field. And I get query results that look like:

Analysis only affects indexed terms.  The field contents in query
results is *ALWAYS* the original indexed text -- analysis *CANNOT*
affect the fields returned for a document.  The copyField feature does
not copy the results of analysis, it always copies the original input.

Since this is a "solr.TextField" type, you cannot define docValues on
it, which means that the Result Grouping feature in Solr will use the
indexed terms.  Note that if your index is distributed, you probably
won't be able to use the grouping feature -- that seems to require
docValues.  But if your index has a single shard, you should be OK.

Thanks,
Shawn

Reply via email to