Ok, I know shingling will join with "_".

But that is the behaviour we want, imagine we have this fields (contained in
species file):

abarema idiopoda
abutilon bakerianum

Those become in:
abarema 
idiopoda
abutilon 
bakerianum
abarema_idiopoda
abutilon_bakerianum

But now in my genus file maybe is only the word abarema, so, we end up with
a field with only that word.

So, the requirements here, are to be able to find all species in species
files (step one) and then make a facet with species in file genus, step two.

It seems reasonable to just chain the fields, I just forgot solr didn't
change the field, as Shawn points (thanks for it).

So what we came here is to make 2 fields the first with species.

<fieldType name="species_type" class="solr.TextField"
positionIncrementGap="0">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping/mapping-ISOLatin1Accent.txt"/>
      <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[0-9]+|(\-)(\s*)" replacement=""/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
outputUnigrams="true"/>
      <filter class="solr.KeepWordFilterFactory" words="species.txt"
ignoreCase="true"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
outputUnigrams="false"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

And the second one (genus), which contains genus that has to be for facet
purposes, like this:

<fieldType name="genus_type" class="solr.TextField"
positionIncrementGap="0">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping/mapping-ISOLatin1Accent.txt"/>
      <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[0-9]+|(\-)(\s*)" replacement=""/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
outputUnigrams="true"/>
      <filter class="solr.KeepWordFilterFactory" words="species.txt"
ignoreCase="true"/>
      <filter class="solr.KeepWordFilterFactory" words="genus.txt"
ignoreCase="true"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

Nevertheless, there is no second processing for keep word filter as (I)
expect. Am I missing something?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Copy-field-a-source-of-copy-field-tp4346425p4346593.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to