OK, you'll need two fields pretty much for certain. The trick is
getting _only_ genus names in the genus field.

The simplest thing to do would be a straight copyField with a single
keep word filter that contains a list of all the genera. That
presupposes that the genera are disjoint sets from all other words.
You search on your species field and facet on the genus field.

But assuming your genera are not disjoint from all other words, hmmmm.
Do you have a way of unambiguously identifying genus/species pairs in
the text you're processing? If you do we can work with that, but
without that you're talking entity recognition of some sort.

BTW, there's no real need to shingle the species field, just search
for "genus species" as a phrase. Unless those two appear next to each
other in order you won't get a hit.

Best,
Erick

On Wed, Jul 19, 2017 at 11:07 AM, tstusr <ulfrhe...@gmail.com> wrote:
> Well, our documents consist on pdf files (between 20 to 200 pages).
>
> So, we catch words of all the file, for that, we use the extract handler,
> that's why we have this fields:
>
> <copyField source="attr_conten*" dest="genus"/>
> <copyField source="attr_conten*" dest="specie"/>
>
> We catch species in all the pdf content (On attr_content field)
>
> Species captured are used for ranking purposes. So, we have to have the
> whole name, that's why we use shingles. As an example, we catch from the
> pdf:
>
> abelmoschus achanioides
> abies colimensis
> abies concolor
>
> Because that information is important, we provide a facet of those species,
> grouped by genus (just the first word of the species). So, in the facet we
> have to have:
>
> abelmoschus (1)
> abies (2)
>
> Nevertheless, we need a sort of subquery, because first, we need the
> complete species and then of those results facet by genus. For example:
>
> the abies something else (This phrase shouldn't have to be captured)
> the abies concolor something else (This phrase should've to be captured) ->
> Finish with just "abies concolor" and for consequence then captured by genus
>
> I realized that all genus are contained on species.
>
> So, there is a way to make a facet with just the first word of a field, like
> I've got for the field:
>
> abelmoschus achanioides
> abies colimensis
> abies concolor
>
> Just use the first word of those?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Copy-field-a-source-of-copy-field-tp4346425p4346846.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to