Finally, I was able to implement desirable behavior using your suggestions
as follows:

- Added StatelessScriptUpdateProcessorFactory before
SignatureUpdateProcessorFactory in order to analyze "field1" and set
analyzed value to "field1_tmp_ss"
- Passed "field1_tmp_ss" to SignatureUpdateProcessorFactory
- Used IgnoreFieldUpdateProcessorFactory to ignore "field1_tmp_ss" from
document stored

Everything seems to work fine and as expected.

Thank you very much,
Have a nice day,

Leonidas

2017-01-25 19:19 GMT+02:00 Alexandre Rafalovitch <arafa...@gmail.com>:

> It might be possible by sticking additional update request processors
> before the signature one. For example clone field, regex instead of
> tokenizing on the clone, then signature. If a clone is too much of a
> burden, it may even be possible to then add IgnoreField URP to remove
> it or map it in the schema to index/store/docValues=false field.
>
> Regards,
>    Alex.
> P.s. The full all-in-one list of URPs is available at:
> http://www.solr-start.com/info/update-request-processors/
>
> ----
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 25 January 2017 at 12:00, Markus Jelsma <markus.jel...@openindex.io>
> wrote:
> > Hello,
> >
> > This is not possible out of the box, you would need to manually pass the
> input through an analyzer with a tokenizer and your steming token filter,
> and put the output together again.
> >
> > Markus
> >
> >
> >
> > -----Original message-----
> >> From:Leonidas Zagkaretos <leonz...@gmail.com>
> >> Sent: Wednesday 25th January 2017 17:51
> >> To: solr-user@lucene.apache.org
> >> Subject: Pass Analyzed Field to SignatureUpdateProcessorFactory
> >>
> >> Hi all,
> >>
> >> We have successfully integrated Solr in our application, and now we are
> >> facing a requirement where the application should be able to search for
> >> duplicate records in Solr core based on equality in 3 distinct fields.
> >>
> >> Tried using SignatureUpdateProcessorFactory as described in
> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication and
> >> Lookup3Signature and everything seems to work fine, signature field is
> >> being filled with unique hash values.
> >>
> >> One issue we have, is that we need to pass to
> >> SignatureUpdateProcessorFactory the stemmed value of 1 of 3 fields.
> >> Currenty, the following documents produce different hash values, and we
> >> need them to produce unique.
> >> Analysis for field1 and values "value1_a" and "value1_b" produce stemmed
> >> value "value1"
> >>
> >> documentA: {
> >>     field1: value1_a,
> >>     field2: value2,
> >>     field3: value3,
> >>     signature: hash_value1
> >> }
> >>
> >> documentB: {
> >>     field1: value1_b,
> >>     field2: value2,
> >>     field3: value3,
> >>     signature: hash_value2
> >> }
> >>
> >> I would like to ask whether it is possible to have required behavior,
> and
> >> some tips about how to accomplish this task.
> >>
> >> Thank you in advance,
> >>
> >> Leonidas
> >>
>

Reply via email to