I tried to follow the de-duplication guide, but after I configured it in
solrconfig.xml and schema.xml, nothing is indexed into Solr, and there is
no error message. I'm using SimplePostTool to index rich-text documents.

Below are my configurations:

In solrconfig.xml

  <requestHandler name="/update" class="solr.UpdateRequestHandler">
 <lst name="defaults">
<str name="update.chain">dedupe</str>
 </lst>
  </requestHandler>

    <updateRequestProcessorChain name="dedupe">
 <processor class="solr.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<str name="signatureField">id</str>
<bool name="overwriteDupes">false</bool>
<str name="fields">content</str>
<str name="signatureClass">solr.processor.Lookup3Signature</str>
 </processor>
    </updateRequestProcessorChain>


In schema.xml

 <field name="signature" type="string" stored="true" indexed="true"
multiValued="false" />


Is there anything which I might have missed out or done wrongly?

Regards,
Edwin


On 1 September 2015 at 10:46, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Thank you for your advice Alexandre.
>
> Will try out the de-duplication from the link you gave.
>
> Regards,
> Edwin
>
>
> On 1 September 2015 at 10:34, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
>
>> Re-read the question. You want to de-dupe on the full text-content.
>>
>> I would actually try to use the dedupe chain as per the link I gave
>> but put results into a separate string field. Then, you group on that
>> field. You cannot actually group on the long text field, that would
>> kill any performance. So a signature is your proxy.
>>
>> Regards,
>>    Alex
>> ----
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
>> wrote:
>> > Hi Alexandre,
>> >
>> > Will treating it as String affect the search or other functions like
>> > highlighting?
>> >
>> > Yes, the content must be in my index, unless I do a copyField to do
>> > de-duplication on that field.. Will that help?
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> > On 1 September 2015 at 10:04, Alexandre Rafalovitch <arafa...@gmail.com
>> >
>> > wrote:
>> >
>> >> Can't you just treat it as String?
>> >>
>> >> Also, do you actually want those documents in your index in the first
>> >> place? If not, have you looked at De-duplication:
>> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>> >>
>> >> Regards,
>> >>    Alex.
>> >> ----
>> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> >> http://www.solr-start.com/
>> >>
>> >>
>> >> On 31 August 2015 at 22:00, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
>> >> wrote:
>> >> > Thanks Jan.
>> >> >
>> >> > But I read that the field that is being collapsed on must be a single
>> >> > valued String, Int or Float. As I'm required to get the distinct
>> results
>> >> > from "content" field that was indexed from a rich text document, I
>> got
>> >> the
>> >> > following error:
>> >> >
>> >> >   "error":{
>> >> >     "msg":"java.io.IOException: 64 bit numeric collapse fields are
>> not
>> >> > supported",
>> >> >     "trace":"java.lang.RuntimeException: java.io.IOException: 64 bit
>> >> > numeric collapse fields are not supported\r\n\tat
>> >> >
>> >> >
>> >> > Is it possible to collapsed on fields which has a long integer of
>> data,
>> >> > like content from a rich text document?
>> >> >
>> >> > Regards,
>> >> > Edwin
>> >> >
>> >> >
>> >> > On 31 August 2015 at 18:59, Jan Høydahl <jan....@cominvent.com>
>> wrote:
>> >> >
>> >> >> Hi
>> >> >>
>> >> >> Check out the CollapsingQParser (
>> >> >>
>> >>
>> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
>> >> ).
>> >> >> As long as you have a field that will be the same for all
>> duplicates,
>> >> you
>> >> >> can “collapse” on that field. If you not have a “group id”, you can
>> >> create
>> >> >> one using e.g. an MD5 signature of the identical body text (
>> >> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication).
>> >> >>
>> >> >> --
>> >> >> Jan Høydahl, search solution architect
>> >> >> Cominvent AS - www.cominvent.com
>> >> >>
>> >> >> > 31. aug. 2015 kl. 12.03 skrev Zheng Lin Edwin Yeo <
>> >> edwinye...@gmail.com
>> >> >> >:
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > I'm using Solr 5.2.1, and I would like to find out, what is the
>> best
>> >> way
>> >> >> to
>> >> >> > get Solr to return only distinct results?
>> >> >> >
>> >> >> > Currently, I've indexed several exact similar documents into Solr,
>> >> with
>> >> >> > just different id and title, but the content is exactly the same.
>> >> When I
>> >> >> do
>> >> >> > a search, Solr will return all these documents several time in the
>> >> list.
>> >> >> >
>> >> >> > What is the most suitable way to get Solr to return only one of
>> the
>> >> >> > document during the search?
>> >> >> > I understand that there is result grouping and faceting, but I'm
>> not
>> >> sure
>> >> >> > if that is the best way.
>> >> >> >
>> >> >> > Regards,
>> >> >> > Edwin
>> >> >>
>> >> >>
>> >>
>>
>
>

Reply via email to