Hi Upayavira, Yes, I tried with a completely new index. I found that once I added the line below to my /update handler in solrconfig.xml, the indexing doesn't work anymore. <str name="update.chain">dedupe</str>
Besides that, it is also not able to do any deletion to the index when this line is added. Regards, Edwin On 1 September 2015 at 21:15, Upayavira <u...@odoko.co.uk> wrote: > Have you tried with a completely clean index? Are you deduping, or just > calculating the signature? Is it possible dedup is preventing your > documents from indexing (because it thinks they are dups)? > > On Tue, Sep 1, 2015, at 09:46 AM, Zheng Lin Edwin Yeo wrote: > > Hi Upayavira, > > > > I've tried to change <str name="signatureField">id</str> to be <str > > name="signatureField">signature</str>, but nothing is indexed into Solr > > as > > well. Is that what you mean? > > > > Besides that, I've also included a copyField to copy the content field > > into > > the signature field. Both versions (with and without copyField) have > > nothing indexed into Solr. > > > > Regards, > > Edwin > > > > > > On 1 September 2015 at 15:48, Upayavira <u...@odoko.co.uk> wrote: > > > > > you are attempting to write your signature to your ID field. That's not > > > a good idea. You are generating your signature from the content field, > > > which seems okay. Change your <str name="signatureField">id</str> to be > > > your 'signature' field instead of id, and something different will > > > happen :-) > > > > > > Upayavira > > > > > > On Tue, Sep 1, 2015, at 04:34 AM, Zheng Lin Edwin Yeo wrote: > > > > I tried to follow the de-duplication guide, but after I configured > it in > > > > solrconfig.xml and schema.xml, nothing is indexed into Solr, and > there is > > > > no error message. I'm using SimplePostTool to index rich-text > documents. > > > > > > > > Below are my configurations: > > > > > > > > In solrconfig.xml > > > > > > > > <requestHandler name="/update" class="solr.UpdateRequestHandler"> > > > > <lst name="defaults"> > > > > <str name="update.chain">dedupe</str> > > > > </lst> > > > > </requestHandler> > > > > > > > > <updateRequestProcessorChain name="dedupe"> > > > > <processor class="solr.processor.SignatureUpdateProcessorFactory"> > > > > <bool name="enabled">true</bool> > > > > <str name="signatureField">id</str> > > > > <bool name="overwriteDupes">false</bool> > > > > <str name="fields">content</str> > > > > <str name="signatureClass">solr.processor.Lookup3Signature</str> > > > > </processor> > > > > </updateRequestProcessorChain> > > > > > > > > > > > > In schema.xml > > > > > > > > <field name="signature" type="string" stored="true" indexed="true" > > > > multiValued="false" /> > > > > > > > > > > > > Is there anything which I might have missed out or done wrongly? > > > > > > > > Regards, > > > > Edwin > > > > > > > > > > > > On 1 September 2015 at 10:46, Zheng Lin Edwin Yeo < > edwinye...@gmail.com> > > > > wrote: > > > > > > > > > Thank you for your advice Alexandre. > > > > > > > > > > Will try out the de-duplication from the link you gave. > > > > > > > > > > Regards, > > > > > Edwin > > > > > > > > > > > > > > > On 1 September 2015 at 10:34, Alexandre Rafalovitch < > > > arafa...@gmail.com> > > > > > wrote: > > > > > > > > > >> Re-read the question. You want to de-dupe on the full > text-content. > > > > >> > > > > >> I would actually try to use the dedupe chain as per the link I > gave > > > > >> but put results into a separate string field. Then, you group on > that > > > > >> field. You cannot actually group on the long text field, that > would > > > > >> kill any performance. So a signature is your proxy. > > > > >> > > > > >> Regards, > > > > >> Alex > > > > >> ---- > > > > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > > > > >> http://www.solr-start.com/ > > > > >> > > > > >> > > > > >> On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo < > edwinye...@gmail.com > > > > > > > > >> wrote: > > > > >> > Hi Alexandre, > > > > >> > > > > > >> > Will treating it as String affect the search or other functions > like > > > > >> > highlighting? > > > > >> > > > > > >> > Yes, the content must be in my index, unless I do a copyField > to do > > > > >> > de-duplication on that field.. Will that help? > > > > >> > > > > > >> > Regards, > > > > >> > Edwin > > > > >> > > > > > >> > > > > > >> > On 1 September 2015 at 10:04, Alexandre Rafalovitch < > > > arafa...@gmail.com > > > > >> > > > > > >> > wrote: > > > > >> > > > > > >> >> Can't you just treat it as String? > > > > >> >> > > > > >> >> Also, do you actually want those documents in your index in the > > > first > > > > >> >> place? If not, have you looked at De-duplication: > > > > >> >> > https://cwiki.apache.org/confluence/display/solr/De-Duplication > > > > >> >> > > > > >> >> Regards, > > > > >> >> Alex. > > > > >> >> ---- > > > > >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a > newsletter: > > > > >> >> http://www.solr-start.com/ > > > > >> >> > > > > >> >> > > > > >> >> On 31 August 2015 at 22:00, Zheng Lin Edwin Yeo < > > > edwinye...@gmail.com> > > > > >> >> wrote: > > > > >> >> > Thanks Jan. > > > > >> >> > > > > > >> >> > But I read that the field that is being collapsed on must be > a > > > single > > > > >> >> > valued String, Int or Float. As I'm required to get the > distinct > > > > >> results > > > > >> >> > from "content" field that was indexed from a rich text > document, > > > I > > > > >> got > > > > >> >> the > > > > >> >> > following error: > > > > >> >> > > > > > >> >> > "error":{ > > > > >> >> > "msg":"java.io.IOException: 64 bit numeric collapse > fields > > > are > > > > >> not > > > > >> >> > supported", > > > > >> >> > "trace":"java.lang.RuntimeException: > java.io.IOException: 64 > > > bit > > > > >> >> > numeric collapse fields are not supported\r\n\tat > > > > >> >> > > > > > >> >> > > > > > >> >> > Is it possible to collapsed on fields which has a long > integer of > > > > >> data, > > > > >> >> > like content from a rich text document? > > > > >> >> > > > > > >> >> > Regards, > > > > >> >> > Edwin > > > > >> >> > > > > > >> >> > > > > > >> >> > On 31 August 2015 at 18:59, Jan Høydahl < > jan....@cominvent.com> > > > > >> wrote: > > > > >> >> > > > > > >> >> >> Hi > > > > >> >> >> > > > > >> >> >> Check out the CollapsingQParser ( > > > > >> >> >> > > > > >> >> > > > > >> > > > > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results > > > > >> >> ). > > > > >> >> >> As long as you have a field that will be the same for all > > > > >> duplicates, > > > > >> >> you > > > > >> >> >> can “collapse” on that field. If you not have a “group id”, > you > > > can > > > > >> >> create > > > > >> >> >> one using e.g. an MD5 signature of the identical body text ( > > > > >> >> >> > https://cwiki.apache.org/confluence/display/solr/De-Duplication > > > ). > > > > >> >> >> > > > > >> >> >> -- > > > > >> >> >> Jan Høydahl, search solution architect > > > > >> >> >> Cominvent AS - www.cominvent.com > > > > >> >> >> > > > > >> >> >> > 31. aug. 2015 kl. 12.03 skrev Zheng Lin Edwin Yeo < > > > > >> >> edwinye...@gmail.com > > > > >> >> >> >: > > > > >> >> >> > > > > > >> >> >> > Hi, > > > > >> >> >> > > > > > >> >> >> > I'm using Solr 5.2.1, and I would like to find out, what > is > > > the > > > > >> best > > > > >> >> way > > > > >> >> >> to > > > > >> >> >> > get Solr to return only distinct results? > > > > >> >> >> > > > > > >> >> >> > Currently, I've indexed several exact similar documents > into > > > Solr, > > > > >> >> with > > > > >> >> >> > just different id and title, but the content is exactly > the > > > same. > > > > >> >> When I > > > > >> >> >> do > > > > >> >> >> > a search, Solr will return all these documents several > time > > > in the > > > > >> >> list. > > > > >> >> >> > > > > > >> >> >> > What is the most suitable way to get Solr to return only > one > > > of > > > > >> the > > > > >> >> >> > document during the search? > > > > >> >> >> > I understand that there is result grouping and faceting, > but > > > I'm > > > > >> not > > > > >> >> sure > > > > >> >> >> > if that is the best way. > > > > >> >> >> > > > > > >> >> >> > Regards, > > > > >> >> >> > Edwin > > > > >> >> >> > > > > >> >> >> > > > > >> >> > > > > >> > > > > > > > > > > > > > >