Hi Martijn, ----- Original Message ----
> From: Martijn v Groningen <martijn.is.h...@gmail.com> > To: solr-user@lucene.apache.org > Sent: Thu, November 26, 2009 3:19:40 AM > Subject: Re: Deduplication in 1.4 > > Field collapsing has been used by many in their production > environment. Got any pointers to public sites you know use it? I know of a high traffic site that used an early version, and it caused performance problems. Is double-tripping still required? > The last few months the stability of the patch grew as > quiet some bugs were fixed. The only big feature missing currently is > caching of the collapsing algorithm. I'm currently working on that and Is it also full distributed-search-ready? > I will put it in a new patch in the coming next days. So yes the > patch is very near being production ready. Thanks, Otis > Martijn > > 2009/11/26 KaktuChakarabati : > > > > Hey Otis, > > Yep, I realized this myself after playing some with the dedupe feature > > yesterday. > > So it does look like Field collapsing is what I need pretty much. > > Any idea on how close it is to being production-ready? > > > > Thanks, > > -Chak > > > > Otis Gospodnetic wrote: > >> > >> Hi, > >> > >> As far as I know, the point of deduplication in Solr ( > >> http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate > >> document before indexing it in order to avoid duplicates in the index in > >> the first place. > >> > >> What you are describing is closer to field collapsing patch in SOLR-236. > >> > >> Otis > >> -- > >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls > >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > >> > >> > >> > >> ----- Original Message ---- > >>> From: KaktuChakarabati > >>> To: solr-user@lucene.apache.org > >>> Sent: Tue, November 24, 2009 5:29:00 PM > >>> Subject: Deduplication in 1.4 > >>> > >>> > >>> Hey, > >>> I've been trying to find some documentation on using this feature in 1.4 > >>> but > >>> Wiki page is alittle sparse.. > >>> In specific, here's what i'm trying to do: > >>> > >>> I have a field, say 'duplicate_group_id' that i'll populate based on some > >>> offline documents deduplication process I have. > >>> > >>> All I want is for solr to compute a 'duplicate_signature' field based on > >>> this one at update time, so that when i search for documents later, all > >>> documents with same original 'duplicate_group_id' value will be rolled up > >>> (e.g i'll just get the first one that came back according to relevancy). > >>> > >>> I enabled the deduplication processor and put it into updater, but i'm > >>> not > >>> seeing any difference in returned results (i.e results with same > >>> duplicate_id are returned separately..) > >>> > >>> is there anything i need to supply in query-time for this to take effect? > >>> what should be the behaviour? is there any working example of this? > >>> > >>> Anything will be helpful.. > >>> > >>> Thanks, > >>> Chak > >>> -- > >>> View this message in context: > >>> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html > >>> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > >> > > > > -- > > View this message in context: > http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > >