fwiw, this code won't capture uncommitted duplicates.
On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen <dotanco...@gmail.com> wrote: > On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky > <j...@basetechnology.com> wrote: > > The Solr SignatureUpdateProcessorFactory is designed to facilitate > dedupe... > > any particular reason you did not use it? > > > > See: > > http://wiki.apache.org/solr/Deduplication > > > > and > > > > https://cwiki.apache.org/confluence/display/solr/De-Duplication > > > > Actually, the guy who made the changes (a coworker) did in fact write > an alternative UpdateHandler. I've just noticed that there are a bunch > of dupes right now, though. > > public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 { > > public DiscoAPIUpdateHandler(SolrCore core) { > super(core); > } > > @Override > public int addDoc(AddUpdateCommand cmd) throws IOException{ > > // if overwrite is set to false we'll use the > DefaultUpdateHandler2 , this is done for debugging to insert > duplicates to solr > if (!cmd.overwrite) return super.addDoc(cmd); > > > // when using ref counted objects you have!! to decrement the > ref count when your done > RefCounted<SolrIndexSearcher> indexSearcher = > this.core.getNewestSearcher(false); > > // the idea is like this we'll make an internal lucene query > and check if that id already exists > > Term updateTerm = null; > > > if (cmd.updateTerm != null){ > updateTerm = cmd.updateTerm; > } else { > updateTerm = new Term("id",cmd.getIndexedId()); > } > > > Query query = new TermQuery(updateTerm); > TopDocs docs = indexSearcher.get().search(query,2); > > if (docs.totalHits>0){ > // index searcher is no longer needed > indexSearcher.decref(); > // don't add the new document > return 0; > } > > // index searcher is no longer needed > indexSearcher.decref(); > > // if i'm here then it's a new document > return super.addDoc(cmd); > > } > > } > > > > And I give a bunch of examples in my book. > > > > I anticipate the book with esteem! > > -- > Dotan Cohen > > http://gibberish.co.il > http://what-is-what.com > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>