What are you using as the unique id in your Solr index?  It sounds like you may 
have one value as your Solr index unique id, which bears no resemblance to a 
unique[1] id derived from your data...

Or - another way to put it - what is it that makes these two records in your 
Solr index 'the same', and what are the unique id's for those two entries in 
the Solr index?  How are those id's related to your original data?

[1] not only unique, but immutable.  I.E. if you update a row in your database, 
the unique id derived from that row has to be the same as it would have been 
before the update.  Otherwise, there's nothing for Solr to recognize as a 
duplicate entry, and do a 'delete' and 'insert' instead of just an 'insert'.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


> -----Original Message-----
> From: Mark juszczec [mailto:mark.juszc...@gmail.com]
> Sent: Thursday, July 07, 2011 9:15 AM
> To: solr-user@lucene.apache.org
> Subject: updating existing data in index vs inserting new data in index
> 
> Hello all
> 
> I'm using Solr 3.2 and am confused about updating existing data in an
> index.
> 
> According to the DataImportHandler Wiki:
> 
> *"delta-import* : For incremental imports and change detection run the
> command `http://<host>:<port>/solr/dataimport?command=delta-import . It
> supports the same clean, commit, optimize and debug parameters as
> full-import command."
> 
> I know delta-import will find new data in the database and insert it
> into
> the index.  My problem is how it handles updates where I've got a record
> that exists in the index and the database, the database record is
> changed
> and I want to incorporate those changes in the existing record in the
> index.
>  IOW I don't want to insert it again.
> 
> I've tried this and wound up with 2 records with the same key in the
> index.
>  The first contains the original db values found when the index was
> created,
> the 2nd contains the db values after the record was changed.
> 
> I've also found this
> http://search.lucidimagination.com/search/out?u=http%3A%2F%2Flucene.4720
> 66.n3.nabble.com%2FDelta-import-with-solrj-client-tp1085763p1086173.html
> the
> subject is 'Delta-import with solrj client'
> 
> "Greetings. I have a *solrj* client for fetching data from database. I
> am
> using *delta*-*import* for fetching data. If a column is changed in
> database
> using timestamp with *delta*-*import* i get the latest column indexed
> but
> there are *duplicate* values in the index similar to the column but the
> data
> is older. This works with cleaning the index but i want to update the
> index
> without cleaning it. Is there a way to just update the index with the
> updated column without having *duplicate* values. Appreciate for any
> feedback.
> 
> Hando"
> 
> There are 2 responses:
> 
> "Short answer is no, there isn't a way. *Solr* doesn't have the concept
> of
> 'Update' to an indexed document. You need to add the full document (all
> 'columns') each time any one field changes. If doing that in your
> DataImportHandler logic is difficult you may need to write a separate
> Update
> Service that does:
> 
> 1) Read UniqueID, UpdatedColumn(s)  from database
> 2) Using UniqueID Retrieve document from *Solr*
> 3) Add/Update field(s) with updated column(s)
> 4) Add document back to *Solr*
> 
> Although, if you use DIH to do a full *import*, using the same query in
> your *Delta*-*Import* to get the whole document shouldn't be that
> difficult."
> 
> and
> 
> "Hi,
> 
> Make sure you use a proper "ID" field, which does *not* change even if
> the
> content in the database changes. In this way, when your
> *delta*-*import* fetches
> changed rows to index, they will update the existing rows in your index.
> "
> 
> I have an ID field that doesn't change.  It is the primary key field
> from
> the database table I am trying to index and I have verified it is
> unique.
> 
> So, does Solr allow updates (not inserts) of existing records?  Is
> anyone
> able to do this?
> 
> Mark

Reply via email to