Bob No, I don't. Let me look into that and post my results.
Mark On Thu, Jul 7, 2011 at 10:14 AM, Bob Sandiford <bob.sandif...@sirsidynix.com > wrote: > Hi, Mark. > > I haven't used DIH myself - so I'll need to leave comments on your set up > to others who have done so. > > Another question - after your initial index create (and after each delta), > do you run a 'commit'? Do you run an 'optimize'? (Without the optimize, > 'deleted' records still show up in query results...) > > Bob Sandiford | Lead Software Engineer | SirsiDynix > P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com > www.sirsidynix.com > > > > -----Original Message----- > > From: Mark juszczec [mailto:mark.juszc...@gmail.com] > > Sent: Thursday, July 07, 2011 10:04 AM > > To: solr-user@lucene.apache.org > > Subject: Re: updating existing data in index vs inserting new data in > > index > > > > Bob > > > > Thanks very much for the reply! > > > > I am using a unique integer called order_id as the Solr index key. > > > > My query, deltaQuery and deltaImportQuery are below: > > > > <entity name="item1" > > pk="ORDER_ID" > > query="select 1 as TABLE_ID , orders.order_id, > > orders.order_booked_ind, > > orders.order_dt, orders.cancel_dt, orders.account_manager_id, > > orders.of_header_id, orders.order_status_lov_id, orders.order_type_id, > > orders.approved_discount_pct, orders.campaign_nm, > > orders.approved_by_cd,orders.advertiser_id, orders.agency_id from > > orders" > > > > deltaImportQuery="select 1 as TABLE_ID, orders.order_id, > > orders.order_booked_ind, orders.order_dt, orders.cancel_dt, > > orders.account_manager_id, orders.of_header_id, > > orders.order_status_lov_id, > > orders.order_type_id, orders.approved_discount_pct, orders.campaign_nm, > > orders.approved_by_cd,orders.advertiser_id, orders.agency_id from orders > > where orders.order_id = '${dataimporter.delta.ORDER_ID}'" > > > > deltaQuery="select orders.order_id from orders where orders.change_dt > > > > > to_date('${dataimporter.last_index_time}','YYYY-MM-DD HH24:MI:SS')" > > > </entity> > > > > The test I am running is two part: > > > > 1. After I do a full import of the index, I insert a brand new record > > (with > > a never existed before order_id) in the database. The delta import > > picks > > this up just fine. > > > > 2. After the full import, I modify a record with an order_id that > > already > > shows up in the index. I have verified there is only one record with > > this > > order_id in both the index and the db before I do the delta update. > > > > I guess the question is, am I screwing myself up by defining my own Solr > > index key? I want to, ultimately, be able to search on ORDER_ID in the > > Solr > > index. However, the docs say (I think) a field does not have to be the > > Solr > > primary key in order to be searchable. Would I be better off letting > > Solr > > manage the keys? > > > > Mark > > > > On Thu, Jul 7, 2011 at 9:24 AM, Bob Sandiford > > <bob.sandif...@sirsidynix.com>wrote: > > > > > What are you using as the unique id in your Solr index? It sounds > > like you > > > may have one value as your Solr index unique id, which bears no > > resemblance > > > to a unique[1] id derived from your data... > > > > > > Or - another way to put it - what is it that makes these two records > > in > > > your Solr index 'the same', and what are the unique id's for those two > > > entries in the Solr index? How are those id's related to your > > original > > > data? > > > > > > [1] not only unique, but immutable. I.E. if you update a row in your > > > database, the unique id derived from that row has to be the same as it > > would > > > have been before the update. Otherwise, there's nothing for Solr to > > > recognize as a duplicate entry, and do a 'delete' and 'insert' instead > > of > > > just an 'insert'. > > > > > > Bob Sandiford | Lead Software Engineer | SirsiDynix > > > P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com > > > www.sirsidynix.com > > > > > > > > > > -----Original Message----- > > > > From: Mark juszczec [mailto:mark.juszc...@gmail.com] > > > > Sent: Thursday, July 07, 2011 9:15 AM > > > > To: solr-user@lucene.apache.org > > > > Subject: updating existing data in index vs inserting new data in > > index > > > > > > > > Hello all > > > > > > > > I'm using Solr 3.2 and am confused about updating existing data in > > an > > > > index. > > > > > > > > According to the DataImportHandler Wiki: > > > > > > > > *"delta-import* : For incremental imports and change detection run > > the > > > > command `http://<host>:<port>/solr/dataimport?command=delta-import . > > It > > > > supports the same clean, commit, optimize and debug parameters as > > > > full-import command." > > > > > > > > I know delta-import will find new data in the database and insert it > > > > into > > > > the index. My problem is how it handles updates where I've got a > > record > > > > that exists in the index and the database, the database record is > > > > changed > > > > and I want to incorporate those changes in the existing record in > > the > > > > index. > > > > IOW I don't want to insert it again. > > > > > > > > I've tried this and wound up with 2 records with the same key in the > > > > index. > > > > The first contains the original db values found when the index was > > > > created, > > > > the 2nd contains the db values after the record was changed. > > > > > > > > I've also found this > > > > > > http://search.lucidimagination.com/search/out?u=http%3A%2F%2Flucene.4720 > > > > 66.n3.nabble.com%2FDelta-import-with-solrj-client- > > tp1085763p1086173.html > > > > the > > > > subject is 'Delta-import with solrj client' > > > > > > > > "Greetings. I have a *solrj* client for fetching data from database. > > I > > > > am > > > > using *delta*-*import* for fetching data. If a column is changed in > > > > database > > > > using timestamp with *delta*-*import* i get the latest column > > indexed > > > > but > > > > there are *duplicate* values in the index similar to the column but > > the > > > > data > > > > is older. This works with cleaning the index but i want to update > > the > > > > index > > > > without cleaning it. Is there a way to just update the index with > > the > > > > updated column without having *duplicate* values. Appreciate for any > > > > feedback. > > > > > > > > Hando" > > > > > > > > There are 2 responses: > > > > > > > > "Short answer is no, there isn't a way. *Solr* doesn't have the > > concept > > > > of > > > > 'Update' to an indexed document. You need to add the full document > > (all > > > > 'columns') each time any one field changes. If doing that in your > > > > DataImportHandler logic is difficult you may need to write a > > separate > > > > Update > > > > Service that does: > > > > > > > > 1) Read UniqueID, UpdatedColumn(s) from database > > > > 2) Using UniqueID Retrieve document from *Solr* > > > > 3) Add/Update field(s) with updated column(s) > > > > 4) Add document back to *Solr* > > > > > > > > Although, if you use DIH to do a full *import*, using the same query > > in > > > > your *Delta*-*Import* to get the whole document shouldn't be that > > > > difficult." > > > > > > > > and > > > > > > > > "Hi, > > > > > > > > Make sure you use a proper "ID" field, which does *not* change even > > if > > > > the > > > > content in the database changes. In this way, when your > > > > *delta*-*import* fetches > > > > changed rows to index, they will update the existing rows in your > > index. > > > > " > > > > > > > > I have an ID field that doesn't change. It is the primary key field > > > > from > > > > the database table I am trying to index and I have verified it is > > > > unique. > > > > > > > > So, does Solr allow updates (not inserts) of existing records? Is > > > > anyone > > > > able to do this? > > > > > > > > Mark > > > > > > > >