Actually I requested .../dataimport?command=delta-import&commit=true And DIH in delta-import mode does not commit. Do you have any guess ???
INFO: Starting Delta Import Aug 14, 2011 1:42:02 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/apache-solr-3.3.0 path=/dataimport params={commit=true&command=delta-import} status=0 QTime=0 Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: event Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity event with URL: jdbc:mysql:// 85.168.123.207:3306/AGENDA Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 865 Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: event rows obtained : 3 Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: event rows obtained : 0 Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: event Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Delta Import completed successfully Aug 14, 2011 1:42:03 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 0 Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time taken = 0:0:1.282 On Sun, Aug 14, 2011 at 1:39 AM, Alexandre Sompheng <asomph...@gmail.com>wrote: > Hi Mark, > > I guess the "commit=true" when doing a "delta-import" is the solution for > the JIRA I just submit SOLR-2711. > Can you explain to me where you configured this info commit=true ? > > thanks, > Alex > > > On Thu, Jul 7, 2011 at 6:44 PM, Mark juszczec <mark.juszc...@gmail.com>wrote: > >> First thanks for all the help. >> >> I think the problem was a combination of not having a unique key defined >> AND >> not including the commit=true parameter in the delta update. >> >> Once I did those things, the delta import left me with a single (updated) >> copy of the record including the changes in the source database. >> >> Do I have write access to the Wiki so I can explicitly state commit=true >> NEEDS to be specified? >> >> Mark >> >> >> On Thu, Jul 7, 2011 at 12:39 PM, Erick Erickson <erickerick...@gmail.com >> >wrote: >> >> > I'd restart Solr after changing the schema.xml. The delta import does >> NOT >> > require restart or anything else like that..... >> > >> > The fact that two records are displayed is not what I'd expect. But Solr >> > absolutely handles the replace via <uniqueKey>. So I suspect that you're >> > not actually doing what you expect. A little-known aid for debugging DIH >> > is solr/admin/dataimport.jsp, that might give you some joy. >> > >> > But, to summarize. This should work fine for DIH as far as Solr is >> > concerned >> > assuming that <uniqueKey> is properly defined. In you query above that >> > returns two documents, can you paste the entire response with &fl=* >> > attached? >> > I'm guessing that the data in your index isn't what you're expecting... >> > >> > Also, you might want to get a copy of Luke and examine your index, >> there's >> > a >> > wealth of infomration >> > >> > >> > Best >> > Erick >> > >> > >> > On Thu, Jul 7, 2011 at 11:12 AM, Mark juszczec <mark.juszc...@gmail.com >> > >> > wrote: >> > > Erick >> > > >> > > I used to, but now I find I must have commented it out in a fit of >> rage >> > ;-) >> > > >> > > This could be the whole problem. >> > > >> > > I have verified via admin schema browser that the field is ORDER_ID >> and >> > will >> > > double check I refer to it in upper case in the appropriate places in >> the >> > > Solr config scheme. >> > > >> > > Curiously, the admin schema browser display for ORDER_ID says >> > "hasDeletions: >> > > false" - which seems the opposite of what I want. I want to be able >> to >> > > delete duplicates. Or am I interpreting this field wrong? >> > > >> > > In order to check for duplicates, I am going to using the admin >> browser >> > to >> > > enter the following in the Make A Query box: >> > > >> > > TABLE_ID:1 AND ORDER_ID:674659 >> > > >> > > When I click search and view the results, 2 records are displayed. >> One >> > has >> > > the original values, one has the changed values. I haven't examined >> the >> > xml >> > > (via view source) too closely and the next time I run I will look for >> > > something indicating one of the records is inactive. >> > > >> > > When you say "change your schema" do you mean via a delta import or by >> > > modifying the config files or both? FWIW, I am deleting the index on >> the >> > > file system, doing a full import, modifying the data in the database >> and >> > > then doing a delta import. >> > > >> > > I am not restarting Solr at all in this process. >> > > >> > > I understand Solr does not perform key management. You described >> exactly >> > > what I meant. Sorry for any confusion. >> > > >> > > Mark >> > > >> > > On Thu, Jul 7, 2011 at 10:52 AM, Erick Erickson < >> erickerick...@gmail.com >> > >wrote: >> > > >> > >> Let me re-state a few things to see if I've got it right: >> > >> >> > >> > your schema.xml file has an entry like >> > <uniqueKey>order_id</uniqueKey>, >> > >> right? >> > >> >> > >> > given this definition, any document added with an order_id that >> > already >> > >> exists in the >> > >> Solr index will be replaced. i.e. you should have one and only one >> > >> document with a >> > >> given order_id. >> > >> >> > >> > case matters. Check via the admin page ("schema browser") to see if >> > you >> > >> have >> > >> two fields, order_id an ORDER_ID. >> > >> >> > >> > How are you checking that your docs are duplicates? If you do a >> search >> > on >> > >> order_id, you should get back one and only one document (assuming >> the >> > >> definition above). A document that's deleted will just be marked as >> > >> deleted, >> > >> the data won't be purged from the index. It won't show in search >> > results, >> > >> but >> > >> it will show if you use lower-level ways to access the data. >> > >> >> > >> > Whenever you change your schema, it's best to clean the index, >> restart >> > >> the server and >> > >> re-index from scratch. Solr won't retroactively remove duplicate >> > >> <uniqueKey> entries. >> > >> >> > >> > On the stats admin/stats page you should see maxDocs and numDocs. >> The >> > >> difference >> > >> between these should be the number of deleted documents. >> > >> >> > >> > Solr doesn't "manage" unique keys. All that happens is Solr will >> > replace >> > >> any >> > >> pre-existing documents where *you've* defined the <uniqueKey> when >> a >> > >> new doc is added... >> > >> >> > >> Hope this helps >> > >> Erick >> > >> >> > >> On Thu, Jul 7, 2011 at 10:16 AM, Mark juszczec < >> mark.juszc...@gmail.com >> > > >> > >> wrote: >> > >> > Bob >> > >> > >> > >> > No, I don't. Let me look into that and post my results. >> > >> > >> > >> > Mark >> > >> > >> > >> > >> > >> > On Thu, Jul 7, 2011 at 10:14 AM, Bob Sandiford < >> > >> bob.sandif...@sirsidynix.com >> > >> >> wrote: >> > >> > >> > >> >> Hi, Mark. >> > >> >> >> > >> >> I haven't used DIH myself - so I'll need to leave comments on your >> > set >> > >> up >> > >> >> to others who have done so. >> > >> >> >> > >> >> Another question - after your initial index create (and after each >> > >> delta), >> > >> >> do you run a 'commit'? Do you run an 'optimize'? (Without the >> > >> optimize, >> > >> >> 'deleted' records still show up in query results...) >> > >> >> >> > >> >> Bob Sandiford | Lead Software Engineer | SirsiDynix >> > >> >> P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com >> > >> >> www.sirsidynix.com >> > >> >> >> > >> >> >> > >> >> > -----Original Message----- >> > >> >> > From: Mark juszczec [mailto:mark.juszc...@gmail.com] >> > >> >> > Sent: Thursday, July 07, 2011 10:04 AM >> > >> >> > To: solr-user@lucene.apache.org >> > >> >> > Subject: Re: updating existing data in index vs inserting new >> data >> > in >> > >> >> > index >> > >> >> > >> > >> >> > Bob >> > >> >> > >> > >> >> > Thanks very much for the reply! >> > >> >> > >> > >> >> > I am using a unique integer called order_id as the Solr index >> key. >> > >> >> > >> > >> >> > My query, deltaQuery and deltaImportQuery are below: >> > >> >> > >> > >> >> > <entity name="item1" >> > >> >> > pk="ORDER_ID" >> > >> >> > query="select 1 as TABLE_ID , orders.order_id, >> > >> >> > orders.order_booked_ind, >> > >> >> > orders.order_dt, orders.cancel_dt, >> orders.account_manager_id, >> > >> >> > orders.of_header_id, orders.order_status_lov_id, >> > orders.order_type_id, >> > >> >> > orders.approved_discount_pct, orders.campaign_nm, >> > >> >> > orders.approved_by_cd,orders.advertiser_id, orders.agency_id >> from >> > >> >> > orders" >> > >> >> > >> > >> >> > deltaImportQuery="select 1 as TABLE_ID, orders.order_id, >> > >> >> > orders.order_booked_ind, orders.order_dt, orders.cancel_dt, >> > >> >> > orders.account_manager_id, orders.of_header_id, >> > >> >> > orders.order_status_lov_id, >> > >> >> > orders.order_type_id, orders.approved_discount_pct, >> > >> orders.campaign_nm, >> > >> >> > orders.approved_by_cd,orders.advertiser_id, orders.agency_id >> from >> > >> orders >> > >> >> > where orders.order_id = '${dataimporter.delta.ORDER_ID}'" >> > >> >> > >> > >> >> > deltaQuery="select orders.order_id from orders where >> > >> orders.change_dt >> > >> >> > > >> > >> >> > to_date('${dataimporter.last_index_time}','YYYY-MM-DD >> HH24:MI:SS')" >> > > >> > >> >> > </entity> >> > >> >> > >> > >> >> > The test I am running is two part: >> > >> >> > >> > >> >> > 1. After I do a full import of the index, I insert a brand new >> > record >> > >> >> > (with >> > >> >> > a never existed before order_id) in the database. The delta >> import >> > >> >> > picks >> > >> >> > this up just fine. >> > >> >> > >> > >> >> > 2. After the full import, I modify a record with an order_id >> that >> > >> >> > already >> > >> >> > shows up in the index. I have verified there is only one record >> > with >> > >> >> > this >> > >> >> > order_id in both the index and the db before I do the delta >> update. >> > >> >> > >> > >> >> > I guess the question is, am I screwing myself up by defining my >> own >> > >> Solr >> > >> >> > index key? I want to, ultimately, be able to search on ORDER_ID >> in >> > >> the >> > >> >> > Solr >> > >> >> > index. However, the docs say (I think) a field does not have to >> be >> > >> the >> > >> >> > Solr >> > >> >> > primary key in order to be searchable. Would I be better off >> > letting >> > >> >> > Solr >> > >> >> > manage the keys? >> > >> >> > >> > >> >> > Mark >> > >> >> > >> > >> >> > On Thu, Jul 7, 2011 at 9:24 AM, Bob Sandiford >> > >> >> > <bob.sandif...@sirsidynix.com>wrote: >> > >> >> > >> > >> >> > > What are you using as the unique id in your Solr index? It >> > sounds >> > >> >> > like you >> > >> >> > > may have one value as your Solr index unique id, which bears >> no >> > >> >> > resemblance >> > >> >> > > to a unique[1] id derived from your data... >> > >> >> > > >> > >> >> > > Or - another way to put it - what is it that makes these two >> > records >> > >> >> > in >> > >> >> > > your Solr index 'the same', and what are the unique id's for >> > those >> > >> two >> > >> >> > > entries in the Solr index? How are those id's related to your >> > >> >> > original >> > >> >> > > data? >> > >> >> > > >> > >> >> > > [1] not only unique, but immutable. I.E. if you update a row >> in >> > >> your >> > >> >> > > database, the unique id derived from that row has to be the >> same >> > as >> > >> it >> > >> >> > would >> > >> >> > > have been before the update. Otherwise, there's nothing for >> Solr >> > to >> > >> >> > > recognize as a duplicate entry, and do a 'delete' and 'insert' >> > >> instead >> > >> >> > of >> > >> >> > > just an 'insert'. >> > >> >> > > >> > >> >> > > Bob Sandiford | Lead Software Engineer | SirsiDynix >> > >> >> > > P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com >> > >> >> > > www.sirsidynix.com >> > >> >> > > >> > >> >> > > >> > >> >> > > > -----Original Message----- >> > >> >> > > > From: Mark juszczec [mailto:mark.juszc...@gmail.com] >> > >> >> > > > Sent: Thursday, July 07, 2011 9:15 AM >> > >> >> > > > To: solr-user@lucene.apache.org >> > >> >> > > > Subject: updating existing data in index vs inserting new >> data >> > in >> > >> >> > index >> > >> >> > > > >> > >> >> > > > Hello all >> > >> >> > > > >> > >> >> > > > I'm using Solr 3.2 and am confused about updating existing >> data >> > in >> > >> >> > an >> > >> >> > > > index. >> > >> >> > > > >> > >> >> > > > According to the DataImportHandler Wiki: >> > >> >> > > > >> > >> >> > > > *"delta-import* : For incremental imports and change >> detection >> > run >> > >> >> > the >> > >> >> > > > command `http:// >> > <host>:<port>/solr/dataimport?command=delta-import >> > >> . >> > >> >> > It >> > >> >> > > > supports the same clean, commit, optimize and debug >> parameters >> > as >> > >> >> > > > full-import command." >> > >> >> > > > >> > >> >> > > > I know delta-import will find new data in the database and >> > insert >> > >> it >> > >> >> > > > into >> > >> >> > > > the index. My problem is how it handles updates where I've >> got >> > a >> > >> >> > record >> > >> >> > > > that exists in the index and the database, the database >> record >> > is >> > >> >> > > > changed >> > >> >> > > > and I want to incorporate those changes in the existing >> record >> > in >> > >> >> > the >> > >> >> > > > index. >> > >> >> > > > IOW I don't want to insert it again. >> > >> >> > > > >> > >> >> > > > I've tried this and wound up with 2 records with the same >> key >> > in >> > >> the >> > >> >> > > > index. >> > >> >> > > > The first contains the original db values found when the >> index >> > >> was >> > >> >> > > > created, >> > >> >> > > > the 2nd contains the db values after the record was changed. >> > >> >> > > > >> > >> >> > > > I've also found this >> > >> >> > > > >> > >> >> > >> > >> >> > >> http://search.lucidimagination.com/search/out?u=http%3A%2F%2Flucene.4720 >> > >> >> > > > 66.n3.nabble.com%2FDelta-import-with-solrj-client- >> > >> >> > tp1085763p1086173.html >> > >> >> > > > the >> > >> >> > > > subject is 'Delta-import with solrj client' >> > >> >> > > > >> > >> >> > > > "Greetings. I have a *solrj* client for fetching data from >> > >> database. >> > >> >> > I >> > >> >> > > > am >> > >> >> > > > using *delta*-*import* for fetching data. If a column is >> > changed >> > >> in >> > >> >> > > > database >> > >> >> > > > using timestamp with *delta*-*import* i get the latest >> column >> > >> >> > indexed >> > >> >> > > > but >> > >> >> > > > there are *duplicate* values in the index similar to the >> column >> > >> but >> > >> >> > the >> > >> >> > > > data >> > >> >> > > > is older. This works with cleaning the index but i want to >> > update >> > >> >> > the >> > >> >> > > > index >> > >> >> > > > without cleaning it. Is there a way to just update the index >> > with >> > >> >> > the >> > >> >> > > > updated column without having *duplicate* values. Appreciate >> > for >> > >> any >> > >> >> > > > feedback. >> > >> >> > > > >> > >> >> > > > Hando" >> > >> >> > > > >> > >> >> > > > There are 2 responses: >> > >> >> > > > >> > >> >> > > > "Short answer is no, there isn't a way. *Solr* doesn't have >> the >> > >> >> > concept >> > >> >> > > > of >> > >> >> > > > 'Update' to an indexed document. You need to add the full >> > document >> > >> >> > (all >> > >> >> > > > 'columns') each time any one field changes. If doing that in >> > your >> > >> >> > > > DataImportHandler logic is difficult you may need to write a >> > >> >> > separate >> > >> >> > > > Update >> > >> >> > > > Service that does: >> > >> >> > > > >> > >> >> > > > 1) Read UniqueID, UpdatedColumn(s) from database >> > >> >> > > > 2) Using UniqueID Retrieve document from *Solr* >> > >> >> > > > 3) Add/Update field(s) with updated column(s) >> > >> >> > > > 4) Add document back to *Solr* >> > >> >> > > > >> > >> >> > > > Although, if you use DIH to do a full *import*, using the >> same >> > >> query >> > >> >> > in >> > >> >> > > > your *Delta*-*Import* to get the whole document shouldn't be >> > that >> > >> >> > > > difficult." >> > >> >> > > > >> > >> >> > > > and >> > >> >> > > > >> > >> >> > > > "Hi, >> > >> >> > > > >> > >> >> > > > Make sure you use a proper "ID" field, which does *not* >> change >> > >> even >> > >> >> > if >> > >> >> > > > the >> > >> >> > > > content in the database changes. In this way, when your >> > >> >> > > > *delta*-*import* fetches >> > >> >> > > > changed rows to index, they will update the existing rows in >> > your >> > >> >> > index. >> > >> >> > > > " >> > >> >> > > > >> > >> >> > > > I have an ID field that doesn't change. It is the primary >> key >> > >> field >> > >> >> > > > from >> > >> >> > > > the database table I am trying to index and I have verified >> it >> > is >> > >> >> > > > unique. >> > >> >> > > > >> > >> >> > > > So, does Solr allow updates (not inserts) of existing >> records? >> > Is >> > >> >> > > > anyone >> > >> >> > > > able to do this? >> > >> >> > > > >> > >> >> > > > Mark >> > >> >> > > >> > >> >> > > >> > >> >> >> > >> >> >> > >> > >> > >> >> > > >> > >> > >