Actually I requested      .../dataimport?command=delta-import&commit=true
And DIH in delta-import mode does not commit. Do you have any guess ???


INFO: Starting Delta Import

Aug 14, 2011 1:42:02 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=/apache-solr-3.3.0 path=/dataimport
params={commit=true&command=delta-import} status=0 QTime=0

Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties

INFO: Read dataimport.properties

Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder
doDelta

INFO: Starting delta collection.

Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta

INFO: Running ModifiedRowKey() for Entity: event

Aug 14, 2011 1:42:02 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
call

INFO: Creating a connection for entity event with URL: jdbc:mysql://
85.168.123.207:3306/AGENDA

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.JdbcDataSource$1
call

INFO: Time taken for getConnection(): 865

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta

INFO: Completed ModifiedRowKey for Entity: event rows obtained : 3

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta

INFO: Completed DeletedRowKey for Entity: event rows obtained : 0

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta

INFO: Completed parentDeltaQuery for Entity: event

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
doDelta

INFO: Delta Import completed successfully

Aug 14, 2011 1:42:03 AM org.apache.solr.update.processor.LogUpdateProcessor
finish

INFO: {} 0 0

Aug 14, 2011 1:42:03 AM org.apache.solr.handler.dataimport.DocBuilder
execute

INFO: Time taken = 0:0:1.282


On Sun, Aug 14, 2011 at 1:39 AM, Alexandre Sompheng <asomph...@gmail.com>wrote:

> Hi Mark,
>
> I guess the "commit=true" when doing a "delta-import" is the solution for
> the JIRA I just submit SOLR-2711.
> Can you explain to me where you configured this info commit=true ?
>
> thanks,
> Alex
>
>
> On Thu, Jul 7, 2011 at 6:44 PM, Mark juszczec <mark.juszc...@gmail.com>wrote:
>
>> First thanks for all the help.
>>
>> I think the problem was a combination of not having a unique key defined
>> AND
>> not including the commit=true parameter in the delta update.
>>
>> Once I did those things, the delta import left me with a single (updated)
>> copy of the record including the changes in the source database.
>>
>> Do I have write access to the Wiki so I can explicitly state commit=true
>> NEEDS to be specified?
>>
>> Mark
>>
>>
>> On Thu, Jul 7, 2011 at 12:39 PM, Erick Erickson <erickerick...@gmail.com
>> >wrote:
>>
>> > I'd restart Solr after changing the schema.xml. The delta import does
>> NOT
>> > require restart or anything else like that.....
>> >
>> > The fact that two records are displayed is not what I'd expect. But Solr
>> > absolutely handles the replace via <uniqueKey>. So I suspect that you're
>> > not actually doing what you expect. A little-known aid for debugging DIH
>> > is solr/admin/dataimport.jsp, that might give you some joy.
>> >
>> > But, to summarize. This should work fine for DIH as far as Solr is
>> > concerned
>> > assuming that <uniqueKey> is properly defined. In you query above that
>> > returns two documents, can you paste the entire response with &fl=*
>> > attached?
>> > I'm guessing that the data in your index isn't what you're expecting...
>> >
>> > Also, you might want to get a copy of Luke and examine your index,
>> there's
>> > a
>> > wealth of infomration
>> >
>> >
>> > Best
>> > Erick
>> >
>> >
>> > On Thu, Jul 7, 2011 at 11:12 AM, Mark juszczec <mark.juszc...@gmail.com
>> >
>> > wrote:
>> > > Erick
>> > >
>> > > I used to, but now I find I must have commented it out in a fit of
>> rage
>> > ;-)
>> > >
>> > > This could be the whole problem.
>> > >
>> > > I have verified via admin schema browser that the field is ORDER_ID
>> and
>> > will
>> > > double check I refer to it in upper case in the appropriate places in
>> the
>> > > Solr config scheme.
>> > >
>> > > Curiously, the admin schema browser display for ORDER_ID says
>> > "hasDeletions:
>> > > false"  - which seems the opposite of what I want.  I want to be able
>> to
>> > > delete duplicates.  Or am I interpreting this field wrong?
>> > >
>> > > In order to check for duplicates, I am going to using the admin
>> browser
>> > to
>> > > enter the following in the Make A Query box:
>> > >
>> > > TABLE_ID:1 AND ORDER_ID:674659
>> > >
>> > > When I click search and view the results, 2 records are displayed.
>>  One
>> > has
>> > > the original values, one has the changed values.  I haven't examined
>> the
>> > xml
>> > > (via view source) too closely and the next time I run I will look for
>> > > something indicating one of the records is inactive.
>> > >
>> > > When you say "change your schema" do you mean via a delta import or by
>> > > modifying the config files or both?  FWIW, I am deleting the index on
>> the
>> > > file system, doing a full import, modifying the data in the database
>> and
>> > > then doing a delta import.
>> > >
>> > > I am not restarting Solr at all in this process.
>> > >
>> > > I understand Solr does not perform key management.  You described
>> exactly
>> > > what I meant.  Sorry for any confusion.
>> > >
>> > > Mark
>> > >
>> > > On Thu, Jul 7, 2011 at 10:52 AM, Erick Erickson <
>> erickerick...@gmail.com
>> > >wrote:
>> > >
>> > >> Let me re-state a few things to see if I've got it right:
>> > >>
>> > >> > your schema.xml file has an entry like
>> > <uniqueKey>order_id</uniqueKey>,
>> > >> right?
>> > >>
>> > >> > given this definition, any document added with an order_id that
>> > already
>> > >> exists in the
>> > >>   Solr index will be replaced. i.e. you should have one and only one
>> > >> document with a
>> > >>   given order_id.
>> > >>
>> > >> > case matters. Check via the admin page ("schema browser") to see if
>> > you
>> > >> have
>> > >>   two fields, order_id an ORDER_ID.
>> > >>
>> > >> > How are you checking that your docs are duplicates? If you do a
>> search
>> > on
>> > >>   order_id, you should get back one and only one document (assuming
>> the
>> > >>   definition above). A document that's deleted will just be marked as
>> > >> deleted,
>> > >>   the data won't be purged from the index. It won't show in search
>> > results,
>> > >> but
>> > >>   it will show if you use lower-level ways to access the data.
>> > >>
>> > >> > Whenever you change your schema, it's best to clean the index,
>> restart
>> > >> the server and
>> > >>    re-index from scratch. Solr won't retroactively remove duplicate
>> > >> <uniqueKey> entries.
>> > >>
>> > >> > On the stats admin/stats page you should see maxDocs and numDocs.
>> The
>> > >> difference
>> > >>   between these should be the number of deleted documents.
>> > >>
>> > >> > Solr doesn't "manage" unique keys. All that happens is Solr will
>> > replace
>> > >> any
>> > >>   pre-existing documents where *you've* defined the <uniqueKey> when
>> a
>> > >>   new doc is added...
>> > >>
>> > >> Hope this helps
>> > >> Erick
>> > >>
>> > >> On Thu, Jul 7, 2011 at 10:16 AM, Mark juszczec <
>> mark.juszc...@gmail.com
>> > >
>> > >> wrote:
>> > >> > Bob
>> > >> >
>> > >> > No, I don't.  Let me look into that and post my results.
>> > >> >
>> > >> > Mark
>> > >> >
>> > >> >
>> > >> > On Thu, Jul 7, 2011 at 10:14 AM, Bob Sandiford <
>> > >> bob.sandif...@sirsidynix.com
>> > >> >> wrote:
>> > >> >
>> > >> >> Hi, Mark.
>> > >> >>
>> > >> >> I haven't used DIH myself - so I'll need to leave comments on your
>> > set
>> > >> up
>> > >> >> to others who have done so.
>> > >> >>
>> > >> >> Another question - after your initial index create (and after each
>> > >> delta),
>> > >> >> do you run a 'commit'?  Do you run an 'optimize'?  (Without the
>> > >> optimize,
>> > >> >> 'deleted' records still show up in query results...)
>> > >> >>
>> > >> >> Bob Sandiford | Lead Software Engineer | SirsiDynix
>> > >> >> P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
>> > >> >> www.sirsidynix.com
>> > >> >>
>> > >> >>
>> > >> >> > -----Original Message-----
>> > >> >> > From: Mark juszczec [mailto:mark.juszc...@gmail.com]
>> > >> >> > Sent: Thursday, July 07, 2011 10:04 AM
>> > >> >> > To: solr-user@lucene.apache.org
>> > >> >> > Subject: Re: updating existing data in index vs inserting new
>> data
>> > in
>> > >> >> > index
>> > >> >> >
>> > >> >> > Bob
>> > >> >> >
>> > >> >> > Thanks very much for the reply!
>> > >> >> >
>> > >> >> > I am using a unique integer called order_id as the Solr index
>> key.
>> > >> >> >
>> > >> >> > My query, deltaQuery and deltaImportQuery are below:
>> > >> >> >
>> > >> >> > <entity name="item1"
>> > >> >> >   pk="ORDER_ID"
>> > >> >> >   query="select 1 as TABLE_ID , orders.order_id,
>> > >> >> > orders.order_booked_ind,
>> > >> >> > orders.order_dt, orders.cancel_dt,
>> orders.account_manager_id,
>> > >> >> > orders.of_header_id, orders.order_status_lov_id,
>> > orders.order_type_id,
>> > >> >> > orders.approved_discount_pct, orders.campaign_nm,
>> > >> >> > orders.approved_by_cd,orders.advertiser_id, orders.agency_id
>> from
>> > >> >> > orders"
>> > >> >> >
>> > >> >> >   deltaImportQuery="select 1 as TABLE_ID, orders.order_id,
>> > >> >> > orders.order_booked_ind, orders.order_dt, orders.cancel_dt,
>> > >> >> > orders.account_manager_id, orders.of_header_id,
>> > >> >> > orders.order_status_lov_id,
>> > >> >> > orders.order_type_id, orders.approved_discount_pct,
>> > >> orders.campaign_nm,
>> > >> >> > orders.approved_by_cd,orders.advertiser_id, orders.agency_id
>> from
>> > >> orders
>> > >> >> > where orders.order_id = '${dataimporter.delta.ORDER_ID}'"
>> > >> >> >
>> > >> >> >   deltaQuery="select orders.order_id from orders where
>> > >> orders.change_dt
>> > >> >> > >
>> > >> >> > to_date('${dataimporter.last_index_time}','YYYY-MM-DD
>> HH24:MI:SS')"
>> > >
>> > >> >> >         </entity>
>> > >> >> >
>> > >> >> > The test I am running is two part:
>> > >> >> >
>> > >> >> > 1.  After I do a full import of the index, I insert a brand new
>> > record
>> > >> >> > (with
>> > >> >> > a never existed before order_id) in the database.  The delta
>> import
>> > >> >> > picks
>> > >> >> > this up just fine.
>> > >> >> >
>> > >> >> > 2.  After the full import, I modify a record with an order_id
>> that
>> > >> >> > already
>> > >> >> > shows up in the index.  I have verified there is only one record
>> > with
>> > >> >> > this
>> > >> >> > order_id in both the index and the db before I do the delta
>> update.
>> > >> >> >
>> > >> >> > I guess the question is, am I screwing myself up by defining my
>> own
>> > >> Solr
>> > >> >> > index key?  I want to, ultimately, be able to search on ORDER_ID
>> in
>> > >> the
>> > >> >> > Solr
>> > >> >> > index.  However, the docs say (I think) a field does not have to
>> be
>> > >> the
>> > >> >> > Solr
>> > >> >> > primary key in order to be searchable.  Would I be better off
>> > letting
>> > >> >> > Solr
>> > >> >> > manage the keys?
>> > >> >> >
>> > >> >> > Mark
>> > >> >> >
>> > >> >> > On Thu, Jul 7, 2011 at 9:24 AM, Bob Sandiford
>> > >> >> > <bob.sandif...@sirsidynix.com>wrote:
>> > >> >> >
>> > >> >> > > What are you using as the unique id in your Solr index?  It
>> > sounds
>> > >> >> > like you
>> > >> >> > > may have one value as your Solr index unique id, which bears
>> no
>> > >> >> > resemblance
>> > >> >> > > to a unique[1] id derived from your data...
>> > >> >> > >
>> > >> >> > > Or - another way to put it - what is it that makes these two
>> > records
>> > >> >> > in
>> > >> >> > > your Solr index 'the same', and what are the unique id's for
>> > those
>> > >> two
>> > >> >> > > entries in the Solr index?  How are those id's related to your
>> > >> >> > original
>> > >> >> > > data?
>> > >> >> > >
>> > >> >> > > [1] not only unique, but immutable.  I.E. if you update a row
>> in
>> > >> your
>> > >> >> > > database, the unique id derived from that row has to be the
>> same
>> > as
>> > >> it
>> > >> >> > would
>> > >> >> > > have been before the update.  Otherwise, there's nothing for
>> Solr
>> > to
>> > >> >> > > recognize as a duplicate entry, and do a 'delete' and 'insert'
>> > >> instead
>> > >> >> > of
>> > >> >> > > just an 'insert'.
>> > >> >> > >
>> > >> >> > > Bob Sandiford | Lead Software Engineer | SirsiDynix
>> > >> >> > > P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
>> > >> >> > > www.sirsidynix.com
>> > >> >> > >
>> > >> >> > >
>> > >> >> > > > -----Original Message-----
>> > >> >> > > > From: Mark juszczec [mailto:mark.juszc...@gmail.com]
>> > >> >> > > > Sent: Thursday, July 07, 2011 9:15 AM
>> > >> >> > > > To: solr-user@lucene.apache.org
>> > >> >> > > > Subject: updating existing data in index vs inserting new
>> data
>> > in
>> > >> >> > index
>> > >> >> > > >
>> > >> >> > > > Hello all
>> > >> >> > > >
>> > >> >> > > > I'm using Solr 3.2 and am confused about updating existing
>> data
>> > in
>> > >> >> > an
>> > >> >> > > > index.
>> > >> >> > > >
>> > >> >> > > > According to the DataImportHandler Wiki:
>> > >> >> > > >
>> > >> >> > > > *"delta-import* : For incremental imports and change
>> detection
>> > run
>> > >> >> > the
>> > >> >> > > > command `http://
>> > <host>:<port>/solr/dataimport?command=delta-import
>> > >> .
>> > >> >> > It
>> > >> >> > > > supports the same clean, commit, optimize and debug
>> parameters
>> > as
>> > >> >> > > > full-import command."
>> > >> >> > > >
>> > >> >> > > > I know delta-import will find new data in the database and
>> > insert
>> > >> it
>> > >> >> > > > into
>> > >> >> > > > the index.  My problem is how it handles updates where I've
>> got
>> > a
>> > >> >> > record
>> > >> >> > > > that exists in the index and the database, the database
>> record
>> > is
>> > >> >> > > > changed
>> > >> >> > > > and I want to incorporate those changes in the existing
>> record
>> > in
>> > >> >> > the
>> > >> >> > > > index.
>> > >> >> > > >  IOW I don't want to insert it again.
>> > >> >> > > >
>> > >> >> > > > I've tried this and wound up with 2 records with the same
>> key
>> > in
>> > >> the
>> > >> >> > > > index.
>> > >> >> > > >  The first contains the original db values found when the
>> index
>> > >> was
>> > >> >> > > > created,
>> > >> >> > > > the 2nd contains the db values after the record was changed.
>> > >> >> > > >
>> > >> >> > > > I've also found this
>> > >> >> > > >
>> > >> >> >
>> > >>
>> >
>> http://search.lucidimagination.com/search/out?u=http%3A%2F%2Flucene.4720
>> > >> >> > > > 66.n3.nabble.com%2FDelta-import-with-solrj-client-
>> > >> >> > tp1085763p1086173.html
>> > >> >> > > > the
>> > >> >> > > > subject is 'Delta-import with solrj client'
>> > >> >> > > >
>> > >> >> > > > "Greetings. I have a *solrj* client for fetching data from
>> > >> database.
>> > >> >> > I
>> > >> >> > > > am
>> > >> >> > > > using *delta*-*import* for fetching data. If a column is
>> > changed
>> > >> in
>> > >> >> > > > database
>> > >> >> > > > using timestamp with *delta*-*import* i get the latest
>> column
>> > >> >> > indexed
>> > >> >> > > > but
>> > >> >> > > > there are *duplicate* values in the index similar to the
>> column
>> > >> but
>> > >> >> > the
>> > >> >> > > > data
>> > >> >> > > > is older. This works with cleaning the index but i want to
>> > update
>> > >> >> > the
>> > >> >> > > > index
>> > >> >> > > > without cleaning it. Is there a way to just update the index
>> > with
>> > >> >> > the
>> > >> >> > > > updated column without having *duplicate* values. Appreciate
>> > for
>> > >> any
>> > >> >> > > > feedback.
>> > >> >> > > >
>> > >> >> > > > Hando"
>> > >> >> > > >
>> > >> >> > > > There are 2 responses:
>> > >> >> > > >
>> > >> >> > > > "Short answer is no, there isn't a way. *Solr* doesn't have
>> the
>> > >> >> > concept
>> > >> >> > > > of
>> > >> >> > > > 'Update' to an indexed document. You need to add the full
>> > document
>> > >> >> > (all
>> > >> >> > > > 'columns') each time any one field changes. If doing that in
>> > your
>> > >> >> > > > DataImportHandler logic is difficult you may need to write a
>> > >> >> > separate
>> > >> >> > > > Update
>> > >> >> > > > Service that does:
>> > >> >> > > >
>> > >> >> > > > 1) Read UniqueID, UpdatedColumn(s)  from database
>> > >> >> > > > 2) Using UniqueID Retrieve document from *Solr*
>> > >> >> > > > 3) Add/Update field(s) with updated column(s)
>> > >> >> > > > 4) Add document back to *Solr*
>> > >> >> > > >
>> > >> >> > > > Although, if you use DIH to do a full *import*, using the
>> same
>> > >> query
>> > >> >> > in
>> > >> >> > > > your *Delta*-*Import* to get the whole document shouldn't be
>> > that
>> > >> >> > > > difficult."
>> > >> >> > > >
>> > >> >> > > > and
>> > >> >> > > >
>> > >> >> > > > "Hi,
>> > >> >> > > >
>> > >> >> > > > Make sure you use a proper "ID" field, which does *not*
>> change
>> > >> even
>> > >> >> > if
>> > >> >> > > > the
>> > >> >> > > > content in the database changes. In this way, when your
>> > >> >> > > > *delta*-*import* fetches
>> > >> >> > > > changed rows to index, they will update the existing rows in
>> > your
>> > >> >> > index.
>> > >> >> > > > "
>> > >> >> > > >
>> > >> >> > > > I have an ID field that doesn't change.  It is the primary
>> key
>> > >> field
>> > >> >> > > > from
>> > >> >> > > > the database table I am trying to index and I have verified
>> it
>> > is
>> > >> >> > > > unique.
>> > >> >> > > >
>> > >> >> > > > So, does Solr allow updates (not inserts) of existing
>> records?
>> >  Is
>> > >> >> > > > anyone
>> > >> >> > > > able to do this?
>> > >> >> > > >
>> > >> >> > > > Mark
>> > >> >> > >
>> > >> >> > >
>> > >> >>
>> > >> >>
>> > >> >
>> > >>
>> > >
>> >
>>
>
>

Reply via email to