The "failure to commit" bug with $deleteDocById can be fixed by applying patch SOLR-2492. This patch also partially fixes the "no updated stats" bug in that it increments 1 for every call to $deleteDocById and $deleteDocByQuery. Note that this might result in inaccurate counts if the id given with $deleteDocById doesn't exist or is duplicated. Obviously this is not a complete fix for stats using $deleteDocByQuery as this command would normally be used to delete >1 doc at a time.
The patch is for Trunk but it might work with 3.1 also. If not, it likely only needs minor tweaking. The jira ticket is here: https://issues.apache.org/jira/browse/SOLR-2492 James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: Alexandre Rocco [mailto:alel...@gmail.com] Sent: Wednesday, May 25, 2011 12:54 PM To: solr-user@lucene.apache.org Subject: Re: DIH import and postImportDeleteQuery Hi Ephraim, Thank you so much for the input. I was able to find your thread on the archives and got your solution to work. In fact, when using $deleteDocById and $skipDoc it worked like a charm. This feature is very useful, it's a shame it's not properly documented. The only downside is the one you mentioned that the stats are not updated, so if I update 13 documents and delete 2, DIH would tell me that only 13 documents were processed. This is bad in my case because I check the end result to generate an error e-mail if needed. You also mentioned that if the query contains only deletion records, a commit would not be automatically executed and it would be necessary to commit manually. How can I commit manually via DIH? I was not able to find any references on the documentation. Thanks! Alexandre On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir <ephra...@icq.com> wrote: > Search the list for my post "DIH - deleting documents, high performance > (delta) imports, and passing parameters" which shows my solution a > similar problem. > > Ephraim Ofir > > -----Original Message----- > From: Alexandre Rocco [mailto:alel...@gmail.com] > Sent: Tuesday, May 24, 2011 11:24 PM > To: solr-user@lucene.apache.org > Subject: DIH import and postImportDeleteQuery > > Guys, > > I am facing a situation in one of our projects that I need to perform a > cleanup to remove some documents after we perform an update via DIH. > The big issue right now comes from the fact that when we call the DIH > with > clean=false, the postImportDeleteQuery is not executed. > > My setup is currently arranged like this: > - A SQL Server stored procedure that receives a parameter (specified in > the > URL) and returns the records to be indexed > - The procedure is able to return all the records (for a full-import) or > only the updated records (for a delta-import) > - This procedure returns valid and deleted records, from this point > comes > the need to run a postImportDeleteQuery to remove the deleted ones. > > Everything works fine when I run a full-import, I am running always with > clean=true, and then the whole index is rebuilt. > When I need to do an incremental update, the records are updated > correctly, > but the command to delete the other records is not executed. > > I've tried several combinations, with different results: > - Running full-import with clean=false: the records are updated but the > ones > that needs to be deleted stays on the index > - Running delta-import with clean=false: the records are updated but the > ones that needs to be deleted stays on the index > - Running delta-import with clean=true: all records are deleted from the > index and then only the records returned by the procedure are on the > index, > except the deleted ones. > > I don't see any way to achieve my goal, without changing the process > that I > do to obtain the data. > Since this is a very complex stored procedure, with tons of joins and > custom > processing, I am trying everything to avoid messing with it. > > See below a copy of my data-config.xml file. I made it simpler omitting > all > the fields, since it's out of scope of the issue: > <?xml version="1.0" encoding="UTF-8" ?> > <dataConfig> > <dataSource type="JdbcDataSource" > driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" > url="jdbc:sqlserver://myserver;databaseName=mydb;user=username;password= > password;responseBuffering=adaptive;" > > /> > <document> > <entity name="entity_one" > pk="entityid" > transformer="RegexTransformer" > query="EXEC some_stored_procedure ${dataimporter.request.someid}" > preImportDeleteQuery="status:1" postImportDeleteQuery="status:1" > > > <field column="field1" name="field1" splitBy=";" /> > <field column="field2" name="field2" splitBy=";" /> > <field column="field3" name="field3" splitBy=";" /> > </entity> > > <entity name="entity_two" > pk="entityid" > transformer="RegexTransformer" > query="EXEC someother_stored_procedure > ${dataimporter.request.someotherid}" > preImportDeleteQuery="status:1" postImportDeleteQuery="status:1" > > > <field column="field1" name="field1" /> > <field column="field2" name="field2" /> > <field column="field3" name="field2" /> > </entity> > </document> > </dataConfig> > > Any ideas or pointers that might help on this one? > > Many thanks, > Alexandre >