Search the list for my post "DIH - deleting documents, high performance (delta) imports, and passing parameters" which shows my solution a similar problem.
Ephraim Ofir -----Original Message----- From: Alexandre Rocco [mailto:alel...@gmail.com] Sent: Tuesday, May 24, 2011 11:24 PM To: solr-user@lucene.apache.org Subject: DIH import and postImportDeleteQuery Guys, I am facing a situation in one of our projects that I need to perform a cleanup to remove some documents after we perform an update via DIH. The big issue right now comes from the fact that when we call the DIH with clean=false, the postImportDeleteQuery is not executed. My setup is currently arranged like this: - A SQL Server stored procedure that receives a parameter (specified in the URL) and returns the records to be indexed - The procedure is able to return all the records (for a full-import) or only the updated records (for a delta-import) - This procedure returns valid and deleted records, from this point comes the need to run a postImportDeleteQuery to remove the deleted ones. Everything works fine when I run a full-import, I am running always with clean=true, and then the whole index is rebuilt. When I need to do an incremental update, the records are updated correctly, but the command to delete the other records is not executed. I've tried several combinations, with different results: - Running full-import with clean=false: the records are updated but the ones that needs to be deleted stays on the index - Running delta-import with clean=false: the records are updated but the ones that needs to be deleted stays on the index - Running delta-import with clean=true: all records are deleted from the index and then only the records returned by the procedure are on the index, except the deleted ones. I don't see any way to achieve my goal, without changing the process that I do to obtain the data. Since this is a very complex stored procedure, with tons of joins and custom processing, I am trying everything to avoid messing with it. See below a copy of my data-config.xml file. I made it simpler omitting all the fields, since it's out of scope of the issue: <?xml version="1.0" encoding="UTF-8" ?> <dataConfig> <dataSource type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://myserver;databaseName=mydb;user=username;password= password;responseBuffering=adaptive;" /> <document> <entity name="entity_one" pk="entityid" transformer="RegexTransformer" query="EXEC some_stored_procedure ${dataimporter.request.someid}" preImportDeleteQuery="status:1" postImportDeleteQuery="status:1" > <field column="field1" name="field1" splitBy=";" /> <field column="field2" name="field2" splitBy=";" /> <field column="field3" name="field3" splitBy=";" /> </entity> <entity name="entity_two" pk="entityid" transformer="RegexTransformer" query="EXEC someother_stored_procedure ${dataimporter.request.someotherid}" preImportDeleteQuery="status:1" postImportDeleteQuery="status:1" > <field column="field1" name="field1" /> <field column="field2" name="field2" /> <field column="field3" name="field2" /> </entity> </document> </dataConfig> Any ideas or pointers that might help on this one? Many thanks, Alexandre