Search the list for my post "DIH - deleting documents, high performance
(delta) imports, and passing parameters" which shows my solution a
similar problem.

Ephraim Ofir

-----Original Message-----
From: Alexandre Rocco [mailto:alel...@gmail.com] 
Sent: Tuesday, May 24, 2011 11:24 PM
To: solr-user@lucene.apache.org
Subject: DIH import and postImportDeleteQuery

Guys,

I am facing a situation in one of our projects that I need to perform a
cleanup to remove some documents after we perform an update via DIH.
The big issue right now comes from the fact that when we call the DIH
with
clean=false, the postImportDeleteQuery is not executed.

My setup is currently arranged like this:
- A SQL Server stored procedure that receives a parameter (specified in
the
URL) and returns the records to be indexed
- The procedure is able to return all the records (for a full-import) or
only the updated records (for a delta-import)
- This procedure returns valid and deleted records, from this point
comes
the need to run a postImportDeleteQuery to remove the deleted ones.

Everything works fine when I run a full-import, I am running always with
clean=true, and then the whole index is rebuilt.
When I need to do an incremental update, the records are updated
correctly,
but the command to delete the other records is not executed.

I've tried several combinations, with different results:
- Running full-import with clean=false: the records are updated but the
ones
that needs to be deleted stays on the index
- Running delta-import with clean=false: the records are updated but the
ones that needs to be deleted stays on the index
- Running delta-import with clean=true: all records are deleted from the
index and then only the records returned by the procedure are on the
index,
except the deleted ones.

I don't see any way to achieve my goal, without changing the process
that I
do to obtain the data.
Since this is a very complex stored procedure, with tons of joins and
custom
processing, I am trying everything to avoid messing with it.

See below a copy of my data-config.xml file. I made it simpler omitting
all
the fields, since it's out of scope of the issue:
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://myserver;databaseName=mydb;user=username;password=
password;responseBuffering=adaptive;"

/>
<document>
<entity name="entity_one"
pk="entityid"
transformer="RegexTransformer"
query="EXEC some_stored_procedure ${dataimporter.request.someid}"
preImportDeleteQuery="status:1" postImportDeleteQuery="status:1"
>
<field column="field1" name="field1" splitBy=";" />
<field column="field2" name="field2" splitBy=";" />
<field column="field3" name="field3" splitBy=";" />
</entity>

<entity name="entity_two"
pk="entityid"
transformer="RegexTransformer"
query="EXEC someother_stored_procedure
${dataimporter.request.someotherid}"
preImportDeleteQuery="status:1" postImportDeleteQuery="status:1"
>
<field column="field1" name="field1" />
<field column="field2" name="field2" />
<field column="field3" name="field2" />
</entity>
</document>
</dataConfig>

Any ideas or pointers that might help on this one?

Many thanks,
Alexandre

Reply via email to