Hi,

I use the DIH with RDBMS for indexing a large mysql database with about 7 mill. entries. Full index is working fine, in schema.xml I implemented a uniqueKey field (which is of the type 'text').

I start queries with the dismax query handler, and get my results as an php array.

Now, since the database entries change every second, I use the delta query property to a) delete documents from the index that have been deleted in the database (there´s a table for deleted items) and b) update documents in the index that have changed since the last index (there´s a last_modified-column in a table for that).

From my understanding, when I start a delta-import, the DIH checks the deletedPkQuery first and deletes the documents that should be deleted (identified by the uniqueKey-field?). Seems to work - the catalina.out says "INFO: deleted from document to Solr: 1851010" for example. Next thing would be the deltaQuery. This seems to work, too - when finished, a query returns the new database entries.
But (and here comes the problem):
The dataimport status always says "Added / Changed x-hundred documents, deleted 0 documents" -> no deletes? Everytime I change an item in the database, and do a delta-import after that, my next query will return that item *twice*. After the next change and next delta-import solr will return *three* result documents, and so on. As I mentioned before, I get my search results as an array, consisting of many arrays (= solr documents) with the fields I set in schema.xml. After changing some documents and delta-indexing them, I get lots of identical arrays (even the uniqueKey-field is absolutely identical).

I have read somewhere in the wiki, that an update is a delete of the old document plus a new document. I guess the problem could be that something fails with the delete- process, but I don´t have a clue why.

Any ideas?

Thanks in advance
Chris

Reply via email to