Solr does not delete documents from index although delta-import says it has deleted n documents from index. I'm using version 1.4.1.
The schema looks like <fields> <field name="uuid" type="string" indexed="true" stored="true" required="true" /> <field name="type" type="int" indexed="true" stored="true" required="true" /> <field name="blog_id" type="int" indexed="true" stored="true" /> <field name="entry_id" type="int" indexed="false" stored="true" /> <field name="content" type="textgen" indexed="true" stored="true" /> </fields> <uniqueKey>uuid</uniqueKey> Relevant fields from database tables: TABLE: blogs and entries both have Field: id Type: int(11) Null: NO Key: PRI Default: NULL Extra: auto_increment ------------------------------------ Field: modified Type: datetime Null: YES Key: Default: NULL Extra: ------------------------------------ Field: status Type: tinyint(1) unsigned Null: YES Key: Default: NULL Extra: <?xml version="1.0" encoding="UTF-8" ?> <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver".../> <document> <entity name="blog" pk="id" query="SELECT id,description,1 as type FROM blogs WHERE status=2" deltaImportQuery="SELECT id,description,1 as type FROM blogs WHERE status=2 AND id='${dataimporter.delta.id}'" deltaQuery="SELECT id FROM blogs WHERE '${dataimporter.last_index_time}' < modified AND status=2" deletedPkQuery="SELECT id FROM blogs WHERE '${dataimporter.last_index_time}' <= modified AND status=3" transformer="TemplateTransformer"> <field column="uuid" name="uuid" template="blog-${blog.id}" /> <field column="id" name="blog_id" /> <field column="description" name="content" /> <field column="type" name="type" /> </entity> <entity name="entry" pk="id" query="SELECT f.id as id,f.content,f.blog_id,2 as type FROM entries f,blogs b WHERE f.blog_id=b.id AND b.status=2" deltaImportQuery="SELECT f.id as id,f.content,f.blog_id,2 as type FROM entries f,blogs b WHERE f.blog_id=b.id AND f.id='${dataimporter.delta.id}'" deltaQuery="SELECT f.id as id FROM entries f JOIN blogs b ON b.id=f.blog_id WHERE '${dataimporter.last_index_time}' < b.modified AND b.status=2" deletedPkQuery="SELECT f.id as id FROM entries f JOIN blogs b ON b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}' < b.modified" transformer="HTMLStripTransformer,TemplateTransformer"> <field column="uuid" name="uuid" template="entry-${entry.id}" /> <field column="id" name="entry_id" /> <field column="blog_id" name="blog_id" /> <field column="content" name="content" stripHTML="true" /> <field column="type" name="type" /> </entity> </document> </dataConfig> Full import and delta import works without problems when it comes to adding new documents to the index but when blog is deleted (status is set to 3 in database), solr report after delta import is something like "Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.". The problem is that documents are still found from solr index. 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26; 2. delta-import => <str name=""> Indexing completed. Added/Updated: 0 documents. Deleted 81 documents. </str> <str name="Committed">2010-11-17 13:00:50</str> <str name="Optimized">2010-11-17 13:00:50</str> So solr says it has deleted documents and that index is also optimzed and committed after the operation. 3. Search; blog_id:26 still returns 1 document with type 1 (blog) and 80 documents with type 2 (entry).