(10/11/17 20:18), Matti Oinas wrote:
Solr does not delete documents from index although delta-import says
it has deleted n documents from index. I'm using version 1.4.1.
The schema looks like
<fields>
<field name="uuid" type="string" indexed="true" stored="true"
required="true" />
<field name="type" type="int" indexed="true" stored="true"
required="true" />
<field name="blog_id" type="int" indexed="true" stored="true" />
<field name="entry_id" type="int" indexed="false" stored="true" />
<field name="content" type="textgen" indexed="true" stored="true" />
</fields>
<uniqueKey>uuid</uniqueKey>
Relevant fields from database tables:
TABLE: blogs and entries both have
Field: id
Type: int(11)
Null: NO
Key: PRI
Default: NULL
Extra: auto_increment
------------------------------------
Field: modified
Type: datetime
Null: YES
Key:
Default: NULL
Extra:
------------------------------------
Field: status
Type: tinyint(1) unsigned
Null: YES
Key:
Default: NULL
Extra:
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver".../>
<document>
<entity name="blog"
pk="id"
query="SELECT id,description,1 as type FROM blogs
WHERE status=2"
deltaImportQuery="SELECT id,description,1 as
type FROM blogs WHERE
status=2 AND id='${dataimporter.delta.id}'"
deltaQuery="SELECT id FROM blogs WHERE
'${dataimporter.last_index_time}'< modified AND status=2"
deletedPkQuery="SELECT id FROM blogs WHERE
'${dataimporter.last_index_time}'<= modified AND status=3"
transformer="TemplateTransformer">
<field column="uuid" name="uuid"
template="blog-${blog.id}" />
<field column="id" name="blog_id" />
<field column="description" name="content" />
<field column="type" name="type" />
</entity>
<entity name="entry"
pk="id"
query="SELECT f.id as id,f.content,f.blog_id,2
as type FROM
entries f,blogs b WHERE f.blog_id=b.id AND b.status=2"
deltaImportQuery="SELECT f.id as
id,f.content,f.blog_id,2 as type
FROM entries f,blogs b WHERE f.blog_id=b.id AND
f.id='${dataimporter.delta.id}'"
deltaQuery="SELECT f.id as id FROM entries f
JOIN blogs b ON
b.id=f.blog_id WHERE '${dataimporter.last_index_time}'< b.modified
AND b.status=2"
deletedPkQuery="SELECT f.id as id FROM entries
f JOIN blogs b ON
b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}'
< b.modified"
transformer="HTMLStripTransformer,TemplateTransformer">
<field column="uuid" name="uuid"
template="entry-${entry.id}" />
<field column="id" name="entry_id" />
<field column="blog_id" name="blog_id" />
<field column="content" name="content" stripHTML="true"
/>
<field column="type" name="type" />
</entity>
</document>
</dataConfig>
Full import and delta import works without problems when it comes to
adding new documents to the index but when blog is deleted (status is
set to 3 in database), solr report after delta import is something
like "Indexing completed. Added/Updated: 0 documents. Deleted 81
documents.". The problem is that documents are still found from solr
index.
1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26;
2. delta-import =>
<str name="">
Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.
</str>
<str name="Committed">2010-11-17 13:00:50</str>
<str name="Optimized">2010-11-17 13:00:50</str>
So solr says it has deleted documents and that index is also optimzed
and committed after the operation.
3. Search; blog_id:26 still returns 1 document with type 1 (blog) and
80 documents with type 2 (entry).
Hi Matti,
Can you see something like the following "Completed DeletedRowKey for Entity"
and then "Deleting document: ID-1" in your solr log?
(sample messages from my Solr log)
Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: product rows obtained : 2
:
Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder deleteAll
INFO: Deleting stale documents
Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: OVEN-2
:
If you cannot find these messages, I think there is something incorrect
setting (but I couldn't find incorrect ones in your data-config.xml...).
Koji
--
http://www.rondhuit.com/en/