(10/11/17 20:18), Matti Oinas wrote:
Solr does not delete documents from index although delta-import says
it has deleted n documents from index. I'm using version 1.4.1.

The schema looks like

  <fields>
     <field name="uuid" type="string" indexed="true" stored="true"
required="true" />
     <field name="type" type="int" indexed="true" stored="true"
required="true" />
     <field name="blog_id" type="int" indexed="true" stored="true" />
     <field name="entry_id" type="int" indexed="false" stored="true" />
     <field name="content" type="textgen" indexed="true" stored="true" />
  </fields>
  <uniqueKey>uuid</uniqueKey>


Relevant fields from database tables:

TABLE: blogs and entries both have

   Field: id
    Type: int(11)
    Null: NO
     Key: PRI
Default: NULL
   Extra: auto_increment
------------------------------------
   Field: modified
    Type: datetime
    Null: YES
     Key:
Default: NULL
   Extra:
------------------------------------
   Field: status
    Type: tinyint(1) unsigned
    Null: YES
     Key:
Default: NULL
   Extra:


<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
        <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver".../>
        <document>
                <entity name="blog"
                                pk="id"
                                query="SELECT id,description,1 as type FROM blogs 
WHERE status=2"
                                deltaImportQuery="SELECT id,description,1 as 
type FROM blogs WHERE
status=2 AND id='${dataimporter.delta.id}'"
                                deltaQuery="SELECT id FROM blogs WHERE
'${dataimporter.last_index_time}'&lt; modified AND status=2"
                                deletedPkQuery="SELECT id FROM blogs WHERE
'${dataimporter.last_index_time}'&lt;= modified AND status=3"
                                transformer="TemplateTransformer">
                        <field column="uuid" name="uuid" 
template="blog-${blog.id}" />
                        <field column="id" name="blog_id" />
                        <field column="description" name="content" />
                        <field column="type" name="type" />
                </entity>
                <entity name="entry"
                                pk="id"
                                query="SELECT f.id as id,f.content,f.blog_id,2 
as type FROM
entries f,blogs b WHERE f.blog_id=b.id AND b.status=2"
                                deltaImportQuery="SELECT f.id as 
id,f.content,f.blog_id,2 as type
FROM entries f,blogs b WHERE f.blog_id=b.id AND
f.id='${dataimporter.delta.id}'"
                                deltaQuery="SELECT f.id as id FROM entries f 
JOIN blogs b ON
b.id=f.blog_id WHERE '${dataimporter.last_index_time}'&lt; b.modified
AND b.status=2"
                                deletedPkQuery="SELECT f.id as id FROM entries 
f JOIN blogs b ON
b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}'
&lt; b.modified"
                                
transformer="HTMLStripTransformer,TemplateTransformer">
                        <field column="uuid" name="uuid" 
template="entry-${entry.id}" />
                        <field column="id" name="entry_id" />
                        <field column="blog_id" name="blog_id" />
                        <field column="content" name="content" stripHTML="true" 
/>
                        <field column="type" name="type" />
                </entity>
        </document>
</dataConfig>

Full import and delta import works without problems when it comes to
adding new documents to the index but when blog is deleted (status is
set to 3 in database), solr report after delta import is something
like "Indexing completed. Added/Updated: 0 documents. Deleted 81
documents.". The problem is that documents are still found from solr
index.

1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26;

2. delta-import =>

<str name="">
Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.
</str>
<str name="Committed">2010-11-17 13:00:50</str>
<str name="Optimized">2010-11-17 13:00:50</str>

So solr says it has deleted documents and that index is also optimzed
and committed after the operation.

3. Search; blog_id:26 still returns 1 document with type 1 (blog) and
80 documents with type 2 (entry).


Hi Matti,

Can you see something like the following "Completed DeletedRowKey for Entity"
and then "Deleting document: ID-1" in your solr log?

(sample messages from my Solr log)
Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder 
collectDelta
INFO: Completed DeletedRowKey for Entity: product rows obtained : 2
  :
Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder deleteAll
INFO: Deleting stale documents
Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: OVEN-2
  :

If you cannot find these messages, I think there is something incorrect
setting (but I couldn't find incorrect ones in your data-config.xml...).

Koji
--
http://www.rondhuit.com/en/

Reply via email to