Moser:
You may not need to resort to workarounds. There are two solutions one
using delta-import and one using full-import
solution:1 using delta-import
If you wish that DIH manage your deletes there is a deletedPkQuery
also ,. The config may look like,
<entity name="posts" query="SELECT p.forumid, p.messageid,p.message
FROM posts p, forums f WHERE f.forumid = p.forumid"
deletedPkQuery ="SELECT p.messageid from posts p, forums f WHERE
f.forumid = p.forumid and p.deleted= true OR f.deleted=true"/>
* am assuming that p.messageid is the pk
The query is run in the beginning and the pk's returned will be used
to delete documents
solution:2 using full-import The config may look like, This will do a
clean full import everytime
<entity name="posts" query="SELECT p.forumid, p.messageid, IF
(p.deleted OR f.deleted,true,false) as deleted, p.message FROM posts
p, forums f WHERE f.forumid = p.forumid"/>
This adds the flag 'deleted' to a document
If you wish to do incremental indexing then run the command
full-import with clean=false , It ensures that the index is not
cleaned prior to indexing.
<entity name="posts" query="SELECT p.forumid, p.messageid,p.message
FROM posts p, forums f WHERE f.forumid = p.forumid and
p.last_modified> ${dataimporter.last_index_time}"
deletedPkQuery ="SELECT p.messageid from posts p, forums f WHERE
f.forumid = p.forumid and p.deleted= true OR f.deleted=true"/>
I am assuming that you are maintaining a timestamp for last_modified
in the posts .
note: The full-import may not be as expensive as you think. We do a
full import of 3 million docs in 20 mins .
--Noble
On Tue, May 13, 2008 at 5:36 AM, Chris Moser (JIRA) <[EMAIL PROTECTED]> wrote:
>
>
> [
> https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596237#action_12596237
> ]
>
> Chris Moser commented on SOLR-469:
> ----------------------------------
>
> Hi Shalin,
>
> I'm indexing forums with Solr and have tables with a structure similar to
> this:
>
> {code}
> posts
> ------
> forumid int
> messageid int
> deleted boolean
> message text
>
> forums
> ------
> forumid int
> name text
> deleted boolean
>
> {code}
>
> The simplified data query I'm running goes like this:
>
> {code}
> SELECT
> p.forumid,
> p.messageid,
> IF (p.deleted OR f.deleted,true,false) as deleted,
> p.message
>
> FROM
> posts p, forums f
>
> WHERE
> f.forumid = p.forumid
> {code}
>
> The query checks to see if the post or the forum is deleted, and marks it in
> the index as deleted in either case (which is why I'm doing the join). The
> problem I'm running into is that the importer is running the WHERE clause
> like this:
>
> {code}
> WHERE
> f.forumid = p.forumid and forumid=123 and messageid=123456789
> {code}
>
> In this case, the _forumid=123_ part is ambiguous (forumid being in the
> posts and the forums table) so this causes a SQL error. So I added an
> additional attribute to the entity defintion (pkTable) which prepends the
> _forumid=123_ with the pkTable value so it generates _pkTable.forumid=123_.
>
> Not sure if this is the best way to do it but it fixed the problem :)
>
> > Data Import RequestHandler
> > --------------------------
> >
> > Key: SOLR-469
> > URL: https://issues.apache.org/jira/browse/SOLR-469
> > Project: Solr
> > Issue Type: New Feature
> > Components: update
> > Affects Versions: 1.3
> > Reporter: Noble Paul
> > Assignee: Grant Ingersoll
> > Fix For: 1.3
> >
> > Attachments: SOLR-469-contrib.patch, SOLR-469.patch,
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch,
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch
> >
> >
> > We need a RequestHandler Which can import data from a DB or other
> dataSources into the Solr index .Think of it as an advanced form of SqlUpload
> Plugin (SOLR-103).
> > The way it works is as follows.
> > * Provide a configuration file (xml) to the Handler which takes in the
> necessary SQL queries and mappings to a solr schema
> > - It also takes in a properties file for the data source
> configuraution
> > * Given the configuration it can also generate the solr schema.xml
> > * It is registered as a RequestHandler which can take two commands
> do-full-import, do-delta-import
> > - do-full-import - dumps all the data from the Database into
> the index (based on the SQL query in configuration)
> > - do-delta-import - dumps all the data that has changed since
> last import. (We assume a modified-timestamp column in tables)
> > * It provides a admin page
> > - where we can schedule it to be run automatically at regular
> intervals
> > - It shows the status of the Handler (idle, full-import,
> delta-import)
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
--
--Noble Paul