Great.  I wasn't aware of the other issue.  I put a link on the 2 issues in 
JIRA so people can know in the future.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Alexandre Rocco [mailto:alel...@gmail.com] 
Sent: Wednesday, May 25, 2011 2:34 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH import and postImportDeleteQuery

Hi James,

Thanks for the heads up!
I am currently on version 1.4.1, so I can apply this patch and see if it
works.
Just need to assess if it's best to apply the patch or to check on the
backend system to see if only delete requests were generated and then do not
call DIH.

Previously, I found another open issue, created from Ephraim:
https://issues.apache.org/jira/browse/SOLR-2104

It's the same issue, but it hasn't had any updates yet.

Regards,
Alexandre

On Wed, May 25, 2011 at 3:17 PM, Dyer, James <james.d...@ingrambook.com>wrote:

> The "failure to commit" bug with $deleteDocById can be fixed by applying
> patch SOLR-2492.  This patch also partially fixes the "no updated stats" bug
> in that it increments 1 for every call to $deleteDocById and
> $deleteDocByQuery.  Note that this might result in inaccurate counts if the
> id given with $deleteDocById doesn't exist or is duplicated.  Obviously this
> is not a complete fix for stats using $deleteDocByQuery as this command
> would normally be used to delete >1 doc at a time.
>
> The patch is for Trunk but it might work with 3.1 also.  If not, it likely
> only needs minor tweaking.
>
> The jira ticket is here:  https://issues.apache.org/jira/browse/SOLR-2492
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Alexandre Rocco [mailto:alel...@gmail.com]
> Sent: Wednesday, May 25, 2011 12:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DIH import and postImportDeleteQuery
>
> Hi Ephraim,
>
> Thank you so much for the input.
> I was able to find your thread on the archives and got your solution to
> work.
>
> In fact, when using $deleteDocById and $skipDoc it worked like a charm.
> This
> feature is very useful, it's a shame it's not properly documented.
> The only downside is the one you mentioned that the stats are not updated,
> so if I update 13 documents and delete 2, DIH would tell me that only 13
> documents were processed. This is bad in my case because I check the end
> result to generate an error e-mail if needed.
>
> You also mentioned that if the query contains only deletion records, a
> commit would not be automatically executed and it would be necessary to
> commit manually.
>
> How can I commit manually via DIH? I was not able to find any references on
> the documentation.
>
> Thanks!
> Alexandre
>
> On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir <ephra...@icq.com> wrote:
>
> > Search the list for my post "DIH - deleting documents, high performance
> > (delta) imports, and passing parameters" which shows my solution a
> > similar problem.
> >
> > Ephraim Ofir
> >
> > -----Original Message-----
> > From: Alexandre Rocco [mailto:alel...@gmail.com]
> > Sent: Tuesday, May 24, 2011 11:24 PM
> > To: solr-user@lucene.apache.org
> > Subject: DIH import and postImportDeleteQuery
> >
> > Guys,
> >
> > I am facing a situation in one of our projects that I need to perform a
> > cleanup to remove some documents after we perform an update via DIH.
> > The big issue right now comes from the fact that when we call the DIH
> > with
> > clean=false, the postImportDeleteQuery is not executed.
> >
> > My setup is currently arranged like this:
> > - A SQL Server stored procedure that receives a parameter (specified in
> > the
> > URL) and returns the records to be indexed
> > - The procedure is able to return all the records (for a full-import) or
> > only the updated records (for a delta-import)
> > - This procedure returns valid and deleted records, from this point
> > comes
> > the need to run a postImportDeleteQuery to remove the deleted ones.
> >
> > Everything works fine when I run a full-import, I am running always with
> > clean=true, and then the whole index is rebuilt.
> > When I need to do an incremental update, the records are updated
> > correctly,
> > but the command to delete the other records is not executed.
> >
> > I've tried several combinations, with different results:
> > - Running full-import with clean=false: the records are updated but the
> > ones
> > that needs to be deleted stays on the index
> > - Running delta-import with clean=false: the records are updated but the
> > ones that needs to be deleted stays on the index
> > - Running delta-import with clean=true: all records are deleted from the
> > index and then only the records returned by the procedure are on the
> > index,
> > except the deleted ones.
> >
> > I don't see any way to achieve my goal, without changing the process
> > that I
> > do to obtain the data.
> > Since this is a very complex stored procedure, with tons of joins and
> > custom
> > processing, I am trying everything to avoid messing with it.
> >
> > See below a copy of my data-config.xml file. I made it simpler omitting
> > all
> > the fields, since it's out of scope of the issue:
> > <?xml version="1.0" encoding="UTF-8" ?>
> > <dataConfig>
> > <dataSource type="JdbcDataSource"
> > driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> > url="jdbc:sqlserver://myserver;databaseName=mydb;user=username;password=
> > password;responseBuffering=adaptive;"
> >
> > />
> > <document>
> > <entity name="entity_one"
> > pk="entityid"
> > transformer="RegexTransformer"
> > query="EXEC some_stored_procedure ${dataimporter.request.someid}"
> > preImportDeleteQuery="status:1" postImportDeleteQuery="status:1"
> > >
> > <field column="field1" name="field1" splitBy=";" />
> > <field column="field2" name="field2" splitBy=";" />
> > <field column="field3" name="field3" splitBy=";" />
> > </entity>
> >
> > <entity name="entity_two"
> > pk="entityid"
> > transformer="RegexTransformer"
> > query="EXEC someother_stored_procedure
> > ${dataimporter.request.someotherid}"
> > preImportDeleteQuery="status:1" postImportDeleteQuery="status:1"
> > >
> > <field column="field1" name="field1" />
> > <field column="field2" name="field2" />
> > <field column="field3" name="field2" />
> > </entity>
> > </document>
> > </dataConfig>
> >
> > Any ideas or pointers that might help on this one?
> >
> > Many thanks,
> > Alexandre
> >
>

Reply via email to