I am guessing your Enterprise system deletes/updates tables in RDBMS,
and your SOLR indexes that data. Additionally to that, you have
front-end interacting with SOLR and with RDBMS. At front-end level, in
case of a search sent to SOLR returning primary keys for data, you may
check your
How long does it take to build the entire index? Can you just rebuild it
from scratch every night? That would be the simplest.
Best
Erick
On Thu, Sep 25, 2008 at 12:48 PM, sundar shankar
[EMAIL PROTECTED]wrote:
Hi,
We have an index of courses (about 4 million docs in prod) and we have
a
This will cause the result counts to be wrong and the deleted docs
will stay in the search index forever.
Some approaches for incremental update:
* full sweep garbage collection: fetch every ID in the Solr DB and
check whether that exists in the source DB, then delete the ones
that don't exist.
That should be flag it in a boolean column. --wunder
On 9/25/08 11:51 AM, Walter Underwood [EMAIL PROTECTED] wrote:
This will cause the result counts to be wrong and the deleted docs
will stay in the search index forever.
Some approaches for incremental update:
* full sweep garbage
Great Thanks.
Date: Thu, 25 Sep 2008 11:54:32 -0700
Subject: Re: Best practice advice needed!
From: [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
That should be flag it in a boolean column. --wunder
On 9/25/08 11:51 AM, Walter Underwood [EMAIL PROTECTED] wrote
Hi Faud,
Since I dont have too much of data (4 million) I dont have a master slave setup
yet. How big a change would that be?
Date: Thu, 25 Sep 2008 10:08:51 -0700
From: [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Subject: Re: Best practice advice needed!
I am guessing your
About web spiders: I simply use last modified timestamp field in
SOLR, and I expire items after 30 days. If item was updated (timestamp
changed) - it won't be deleted. If I delete it from database - it will
be deleted from SOLR within 30 days. Spiders don't need
'transactional' updates.