About web spiders: I simply use "last modified timestamp" field in
SOLR, and I expire items after 30 days. If item was updated (timestamp
changed) - it won't be deleted. If I delete it from database - it will
be deleted from SOLR within 30 days. Spiders don't need
'transactional' updates.
Hi Faud,
Since I dont have too much of data (4 million) I dont have a master slave setup
yet. How big a change would that be?
> Date: Thu, 25 Sep 2008 10:08:51 -0700
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Re: Best practice advice needed!
>
>
Great Thanks.
> Date: Thu, 25 Sep 2008 11:54:32 -0700
> Subject: Re: Best practice advice needed!
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
>
> That should be "flag it in a boolean column". --wunder
>
>
> On 9/25/08 11:51 AM, "W
That should be "flag it in a boolean column". --wunder
On 9/25/08 11:51 AM, "Walter Underwood" <[EMAIL PROTECTED]> wrote:
> This will cause the result counts to be wrong and the "deleted" docs
> will stay in the search index forever.
>
> Some approaches for incremental update:
>
> * full sweep
This will cause the result counts to be wrong and the "deleted" docs
will stay in the search index forever.
Some approaches for incremental update:
* full sweep garbage collection: fetch every ID in the Solr DB and
check whether that exists in the source DB, then delete the ones
that don't exist.
How long does it take to build the entire index? Can you just rebuild it
from scratch every night? That would be the simplest.
Best
Erick
On Thu, Sep 25, 2008 at 12:48 PM, sundar shankar
<[EMAIL PROTECTED]>wrote:
> Hi,
> We have an index of courses (about 4 million docs in prod) and we have
I am guessing your Enterprise system deletes/updates tables in RDBMS,
and your SOLR indexes that data. Additionally to that, you have
front-end interacting with SOLR and with RDBMS. At front-end level, in
case of a search sent to SOLR returning primary keys for data, you may
check your data
Hi,
We have an index of courses (about 4 million docs in prod) and we have a
nightly that would pick up newly added courses and update the index
accordingly. There is another Enterprise system that shares the same table and
that could delete data from the table too.
I just want to know w