Re: Best practice advice needed!

2008-09-25 Thread Fuad Efendi
About web spiders: I simply use "last modified timestamp" field in SOLR, and I expire items after 30 days. If item was updated (timestamp changed) - it won't be deleted. If I delete it from database - it will be deleted from SOLR within 30 days. Spiders don't need 'transactional' updates.

RE: Best practice advice needed!

2008-09-25 Thread sundar shankar
Hi Faud, Since I dont have too much of data (4 million) I dont have a master slave setup yet. How big a change would that be? > Date: Thu, 25 Sep 2008 10:08:51 -0700 > From: [EMAIL PROTECTED] > To: solr-user@lucene.apache.org > Subject: Re: Best practice advice needed! > >

RE: Best practice advice needed!

2008-09-25 Thread sundar shankar
Great Thanks. > Date: Thu, 25 Sep 2008 11:54:32 -0700 > Subject: Re: Best practice advice needed! > From: [EMAIL PROTECTED] > To: solr-user@lucene.apache.org > > That should be "flag it in a boolean column". --wunder > > > On 9/25/08 11:51 AM, "W

Re: Best practice advice needed!

2008-09-25 Thread Walter Underwood
That should be "flag it in a boolean column". --wunder On 9/25/08 11:51 AM, "Walter Underwood" <[EMAIL PROTECTED]> wrote: > This will cause the result counts to be wrong and the "deleted" docs > will stay in the search index forever. > > Some approaches for incremental update: > > * full sweep

Re: Best practice advice needed!

2008-09-25 Thread Walter Underwood
This will cause the result counts to be wrong and the "deleted" docs will stay in the search index forever. Some approaches for incremental update: * full sweep garbage collection: fetch every ID in the Solr DB and check whether that exists in the source DB, then delete the ones that don't exist.

Re: Best practice advice needed!

2008-09-25 Thread Erick Erickson
How long does it take to build the entire index? Can you just rebuild it from scratch every night? That would be the simplest. Best Erick On Thu, Sep 25, 2008 at 12:48 PM, sundar shankar <[EMAIL PROTECTED]>wrote: > Hi, > We have an index of courses (about 4 million docs in prod) and we have

Re: Best practice advice needed!

2008-09-25 Thread Fuad Efendi
I am guessing your Enterprise system deletes/updates tables in RDBMS, and your SOLR indexes that data. Additionally to that, you have front-end interacting with SOLR and with RDBMS. At front-end level, in case of a search sent to SOLR returning primary keys for data, you may check your data

Best practice advice needed!

2008-09-25 Thread sundar shankar
Hi, We have an index of courses (about 4 million docs in prod) and we have a nightly that would pick up newly added courses and update the index accordingly. There is another Enterprise system that shares the same table and that could delete data from the table too. I just want to know w