I would be curious then how the Google architecture works given that it seems 
to combine search and database concepts together and the Adam Bosworth talk 
seems to imply a replication redundant architecture like Solr.  Is a faster 
method of loading or updating the IndexSearcher something that makes sense for 
Lucene?  Or just assume the Google architecture is a lot more complex.

----- Original Message ----
From: Yonik Seeley <[EMAIL PROTECTED]>
To: solr-dev@lucene.apache.org; jason rutherglen <[EMAIL PROTECTED]>
Sent: Tuesday, April 25, 2006 3:21:07 PM
Subject: Re: GData

On 4/25/06, jason rutherglen <[EMAIL PROTECTED]> wrote:
> Ok, if Google is using the GData architecture to store the GCalendar data, 
> assuming they are, how long do you think a write takes to show up on the 
> GCalendar web site?  I think in this case something other than rsync may be a 
> better option.

rsync is just used as a replication transport, and I don't think it's
the limiting factor.

Opening a new IndexSearcher in Lucene is a relatively expensive
operation, esp when you factor in populating the fieldCache and field
norms.  You shouldn't be doing it too often (once a minute maybe).

If updates need to be immediately visible in conjunction with a high
update rate, a database is a better solution.

For Solr, I'd solve GData for the single-server case first, then go
about figuring out replication requirements.



> ----- Original Message ----
> From: Yonik Seeley <[EMAIL PROTECTED]>
> To: solr-dev@lucene.apache.org; jason rutherglen <[EMAIL PROTECTED]>
> Sent: Tuesday, April 25, 2006 12:42:58 PM
> Subject: Re: GData
>
> On 4/25/06, jason rutherglen <[EMAIL PROTECTED]> wrote:
> > Here is a good blog entry with a talk on GData from someone who worked on 
> > it.  The only thing I think Solr needs is faster replication, which perhaps 
> > can be done faster using a direct replication model, preferably over HTTP 
> > of the segments files instead of rsync?
>
> rsync should be very fast if you configure it to not checksum the
> files, and just go by timestamp and size.  It will only transfer the
> changed segments.  We get very good performance with this model.
>
> >  Reserving rsync for the optimized index sync.  The only other thing GData 
> > does is
> > versioning of the documents.
>
> Hmmm, that might require some thought...  I guess it depends on what
> GData allows you to do with the different versions.
>
> -Yonik
>
>
>
>
>


--
-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server



Reply via email to