I would be curious then how the Google architecture works given that it seems to combine search and database concepts together and the Adam Bosworth talk seems to imply a replication redundant architecture like Solr. Is a faster method of loading or updating the IndexSearcher something that makes sense for Lucene? Or just assume the Google architecture is a lot more complex.
----- Original Message ---- From: Yonik Seeley <[EMAIL PROTECTED]> To: solr-dev@lucene.apache.org; jason rutherglen <[EMAIL PROTECTED]> Sent: Tuesday, April 25, 2006 3:21:07 PM Subject: Re: GData On 4/25/06, jason rutherglen <[EMAIL PROTECTED]> wrote: > Ok, if Google is using the GData architecture to store the GCalendar data, > assuming they are, how long do you think a write takes to show up on the > GCalendar web site? I think in this case something other than rsync may be a > better option. rsync is just used as a replication transport, and I don't think it's the limiting factor. Opening a new IndexSearcher in Lucene is a relatively expensive operation, esp when you factor in populating the fieldCache and field norms. You shouldn't be doing it too often (once a minute maybe). If updates need to be immediately visible in conjunction with a high update rate, a database is a better solution. For Solr, I'd solve GData for the single-server case first, then go about figuring out replication requirements. > ----- Original Message ---- > From: Yonik Seeley <[EMAIL PROTECTED]> > To: solr-dev@lucene.apache.org; jason rutherglen <[EMAIL PROTECTED]> > Sent: Tuesday, April 25, 2006 12:42:58 PM > Subject: Re: GData > > On 4/25/06, jason rutherglen <[EMAIL PROTECTED]> wrote: > > Here is a good blog entry with a talk on GData from someone who worked on > > it. The only thing I think Solr needs is faster replication, which perhaps > > can be done faster using a direct replication model, preferably over HTTP > > of the segments files instead of rsync? > > rsync should be very fast if you configure it to not checksum the > files, and just go by timestamp and size. It will only transfer the > changed segments. We get very good performance with this model. > > > Reserving rsync for the optimized index sync. The only other thing GData > > does is > > versioning of the documents. > > Hmmm, that might require some thought... I guess it depends on what > GData allows you to do with the different versions. > > -Yonik > > > > > -- -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server