Re: GData

Doug Cutting Tue, 25 Apr 2006 21:05:17 -0700

jason rutherglen wrote:

Ah ok, think I found it: org.apache.nutch.indexer.FsDirectory no?


Couldn't this be used in Solr and distribute all the data rather than 
master/slave it?

It's possible to search a Lucene index that lives in Hadoop's DFS, butnot recommended. It's very slow. It's much faster to copy the index toa local drive.

The rsync approach, of only transmitting index diffs, is a veryefficient way to distribute an index. In particular, it supportsscaling the number of *readers* well.

For read/write stuff (e.g. a calendar) such scaling might not beparamount. Rather, you might be happy to route all requests for aparticular calendar to a particular server. The index/database couldstill be somehow replicated/synced, in case that server dies, but asingle server can probably handle all requests for a particularindex/database. And keeping things coherent is much simpler in this case.


Doug

Re: GData

Reply via email to