jason rutherglen wrote:
Ah ok, think I found it: org.apache.nutch.indexer.FsDirectory no?

Couldn't this be used in Solr and distribute all the data rather than 
master/slave it?

It's possible to search a Lucene index that lives in Hadoop's DFS, but not recommended. It's very slow. It's much faster to copy the index to a local drive.

The rsync approach, of only transmitting index diffs, is a very efficient way to distribute an index. In particular, it supports scaling the number of *readers* well.

For read/write stuff (e.g. a calendar) such scaling might not be paramount. Rather, you might be happy to route all requests for a particular calendar to a particular server. The index/database could still be somehow replicated/synced, in case that server dies, but a single server can probably handle all requests for a particular index/database. And keeping things coherent is much simpler in this case.

Doug

Reply via email to