jason rutherglen wrote:
Ah ok, think I found it: org.apache.nutch.indexer.FsDirectory no?
Couldn't this be used in Solr and distribute all the data rather than
master/slave it?
It's possible to search a Lucene index that lives in Hadoop's DFS, but
not recommended. It's very slow. It's much faster to copy the index to
a local drive.
The rsync approach, of only transmitting index diffs, is a very
efficient way to distribute an index. In particular, it supports
scaling the number of *readers* well.
For read/write stuff (e.g. a calendar) such scaling might not be
paramount. Rather, you might be happy to route all requests for a
particular calendar to a particular server. The index/database could
still be somehow replicated/synced, in case that server dies, but a
single server can probably handle all requests for a particular
index/database. And keeping things coherent is much simpler in this case.
Doug