Hi Paul!

Noble Paul നോബിള്‍ नोब्ळ् wrote:
The current replication strategy in solr involves shell scripts . The
following are the drawbacks
*  It does not work with windows

Solr's replication code uses hard links to avoid copying of the potentially big index files. Windows supports (better: the NTFS filesystem) hard links as well - but Microsoft in its wisdom has hidden this feature deep inside the API. Another problem of Windows is that many nice programs like rsync are not part of the distribution.

* Replication works as a separate piece not integrated with solr.

One can see this as an advantage as well.

* Cannot control replication from solr admin/JMX
* Each operation requires manual telnet to the host

Right. The distribution.jsp is the only piece where you can see something about replication. But adding code to call the shell scripts is not hard to do.

Doing the replication within java code has the following advantages
* Platform independence

True. But you also have to live with the limitations Java gives (because it IS multiplatform). For example Java does not support to set hard links on files - this is used by the current snapshotting code. There is an JSR around to implement that but even Java6 does not have support for that.

* Manual steps can be completely eliminated. Everything can be driven
from solrconfig.xml .

Could be done with the shell scripts as well. You only need code to call the shell scripts from within Solr (like in the "postOptimize"-listener).

** Just put in the url of the master in the slaves that should be good
enough to enable replication. Other things like frequency of
snapshoot/snappull can also be configured

* Start/stop can be triggered from solr/admin or JMX
* Can get the status/progress while replication is going on
* No need to have a login into the machine

See above. The triggering of the snapshooting/snappulling/snapinstalling scripts could be done trough Solr's admin UI as well. We only need the code for that.

Besides that the current solution uses rsync to avoid transfer of not changed files/parts of files. You need to implement that as well if you need a comparable index distribution system.

I would more go into a direction to have replication scripts especially for the Windows platform. Many other projects have something like that as well (Tomcat has .sh and .bat/.cmd files in the bin folder for example).

I like the current implementation but I agree that we need a better control of replication stuff from within the admin UI.

CU
Thomas

Reply via email to