Hi Ian,

I assume that a sizeable amount of people do replication after an optimize
which causes almost the whole index to be transferred by rsync. We can do a
checksum based modification check on individual segment files and pull only
those from the master. Although that's not a true diff copy, but it wouldn't
be much load on the gigabit production networks.

We are not doing away with the current replication strategy. It's just that
we're proposing an alternative.

On Tue, Apr 29, 2008 at 3:56 PM, Ian Holsman <[EMAIL PROTECTED]> wrote:

> The current scripts use rsync to minimize the amount of data actually
> being copied.
>
> I've had a brief look and found only 1 implementation which is GPL and
> abandoned
> http://sourceforge.net/projects/jarsync.
>
> Personally I still think the size of the transfer is important (as for
> most use cases not much is actually changed every hour).. but thats just
> me.. your case may be different than mine.
>
> regards
> Ian
>
>
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
> > hi ,
> > The current replication strategy in solr involves shell scripts . The
> > following are the drawbacks
> > *  It does not work with windows
> > * Replication works as a separate piece not integrated with solr.
> > * Cannot control replication from solr admin/JMX
> > * Each operation requires manual telnet to the host
> >
> > Doing the replication within java code has the following advantages
> > * Platform independence
> > * Manual steps can be completely eliminated. Everything can be driven
> > from solrconfig.xml .
> > ** Just put in the url of the master in the slaves that should be good
> > enough to enable replication. Other things like frequency of
> > snapshoot/snappull can also be configured
> > * Start/stop can be triggered from solr/admin or JMX
> > * Can get the status/progress while replication is going on
> > * No need to have a login into the machine
> >
> > The implementation can be done as two components
> > * A SolrEventListener which does a snapshoot . Same as done by the
> > script
> > * A ReplicationHandler which can act as a server to dish out the index
> > snapshots (in the master)
> > ** In the slave the same handler can poll at regular intervals and if
> > there is a new snapshot fetch the index over http (it can use
> > solrj+BinaryReponseWriter)
> > * The same Handler can do a snap install
> > * The Handler may expose all the operations over a REST interface or JMX
> > * It may also show the current state of the master index through the
> > console
> >
> > What do you think?
> >
> >
> >
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to