What twitter calls murder, unless it has changed since then, is just a BitTornado wrapper. In 2011, We did some comparison on the performance of murder and the TorrentBroadcast we have right now for Spark's own broadcast (Section 7.1 in http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf). Spark's implementation was 4.5X faster than murder.
The only issue with using TorrentBroadcast to deploy code/VM is writing a wrapper around it to read from disk, but it shouldn't be too complicated. If someone picks it up, I can give some pointers on how to proceed (I've thought about doing it myself forever, but never ended up actually taking the time; right now I don't have enough free cycles either) Otherwise, murder/BitTornado would be better than the current strategy we have. A third option would be to use rsync; but instead of rsync-ing to every slave from the master, one can simply rsync from the master first to one slave; then use the two sources (master and the first slave) to rsync to two more; then four and so on. Might be a simpler solution without many changes. -- Mosharaf Chowdhury http://www.mosharaf.com/ On Sun, May 18, 2014 at 11:07 PM, Andrew Ash <and...@andrewash.com> wrote: > My first thought would be to use libtorrent for this setup, and it turns > out that both Twitter and Facebook do code deploys with a bittorrent setup. > Twitter even released their code as open source: > > > https://blog.twitter.com/2010/murder-fast-datacenter-code-deploys-using-bittorrent > > > http://arstechnica.com/business/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/ > > > On Sun, May 18, 2014 at 10:44 PM, Daniel Mahler <dmah...@gmail.com> wrote: > >> I am not an expert in this space either. I thought the initial rsync >> during launch is really just a straight copy that did not need the tree >> diff. So it seemed like having the slaves do the copying among it each >> other would be better than having the master copy to everyone directly. >> That made me think of bittorrent, though there may well be other systems >> that do this. >> From the launches I did today it seems that it is taking around 1 minute >> per slave to launch a cluster, which can be a problem for clusters with 10s >> or 100s of slaves, particularly since on ec2 that time has to be paid for. >> >> >> On Sun, May 18, 2014 at 11:54 PM, Aaron Davidson <ilike...@gmail.com>wrote: >> >>> Out of curiosity, do you have a library in mind that would make it easy >>> to setup a bit torrent network and distribute files in an rsync (i.e., >>> apply a diff to a tree, ideally) fashion? I'm not familiar with this space, >>> but we do want to minimize the complexity of our standard ec2 launch >>> scripts to reduce the chance of something breaking. >>> >>> >>> On Sun, May 18, 2014 at 9:22 PM, Daniel Mahler <dmah...@gmail.com>wrote: >>> >>>> I am launching a rather large cluster on ec2. >>>> It seems like the launch is taking forever on >>>> .... >>>> Setting up spark >>>> RSYNC'ing /root/spark to slaves... >>>> ... >>>> >>>> It seems that bittorrent might be a faster way to replicate >>>> the sizeable spark directory to the slaves >>>> particularly if there is a lot of not very powerful slaves. >>>> >>>> Just a thought ... >>>> >>>> cheers >>>> Daniel >>>> >>>> >>> >> >