If the codebase for Spark's broadcast is pretty self-contained, you could
consider creating a small bootstrap sent out via the doubling rsync
strategy that Mosharaf outlined above (called "Tree D=2" in the paper) that
then pulled the larger

Mosharaf, do you have a sense of whether the gains from using Cornet vs
Tree D=2 with rsync outweighs the overhead of using a 2-phase broadcast
mechanism?

Andrew


On Sun, May 18, 2014 at 11:32 PM, Aaron Davidson <ilike...@gmail.com> wrote:

> One issue with using Spark itself is that this rsync is required to get
> Spark to work...
>
> Also note that a similar strategy is used for *updating* the spark
> cluster on ec2, where the "diff" aspect is much more important, as you
> might only make a small change on the driver node (recompile or
> reconfigure) and can get a fast sync.
>
>
> On Sun, May 18, 2014 at 11:22 PM, Mosharaf Chowdhury <
> mosharafka...@gmail.com> wrote:
>
>> What twitter calls murder, unless it has changed since then, is just a
>> BitTornado wrapper. In 2011, We did some comparison on the performance of
>> murder and the TorrentBroadcast we have right now for Spark's own broadcast
>> (Section 7.1 in
>> http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf).
>> Spark's implementation was 4.5X faster than murder.
>>
>> The only issue with using TorrentBroadcast to deploy code/VM is writing a
>> wrapper around it to read from disk, but it shouldn't be too complicated.
>> If someone picks it up, I can give some pointers on how to proceed (I've
>> thought about doing it myself forever, but never ended up actually taking
>> the time; right now I don't have enough free cycles either)
>>
>> Otherwise, murder/BitTornado would be better than the current strategy we
>> have.
>>
>> A third option would be to use rsync; but instead of rsync-ing to every
>> slave from the master, one can simply rsync from the master first to one
>> slave; then use the two sources (master and the first slave) to rsync to
>> two more; then four and so on. Might be a simpler solution without many
>> changes.
>>
>> --
>> Mosharaf Chowdhury
>> http://www.mosharaf.com/
>>
>>
>> On Sun, May 18, 2014 at 11:07 PM, Andrew Ash <and...@andrewash.com>wrote:
>>
>>> My first thought would be to use libtorrent for this setup, and it turns
>>> out that both Twitter and Facebook do code deploys with a bittorrent setup.
>>>  Twitter even released their code as open source:
>>>
>>>
>>> https://blog.twitter.com/2010/murder-fast-datacenter-code-deploys-using-bittorrent
>>>
>>>
>>> http://arstechnica.com/business/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/
>>>
>>>
>>> On Sun, May 18, 2014 at 10:44 PM, Daniel Mahler <dmah...@gmail.com>wrote:
>>>
>>>> I am not an expert in this space either. I thought the initial rsync
>>>> during launch is really just a straight copy that did not need the tree
>>>> diff. So it seemed like having the slaves do the copying among it each
>>>> other would be better than having the master copy to everyone directly.
>>>> That made me think of bittorrent, though there may well be other systems
>>>> that do this.
>>>> From the launches I did today it seems that it is taking around 1
>>>> minute per slave to launch a cluster, which can be a problem for clusters
>>>> with 10s or 100s of slaves, particularly since on ec2  that time has to be
>>>> paid for.
>>>>
>>>>
>>>> On Sun, May 18, 2014 at 11:54 PM, Aaron Davidson <ilike...@gmail.com>wrote:
>>>>
>>>>> Out of curiosity, do you have a library in mind that would make it
>>>>> easy to setup a bit torrent network and distribute files in an rsync 
>>>>> (i.e.,
>>>>> apply a diff to a tree, ideally) fashion? I'm not familiar with this 
>>>>> space,
>>>>> but we do want to minimize the complexity of our standard ec2 launch
>>>>> scripts to reduce the chance of something breaking.
>>>>>
>>>>>
>>>>> On Sun, May 18, 2014 at 9:22 PM, Daniel Mahler <dmah...@gmail.com>wrote:
>>>>>
>>>>>> I am launching a rather large cluster on ec2.
>>>>>> It seems like the launch is taking forever on
>>>>>> ....
>>>>>> Setting up spark
>>>>>> RSYNC'ing /root/spark to slaves...
>>>>>> ...
>>>>>>
>>>>>> It seems that bittorrent might be a faster way to replicate
>>>>>> the sizeable spark directory to the slaves
>>>>>> particularly if there is a lot of not very powerful slaves.
>>>>>>
>>>>>> Just a thought ...
>>>>>>
>>>>>> cheers
>>>>>> Daniel
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to