btw is there a command or script to update the slaves from the master?
thanks
Daniel
On Mon, May 19, 2014 at 1:48 AM, Andrew Ash and...@andrewash.com wrote:
If the codebase for Spark's broadcast is pretty self-contained, you could
consider creating a small bootstrap sent out via the doubling
On Mon, May 19, 2014 at 2:04 AM, Daniel Mahler dmah...@gmail.com wrote:
I agree that for updating rsync is probably preferable, and it seems like
for that purpose it would also parallelize well since most of the time is
spent computing checksums so the process is not constrained by the total
On the ec2 machines, you can update the slaves from the master using
something like ~/spark-ec2/copy-dir ~/spark.
Spark's TorrentBroadcast relies on the Block Manager to distribute blocks,
making it relatively hard to extract.
On Mon, May 19, 2014 at 12:36 AM, Daniel Mahler dmah...@gmail.com
Good catch. In that case, using BitTornado/murder would be better.
--
Mosharaf Chowdhury
http://www.mosharaf.com/
On Mon, May 19, 2014 at 11:17 AM, Aaron Davidson ilike...@gmail.com wrote:
On the ec2 machines, you can update the slaves from the master using
something like
I am launching a rather large cluster on ec2.
It seems like the launch is taking forever on
Setting up spark
RSYNC'ing /root/spark to slaves...
...
It seems that bittorrent might be a faster way to replicate
the sizeable spark directory to the slaves
particularly if there is a lot of not
Out of curiosity, do you have a library in mind that would make it easy to
setup a bit torrent network and distribute files in an rsync (i.e., apply a
diff to a tree, ideally) fashion? I'm not familiar with this space, but we
do want to minimize the complexity of our standard ec2 launch scripts to
I am not an expert in this space either. I thought the initial rsync during
launch is really just a straight copy that did not need the tree diff. So
it seemed like having the slaves do the copying among it each other would
be better than having the master copy to everyone directly. That made me