Re: Spark EC2 script on Large clusters

Christian Thu, 05 Nov 2015 10:52:27 -0800

Let me rephrase. Emr cost is about twice as much as the spot price, making
it almost 2/3 of the overall cost.
On Thu, Nov 5, 2015 at 11:50 AM Christian <engr...@gmail.com> wrote:


> Hi Johnathan,
>
> We are using EMR now and it's costing way too much. We do spot pricing and
> the emr addon cost is about 2/3 the price of the actual spot instance.
> On Thu, Nov 5, 2015 at 11:31 AM Jonathan Kelly <jonathaka...@gmail.com>
> wrote:
>
>> Christian,
>>
>> Is there anything preventing you from using EMR, which will manage your
>> cluster for you? Creating large clusters would take mins on EMR instead of
>> hours. Also, EMR supports growing your cluster easily and recently added
>> support for shrinking your cluster gracefully (even while jobs are running).
>>
>> ~ Jonathan
>>
>> On Thu, Nov 5, 2015 at 9:48 AM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> Yeah, as Shivaram mentioned, this issue is well-known. It's documented
>>> in SPARK-5189 <https://issues.apache.org/jira/browse/SPARK-5189> and a
>>> bunch of related issues. Unfortunately, it's hard to resolve this issue in
>>> spark-ec2 without rewriting large parts of the project. But if you take a
>>> crack at it and succeed I'm sure a lot of people will be happy.
>>>
>>> I've started a separate project <https://github.com/nchammas/flintrock> --
>>> which Shivaram also mentioned -- which aims to solve the problem of
>>> long launch times and other issues
>>> <https://github.com/nchammas/flintrock#motivation> with spark-ec2. It's
>>> still very young and lacks several critical features, but we are making
>>> steady progress.
>>>
>>> Nick
>>>
>>> On Thu, Nov 5, 2015 at 12:30 PM Shivaram Venkataraman <
>>> shiva...@eecs.berkeley.edu> wrote:
>>>
>>>> It is a known limitation that spark-ec2 is very slow for large
>>>> clusters and as you mention most of this is due to the use of rsync to
>>>> transfer things from the master to all the slaves.
>>>>
>>>> Nick cc'd has been working on an alternative approach at
>>>> https://github.com/nchammas/flintrock that is more scalable.
>>>>
>>>> Thanks
>>>> Shivaram
>>>>
>>>> On Thu, Nov 5, 2015 at 8:12 AM, Christian <engr...@gmail.com> wrote:
>>>> > For starters, thanks for the awesome product!
>>>> >
>>>> > When creating ec2-clusters of 20-40 nodes, things work great. When we
>>>> create
>>>> > a cluster with the provided spark-ec2 script, it takes hours. When
>>>> creating
>>>> > a 200 node cluster, it takes 2 1/2 hours and for a 500 node cluster
>>>> it takes
>>>> > over 5 hours. One other problem we are having is that some nodes
>>>> don't come
>>>> > up when the other ones do, the process seems to just move on,
>>>> skipping the
>>>> > rsync and any installs on those ones.
>>>> >
>>>> > My guess as to why it takes so long to set up a large cluster is
>>>> because of
>>>> > the use of rsync. What if instead of using rsync, you synched to s3
>>>> and then
>>>> > did a pdsh to pull it down on all of the machines. This is a big deal
>>>> for us
>>>> > and if we can come up with a good plan, we might be able help out
>>>> with the
>>>> > required changes.
>>>> >
>>>> > Are there any suggestions on how to deal with some of the nodes not
>>>> being
>>>> > ready when the process starts?
>>>> >
>>>> > Thanks for your time,
>>>> > Christian
>>>> >
>>>>
>>>
>>

Re: Spark EC2 script on Large clusters

Reply via email to