Let me rephrase. Emr cost is about twice as much as the spot price, making it almost 2/3 of the overall cost. On Thu, Nov 5, 2015 at 11:50 AM Christian <engr...@gmail.com> wrote:
> Hi Johnathan, > > We are using EMR now and it's costing way too much. We do spot pricing and > the emr addon cost is about 2/3 the price of the actual spot instance. > On Thu, Nov 5, 2015 at 11:31 AM Jonathan Kelly <jonathaka...@gmail.com> > wrote: > >> Christian, >> >> Is there anything preventing you from using EMR, which will manage your >> cluster for you? Creating large clusters would take mins on EMR instead of >> hours. Also, EMR supports growing your cluster easily and recently added >> support for shrinking your cluster gracefully (even while jobs are running). >> >> ~ Jonathan >> >> On Thu, Nov 5, 2015 at 9:48 AM, Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >>> Yeah, as Shivaram mentioned, this issue is well-known. It's documented >>> in SPARK-5189 <https://issues.apache.org/jira/browse/SPARK-5189> and a >>> bunch of related issues. Unfortunately, it's hard to resolve this issue in >>> spark-ec2 without rewriting large parts of the project. But if you take a >>> crack at it and succeed I'm sure a lot of people will be happy. >>> >>> I've started a separate project <https://github.com/nchammas/flintrock> -- >>> which Shivaram also mentioned -- which aims to solve the problem of >>> long launch times and other issues >>> <https://github.com/nchammas/flintrock#motivation> with spark-ec2. It's >>> still very young and lacks several critical features, but we are making >>> steady progress. >>> >>> Nick >>> >>> On Thu, Nov 5, 2015 at 12:30 PM Shivaram Venkataraman < >>> shiva...@eecs.berkeley.edu> wrote: >>> >>>> It is a known limitation that spark-ec2 is very slow for large >>>> clusters and as you mention most of this is due to the use of rsync to >>>> transfer things from the master to all the slaves. >>>> >>>> Nick cc'd has been working on an alternative approach at >>>> https://github.com/nchammas/flintrock that is more scalable. >>>> >>>> Thanks >>>> Shivaram >>>> >>>> On Thu, Nov 5, 2015 at 8:12 AM, Christian <engr...@gmail.com> wrote: >>>> > For starters, thanks for the awesome product! >>>> > >>>> > When creating ec2-clusters of 20-40 nodes, things work great. When we >>>> create >>>> > a cluster with the provided spark-ec2 script, it takes hours. When >>>> creating >>>> > a 200 node cluster, it takes 2 1/2 hours and for a 500 node cluster >>>> it takes >>>> > over 5 hours. One other problem we are having is that some nodes >>>> don't come >>>> > up when the other ones do, the process seems to just move on, >>>> skipping the >>>> > rsync and any installs on those ones. >>>> > >>>> > My guess as to why it takes so long to set up a large cluster is >>>> because of >>>> > the use of rsync. What if instead of using rsync, you synched to s3 >>>> and then >>>> > did a pdsh to pull it down on all of the machines. This is a big deal >>>> for us >>>> > and if we can come up with a good plan, we might be able help out >>>> with the >>>> > required changes. >>>> > >>>> > Are there any suggestions on how to deal with some of the nodes not >>>> being >>>> > ready when the process starts? >>>> > >>>> > Thanks for your time, >>>> > Christian >>>> > >>>> >>> >>