What's the updated way of deploying spark streaming apps on EMR? Using YARN?
There are some out of date solutions like https://github.com/ianoc/SparkEMRBootstrap which setup mesos on EMR. I wonder if this can be simplified by spark 0.9. Spark-ec2 comes with a considerable amount of configuration, and some useful utilities like deploy to workers, porting it to a managed service such as EMR is not as trivial as it might seem to be. On Fri, Feb 28, 2014 at 6:19 PM, Mayur Rustagi <mayur.rust...@gmail.com>wrote: > I think what you are looking for is sort of a managed service ala EMR or > Qubole. Spark-ec2 is just software to boot up machines & integrate them > together using Whirr. > I agree a managed service for Streaming would be really useful. > Regards > Mayur > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Fri, Feb 28, 2014 at 8:50 AM, Aureliano Buendia > <buendia...@gmail.com>wrote: > >> Another subject that was not that important in spark, but it could be >> crucial for 24/7 spark streaming, is reconstruction of lost nodes. By that, >> I do not mean lost data reconstruction by self healing, but bringing up new >> ec2 instances once they die for whatever reasons. Is this also supported in >> spark ec2? >> >> >> On Fri, Feb 28, 2014 at 2:24 AM, Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >>> Yes, the default spark EC2 cluster runs the standalone deploy mode. >>> Since Spark 0.9, the standalone deploy mode allows you to launch the driver >>> app within the cluster itself and automatically restart it if it fails. You >>> can read about launching your app inside the cluster >>> here<http://spark.incubator.apache.org/docs/latest/spark-standalone.html#connecting-an-application-to-the-cluster>. >>> Using this you can launch your streaming app as well. >>> >>> TD >>> >>> >>> On Thu, Feb 27, 2014 at 5:35 PM, Aureliano Buendia <buendia...@gmail.com >>> > wrote: >>> >>>> How about spark stream app itself? Does the ec2 script also provide >>>> means for daemonizing and monitoring spark streaming apps which are >>>> supposed to run 24/7? If not, any suggestions for how to do this? >>>> >>>> >>>> On Thu, Feb 27, 2014 at 8:23 PM, Tathagata Das < >>>> tathagata.das1...@gmail.com> wrote: >>>> >>>>> Zookeeper is automatically set up in the cluster as Spark uses >>>>> Zookeeper. However, you have to setup your own input source like Kafka or >>>>> Flume. >>>>> >>>>> TD >>>>> >>>>> >>>>> On Thu, Feb 27, 2014 at 10:32 AM, Aureliano Buendia < >>>>> buendia...@gmail.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Feb 27, 2014 at 6:17 PM, Tathagata Das < >>>>>> tathagata.das1...@gmail.com> wrote: >>>>>> >>>>>>> Yes! Spark streaming programs are just like any spark program and so >>>>>>> any ec2 cluster setup using the spark-ec2 scripts can be used to run >>>>>>> spark >>>>>>> streaming programs as well. >>>>>>> >>>>>> >>>>>> Great. Does it come with any input source support as well? (Eg kafka >>>>>> requires setting up zookeeper). >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Feb 27, 2014 at 10:11 AM, Aureliano Buendia < >>>>>>> buendia...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Does the ec2 support for spark 0.9 also include spark streaming? If >>>>>>>> not, is there an equivalent? >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >