I'm a Spark and HDInsight novice, so I could be wrong...

HDInsight is based on HDP2, so my guess here is that you have the option of
installing/configuring Spark in cluster mode (YARN) or in standalone mode
and package the Spark binaries with your job.

Everything I seem to look at is related to UNIX shell scripts.  So, one
might need to pull apart some of these scripts to pick out how to run this
on Windows.

Interesting project...

Marco



On Mon, Jul 14, 2014 at 8:00 AM, Niek Tax <niek...@gmail.com> wrote:

> Hi everyone,
>
> Currently I am working on parallelizing a machine learning algorithm using
> a Microsoft HDInsight cluster. I tried running my algorithm on Hadoop
> MapReduce, but since my algorithm is iterative the job scheduling overhead
> and data loading overhead severely limits the performance of my algorithm
> in terms of training time.
>
> Since recently, HDInsight supports Hadoop 2 with YARN, which I thought
> would allow me to use run Spark jobs, which seem more fitting for my task. So
> far I have not been able however to find how I can run Apache Spark jobs on
> a HDInsight cluster.
>
> It seems like remote job submission (which would have my preference) is
> not possible for Spark on HDInsight, as REST endpoints for Oozie and
> templeton do not seem to support submission of Spark jobs. I also tried to
> RDP to the headnode for job submission from the headnode. On the headnode
> drives I can find other new YARN computation models like Tez and I also
> managed to run Tez jobs on it through YARN. However, Spark seems to be
> missing. Does this mean that HDInsight currently does not support Spark,
> even though it supports Hadoop versions with YARN? Or do I need to install
> Spark on the HDInsight cluster first, in some way? Or is there maybe
> something else that I'm missing and can I run Spark jobs on HDInsight some
> other way?
>
> Many thanks in advance!
>
>
> Kind regards,
>
> Niek Tax
>

Reply via email to