>
> At its core, EMR just launches Spark applications, whereas Databricks is a
> higher-level platform that also includes multi-user support, an interactive
> UI, security, and job scheduling.
>
> Specifically, Databricks runs standard Spark applications inside a user’s
> AWS account, similar to EMR, but it adds a variety of features to create an
> end-to-end environment for working with Spark. These include:
>
>
>    -
>
>    Interactive UI (includes a workspace with notebooks, dashboards, a job
>    scheduler, point-and-click cluster management)
>    -
>
>    Cluster sharing (multiple users can connect to the same cluster,
>    saving cost)
>    -
>
>    Security features (access controls to the whole workspace)
>    -
>
>    Collaboration (multi-user access to the same notebook, revision
>    control, and IDE and GitHub integration)
>    -
>
>    Data management (support for connecting different data sources to
>    Spark, caching service to speed up queries)
>
>
> The idea is that a lot of Spark deployments soon need to bring in multiple
> users, different types of jobs, etc, and we want to have these built-in.
> But if you just want to connect to existing data and run jobs, that also
> works.
>
> The cluster manager in Databricks is based on Standalone mode, not YARN,
> but Databricks adds several features, such as allowing multiple users to
> run commands on the same cluster and running multiple versions of Spark.
> Because Databricks is also the team that initially built Spark, the service
> is very up to date and integrated with the newest Spark features -- e.g.
> you can run previews of the next release, any data in Spark can be
> displayed visually, etc.
>
> *From: *Alex Nastetsky <alex.nastet...@vervemobile.com>
> *Subject: **Databricks Cloud vs AWS EMR*
> *Date: *January 26, 2016 at 11:55:41 AM PST
> *To: *user <user@spark.apache.org>
>
> As a user of AWS EMR (running Spark and MapReduce), I am interested in
> potential benefits that I may gain from Databricks Cloud. I was wondering
> if anyone has used both and done comparison / contrast between the two
> services.
>
> In general, which resource manager(s) does Databricks Cloud use for Spark?
> If it's YARN, can you also run MapReduce jobs in Databricks Cloud?
>
> Thanks.
>
> --

Reply via email to