Hi Zeppelin users,

I'v been presenting some demo on "Spark+Zeppelin on AWS EMR" at AWS Summit
Seoul yesterday. I'm so sad that the slides are written in Korean so it's
hard to share, but I'd like to share some essentials.

1. Running Z on EMR is super easy. (EMR team did really good job. You can
do that with only few clicks, took 8min to launch)

2. You can launch EMR with spot instances, it will save your money.

3. You can provide some configs when you launch EMR cluster, so you may
want to save your notebook on S3, proper config is as follow.

[
  {
    "Classification": "zeppelin-env",
    "Properties": {},
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
          “ZEPPELIN_NOTEBOOK_STORAGE"
             :"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
          "ZEPPELIN_NOTEBOOK_S3_BUCKET": "BUCKET_NAME",
          "ZEPPELIN_NOTEBOOK_S3_USER": "SOME_USER_NAME"
        },
        "Configurations": []
      }
    ]
  }
]

4. You need to set proper spark.executor.memory in Zeppelin interpreter
setting.

5. You can increase or decrease cluster size in cluster detail page.

6. Don't forget to terminate cluster when you're done your job :)

That's all!


If you have more tips, plz add it on this mail thread. Thanks!

- Kevin

Reply via email to