Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

2017-03-25 Thread RUSHIKESH RAUT
Yes I know it inevitable if the data is large. I want to know how do I increase the interpreter memory to handle large data? Thanks, Rushikesh Raut On Mar 26, 2017 8:56 AM, "Jianfeng (Jeff) Zhang" wrote: > > How large is your data ? This problem is inevitable if your data is too > large, you ca

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

2017-03-25 Thread Jianfeng (Jeff) Zhang
How large is your data ? This problem is inevitable if your data is too large, you can try to use spark data frame if that works for you. Best Regard, Jeff Zhang From: RUSHIKESH RAUT mailto:rushikeshraut...@gmail.com>> Reply-To: "users@zeppelin.apache.org

Re: Setting Zeppelin to work with multiple Hadoop clusters when running Spark.

2017-03-25 Thread Jianfeng (Jeff) Zhang
You can try to specify the namenode address for hdfs file. e.g spark.read.csv("hdfs://localhost:9009/file") Best Regard, Jeff Zhang From: Serega Sheypak mailto:serega.shey...@gmail.com>> Reply-To: "users@zeppelin.apache.org" mailto:users@zeppelin.apache.org>>

Setting Zeppelin to work with multiple Hadoop clusters when running Spark.

2017-03-25 Thread Serega Sheypak
Hi, I have three hadoop clusters. Each cluster has it's own NN HA configured and YARN. I want to allow user to read from ant cluster and write to any cluster. Also user should be able to choose where to run is spark job. What is the right way to configure it in Zeppelin?

Zeppelin out of memory issue - (GC overhead limit exceeded)

2017-03-25 Thread RUSHIKESH RAUT
Hi everyone, I am trying to load some data from hive table into my notebook and then convert this dataframe into r dataframe using spark.r interpreter. This works perfectly for small amount of data. But if the data is increased then it gives me error java.lang.OutOfMemoryError: GC overhead limit