Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Zoltan Fedor
ory required also depend on how many fields are used in > the > results. > > On Tue, Aug 9, 2016 at 11:09 AM, Zoltan Fedor > wrote: > >> Does this mean you only have 1.6G memory for executor (others left for > >> Python) ? > >> The cached table could take

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Zoltan Fedor
tributes of the UDF? On Mon, Aug 8, 2016 at 5:59 PM, Davies Liu wrote: > On Mon, Aug 8, 2016 at 2:24 PM, Zoltan Fedor > wrote: > > Hi all, > > > > I have an interesting issue trying to use UDFs from SparkSQL in Spark > 2.0.0 > > using pyspark. > > > >

java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-08 Thread Zoltan Fedor
Hi all, I have an interesting issue trying to use UDFs from SparkSQL in Spark 2.0.0 using pyspark. There is a big table (5.6 Billion rows, 450Gb in memory) loaded into 300 executors's memory in SparkSQL, on which we would do some calculation using UDFs in pyspark. If I run my SQL on only a portio

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
k.sql.hive.HiveContext.\n', JavaObject id=o24)) >>> On Thu, Oct 29, 2015 at 11:44 AM, Deenar Toraskar wrote: > > Zoltan > > you should have these in your existing CDH 5.3, that's the best place to > get them. Find where spark is running from and should shoul

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
ARK_HOME/conf/yarn-conf > > and it worked. You may be better off with a custom build for CDH 5.3.3 > hadoop, which you already have done. > > Deenar > > On 29 October 2015 at 14:35, Zoltan Fedor > wrote: > >> Sure, I did it with spark-shell, which seems to be showing

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
riftserver -DskipTests clean package On Thu, Oct 29, 2015 at 11:05 AM, Kai Wei wrote: > Failed to see a hadoop-2.5 profile in pom. Maybe that's the problem. > On 30 Oct 2015 1:51 am, "Zoltan Fedor" wrote: > >> The funny thing is, that with Spark 1.2.0 on the same machin

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
The funny thing is, that with Spark 1.2.0 on the same machine (Spark 1.2.0 is the default shipped with CDH 5.3.3) the same hive-site.xml is being picked up and I have no problem whatsoever. On Thu, Oct 29, 2015 at 10:48 AM, Zoltan Fedor wrote: > Yes, I have the hive-site.xml in $SPARK_HOME/c

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
Yes, I have the hive-site.xml in $SPARK_HOME/conf, also in yarn-conf, in /etc/hive/conf, etc On Thu, Oct 29, 2015 at 10:46 AM, Kai Wei wrote: > Did you try copy it to spark/conf dir? > On 30 Oct 2015 1:42 am, "Zoltan Fedor" wrote: > >> There is /user/biapp in hdfs. The

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
There is /user/biapp in hdfs. The problem is that the hive-site.xml is being ignored, so it is looking for it locally. On Thu, Oct 29, 2015 at 10:40 AM, Kai Wei wrote: > Create /user/biapp in hdfs manually first. > On 30 Oct 2015 1:36 am, "Zoltan Fedor" wrote: > >>

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
orks. Can you possibly try running > spark-shell and do the same? > > sqlContext.sql("show databases").collect > > Deenar > > On 29 October 2015 at 14:18, Zoltan Fedor > wrote: > >> Yes, I am. It was compiled with the following: >> >> export SPA

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
ckage > > > On 29 October 2015 at 13:08, Zoltan Fedor > wrote: > >> Hi Deenar, >> As suggested, I have moved the hive-site.xml from HADOOP_CONF_DIR >> ($SPARK_HOME/hadoop-conf) to YARN_CONF_DIR ($SPARK_HOME/conf/yarn-conf) and >> use the b

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
r occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o20)) >>> On Thu, Oct 29, 2015 at 7:20 AM, Deenar Toraskar wrote: > *Hi Zoltan* > > Add hive-site.xml to your YARN_CONF_DIR. i.e. $SPARK_HOME/conf/yarn-conf > > Deenar > &g

No way to supply hive-site.xml in yarn client mode?

2015-10-28 Thread Zoltan Fedor
Hi, We have a shared CDH 5.3.3 cluster and trying to use Spark 1.5.1 on it in yarn client mode with Hive. I have compiled Spark 1.5.1 with SPARK_HIVE=true, but it seems I am not able to make SparkSQL to pick up the hive-site.xml when runnig pyspark. hive-site.xml is located in $SPARK_HOME/hadoop-