Reading spark-env.sh from configured directory

2017-08-02 Thread Lior Chaga
Hi, I have multiple spark deployments using mesos. I use spark.executor.uri to fetch the spark distribution to executor node. Every time I upgrade spark, I download the default distribution, and just add to it custom spark-env.sh to spark/conf folder. Further more, any change I want to do in spa

spark on mesos cluster - metrics with graphite sink

2016-06-09 Thread Lior Chaga
Hi, I'm launching spark application on mesos cluster. The namespace of the metric includes the framework id for driver metrics, and both framework id and executor id for executor metrics. These ids are obviously assigned by mesos, and they are not permanent - re-registering the application would re

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-03-03 Thread Lior Chaga
alue for spark.shuffle.reduceLocality.enabled is true. >>> >>> To reduce surprise to users of 1.5 and earlier releases, should the >>> default value be set to false ? >>> >>> On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga wrote: >>>

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-29 Thread Lior Chaga
bution change... fully reproducible > > On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga wrote: > >> Hi, >> I've experienced a similar problem upgrading from spark 1.4 to spark 1.6. >> The data is not evenly distributed across executors, but in my case it >> also repr

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-28 Thread Lior Chaga
Hi, I've experienced a similar problem upgrading from spark 1.4 to spark 1.6. The data is not evenly distributed across executors, but in my case it also reproduced with legacy mode. Also tried 1.6.1 rc-1, with same results. Still looking for resolution. Lior On Fri, Feb 19, 2016 at 2:01 AM, Koe

deleting application files in standalone cluster

2015-08-09 Thread Lior Chaga
Hi, Using spark 1.4.0 in standalone mode, with following configuration: SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.appDataTtl=86400" cleanup interval is set to default. Application files are not deleted. Using JavaSparkContext, and when the application ends it

Use rank with distribute by in HiveContext

2015-07-16 Thread Lior Chaga
Does spark HiveContext support the rank() ... distribute by syntax (as in the following article- http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/doing_rank_with_hive )? If not, how can it be achieved? Thanks, Lior

Re: spark sql - group by constant column

2015-07-15 Thread Lior Chaga
combinations. On Wed, Jul 15, 2015 at 10:09 AM, Lior Chaga wrote: > Hi, > > Facing a bug with group by in SparkSQL (version 1.4). > Registered a JavaRDD with object containing integer fields as a table. > > Then I'm trying to do a group by, with a constant value in the g

spark sql - group by constant column

2015-07-15 Thread Lior Chaga
Hi, Facing a bug with group by in SparkSQL (version 1.4). Registered a JavaRDD with object containing integer fields as a table. Then I'm trying to do a group by, with a constant value in the group by fields: SELECT primary_one, primary_two, 10 as num, SUM(measure) as total_measures FROM tbl GRO

Re: Help optimising Spark SQL query

2015-06-22 Thread Lior Chaga
Hi James, There are a few configurations that you can try: https://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options >From my experience, the codegen really boost things up. Just run sqlContext.sql("spark.sql.codegen=true") before you execute your query. But keep

Re: spark sql hive-shims

2015-05-14 Thread Lior Chaga
I see that the pre-built distributions includes hive-shims-0.23 shaded in spark-assembly jar (unlike when I make the distribution myself). Does anyone knows what I should do to include the shims in my distribution? On Thu, May 14, 2015 at 9:52 AM, Lior Chaga wrote: > Ultimately it was Perm

Re: spark sql hive-shims

2015-05-13 Thread Lior Chaga
Ultimately it was PermGen out of memory. I somehow missed it in the log On Thu, May 14, 2015 at 9:24 AM, Lior Chaga wrote: > After profiling with YourKit, I see there's an OutOfMemoryException in > context SQLContext.applySchema. Again, it's a very small RDD. Each executor

Re: spark sql hive-shims

2015-05-13 Thread Lior Chaga
After profiling with YourKit, I see there's an OutOfMemoryException in context SQLContext.applySchema. Again, it's a very small RDD. Each executor has 180GB RAM. On Thu, May 14, 2015 at 8:53 AM, Lior Chaga wrote: > Hi, > > Using spark sql with HiveContext. Spark version is 1

spark sql hive-shims

2015-05-13 Thread Lior Chaga
Hi, Using spark sql with HiveContext. Spark version is 1.3.1 When running local spark everything works fine. When running on spark cluster I get ClassNotFoundError org.apache.hadoop.hive.shims.Hadoop23Shims. This class belongs to hive-shims-0.23, and is a runtime dependency for spark-hive: [INFO]

mapping JavaRDD to jdbc DataFrame

2015-05-04 Thread Lior Chaga
Hi, I'd like to use a JavaRDD containing parameters for an SQL query, and use SparkSQL jdbc to load data from mySQL. Consider the following pseudo code: JavaRDD namesRdd = ... ; ... options.put("url", "jdbc:mysql://mysql?user=usr"); options.put("password", "pass"); options.put("dbtable", "(SELEC

Re: Date class not supported by SparkSQL

2015-04-19 Thread Lior Chaga
lue; } public void setValue(Long value) { this.value = value; } } } On Sun, Apr 19, 2015 at 4:27 PM, Lior Chaga wrote: > Using Spark 1.2.0. Tried to apply register an RDD and got: > scala.MatchError: class java.util.Date (of class java.lang.

Date class not supported by SparkSQL

2015-04-19 Thread Lior Chaga
Using Spark 1.2.0. Tried to apply register an RDD and got: scala.MatchError: class java.util.Date (of class java.lang.Class) I see it was resolved in https://issues.apache.org/jira/browse/SPARK-2562 (included in 1.2.0) Anyone encountered this issue? Thanks, Lior

using log4j2 with spark

2015-03-04 Thread Lior Chaga
Hi, Trying to run spark 1.2.1 w/ hadoop 1.0.4 on cluster and configure it to run with log4j2. Problem is that spark-assembly.jar contains log4j and slf4j classes compatible with log4j 1.2 in it, and so it detects it should use log4j 1.2 ( https://github.com/apache/spark/blob/54e7b456dd56c9e52132154