Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-02 Thread John Zhuge
Hi, I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is spark-env.sh sourced when starting the Spark AM container or the executor container? Saw this paragraph on https://github.com/apache/spark/blob/master/docs/configuration.md: Note: When running Spark on YARN in cluster

Re: [Spark SQL] How to run a custom meta query for `ANALYZE TABLE`

2018-01-02 Thread Jörn Franke
Hi, No this is not possible with the current data source API. However, there is a new data source API v2 on its way - maybe it will support it. Alternatively, you can have a config option to calculate meta data after an insert. However, could you please explain more for which dB your

[Spark SQL] How to run a custom meta query for `ANALYZE TABLE`

2018-01-02 Thread Jason Heo
Hi, I'm working on integrating Spark and a custom data source. Most things go well with nice Spark Data Source APIs (Thanks to well designed APIs) But, one thing I couldn't resolve is that how to execute custom meta query for `ANALYZE TABLE` The custom data source I'm currently working on has

Unclosed NingWSCLient holds up a Spark appication

2018-01-02 Thread Lalwani, Jayesh
I noticed some weird behavior with NingWSClient 2.4.3. when used with Spark. Try this 1. Spin up spark-shell with play-ws2.4.3 in driver class path 2. Run this code val myConfig = new AsyncHttpClientConfigBean() config.setAcceptAnyCertificate(true) config.setFollowRedirect(true) val

Re: Converting binary files

2018-01-02 Thread Lalwani, Jayesh
You can repartition your dataframe into 1 partition and all the data will land into one partition. However, doing this is perilious because you will end up with all your data on one node, and if you have too much data you will run out of memory. In fact, anytime you are thinking about putting

Current way of using functions.window with Java

2018-01-02 Thread Anton Puzanov
I write a sliding window analytic program and use the functions.window function ( https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/functions.html#window(org.apache.spark.sql.Column,%20java.lang.String,%20java.lang.String) ) The code looks like this: Column slidingWindow =

Re: Spark on EMR suddenly stalling

2018-01-02 Thread Gourav Sengupta
Hi Jeroen, in case you are using HIVE partitions how many partitions do you have? Also is there any chance that you might post the code? Regards, Gourav Sengupta On Tue, Jan 2, 2018 at 7:50 AM, Jeroen Miller wrote: > Hello Gourav, > > On 30 Dec 2017, at 20:20, Gourav