Re: Spark Window Documentation

2020-05-08 Thread Jacek Laskowski
Hi Neeraj, I'd start from "Contributing Documentation Changes" in https://spark.apache.org/contributing.html Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books Follow me on https://twitter.com/jaceklaskowski

Re: Spark Window Documentation

2020-05-08 Thread neeraj bhadani
Thanks Jacek for sharing the details. I could see some example here https://github.com/apache/spark/blob/master/python/pyspark/sql/window.py#L83 as mentioned in original email but not sure where this is reflecting on spark documentation. Also, what would be the process to contribute to the spark

Re: java.lang.OutOfMemoryError Spark Worker

2020-05-08 Thread Russell Spitzer
The error is in the Spark Standalone Worker. It's hitting an OOM while launching/running an executor process. Specifically it's running out of memory when parsing the hadoop configuration trying to figure out the env/command line to run

Re: java.lang.OutOfMemoryError Spark Worker

2020-05-08 Thread Hrishikesh Mishra
We submit spark job through spark-submit command, Like below one. sudo /var/lib/pf-spark/bin/spark-submit \ --total-executor-cores 30 \ --driver-cores 2 \ --class com.hrishikesh.mishra.Main\ --master spark://XX.XX.XXX.19:6066 \ --deploy-mode cluster \ --supervise

Re: java.lang.OutOfMemoryError Spark Worker

2020-05-08 Thread Jacek Laskowski
Hi, It's been a while since I worked with Spark Standalone, but I'd check the logs of the workers. How do you spark-submit the app? DId you check /grid/1/spark/work/driver-20200508153502-1291 directory? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online

Re: java.lang.OutOfMemoryError Spark Worker

2020-05-08 Thread Hrishikesh Mishra
Thanks Jacek for quick response. Due to our system constraints, we can't move to Structured Streaming now. But definitely YARN can be tried out. But my problem is I'm able to figure out where is the issue, Driver, Executor, or Worker. Even exceptions are clueless. Please see the below exception,

Re: Spark structured streaming - performance tuning

2020-05-08 Thread Srinivas V
Anyone else can answer below questions on performance tuning Structured streaming? @Jacek? On Sun, May 3, 2020 at 12:07 AM Srinivas V wrote: > Hi Alex, read the book , it is a good one but i don’t see things which I > strongly want to understand. > You are right on the partition and tasks. >

Re: java.lang.OutOfMemoryError Spark Worker

2020-05-08 Thread Jacek Laskowski
Hi, Sorry for being perhaps too harsh, but when you asked "Am I missing something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone Cluster. " I immediately thought "Yeah...please upgrade your Spark env to use Spark Structured Streaming at the very least and/or use YARN as the

Re: Spark Window Documentation

2020-05-08 Thread Jacek Laskowski
Hi Neeraj, I'm not a committer so I might be wrong, but there is no "blessed way" to include examples. There are some examples in the official documentation at http://spark.apache.org/docs/latest/sql-programming-guide.html but this is how to use the general concepts not specific operators.

Re: java.lang.OutOfMemoryError Spark Worker

2020-05-08 Thread Hrishikesh Mishra
These errors are completely clueless. No clue why its OOM exception is coming. 20/05/08 15:36:55 INFO Worker: Asked to kill driver driver-20200508153502-1291 20/05/08 15:36:55 INFO DriverRunner: Killing driver process! 20/05/08 15:36:55 INFO CommandUtils: Redirection to

Spark Window Documentation

2020-05-08 Thread neeraj bhadani
Hi Team, I was looking for a Spark window function example on documentation. For example, I could the function definition and params are explained nicely here: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Window.rowsBetween and this is the source which is

Re: How to populate all possible combination values in columns using Spark SQL

2020-05-08 Thread Edgardo Szrajber
Have you checked the pivot function?Bentzi Sent from Yahoo Mail on Android On Thu, May 7, 2020 at 22:46, Aakash Basu wrote: Hi, I've updated the SO question with masked data, added year column and other requirement. Please take a look. Hope this helps in solving the problem. Thanks and

Re: No. of active states?

2020-05-08 Thread Edgardo Szrajber
This should open a new world of real-time metrics for you.How to get Spark Metrics as JSON using Spark REST API in YARN Cluster mode | | | | | | | | | | | How to get Spark Metrics as JSON using Spark REST API in YARN Cluster mode Anbu Cheeralan Spark provides the metrics in UI.