RE: Understanding Executors UI

2021-01-08 Thread Luca Canali
You report 'Storage Memory': 3.3TB/ 598.5 GB -> The first number is the memory used for storage, the second one is the available memory (for storage) in the unified memory pool. The used memory shown in your webui snippet is indeed quite high (higher than the available memory!? ), you can

Re: Spark standalone - reading kerberos hdfs

2021-01-08 Thread Sudhir Babu Pothineni
Incase of Spark on Yarn, Application Master shares the token. I think incase of spark stand alone the token is not shared to executor, any example how to get the HDFS token for executor? On Fri, Jan 8, 2021 at 12:13 PM Gabor Somogyi wrote: > TGT is not enough, you need HDFS token which can be

Re: Spark standalone - reading kerberos hdfs

2021-01-08 Thread Gabor Somogyi
TGT is not enough, you need HDFS token which can be obtained by Spark. Please check the logs... On Fri, 8 Jan 2021, 18:51 Sudhir Babu Pothineni, wrote: > I spin up a spark standalone cluster (spark.autheticate=false), submitted > a job which reads remote kerberized HDFS, > > val spark =

Spark standalone - reading kerberos hdfs

2021-01-08 Thread Sudhir Babu Pothineni
I spin up a spark standalone cluster (spark.autheticate=false), submitted a job which reads remote kerberized HDFS, val spark = SparkSession.builder() .master("spark://spark-standalone:7077") .getOrCreate() UserGroupInformation.loginUserFromKeytab(principal,

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Just to clarify, are you referring to module dependencies in PySpark? With Scala I can create a Uber jar file inclusive of all bits and pieces built with maven or sbt that works in a cluster and submit to spark-submit as a uber jar file. what alternatives would you suggest for PySpark, a zip

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Sean Owen
THis isn't going to help submitting to a remote cluster though. You need to explicitly include dependencies in your submit. On Fri, Jan 8, 2021 at 11:15 AM Mich Talebzadeh wrote: > Hi Riccardo > > This is the env variables at runtime > > PYTHONUNBUFFERED=1;*PYTHONPATH=* >

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Hi Riccardo This is the env variables at runtime PYTHONUNBUFFERED=1;*PYTHONPATH=*

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Hi Sean, sparkstuff.py is under packages/sparutils/sparkstuff.py as shown below [image: image.png] So within PyCharm, it is picked up OK. However, at terminal level, it is not picked up. THis is a snapshot of Pycharm. The module I am trying to run is called analyze_house_prices_GCP.py

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Riccardo Ferrari
I think spark checks the python path env variable. Need to provide that. Of course that works in local mode only On Fri, Jan 8, 2021, 5:28 PM Sean Owen wrote: > I don't see anywhere that you provide 'sparkstuff'? how would the Spark > app have this code otherwise? > > On Fri, Jan 8, 2021 at

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Sean Owen
I don't see anywhere that you provide 'sparkstuff'? how would the Spark app have this code otherwise? On Fri, Jan 8, 2021 at 10:20 AM Mich Talebzadeh wrote: > Thanks Riccardo. > > I am well aware of the submission form > > However, my question relates to doing submission within PyCharm itself.

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Thanks Riccardo. I am well aware of the submission form However, my question relates to doing submission within PyCharm itself. This is what I do at Pycharm *terminal* to invoke the module python spark-submit --jars ..\lib\spark-bigquery-with-dependencies_2.12-0.18.0.jar \ --packages

Re: PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Riccardo Ferrari
You need to provide your python dependencies as well. See http://spark.apache.org/docs/latest/submitting-applications.html, look for --py-files HTH On Fri, Jan 8, 2021 at 3:13 PM Mich Talebzadeh wrote: > Hi, > > I have a module in Pycharm which reads data stored in a Bigquery table and > does

Re: Converting spark batch to spark streaming

2021-01-08 Thread Jacek Laskowski
Hi, Start with DataStreamWriter.foreachBatch. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books Follow me on https://twitter.com/jaceklaskowski On Thu, Jan 7, 2021 at 6:55 PM mhd wrk

PyCharm, Running spark-submit calling jars and a package at run time

2021-01-08 Thread Mich Talebzadeh
Hi, I have a module in Pycharm which reads data stored in a Bigquery table and does plotting. At the command line on the terminal I need to add the jar file and the packet to make it work. (venv) C:\Users\admin\PycharmProjects\pythonProject2\DS\src>spark-submit --jars