Re: Spark UI

2020-07-20 Thread ArtemisDev
Thanks Xiao for the info.  I was looking for this, too.  This page wasn't linked from anywhere on the main doc page (Overview) or any of the pull-down menus.  Someone should remind the doc team to update the table of contents on the Overview page. -- ND On 7/19/20 10:30 PM, Xiao Li wrote: htt

Using Spark UI with Running Spark on Hadoop Yarn

2020-07-13 Thread ArtemisDev
Is there anyway to make the spark process visible via Spark UI when running Spark 3.0 on a Hadoop yarn cluster?  The spark documentation talked about replacing Spark UI with the spark history server, but didn't give much details.  Therefore I would assume it is still possible to use Spark UI wh

org.apache.spark.deploy.yarn.ExecutorLauncher not found when running Spark 3.0 on Hadoop

2020-07-13 Thread ArtemisDev
I've been trying to set up the latest stable version of Spark 3.0 on a hadoop cluster using yarn.  When running spark-submit in client mode, I always got an error of org.apache.spark.deploy.yarn.ExecutorLauncher not found.  This happened when I preload the spark jar files onto HDFS and specifie

Re: File Not Found: /tmp/spark-events in Spark 3.0

2020-07-05 Thread ArtemisDev
Thank you all for the responses.  I believe the user shouldn't be worried about creating the log dir explicitly.  The event logging should behave like other logs (e.g. master or slave) that the directory should be automatically created if not exist. -- ND On 7/2/20 9:19 AM, Zero wrote: This

Re: Spark 3.0 almost 1000 times slower to read json than Spark 2.4

2020-06-29 Thread ArtemisDev
Could you share your code?  Are you sure you Spark 2.4 cluster had indeed read anything?  Looks like the Input size field is empty under 2.4. -- ND On 6/27/20 7:58 PM, Sanjeev Mishra wrote: I have large amount of json files that Spark can read in 36 seconds but Spark 3.0 takes almost 33 minu

File Not Found: /tmp/spark-events in Spark 3.0

2020-06-29 Thread ArtemisDev
While launching a spark job from Zeppelin against a standalone spark cluster (Spark 3.0 with multiple workers without hadoop), we have encountered a Spark interpreter exception caused by a I/O File Not Found exception due to the non-existence of the /tmp/spark-events directory.  We had to creat

Re: Where are all the jars gone ?

2020-06-24 Thread ArtemisDev
If you are using Maven to manage your jar dependencies, the jar files are located in the maven repository on your home directory. It is usually in the .m2 directory. Hope this helps. -ND On 6/23/20 3:21 PM, Anwar AliKhan wrote: Hi, I prefer to do most of my projects in Python and for that I

Structured Streaming using File Source - How to handle live files

2020-06-07 Thread ArtemisDev
We were trying to use structured streaming from file source, but had problems getting the files read by Spark properly.  We have another process generating the data files in the Spark data source directory on a continuous basis.  What we have observed was that the moment a data file is created