Re: File not found exceptions on S3 while running spark jobs

2020-07-16 Thread Hulio andres
https://examples.javacodegeeks.com/java-io-filenotfoundexception-how-to-solve-file-not-found-exception/ Are you a programmer ? Regards, Hulio > Sent: Friday, July 17, 2020 at 2:41 AM > From: "Nagendra Darla" > To: user@spark.apache.org > Subject: File not found exceptions on S3 while

File not found exceptions on S3 while running spark jobs

2020-07-16 Thread Nagendra Darla
Hello All, I am converting existing parquet table (size: 50GB) into Delta format. It took around 1hr 45 mins to convert. And I see that there are lot of FileNotFoundExceptions in the logs Caused by: java.io.FileNotFoundException: No such file or directory:

Re: “Pyspark.zip does not exist” using Spark in cluster mode with Yarn

2020-07-16 Thread Hulio andres
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-10795 https://stackoverflow.com/questions/34632617/spark-python-submission-error-file-does-not-exist-pyspark-zip https://stackoverflow.com/questions/34632617/spark-python-submission-error-file-does-not-exist-pyspark-zip> Sent:

Re: Using spark.jars conf to override jars present in spark default classpath

2020-07-16 Thread Russell Spitzer
That's what I'm saying you don't want to do :) If you have two versions of a library with different apis the safest approach is shading and ordering probably can't be relied on. In my experience reflection will behave in ways you may not like as well as which classpath has priority when a class is

Re: Using spark.jars conf to override jars present in spark default classpath

2020-07-16 Thread Nupur Shukla
Thank you Russel and Jeff, My bad, I wasn't clear before about the conflicting jars. By that, I meant my application needs to use an updated version of certain jars than what are present in the default classpath. What would be the best way to use confs spark.jar and spark.driver.extraClassPath

Re: Using spark.jars conf to override jars present in spark default classpath

2020-07-16 Thread Jeff Evans
If you can't avoid it, you need to make use of the spark.driver.userClassPathFirst and/or spark.executor.userClassPathFirst properties. On Thu, Jul 16, 2020 at 2:03 PM Russell Spitzer wrote: > I believe the main issue here is that spark.jars is a bit "too late" to > actually prepend things to

Re: Using spark.jars conf to override jars present in spark default classpath

2020-07-16 Thread Russell Spitzer
I believe the main issue here is that spark.jars is a bit "too late" to actually prepend things to the class path. For most use cases this value is not read until after the JVM has already started and the system classloader has already loaded. The jar argument gets added via the dynamic class

Using spark.jars conf to override jars present in spark default classpath

2020-07-16 Thread Nupur Shukla
Hello, How can we use *spark.jars* to to specify conflicting jars (that is, jars that are already present in the spark's default classpath)? Jars specified in this conf gets "appended" to the classpath, and thus gets looked at after the default classpath. Is it not intended to be used to specify

“Pyspark.zip does not exist” using Spark in cluster mode with Yarn

2020-07-16 Thread Davide Curcio
I'm trying to run some Spark script in cluster mode using Yarn but I've always obtained this error. I read in other similar question that the cause can be: "Local" set up hard-coded as a master but I don't have it HADOOP_CONF_DIR environment variable that's wrong inside spark-env.sh but it