Re: File not found exceptions on S3 while running spark jobs

2020-07-17 Thread Hulio andres
Most likely directory write permission not permission. The app user doesn't have permission to write files to that directory. > Sent: Friday, July 17, 2020 at 6:03 PM > From: "Nagendra Darla" > To: "Hulio andres" > Cc: user@spark.apache.org > Subject: Re: File not found exceptions on S3 while r

Re: File not found exceptions on S3 while running spark jobs

2020-07-17 Thread Slava Rodionov
Hi those are only my thoughts, not a solution, hope they may help you. First of all, we need a full stacktrace not just an exception to make a conclusion. I see you're using s3a. Where do you run your job? Is that EMR? Normally you need to make S3 more consistent first to make it usable. This mean

Garbage collection issue

2020-07-17 Thread Amit Sharma
Hi All, i am running the same batch job in my two separate spark clusters. In one of the clusters it is showing GC warning on spark -ui under executer tag. Garbage collection is taking longer time around 20 % while in another cluster it is under 10 %. I am using the same configuration in my spar

Re: Using pyspark with Spark 2.4.3 a MultiLayerPerceptron model givens inconsistent outputs if a large amount of data is fed into it and at least one of the model outputs is fed to a Python UDF.

2020-07-17 Thread Sean Owen
I can't reproduce it (on Databricks / Spark 2.4), but as you say, sounds really specific to some way of executing it. I can't off the top of my head imagine why that would be. As you say, no matter the model, it should be the same result. I don't recall a bug being fixed around there, but neverthel

Re: File not found exceptions on S3 while running spark jobs

2020-07-17 Thread Nagendra Darla
Hi, Thanks I know about FileNotFound Exception. This error is with S3 buckets which has a delay in showing newly created files. These files eventually shows up after some time. These errors are coming up while running a parquet table into Delta table. My question is more around avoiding this er

Re: Spark 3.0.0 spark.read.json never completes

2020-07-17 Thread JasonLee
hi is there any error ? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Future timeout

2020-07-17 Thread Amit Sharma
Hi, sometimes my spark streaming job throw this exception Futures timed out after [300 seconds]. I am not sure where is the default timeout configuration. Can i increase it. Please help. Thanks Amit Caused by: java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]

Re: Issue in parallelization of CNN model using spark

2020-07-17 Thread Mukhtaj Khan
Dear All Thanks all of you for your reply. I am trying to parallelize the CNN model using Keras2DML library, however, I am getting the error message: NO Module Named Systemml.mllearn. Can any body guide me how to install systemml using ubuntu best regards On Tue, Jul 14, 2020 at 4:34 AM Anwar Al

Re: How To Access Hive 2 Through JDBC Using Kerberos

2020-07-17 Thread Daniel de Oliveira Mantovani
Sorry for the misunderstanding, I found out today that actually my colleagues didn't make Spark work with Kerberos authentication for Hive JDBC. Spark can't give Kerberos parameters to the executors. Sorry again for the misunderstanding. On Thu, Jul 9, 2020 at 9:52 PM Jeff Evans wrote: > There

Using pyspark with Spark 2.4.3 a MultiLayerPerceptron model givens inconsistent outputs if a large amount of data is fed into it and at least one of the model outputs is fed to a Python UDF.

2020-07-17 Thread Ben Smith
Hi, I am having an issue that looks like a potentially serious bug with Spark 2.4.3 as it impacts data accuracy. I have searched in the Spark Jira and mail lists as best I can and cannot find reference to anyone else having this issue. I am not sure if this would be suitable for raising as a bug i