You can also have a look at:

yarn logs -applicationId <Application ID>

Sometimes you directly see what is going on there

________________________________
Von: Wei-Chiu Chuang <weic...@cloudera.com.INVALID>
Gesendet: Donnerstag, 20. Mai 2021 01:43:07
An: Clay McDonald
Cc: user@hadoop.apache.org
Betreff: Re: PySpark Write File Container exited with a non-zero exit code 143

Have you checked the executor log?
In most cases the executor fails like that because of insufficient memory. You 
should be able to see more details looking at the executor log.

On Thu, May 20, 2021 at 3:28 AM Clay McDonald 
<stuart.mcdon...@bateswhite.com<mailto:stuart.mcdon...@bateswhite.com>> wrote:
Hello all,

I’m hoping someone can give me some direction for troubleshooting this issue, 
I’m trying to write from Spark on an HortonWorks(Cloudera) HDP cluster. I ssh 
directly to the first datanode and run PySpark with the following command; 
however, it is always failing no matter what size I set memory in Yarn 
Containers and Yarn Queues. Any suggestions?



pyspark --conf queue=default --conf executory-memory=24G

--

HDFS_RAW="/HDFS/Data/Test/Original/MyData_data/"
#HDFS_OUT="/ HDFS/Data/Test/Processed/Convert_parquet/Output"
HDFS_OUT="/tmp"
ENCODING="utf-16"

fileList1=[
'Test _2003.txt'
]
from  pyspark.sql.functions import regexp_replace,col
for f in fileList1:
                fname=f
                fname_noext=fname.split('.')[0]
                df = 
spark.read.option("delimiter","|").option("encoding",ENCODING).option("multiLine",True).option('wholeFile',"true").csv('{}/{}'.format(HDFS_RAW,fname),
 header=True)
                lastcol=df.columns[-1]
                print('showing {}'.format(fname))
                if ('\r' in lastcol):
                                lastcol=lastcol.replace('\r','')
                                df=df.withColumn(lastcol, 
regexp_replace(col("{}\r".format(lastcol)), "[\r]", 
"")).drop('{}\r'.format(lastcol))
                
df.write.format('parquet').mode('overwrite').save("{}/{}".format(HDFS_OUT,fname_noext))



Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
1.0 (TID 4, DataNode01.mydomain.com<http://DataNode01.mydomain.com>, executor 
5): ExecutorLostFailure (executor 5 exited caused by one of the running tasks) 
Reason: Container marked as failed: container_e331_1621375512548_0021_01_000006 
on host: DataNode01.mydomain.com<http://DataNode01.mydomain.com>. Exit status: 
143. Diagnostics: [2021-05-19 18:09:06.392]Container killed on request. Exit 
code is 143
[2021-05-19 18:09:06.413]Container exited with a non-zero exit code 143.
[2021-05-19 18:09:06.414]Killed by external signal


THANKS! CLAY

Reply via email to