hi guys , i have this error after 5 hours of processing i make lot of joins 14 left joins with small table :
i saw in the spark ui and console log evrithing ok but when he save last join i get this error Py4JJavaError: An error occurred while calling o115.parquet. _metadata is not a Parquet file (too small) i use 4 containers 26 go each and 8 cores i increase number of partition and i use broadcast join whithout succes i get log file but he s large 57 mo i can't share with you . i use pyspark 1.5.0 on cloudera 5.5.1 and yarn and i use hivecontext for dealing with data.