Missing / Duplicate Data when Spark retries
Hi all, I am on Spark 2.4.4 using Mesos as the task resource scheduler. The context is my job maps over multiple datasets, for each dataset it takes one dataframe from a parquet file from one HDFS path, and another dataframe from second HDFS path, unions them by name, then deduplicate by most rece
subscribe user@spark.apache.org
i want to subscribe user@spark.apache.org ??thanks a lot??