I agree Sean, although its strange since we aren’t using any UDFs but
sticking to spark provided functions. If anyone in the community has seen
such an issue before I would be happy to learn more!
On Thu, Sep 10, 2020 at 6:01 AM Sean Owen wrote:
> It's more likely a subtle issue with your code o
It's more likely a subtle issue with your code or data, but hard to
say without knowing more. The lineage is fine and deterministic, but
your data or operations might not be.
On Thu, Sep 10, 2020 at 12:03 AM Ruijing Li wrote:
>
> Hi all,
>
> I am on Spark 2.4.4 using Mesos as the task resource sc
Hi all,
I am on Spark 2.4.4 using Mesos as the task resource scheduler. The context
is my job maps over multiple datasets, for each dataset it takes one
dataframe from a parquet file from one HDFS path, and another dataframe
from second HDFS path, unions them by name, then deduplicate by most rece