WeichenXu123 commented on PR #40724:
URL: https://github.com/apache/spark/pull/40724#issuecomment-1524671277
> @mengxr raises another suggestion: uses petastorm to load data from DBFS /
HDFS /.. .(so that it can make torch distributor has a simpler interfaces). But
there’s a shortcoming tha
WeichenXu123 commented on PR #40724:
URL: https://github.com/apache/spark/pull/40724#issuecomment-1505144497
@mengxr raises another suggestion: uses petastorm to load data from DBFS /
HDFS /.. .(so that it can make torch distributor has a simpler interfaces). But
there’s a shortcoming that
WeichenXu123 commented on PR #40724:
URL: https://github.com/apache/spark/pull/40724#issuecomment-1505144052
> what if there are two input datasets, one for training and one for
validation?
We can add a "is_validation" boolean column to mark it is for training or
for validation.
-