Hello Adrian,
here is the snippet
import tensorflow_datasets as tfds
(ds_train, ds_test), ds_info = tfds.load(
dataset_name, data_dir='<some path to your storage>', split=["train",
"test"], with_info=True, as_supervised=True
)
schema = StructType([
StructField("image", ArrayType(ArrayType(ArrayType(IntegerType()))),
nullable=False),
StructField("label", IntegerType(), nullable=False)
])
pp4 = spark.createDataFrame(pd.DataFrame(tfds.as_dataframe(ds_train.take(4),
ds_info)), schema)
raised error
, TypeError: field image: ArrayType(ArrayType(ArrayType(IntegerType(), True),
True), True) can not accept object array([[[14, 14, 14],
[14, 14, 14],
[14, 14, 14],
...,
[19, 17, 20],
[19, 17, 20],
[19, 17, 20]],
On Thursday, August 3, 2023 at 11:34:08 PM GMT+8, Adrian Pop-Tifrea
<[email protected]> wrote:
Hello,
can you also please show us how you created the pandas dataframe? I mean, how
you added the actual data into the dataframe. It would help us for reproducing
the error.
Thank you,Pop-Tifrea Adrian
On Mon, Jul 31, 2023 at 5:03 AM [email protected] <[email protected]>
wrote:
i changed to
ArrayType(ArrayType(ArrayType(IntegerType()))) , still get same error
Thank you for responding
On Thursday, July 27, 2023 at 06:58:09 PM GMT+8, Adrian Pop-Tifrea
<[email protected]> wrote:
Hello,
when you said your pandas Dataframe has 10 rows, does that mean it contains 10
images? Because if that's the case, then you'd want ro only use 3 layers of
ArrayType when you define the schema.
Best regards,Adrian
On Thu, Jul 27, 2023, 11:04 [email protected]
<[email protected]> wrote:
i have panda dataframe with column 'image' using numpy.ndarray. shape is (500,
333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10, 500,
333, 3)
when using spark.createDataframe(panda_dataframe, schema), i need to specify
the schema,
schema = StructType([
StructField("image",
ArrayType(ArrayType(ArrayType(ArrayType(IntegerType())))), nullable=False)
])
i get error
raise TypeError(
, TypeError: field image:
ArrayType(ArrayType(ArrayType(ArrayType(IntegerType(), True), True), True),
True) can not accept object array([[[14, 14, 14],...
Can advise how to set schema for image with numpy.ndarray ?