Dear Spark Development Community,
According to the PySpark documentation, it should be possible to have a MapType
column with ArrayType keys. MapType supports keys of type DataType and
ArrayType inherits from DataType.
When i try that with PySpark 3.5.3, the show() method of the DataFrame works as
aspected, but the collect() method throws an exception:
from pyspark.sql import SparkSession
from pyspark.sql.types import MapType, ArrayType, StringType
schema = MapType(ArrayType(StringType()), StringType())
data = [{("A", "B"): "foo", ("X", "Y", "Z"): "bar"}]
df = spark.createDataFrame(data, schema)
df.show() # works
df.collect() # throws exception
Is this behavior correct?
Kind regards,
Eyck