Hyukjin Kwon created SPARK-42021: ------------------------------------ Summary: createDataFrame with array.array Key: SPARK-42021 URL: https://issues.apache.org/jira/browse/SPARK-42021 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon
{code} pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types) self = <pyspark.sql.tests.connect.test_parity_types.TypesParityTests testMethod=test_array_types> def test_array_types(self): # This test need to make sure that the Scala type selected is at least # as large as the python's types. This is necessary because python's # array types depend on C implementation on the machine. Therefore there # is no machine independent correspondence between python's array types # and Scala types. # See: https://docs.python.org/2/library/array.html def assertCollectSuccess(typecode, value): row = Row(myarray=array.array(typecode, [value])) df = self.spark.createDataFrame([row]) self.assertEqual(df.first()["myarray"][0], value) # supported string types # # String types in python's array are "u" for Py_UNICODE and "c" for char. # "u" will be removed in python 4, and "c" is not supported in python 3. supported_string_types = [] if sys.version_info[0] < 4: supported_string_types += ["u"] # test unicode > assertCollectSuccess("u", "a") ../test_types.py:986: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../test_types.py:975: in assertCollectSuccess df = self.spark.createDataFrame([row]) ../../connect/session.py:278: in createDataFrame _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in _data]) pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist ??? pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist ??? pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays ??? pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays ??? pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays ??? pyarrow/array.pxi:320: in pyarrow.lib.array ??? pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array ??? pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type array.array: did not recognize Python value type when inferring an Arrow data type pyarrow/error.pxi:100: ArrowInvalid {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org