Hyukjin Kwon created SPARK-42021:
------------------------------------

             Summary: createDataFrame with array.array
                 Key: SPARK-42021
                 URL: https://issues.apache.org/jira/browse/SPARK-42021
             Project: Spark
          Issue Type: Sub-task
          Components: Connect
    Affects Versions: 3.4.0
            Reporter: Hyukjin Kwon


{code}
pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types)
self = <pyspark.sql.tests.connect.test_parity_types.TypesParityTests 
testMethod=test_array_types>

    def test_array_types(self):
        # This test need to make sure that the Scala type selected is at least
        # as large as the python's types. This is necessary because python's
        # array types depend on C implementation on the machine. Therefore there
        # is no machine independent correspondence between python's array types
        # and Scala types.
        # See: https://docs.python.org/2/library/array.html
    
        def assertCollectSuccess(typecode, value):
            row = Row(myarray=array.array(typecode, [value]))
            df = self.spark.createDataFrame([row])
            self.assertEqual(df.first()["myarray"][0], value)
    
        # supported string types
        #
        # String types in python's array are "u" for Py_UNICODE and "c" for 
char.
        # "u" will be removed in python 4, and "c" is not supported in python 3.
        supported_string_types = []
        if sys.version_info[0] < 4:
            supported_string_types += ["u"]
            # test unicode
>           assertCollectSuccess("u", "a")

../test_types.py:986: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../test_types.py:975: in assertCollectSuccess
    df = self.spark.createDataFrame([row])
../../connect/session.py:278: in createDataFrame
    _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in _data])
pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist
    ???
pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist
    ???
pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays
    ???
pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays
    ???
pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays
    ???
pyarrow/array.pxi:320: in pyarrow.lib.array
    ???
pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array
    ???
pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type 
array.array: did not recognize Python value type when inferring an Arrow data 
type

pyarrow/error.pxi:100: ArrowInvalid

{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to