[ https://issues.apache.org/jira/browse/SPARK-24976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Cutler reassigned SPARK-24976: ------------------------------------ Assignee: Hyukjin Kwon > Allow None for Decimal type conversion (specific to PyArrow 0.9.0) > ------------------------------------------------------------------ > > Key: SPARK-24976 > URL: https://issues.apache.org/jira/browse/SPARK-24976 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Affects Versions: 2.4.0 > Reporter: Hyukjin Kwon > Assignee: Hyukjin Kwon > Priority: Major > Fix For: 2.3.2, 2.4.0 > > > See https://jira.apache.org/jira/browse/ARROW-2432 > If we use Arrow 0.9.0, the the test case (None as decimal) failed as below: > {code} > Traceback (most recent call last): > File "/.../spark/python/pyspark/sql/tests.py", line 4672, in > test_vectorized_udf_null_decimal > self.assertEquals(df.collect(), res.collect()) > File "/.../spark/python/pyspark/sql/dataframe.py", line 533, in collect > sock_info = self._jdf.collectToPython() > File "/.../spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line > 1257, in __call__ > answer, self.gateway_client, self.target_id, self.name) > File "/.../spark/python/pyspark/sql/utils.py", line 63, in deco > return f(*a, **kw) > File "/.../spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line > 328, in get_return_value > format(target_id, ".", name), value) > Py4JJavaError: An error occurred while calling o51.collectToPython. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 > in stage 1.0 failed 1 times, most recent failure: Lost task 3.0 in stage 1.0 > (TID 7, localhost, executor driver): > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File "/.../spark/python/pyspark/worker.py", line 320, in main > process() > File "/.../spark/python/pyspark/worker.py", line 315, in process > serializer.dump_stream(func(split_index, iterator), outfile) > File "/.../spark/python/pyspark/serializers.py", line 274, in dump_stream > batch = _create_batch(series, self._timezone) > File "/.../spark/python/pyspark/serializers.py", line 243, in _create_batch > arrs = [create_array(s, t) for s, t in series] > File "/.../spark/python/pyspark/serializers.py", line 241, in create_array > return pa.Array.from_pandas(s, mask=mask, type=t) > File "array.pxi", line 383, in pyarrow.lib.Array.from_pandas > File "array.pxi", line 177, in pyarrow.lib.array > File "error.pxi", line 77, in pyarrow.lib.check_status > File "error.pxi", line 77, in pyarrow.lib.check_status > ArrowInvalid: Error converting from Python objects to Decimal: Got Python > object of type NoneType but can only handle these types: decimal.Decimal > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org