Repository: spark Updated Branches: refs/heads/master dbd492b7e -> 11a849b3a
[SPARK-22370][SQL][PYSPARK][FOLLOW-UP] Fix a test failure when xmlrunner is installed. ## What changes were proposed in this pull request? This is a follow-up pr of #19587. If `xmlrunner` is installed, `VectorizedUDFTests.test_vectorized_udf_check_config` fails by the following error because the `self` which is a subclass of `unittest.TestCase` in the UDF `check_records_per_batch` can't be pickled anymore. ``` PicklingError: Cannot pickle files that are not opened for reading: w ``` This changes the UDF not to refer the `self`. ## How was this patch tested? Tested locally. Author: Takuya UESHIN <ues...@databricks.com> Closes #20115 from ueshin/issues/SPARK-22370_fup1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/11a849b3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/11a849b3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/11a849b3 Branch: refs/heads/master Commit: 11a849b3a7b3d03c48d3e17c8a721acedfd89285 Parents: dbd492b Author: Takuya UESHIN <ues...@databricks.com> Authored: Fri Dec 29 23:04:28 2017 +0900 Committer: hyukjinkwon <gurwls...@gmail.com> Committed: Fri Dec 29 23:04:28 2017 +0900 ---------------------------------------------------------------------- python/pyspark/sql/tests.py | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/11a849b3/python/pyspark/sql/tests.py ---------------------------------------------------------------------- diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py index 3ef1522..1c34c89 100644 --- a/python/pyspark/sql/tests.py +++ b/python/pyspark/sql/tests.py @@ -3825,6 +3825,7 @@ class VectorizedUDFTests(ReusedSQLTestCase): def test_vectorized_udf_check_config(self): from pyspark.sql.functions import pandas_udf, col + import pandas as pd orig_value = self.spark.conf.get("spark.sql.execution.arrow.maxRecordsPerBatch", None) self.spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", 3) try: @@ -3832,11 +3833,11 @@ class VectorizedUDFTests(ReusedSQLTestCase): @pandas_udf(returnType=LongType()) def check_records_per_batch(x): - self.assertTrue(x.size <= 3) - return x + return pd.Series(x.size).repeat(x.size) - result = df.select(check_records_per_batch(col("id"))) - self.assertEqual(df.collect(), result.collect()) + result = df.select(check_records_per_batch(col("id"))).collect() + for (r,) in result: + self.assertTrue(r <= 3) finally: if orig_value is None: self.spark.conf.unset("spark.sql.execution.arrow.maxRecordsPerBatch") --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org