Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r159108800
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -39,13 +39,16 @@ private[spark] object PythonEvalType {
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r159108771
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r159108782
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
---
@@ -171,6 +171,7 @@ trait CheckAnalysis extends
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r159108798
--- Diff: python/pyspark/sql/tests.py ---
@@ -4052,6 +4066,323 @@ def test_unsupported_types(self):
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r159108773
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -273,7 +274,7 @@ abstract class SparkStrategies extends
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r159108765
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -92,8 +99,14 @@ object
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r159108762
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -215,3 +228,49 @@ object ExtractPythonUDFs
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158952743
--- Diff: python/pyspark/sql/tests.py ---
@@ -477,6 +502,7 @@ def test_udf_with_aggregate_function(self):
sel =
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158952610
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
---
@@ -171,6 +171,7 @@ trait CheckAnalysis extends
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158952133
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -273,7 +274,7 @@ abstract class SparkStrategies extends
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158951752
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158901221
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158912955
--- Diff: python/pyspark/sql/tests.py ---
@@ -4052,6 +4066,323 @@ def test_unsupported_types(self):
df.groupby('id').apply(f).collect()
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158913244
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -215,3 +228,49 @@ object ExtractPythonUDFs extends
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158907883
--- Diff: python/pyspark/sql/tests.py ---
@@ -477,6 +502,7 @@ def test_udf_with_aggregate_function(self):
sel =
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158912156
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -92,8 +99,14 @@ object ExtractPythonUDFFromAggregate
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158909734
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
---
@@ -171,6 +171,7 @@ trait CheckAnalysis extends
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158911077
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -273,7 +274,7 @@ abstract class SparkStrategies extends
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158902463
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -39,13 +39,16 @@ private[spark] object PythonEvalType {
val
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158872851
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158872825
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -48,29 +48,46 @@ object
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158872704
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -48,29 +48,46 @@ object
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158872655
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158423362
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158302645
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r158301996
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -48,29 +48,46 @@ object
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157948426
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -48,29 +48,46 @@ object
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157939292
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -48,29 +48,46 @@ object
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157944969
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157944622
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157938453
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -48,29 +48,46 @@ object
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157931925
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala
---
@@ -15,10 +15,9 @@
* limitations under the
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157931824
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157896109
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -113,6 +113,7 @@ object ExtractPythonUDFs
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157896062
--- Diff: python/pyspark/sql/group.py ---
@@ -89,8 +89,15 @@ def agg(self, *exprs):
else:
# Columns
assert
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157895967
--- Diff: python/pyspark/sql/group.py ---
@@ -89,8 +89,15 @@ def agg(self, *exprs):
else:
# Columns
assert
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157895765
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,135 @@
+/*
+ * Licensed to
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157891787
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -48,9 +48,26 @@ object
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157891697
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -48,9 +48,26 @@ object
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157891668
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala
---
@@ -32,7 +31,5 @@ case class PythonUDF(
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157891477
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -437,6 +437,37 @@ class RelationalGroupedDataset
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157891450
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -437,6 +437,37 @@ class RelationalGroupedDataset
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157891391
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/pythonLogicalOperators.scala
---
@@ -38,3 +38,13 @@ case class
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157891261
--- Diff: python/pyspark/sql/tests.py ---
@@ -4016,6 +4016,89 @@ def test_unsupported_types(self):
with
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157891322
--- Diff: python/pyspark/sql/udf.py ---
@@ -56,6 +56,10 @@ def _create_udf(f, returnType, evalType):
return udf_obj._wrapped()
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157891206
--- Diff: python/pyspark/sql/functions.py ---
@@ -2070,6 +2070,8 @@ class PandasUDFType(object):
GROUP_MAP =
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157825297
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157823488
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala
---
@@ -32,7 +31,5 @@ case class PythonUDF(
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r157823295
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala
---
@@ -15,10 +15,9 @@
* limitations under the
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r156037616
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -48,9 +48,26 @@ object ExtractPythonUDFFromAggregate
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r156028776
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala
---
@@ -32,7 +31,5 @@ case class PythonUDF(
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r156034167
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r156037157
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -48,9 +48,26 @@ object ExtractPythonUDFFromAggregate
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r156031375
--- Diff: python/pyspark/sql/tests.py ---
@@ -4016,6 +4016,124 @@ def test_unsupported_types(self):
with self.assertRaisesRegexp(Exception,
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r156038036
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala
---
@@ -15,10 +15,9 @@
* limitations under the
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r155159185
--- Diff: python/pyspark/sql/group.py ---
@@ -89,8 +89,15 @@ def agg(self, *exprs):
else:
# Columns
assert
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154809806
--- Diff: python/pyspark/sql/group.py ---
@@ -89,8 +89,15 @@ def agg(self, *exprs):
else:
# Columns
assert
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154782452
--- Diff: python/pyspark/sql/group.py ---
@@ -89,8 +89,15 @@ def agg(self, *exprs):
else:
# Columns
assert
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154644084
--- Diff: python/pyspark/sql/udf.py ---
@@ -56,6 +56,10 @@ def _create_udf(f, returnType, evalType):
return udf_obj._wrapped()
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154642230
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
---
@@ -0,0 +1,135 @@
+/*
+ * Licensed to
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154642902
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/pythonLogicalOperators.scala
---
@@ -38,3 +38,13 @@ case class
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154644620
--- Diff: python/pyspark/sql/tests.py ---
@@ -4016,6 +4016,89 @@ def test_unsupported_types(self):
with
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154644235
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -437,6 +437,37 @@ class RelationalGroupedDataset
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154644340
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -437,6 +437,37 @@ class RelationalGroupedDataset
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154616454
--- Diff: python/pyspark/sql/functions.py ---
@@ -2070,6 +2070,8 @@ class PandasUDFType(object):
GROUP_MAP =
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154615728
--- Diff: python/pyspark/sql/udf.py ---
@@ -56,6 +56,10 @@ def _create_udf(f, returnType, evalType):
return udf_obj._wrapped()
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154569899
--- Diff: python/pyspark/sql/group.py ---
@@ -89,8 +89,15 @@ def agg(self, *exprs):
else:
# Columns
assert
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154569953
--- Diff: python/pyspark/sql/group.py ---
@@ -89,8 +89,15 @@ def agg(self, *exprs):
else:
# Columns
assert
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154569884
--- Diff: python/pyspark/sql/group.py ---
@@ -89,8 +89,15 @@ def agg(self, *exprs):
else:
# Columns
assert
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19872#discussion_r154569177
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -113,6 +113,7 @@ object ExtractPythonUDFs extends
GitHub user icexelloss opened a pull request:
https://github.com/apache/spark/pull/19872
WIP: [SPARK-22274][PySpark] User-defined aggregation functions with pandas
udf
## What changes were proposed in this pull request?
Add support for pandas_udf in groupby().agg()
71 matches
Mail list logo