Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/23222
I'm not sure how this ended up being omitted. `TransposeWindowSuite` will
be fine since it creates a simple optimizer from this rule and a few others.
The new test add
Github user ptkool closed the pull request at:
https://github.com/apache/spark/pull/22445
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/22445
Branch 2.3 udf nullability
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/18906
It should throw a proper exception. I will make the necessary code changes.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/18906#discussion_r163828796
--- Diff: python/pyspark/sql/functions.py ---
@@ -2105,6 +2105,14 @@ def udf(f=None, returnType=StringType()):
>>> impo
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/18906
@HyukjinKwon @holdenk There is still an issue with the use of
`SparkSession.udf.register` that needs to be resolved.
For instance, the following will not work as expected:
```python
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/18424
@a10y Yes, I'm still tracking this.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comman
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/18906
@HyukjinKwon As requested, here are the related Scala API changes:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/18906
@holdenk I believe the changes in this PR match what's provided in the
scala API. Am I missing something?
---
-
To unsubs
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/18906
Here are the similar changes in the Scala API:
https://github.com/apache/spark/pull/17911
---
-
To unsubscribe, e-mail: reviews
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/19672
[SPARK-22456] Add support for dayofweek function
## What changes were proposed in this pull request?
This PR adds support for a new function called `dayofweek` that returns the
day of the week
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/19415
Branch 2.2 udf nullability
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain
Github user ptkool closed the pull request at:
https://github.com/apache/spark/pull/19415
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user ptkool closed the pull request at:
https://github.com/apache/spark/pull/19414
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/19414
Udf nullablity fixes
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how
Github user ptkool closed the pull request at:
https://github.com/apache/spark/pull/19209
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/19209
Branch 2.2 udf nullability
## What changes were proposed in this pull request?
When registering a Python UDF, a user may know whether the function can
return null values or not. PythonUDF
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/18906
@rxin
This PR isn't about performance at all. I realize Python UDFs do not
perform well and I also realize annotating Python UDFs with nullability is not
going to make any diffe
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/18906
@rxin We have several large systems with 100s of Spark jobs implemented in
Python and PySpark, and use Python UDFs due to lack of equivalent functionality
in Spark. I understand what your saying re
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/18906
@ueshin Thanks for commenting.
It's unfortunate that users find nullability confusing. If you're coming
from a SQL world, you should be quite familiar with nullability and nu
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/18906
[SPARK-21692] Add nullability support to PythonUDF.
## What changes were proposed in this pull request?
When registering a Python UDF, a user may know whether the function can
return null
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/18424
@rxin Yes, I have.
https://issues.apache.org/jira/browse/SPARK-21218?focusedCommentId=16064608&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16064608
--
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17899#discussion_r125013340
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -610,6 +611,25 @@ object CollapseWindow extends Rule
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17899#discussion_r125013283
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -610,6 +611,25 @@ object CollapseWindow extends Rule
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/18424
@a10y Yes. Please have a look at my comments in
https://issues.apache.org/jira/browse/SPARK-21218.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/17708
@gatorsmile I will run a few more tests to determine if subexpression
elimination solves this issue.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/18424
[SPARK-21218] Add rule to convert IN predicate to equivalent Parquet filter.
## What changes were proposed in this pull request?
Add a new optimizer rule to convert an IN predicate to an
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/17899
@hvanhovell @gatorsmile Can you have another look at this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17899#discussion_r117241321
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala
---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17899#discussion_r117240910
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala
---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17899#discussion_r117240492
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala
---
@@ -423,4 +423,25 @@ class DataFrameWindowFunctionsSuite
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17899#discussion_r117238414
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -609,6 +610,19 @@ object CollapseWindow extends Rule
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/17899
[SPARK-20636] Add new optimization rule to flip adjacent Window expressions.
## What changes were proposed in this pull request?
Add new optimization rule to eliminate unnecessary shuffling
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/17764
Add new function isNotDistinctFrom.
## What changes were proposed in this pull request?
Expose the SPARK SQL <=> operator in PySpark as a column function.
## How was this
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/17648
@rxin Actually, @hvanhovell proposed the following rewrites which I think
are better:
```
some(cond) => max(cond) = true
every(cond) => min(cond) = true
```
---
I
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/17648
@rxin Ok. So you're proposing rewrites for these aggregates that look
something like this?
```
some(cond) => sum(cond) > 0
every(cond) => sum(not(cond)) = 0
```
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/17648
@rxin I'm not sure where you're going with your proposal. These are
aggregate functions, not scalar functions.
---
If your project is set up for it, you can reply to this email and have
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17708#discussion_r112549462
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
---
@@ -387,6 +387,13 @@ case class
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17708#discussion_r112522049
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1007,22 +1006,38 @@ object functions {
def map(cols: Column*): Column
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17708#discussion_r112522069
--- Diff: python/pyspark/sql/functions.py ---
@@ -466,6 +466,14 @@ def nanvl(col1, col2):
return Column(sc._jvm.functions.nanvl(_to_java_column(col1
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17708#discussion_r112521982
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1007,22 +1006,38 @@ object functions {
def map(cols: Column*): Column
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17708#discussion_r112521514
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
---
@@ -537,5 +537,10 @@ class PlanParserSuite extends
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/17708
Add new query hint NO_COLLAPSE.
## What changes were proposed in this pull request?
This PR proposes adding a new query hint called NO_COLLAPSE that can be
used to prevent adjacent
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17650#discussion_r112309456
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
---
@@ -160,4 +166,12 @@ class
Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17650#discussion_r111733014
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
---
@@ -160,4 +166,12 @@ class
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/17650
[SPARK-20350] Add optimization rules to apply Complementation Laws.
## What changes were proposed in this pull request?
Apply Complementation Laws during boolean expression simplification
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/17648
Moved this PR to a feature branch and lost comments. The original PR is
here: https://github.com/apache/spark/pull/17194
---
If your project is set up for it, you can reply to this email and have
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/17648
Every any aggregates
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how
Github user ptkool closed the pull request at:
https://github.com/apache/spark/pull/17194
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/17194
`every` and `any` are also part of the SQL standard, so most SQL users will
be familiar with them.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/17194
I think `every` and `any` are more intuitive, particularly if the operand
is a boolean expression. For example, `every(t1.c1 > t2.c2)` vs `max(t1.c1 >
t2.c2)`. Also, `every` and `any` return
Github user ptkool commented on the issue:
https://github.com/apache/spark/pull/17194
@hvanhovell I'm not following how `min` and `max` could be used.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
GitHub user ptkool opened a pull request:
https://github.com/apache/spark/pull/17194
Add new aggregates EVERY and ANY (SOME).
## What changes were proposed in this pull request?
This pull request implements the EVERY and ANY aggregates.
## How was this patch tested
53 matches
Mail list logo