Li Jin created SPARK-33057: ------------------------------ Summary: Cannot use filter with window operations Key: SPARK-33057 URL: https://issues.apache.org/jira/browse/SPARK-33057 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1 Reporter: Li Jin
Current, trying to use filter with a window operations will fail: {code:java} df = spark.range(100) win = Window.partitionBy().orderBy('id') df.filter(F.rank().over(win) > 10).show() {code} Error: {code:java} Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/icexelloss/opt/miniconda3/envs/ibis-dev-spark-3/lib/python3.8/site-packages/pyspark/sql/dataframe.py", line 1461, in filter jdf = self._jdf.filter(condition._jc) File "/Users/icexelloss/opt/miniconda3/envs/ibis-dev-spark-3/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__ File "/Users/icexelloss/opt/miniconda3/envs/ibis-dev-spark-3/lib/python3.8/site-packages/pyspark/sql/utils.py", line 134, in deco raise_from(converted) File "<string>", line 3, in raise_from pyspark.sql.utils.AnalysisException: It is not allowed to use window functions inside WHERE clause;{code} Although the code is same as: {code:java} df = spark.range(100) win = Window.partitionBy().orderBy('id') df = df.withColumn('rank', F.rank().over(win)) df = df[df['rank'] > 10] df = df.drop('rank'){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org