[spark] branch master updated: [MINOR][PYSPARK][SQL][DOC] Fix rowsBetween doc in Window

gurwls223 Thu, 13 Jun 2019 17:57:18 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new c0297de  [MINOR][PYSPARK][SQL][DOC] Fix rowsBetween doc in Window
c0297de is described below

commit c0297dedd829a92cca920ab8983dab399f8f32d5
Author: Liang-Chi Hsieh <vii...@gmail.com>
AuthorDate: Fri Jun 14 09:56:37 2019 +0900

    [MINOR][PYSPARK][SQL][DOC] Fix rowsBetween doc in Window
    
    ## What changes were proposed in this pull request?
    
    I suspect that the doc of `rowsBetween` methods in Scala and PySpark looks 
wrong.
    Because:
    
    ```scala
    scala> val df = Seq((1, "a"), (2, "a"), (3, "a"), (4, "a"), (5, "a"), (6, 
"a")).toDF("id", "category")
    df: org.apache.spark.sql.DataFrame = [id: int, category: string]
    
    scala> val byCategoryOrderedById = 
Window.partitionBy('category).orderBy('id).rowsBetween(-1, 2)
    byCategoryOrderedById: org.apache.spark.sql.expressions.WindowSpec = 
org.apache.spark.sql.expressions.WindowSpec7f04de97
    
    scala> df.withColumn("sum", sum('id) over byCategoryOrderedById).show()
    +---+--------+---+
    | id|category|sum|
    +---+--------+---+
    |  1|       a|  6|              # sum from index 0 to (0 + 2): 1 + 2 + 3 = 6
    |  2|       a| 10|              # sum from index (1 - 1) to (1 + 2): 1 + 2 
+ 3 + 4 = 10
    |  3|       a| 14|
    |  4|       a| 18|
    |  5|       a| 15|
    |  6|       a| 11|
    +---+--------+---+
    ```
    
    So the frame (-1, 2) for row with index 5, as described in the doc, should 
range from index 4 to index 7.
    
    ## How was this patch tested?
    
    N/A, just doc change.
    
    Closes #24864 from viirya/window-spec-doc.
    
    Authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    Signed-off-by: HyukjinKwon <gurwls...@apache.org>
---
 python/pyspark/sql/window.py                                          | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/window.py b/python/pyspark/sql/window.py
index 65c3ff5..9e02758a 100644
--- a/python/pyspark/sql/window.py
+++ b/python/pyspark/sql/window.py
@@ -101,7 +101,7 @@ class Window(object):
         An offset indicates the number of rows above or below the current row, 
the frame for the
         current row starts or ends. For instance, given a row based sliding 
frame with a lower bound
         offset of -1 and a upper bound offset of +2. The frame for row with 
index 5 would range from
-        index 4 to index 6.
+        index 4 to index 7.
 
         >>> from pyspark.sql import Window
         >>> from pyspark.sql import functions as func
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala
index 9a4ad44..cd1c198 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala
@@ -129,7 +129,7 @@ object Window {
    * An offset indicates the number of rows above or below the current row, 
the frame for the
    * current row starts or ends. For instance, given a row based sliding frame 
with a lower bound
    * offset of -1 and a upper bound offset of +2. The frame for row with index 
5 would range from
-   * index 4 to index 6.
+   * index 4 to index 7.
    *
    * {{{
    *   import org.apache.spark.sql.expressions.Window


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][PYSPARK][SQL][DOC] Fix rowsBetween doc in Window

Reply via email to