This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
     new e24859be740 [SPARK-39253][DOCS][PYTHON][3.3] Improve PySpark API 
reference to be more readable
e24859be740 is described below

commit e24859be7407123018a07a23ec0a78e386bb7398
Author: itholic <haejoon....@databricks.com>
AuthorDate: Thu May 26 19:35:35 2022 +0900

    [SPARK-39253][DOCS][PYTHON][3.3] Improve PySpark API reference to be more 
readable
    
    ### What changes were proposed in this pull request?
    
    Hotfix https://github.com/apache/spark/pull/36647 for branch-3.3.
    
    ### Why are the changes needed?
    
    The improvement of document readability will also improve the usability for 
PySpark.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, now the documentation is categorized by its class or their own purpose 
more clearly as below:
    
    <img width="270" alt="Screen Shot 2022-05-24 at 1 50 23 PM" 
src="https://user-images.githubusercontent.com/44108233/169951517-f8b9cb72-7408-46d6-8cd7-15ae890a7a7f.png";>
    
    ### How was this patch tested?
    
    The existing test should cover.
    
    Closes #36685 from itholic/SPARK-39253-3.3.
    
    Authored-by: itholic <haejoon....@databricks.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/docs/source/reference/index.rst             |   2 +-
 python/docs/source/reference/pyspark.sql.rst       | 663 ---------------------
 .../reference/{index.rst => pyspark.sql/avro.rst}  |  25 +-
 .../{index.rst => pyspark.sql/catalog.rst}         |  49 +-
 .../{index.rst => pyspark.sql/column.rst}          |  59 +-
 .../{index.rst => pyspark.sql/configuration.rst}   |  20 +-
 .../{index.rst => pyspark.sql/core_classes.rst}    |  40 +-
 .../{index.rst => pyspark.sql/data_types.rst}      |  47 +-
 .../source/reference/pyspark.sql/dataframe.rst     | 133 +++++
 .../source/reference/pyspark.sql/functions.rst     | 343 +++++++++++
 .../{index.rst => pyspark.sql/grouping.rst}        |  39 +-
 .../source/reference/{ => pyspark.sql}/index.rst   |  36 +-
 python/docs/source/reference/pyspark.sql/io.rst    |  54 ++
 .../{index.rst => pyspark.sql/observation.rst}     |  24 +-
 .../reference/{index.rst => pyspark.sql/row.rst}   |  24 +-
 .../source/reference/pyspark.sql/spark_session.rst |  53 ++
 .../{index.rst => pyspark.sql/window.rst}          |  39 +-
 17 files changed, 790 insertions(+), 860 deletions(-)

diff --git a/python/docs/source/reference/index.rst 
b/python/docs/source/reference/index.rst
index f023b5a8c99..1d2db3f4a15 100644
--- a/python/docs/source/reference/index.rst
+++ b/python/docs/source/reference/index.rst
@@ -27,7 +27,7 @@ Pandas API on Spark follows the API specifications of pandas 
1.3.
 .. toctree::
    :maxdepth: 2
 
-   pyspark.sql
+   pyspark.sql/index
    pyspark.pandas/index
    pyspark.ss
    pyspark.ml
diff --git a/python/docs/source/reference/pyspark.sql.rst 
b/python/docs/source/reference/pyspark.sql.rst
deleted file mode 100644
index adc1958822e..00000000000
--- a/python/docs/source/reference/pyspark.sql.rst
+++ /dev/null
@@ -1,663 +0,0 @@
-..  Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-..    http://www.apache.org/licenses/LICENSE-2.0
-
-..  Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
-
-
-=========
-Spark SQL
-=========
-
-Core Classes
-------------
-
-.. currentmodule:: pyspark.sql
-
-.. autosummary::
-    :toctree: api/
-
-    SparkSession
-    Catalog
-    DataFrame
-    Column
-    Observation
-    Row
-    GroupedData
-    PandasCogroupedOps
-    DataFrameNaFunctions
-    DataFrameStatFunctions
-    Window
-
-
-Spark Session APIs
-------------------
-
-.. currentmodule:: pyspark.sql
-
-The entry point to programming Spark with the Dataset and DataFrame API.
-To create a Spark session, you should use ``SparkSession.builder`` attribute.
-See also :class:`SparkSession`.
-
-.. autosummary::
-    :toctree: api/
-
-    SparkSession.builder.appName
-    SparkSession.builder.config
-    SparkSession.builder.enableHiveSupport
-    SparkSession.builder.getOrCreate
-    SparkSession.builder.master
-    SparkSession.catalog
-    SparkSession.conf
-    SparkSession.createDataFrame
-    SparkSession.getActiveSession
-    SparkSession.newSession
-    SparkSession.range
-    SparkSession.read
-    SparkSession.readStream
-    SparkSession.sparkContext
-    SparkSession.sql
-    SparkSession.stop
-    SparkSession.streams
-    SparkSession.table
-    SparkSession.udf
-    SparkSession.version
-
-
-Configuration
--------------
-
-.. currentmodule:: pyspark.sql.conf
-
-.. autosummary::
-    :toctree: api/
-
-    RuntimeConfig
-
-
-Input and Output
-----------------
-
-.. currentmodule:: pyspark.sql
-
-.. autosummary::
-    :toctree: api/
-
-    DataFrameReader.csv
-    DataFrameReader.format
-    DataFrameReader.jdbc
-    DataFrameReader.json
-    DataFrameReader.load
-    DataFrameReader.option
-    DataFrameReader.options
-    DataFrameReader.orc
-    DataFrameReader.parquet
-    DataFrameReader.schema
-    DataFrameReader.table
-    DataFrameWriter.bucketBy
-    DataFrameWriter.csv
-    DataFrameWriter.format
-    DataFrameWriter.insertInto
-    DataFrameWriter.jdbc
-    DataFrameWriter.json
-    DataFrameWriter.mode
-    DataFrameWriter.option
-    DataFrameWriter.options
-    DataFrameWriter.orc
-    DataFrameWriter.parquet
-    DataFrameWriter.partitionBy
-    DataFrameWriter.save
-    DataFrameWriter.saveAsTable
-    DataFrameWriter.sortBy
-    DataFrameWriter.text
-
-
-DataFrame APIs
---------------
-
-.. currentmodule:: pyspark.sql
-
-.. autosummary::
-    :toctree: api/
-
-    DataFrame.agg
-    DataFrame.alias
-    DataFrame.approxQuantile
-    DataFrame.cache
-    DataFrame.checkpoint
-    DataFrame.coalesce
-    DataFrame.colRegex
-    DataFrame.collect
-    DataFrame.columns
-    DataFrame.corr
-    DataFrame.count
-    DataFrame.cov
-    DataFrame.createGlobalTempView
-    DataFrame.createOrReplaceGlobalTempView
-    DataFrame.createOrReplaceTempView
-    DataFrame.createTempView
-    DataFrame.crossJoin
-    DataFrame.crosstab
-    DataFrame.cube
-    DataFrame.describe
-    DataFrame.distinct
-    DataFrame.drop
-    DataFrame.dropDuplicates
-    DataFrame.drop_duplicates
-    DataFrame.dropna
-    DataFrame.dtypes
-    DataFrame.exceptAll
-    DataFrame.explain
-    DataFrame.fillna
-    DataFrame.filter
-    DataFrame.first
-    DataFrame.foreach
-    DataFrame.foreachPartition
-    DataFrame.freqItems
-    DataFrame.groupBy
-    DataFrame.head
-    DataFrame.hint
-    DataFrame.inputFiles
-    DataFrame.intersect
-    DataFrame.intersectAll
-    DataFrame.isEmpty
-    DataFrame.isLocal
-    DataFrame.isStreaming
-    DataFrame.join
-    DataFrame.limit
-    DataFrame.localCheckpoint
-    DataFrame.mapInPandas
-    DataFrame.mapInArrow
-    DataFrame.na
-    DataFrame.observe
-    DataFrame.orderBy
-    DataFrame.persist
-    DataFrame.printSchema
-    DataFrame.randomSplit
-    DataFrame.rdd
-    DataFrame.registerTempTable
-    DataFrame.repartition
-    DataFrame.repartitionByRange
-    DataFrame.replace
-    DataFrame.rollup
-    DataFrame.sameSemantics
-    DataFrame.sample
-    DataFrame.sampleBy
-    DataFrame.schema
-    DataFrame.select
-    DataFrame.selectExpr
-    DataFrame.semanticHash
-    DataFrame.show
-    DataFrame.sort
-    DataFrame.sortWithinPartitions
-    DataFrame.sparkSession
-    DataFrame.stat
-    DataFrame.storageLevel
-    DataFrame.subtract
-    DataFrame.summary
-    DataFrame.tail
-    DataFrame.take
-    DataFrame.toDF
-    DataFrame.toJSON
-    DataFrame.toLocalIterator
-    DataFrame.toPandas
-    DataFrame.transform
-    DataFrame.union
-    DataFrame.unionAll
-    DataFrame.unionByName
-    DataFrame.unpersist
-    DataFrame.where
-    DataFrame.withColumn
-    DataFrame.withColumnRenamed
-    DataFrame.withWatermark
-    DataFrame.write
-    DataFrame.writeStream
-    DataFrame.writeTo
-    DataFrame.pandas_api
-    DataFrameNaFunctions.drop
-    DataFrameNaFunctions.fill
-    DataFrameNaFunctions.replace
-    DataFrameStatFunctions.approxQuantile
-    DataFrameStatFunctions.corr
-    DataFrameStatFunctions.cov
-    DataFrameStatFunctions.crosstab
-    DataFrameStatFunctions.freqItems
-    DataFrameStatFunctions.sampleBy
-
-Column APIs
------------
-
-.. currentmodule:: pyspark.sql
-
-.. autosummary::
-    :toctree: api/
-
-    Column.alias
-    Column.asc
-    Column.asc_nulls_first
-    Column.asc_nulls_last
-    Column.astype
-    Column.between
-    Column.bitwiseAND
-    Column.bitwiseOR
-    Column.bitwiseXOR
-    Column.cast
-    Column.contains
-    Column.desc
-    Column.desc_nulls_first
-    Column.desc_nulls_last
-    Column.dropFields
-    Column.endswith
-    Column.eqNullSafe
-    Column.getField
-    Column.getItem
-    Column.ilike
-    Column.isNotNull
-    Column.isNull
-    Column.isin
-    Column.like
-    Column.name
-    Column.otherwise
-    Column.over
-    Column.rlike
-    Column.startswith
-    Column.substr
-    Column.when
-    Column.withField
-
-Data Types
-----------
-
-.. currentmodule:: pyspark.sql.types
-
-.. autosummary::
-    :template: autosummary/class_with_docs.rst
-    :toctree: api/
-
-    ArrayType
-    BinaryType
-    BooleanType
-    ByteType
-    DataType
-    DateType
-    DecimalType
-    DoubleType
-    FloatType
-    IntegerType
-    LongType
-    MapType
-    NullType
-    ShortType
-    StringType
-    StructField
-    StructType
-    TimestampType
-    DayTimeIntervalType
-
-
-Observation
------------
-
-.. currentmodule:: pyspark.sql
-
-.. autosummary::
-    :toctree: api/
-
-    Observation.get
-
-
-Row
----
-
-.. currentmodule:: pyspark.sql
-
-.. autosummary::
-    :toctree: api/
-
-    Row.asDict
-
-
-Functions
----------
-
-.. currentmodule:: pyspark.sql.functions
-
-.. autosummary::
-    :toctree: api/
-
-    abs
-    acos
-    acosh
-    add_months
-    aggregate
-    approxCountDistinct
-    approx_count_distinct
-    array
-    array_contains
-    array_distinct
-    array_except
-    array_intersect
-    array_join
-    array_max
-    array_min
-    array_position
-    array_remove
-    array_repeat
-    array_sort
-    array_union
-    arrays_overlap
-    arrays_zip
-    asc
-    asc_nulls_first
-    asc_nulls_last
-    ascii
-    asin
-    asinh
-    assert_true
-    atan
-    atanh
-    atan2
-    avg
-    base64
-    bin
-    bit_length
-    bitwise_not
-    bitwiseNOT
-    broadcast
-    bround
-    bucket
-    cbrt
-    ceil
-    coalesce
-    col
-    collect_list
-    collect_set
-    column
-    concat
-    concat_ws
-    conv
-    corr
-    cos
-    cosh
-    cot
-    count
-    count_distinct
-    countDistinct
-    covar_pop
-    covar_samp
-    crc32
-    create_map
-    csc
-    cume_dist
-    current_date
-    current_timestamp
-    date_add
-    date_format
-    date_sub
-    date_trunc
-    datediff
-    dayofmonth
-    dayofweek
-    dayofyear
-    days
-    decode
-    degrees
-    dense_rank
-    desc
-    desc_nulls_first
-    desc_nulls_last
-    element_at
-    encode
-    exists
-    exp
-    explode
-    explode_outer
-    expm1
-    expr
-    factorial
-    filter
-    first
-    flatten
-    floor
-    forall
-    format_number
-    format_string
-    from_csv
-    from_json
-    from_unixtime
-    from_utc_timestamp
-    get_json_object
-    greatest
-    grouping
-    grouping_id
-    hash
-    hex
-    hour
-    hours
-    hypot
-    initcap
-    input_file_name
-    instr
-    isnan
-    isnull
-    json_tuple
-    kurtosis
-    lag
-    last
-    last_day
-    lead
-    least
-    length
-    levenshtein
-    lit
-    locate
-    log
-    log10
-    log1p
-    log2
-    lower
-    lpad
-    ltrim
-    make_date
-    map_concat
-    map_entries
-    map_filter
-    map_from_arrays
-    map_from_entries
-    map_keys
-    map_values
-    map_zip_with
-    max
-    max_by
-    md5
-    mean
-    min
-    min_by
-    minute
-    monotonically_increasing_id
-    month
-    months
-    months_between
-    nanvl
-    next_day
-    nth_value
-    ntile
-    octet_length
-    overlay
-    pandas_udf
-    percent_rank
-    percentile_approx
-    posexplode
-    posexplode_outer
-    pow
-    product
-    quarter
-    radians
-    raise_error
-    rand
-    randn
-    rank
-    regexp_extract
-    regexp_replace
-    repeat
-    reverse
-    rint
-    round
-    row_number
-    rpad
-    rtrim
-    schema_of_csv
-    schema_of_json
-    sec
-    second
-    sentences
-    sequence
-    session_window
-    sha1
-    sha2
-    shiftleft
-    shiftright
-    shiftrightunsigned
-    shuffle
-    signum
-    sin
-    sinh
-    size
-    skewness
-    slice
-    sort_array
-    soundex
-    spark_partition_id
-    split
-    sqrt
-    stddev
-    stddev_pop
-    stddev_samp
-    struct
-    substring
-    substring_index
-    sum
-    sum_distinct
-    sumDistinct
-    tan
-    tanh
-    timestamp_seconds
-    toDegrees
-    toRadians
-    to_csv
-    to_date
-    to_json
-    to_timestamp
-    to_utc_timestamp
-    transform
-    transform_keys
-    transform_values
-    translate
-    trim
-    trunc
-    udf
-    unbase64
-    unhex
-    unix_timestamp
-    upper
-    var_pop
-    var_samp
-    variance
-    weekofyear
-    when
-    window
-    xxhash64
-    year
-    years
-    zip_with
-
-
-.. currentmodule:: pyspark.sql.avro.functions
-
-.. autosummary::
-    :toctree: api/
-
-    from_avro
-    to_avro
-
-Window
-------
-
-.. currentmodule:: pyspark.sql
-
-.. autosummary::
-    :toctree: api/
-
-    Window.currentRow
-    Window.orderBy
-    Window.partitionBy
-    Window.rangeBetween
-    Window.rowsBetween
-    Window.unboundedFollowing
-    Window.unboundedPreceding
-    WindowSpec.orderBy
-    WindowSpec.partitionBy
-    WindowSpec.rangeBetween
-    WindowSpec.rowsBetween
-
-Grouping
---------
-
-.. currentmodule:: pyspark.sql
-
-.. autosummary::
-    :toctree: api/
-
-    GroupedData.agg
-    GroupedData.apply
-    GroupedData.applyInPandas
-    GroupedData.avg
-    GroupedData.cogroup
-    GroupedData.count
-    GroupedData.max
-    GroupedData.mean
-    GroupedData.min
-    GroupedData.pivot
-    GroupedData.sum
-    PandasCogroupedOps.applyInPandas
-
-Catalog APIs
-------------
-
-.. currentmodule:: pyspark.sql
-
-.. autosummary::
-    :toctree: api/
-
-    Catalog.cacheTable
-    Catalog.clearCache
-    Catalog.createExternalTable
-    Catalog.createTable
-    Catalog.currentDatabase
-    Catalog.databaseExists
-    Catalog.dropGlobalTempView
-    Catalog.dropTempView
-    Catalog.functionExists
-    Catalog.isCached
-    Catalog.listColumns
-    Catalog.listDatabases
-    Catalog.listFunctions
-    Catalog.listTables
-    Catalog.recoverPartitions
-    Catalog.refreshByPath
-    Catalog.refreshTable
-    Catalog.registerFunction
-    Catalog.setCurrentDatabase
-    Catalog.tableExists
-    Catalog.uncacheTable
diff --git a/python/docs/source/reference/index.rst 
b/python/docs/source/reference/pyspark.sql/avro.rst
similarity index 69%
copy from python/docs/source/reference/index.rst
copy to python/docs/source/reference/pyspark.sql/avro.rst
index f023b5a8c99..b6de88deef1 100644
--- a/python/docs/source/reference/index.rst
+++ b/python/docs/source/reference/pyspark.sql/avro.rst
@@ -16,22 +16,13 @@
     under the License.
 
 
-=============
-API Reference
-=============
+====
+Avro
+====
+.. currentmodule:: pyspark.sql.avro.functions
 
-This page lists an overview of all public PySpark modules, classes, functions 
and methods.
+.. autosummary::
+    :toctree: api/
 
-Pandas API on Spark follows the API specifications of pandas 1.3.
-
-.. toctree::
-   :maxdepth: 2
-
-   pyspark.sql
-   pyspark.pandas/index
-   pyspark.ss
-   pyspark.ml
-   pyspark.streaming
-   pyspark.mllib
-   pyspark
-   pyspark.resource
+    from_avro
+    to_avro
diff --git a/python/docs/source/reference/index.rst 
b/python/docs/source/reference/pyspark.sql/catalog.rst
similarity index 56%
copy from python/docs/source/reference/index.rst
copy to python/docs/source/reference/pyspark.sql/catalog.rst
index f023b5a8c99..8267e06410e 100644
--- a/python/docs/source/reference/index.rst
+++ b/python/docs/source/reference/pyspark.sql/catalog.rst
@@ -16,22 +16,33 @@
     under the License.
 
 
-=============
-API Reference
-=============
-
-This page lists an overview of all public PySpark modules, classes, functions 
and methods.
-
-Pandas API on Spark follows the API specifications of pandas 1.3.
-
-.. toctree::
-   :maxdepth: 2
-
-   pyspark.sql
-   pyspark.pandas/index
-   pyspark.ss
-   pyspark.ml
-   pyspark.streaming
-   pyspark.mllib
-   pyspark
-   pyspark.resource
+=======
+Catalog
+=======
+
+.. currentmodule:: pyspark.sql
+
+.. autosummary::
+    :toctree: api/
+
+    Catalog.cacheTable
+    Catalog.clearCache
+    Catalog.createExternalTable
+    Catalog.createTable
+    Catalog.currentDatabase
+    Catalog.databaseExists
+    Catalog.dropGlobalTempView
+    Catalog.dropTempView
+    Catalog.functionExists
+    Catalog.isCached
+    Catalog.listColumns
+    Catalog.listDatabases
+    Catalog.listFunctions
+    Catalog.listTables
+    Catalog.recoverPartitions
+    Catalog.refreshByPath
+    Catalog.refreshTable
+    Catalog.registerFunction
+    Catalog.setCurrentDatabase
+    Catalog.tableExists
+    Catalog.uncacheTable
diff --git a/python/docs/source/reference/index.rst 
b/python/docs/source/reference/pyspark.sql/column.rst
similarity index 53%
copy from python/docs/source/reference/index.rst
copy to python/docs/source/reference/pyspark.sql/column.rst
index f023b5a8c99..b5f39d299c1 100644
--- a/python/docs/source/reference/index.rst
+++ b/python/docs/source/reference/pyspark.sql/column.rst
@@ -16,22 +16,43 @@
     under the License.
 
 
-=============
-API Reference
-=============
-
-This page lists an overview of all public PySpark modules, classes, functions 
and methods.
-
-Pandas API on Spark follows the API specifications of pandas 1.3.
-
-.. toctree::
-   :maxdepth: 2
-
-   pyspark.sql
-   pyspark.pandas/index
-   pyspark.ss
-   pyspark.ml
-   pyspark.streaming
-   pyspark.mllib
-   pyspark
-   pyspark.resource
+======
+Column
+======
+.. currentmodule:: pyspark.sql
+
+.. autosummary::
+    :toctree: api/
+
+    Column.alias
+    Column.asc
+    Column.asc_nulls_first
+    Column.asc_nulls_last
+    Column.astype
+    Column.between
+    Column.bitwiseAND
+    Column.bitwiseOR
+    Column.bitwiseXOR
+    Column.cast
+    Column.contains
+    Column.desc
+    Column.desc_nulls_first
+    Column.desc_nulls_last
+    Column.dropFields
+    Column.endswith
+    Column.eqNullSafe
+    Column.getField
+    Column.getItem
+    Column.ilike
+    Column.isNotNull
+    Column.isNull
+    Column.isin
+    Column.like
+    Column.name
+    Column.otherwise
+    Column.over
+    Column.rlike
+    Column.startswith
+    Column.substr
+    Column.when
+    Column.withField
diff --git a/python/docs/source/reference/index.rst 
b/python/docs/source/reference/pyspark.sql/configuration.rst
similarity index 71%
copy from python/docs/source/reference/index.rst
copy to python/docs/source/reference/pyspark.sql/configuration.rst
index f023b5a8c99..7a5c10400de 100644
--- a/python/docs/source/reference/index.rst
+++ b/python/docs/source/reference/pyspark.sql/configuration.rst
@@ -17,21 +17,11 @@
 
 
 =============
-API Reference
+Configuration
 =============
+.. currentmodule:: pyspark.sql.conf
 
-This page lists an overview of all public PySpark modules, classes, functions 
and methods.
+.. autosummary::
+    :toctree: api/
 
-Pandas API on Spark follows the API specifications of pandas 1.3.
-
-.. toctree::
-   :maxdepth: 2
-
-   pyspark.sql
-   pyspark.pandas/index
-   pyspark.ss
-   pyspark.ml
-   pyspark.streaming
-   pyspark.mllib
-   pyspark
-   pyspark.resource
+    RuntimeConfig
diff --git a/python/docs/source/reference/index.rst 
b/python/docs/source/reference/pyspark.sql/core_classes.rst
similarity index 69%
copy from python/docs/source/reference/index.rst
copy to python/docs/source/reference/pyspark.sql/core_classes.rst
index f023b5a8c99..72f9ca122a9 100644
--- a/python/docs/source/reference/index.rst
+++ b/python/docs/source/reference/pyspark.sql/core_classes.rst
@@ -16,22 +16,24 @@
     under the License.
 
 
-=============
-API Reference
-=============
-
-This page lists an overview of all public PySpark modules, classes, functions 
and methods.
-
-Pandas API on Spark follows the API specifications of pandas 1.3.
-
-.. toctree::
-   :maxdepth: 2
-
-   pyspark.sql
-   pyspark.pandas/index
-   pyspark.ss
-   pyspark.ml
-   pyspark.streaming
-   pyspark.mllib
-   pyspark
-   pyspark.resource
+============
+Core Classes
+============
+.. currentmodule:: pyspark.sql
+
+.. autosummary::
+    :toctree: api/
+
+    SparkSession
+    Catalog
+    DataFrame
+    Column
+    Observation
+    Row
+    GroupedData
+    PandasCogroupedOps
+    DataFrameNaFunctions
+    DataFrameStatFunctions
+    Window
+    DataFrameReader
+    DataFrameWriter
diff --git a/python/docs/source/reference/index.rst 
b/python/docs/source/reference/pyspark.sql/data_types.rst
similarity index 65%
copy from python/docs/source/reference/index.rst
copy to python/docs/source/reference/pyspark.sql/data_types.rst
index f023b5a8c99..d146c640477 100644
--- a/python/docs/source/reference/index.rst
+++ b/python/docs/source/reference/pyspark.sql/data_types.rst
@@ -16,22 +16,31 @@
     under the License.
 
 
-=============
-API Reference
-=============
-
-This page lists an overview of all public PySpark modules, classes, functions 
and methods.
-
-Pandas API on Spark follows the API specifications of pandas 1.3.
-
-.. toctree::
-   :maxdepth: 2
-
-   pyspark.sql
-   pyspark.pandas/index
-   pyspark.ss
-   pyspark.ml
-   pyspark.streaming
-   pyspark.mllib
-   pyspark
-   pyspark.resource
+==========
+Data Types
+==========
+.. currentmodule:: pyspark.sql.types
+
+.. autosummary::
+    :template: autosummary/class_with_docs.rst
+    :toctree: api/
+
+    ArrayType
+    BinaryType
+    BooleanType
+    ByteType
+    DataType
+    DateType
+    DecimalType
+    DoubleType
+    FloatType
+    IntegerType
+    LongType
+    MapType
+    NullType
+    ShortType
+    StringType
+    StructField
+    StructType
+    TimestampType
+    DayTimeIntervalType
diff --git a/python/docs/source/reference/pyspark.sql/dataframe.rst 
b/python/docs/source/reference/pyspark.sql/dataframe.rst
new file mode 100644
index 00000000000..5b6e704ba48
--- /dev/null
+++ b/python/docs/source/reference/pyspark.sql/dataframe.rst
@@ -0,0 +1,133 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+=========
+DataFrame
+=========
+
+.. currentmodule:: pyspark.sql
+
+.. autosummary::
+    :toctree: api/
+
+    DataFrame.agg
+    DataFrame.alias
+    DataFrame.approxQuantile
+    DataFrame.cache
+    DataFrame.checkpoint
+    DataFrame.coalesce
+    DataFrame.colRegex
+    DataFrame.collect
+    DataFrame.columns
+    DataFrame.corr
+    DataFrame.count
+    DataFrame.cov
+    DataFrame.createGlobalTempView
+    DataFrame.createOrReplaceGlobalTempView
+    DataFrame.createOrReplaceTempView
+    DataFrame.createTempView
+    DataFrame.crossJoin
+    DataFrame.crosstab
+    DataFrame.cube
+    DataFrame.describe
+    DataFrame.distinct
+    DataFrame.drop
+    DataFrame.dropDuplicates
+    DataFrame.drop_duplicates
+    DataFrame.dropna
+    DataFrame.dtypes
+    DataFrame.exceptAll
+    DataFrame.explain
+    DataFrame.fillna
+    DataFrame.filter
+    DataFrame.first
+    DataFrame.foreach
+    DataFrame.foreachPartition
+    DataFrame.freqItems
+    DataFrame.groupBy
+    DataFrame.head
+    DataFrame.hint
+    DataFrame.inputFiles
+    DataFrame.intersect
+    DataFrame.intersectAll
+    DataFrame.isEmpty
+    DataFrame.isLocal
+    DataFrame.isStreaming
+    DataFrame.join
+    DataFrame.limit
+    DataFrame.localCheckpoint
+    DataFrame.mapInPandas
+    DataFrame.mapInArrow
+    DataFrame.na
+    DataFrame.observe
+    DataFrame.orderBy
+    DataFrame.persist
+    DataFrame.printSchema
+    DataFrame.randomSplit
+    DataFrame.rdd
+    DataFrame.registerTempTable
+    DataFrame.repartition
+    DataFrame.repartitionByRange
+    DataFrame.replace
+    DataFrame.rollup
+    DataFrame.sameSemantics
+    DataFrame.sample
+    DataFrame.sampleBy
+    DataFrame.schema
+    DataFrame.select
+    DataFrame.selectExpr
+    DataFrame.semanticHash
+    DataFrame.show
+    DataFrame.sort
+    DataFrame.sortWithinPartitions
+    DataFrame.sparkSession
+    DataFrame.stat
+    DataFrame.storageLevel
+    DataFrame.subtract
+    DataFrame.summary
+    DataFrame.tail
+    DataFrame.take
+    DataFrame.toDF
+    DataFrame.toJSON
+    DataFrame.toLocalIterator
+    DataFrame.toPandas
+    DataFrame.to_pandas_on_spark
+    DataFrame.transform
+    DataFrame.union
+    DataFrame.unionAll
+    DataFrame.unionByName
+    DataFrame.unpersist
+    DataFrame.where
+    DataFrame.withColumn
+    DataFrame.withColumns
+    DataFrame.withColumnRenamed
+    DataFrame.withMetadata
+    DataFrame.withWatermark
+    DataFrame.write
+    DataFrame.writeStream
+    DataFrame.writeTo
+    DataFrame.pandas_api
+    DataFrameNaFunctions.drop
+    DataFrameNaFunctions.fill
+    DataFrameNaFunctions.replace
+    DataFrameStatFunctions.approxQuantile
+    DataFrameStatFunctions.corr
+    DataFrameStatFunctions.cov
+    DataFrameStatFunctions.crosstab
+    DataFrameStatFunctions.freqItems
+    DataFrameStatFunctions.sampleBy
diff --git a/python/docs/source/reference/pyspark.sql/functions.rst 
b/python/docs/source/reference/pyspark.sql/functions.rst
new file mode 100644
index 00000000000..390d7d768ca
--- /dev/null
+++ b/python/docs/source/reference/pyspark.sql/functions.rst
@@ -0,0 +1,343 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+=========
+Functions
+=========
+.. currentmodule:: pyspark.sql.functions
+
+Normal Functions
+----------------
+.. autosummary::
+    :toctree: api/
+
+    col
+    column
+    create_map
+    lit
+    array
+    map_from_arrays
+    broadcast
+    coalesce
+    input_file_name
+    isnan
+    isnull
+    monotonically_increasing_id
+    nanvl
+    rand
+    randn
+    spark_partition_id
+    struct
+    when
+    bitwise_not
+    bitwiseNOT
+    expr
+    greatest
+    least
+
+
+Math Functions
+--------------
+.. autosummary::
+    :toctree: api/
+
+    sqrt
+    abs
+    acos
+    acosh
+    asin
+    asinh
+    atan
+    atanh
+    atan2
+    bin
+    cbrt
+    ceil
+    conv
+    cos
+    cosh
+    cot
+    csc
+    exp
+    expm1
+    factorial
+    floor
+    hex
+    unhex
+    hypot
+    log
+    log10
+    log1p
+    log2
+    pow
+    rint
+    round
+    bround
+    sec
+    shiftleft
+    shiftright
+    shiftrightunsigned
+    signum
+    sin
+    sinh
+    tan
+    tanh
+    toDegrees
+    degrees
+    toRadians
+    radians
+
+
+Datetime Functions
+------------------
+.. autosummary::
+    :toctree: api/
+
+    add_months
+    current_date
+    current_timestamp
+    date_add
+    date_format
+    date_sub
+    date_trunc
+    datediff
+    dayofmonth
+    dayofweek
+    dayofyear
+    second
+    weekofyear
+    year
+    quarter
+    month
+    last_day
+    minute
+    months_between
+    next_day
+    hour
+    make_date
+    from_unixtime
+    unix_timestamp
+    to_timestamp
+    to_date
+    trunc
+    from_utc_timestamp
+    to_utc_timestamp
+    window
+    session_window
+    timestamp_seconds
+
+
+Collection Functions
+--------------------
+.. autosummary::
+    :toctree: api/
+
+    array_contains
+    arrays_overlap
+    slice
+    array_join
+    concat
+    array_position
+    element_at
+    array_sort
+    array_remove
+    array_distinct
+    array_intersect
+    array_union
+    array_except
+    transform
+    exists
+    forall
+    filter
+    aggregate
+    zip_with
+    transform_keys
+    transform_values
+    map_filter
+    map_zip_with
+    explode
+    explode_outer
+    posexplode
+    posexplode_outer
+    get_json_object
+    json_tuple
+    from_json
+    schema_of_json
+    to_json
+    size
+    sort_array
+    array_max
+    array_min
+    shuffle
+    reverse
+    flatten
+    sequence
+    array_repeat
+    map_keys
+    map_values
+    map_entries
+    map_from_entries
+    arrays_zip
+    map_concat
+    from_csv
+    schema_of_csv
+    to_csv
+
+
+Partition Transformation Functions
+----------------------------------
+.. autosummary::
+    :toctree: api/
+
+    years
+    months
+    days
+    hours
+    bucket
+
+
+Aggregate Functions
+-------------------
+.. autosummary::
+    :toctree: api/
+
+    approxCountDistinct
+    approx_count_distinct
+    avg
+    collect_list
+    collect_set
+    corr
+    count
+    count_distinct
+    countDistinct
+    covar_pop
+    covar_samp
+    first
+    grouping
+    grouping_id
+    kurtosis
+    last
+    max
+    max_by
+    mean
+    min
+    min_by
+    percentile_approx
+    product
+    skewness
+    stddev
+    stddev_pop
+    stddev_samp
+    sum
+    sum_distinct
+    sumDistinct
+    var_pop
+    var_samp
+    variance
+
+
+Window Functions
+----------------
+.. autosummary::
+    :toctree: api/
+
+    cume_dist
+    dense_rank
+    lag
+    lead
+    nth_value
+    ntile
+    percent_rank
+    rank
+    row_number
+
+
+Sort Functions
+--------------
+.. autosummary::
+    :toctree: api/
+
+    asc
+    asc_nulls_first
+    asc_nulls_last
+    desc
+    desc_nulls_first
+    desc_nulls_last
+
+
+String Functions
+----------------
+.. autosummary::
+    :toctree: api/
+
+    ascii
+    base64
+    bit_length
+    concat_ws
+    decode
+    encode
+    format_number
+    format_string
+    initcap
+    instr
+    length
+    lower
+    levenshtein
+    locate
+    lpad
+    ltrim
+    octet_length
+    regexp_extract
+    regexp_replace
+    unbase64
+    rpad
+    repeat
+    rtrim
+    soundex
+    split
+    substring
+    substring_index
+    overlay
+    sentences
+    translate
+    trim
+    upper
+
+
+UDF
+---
+.. autosummary::
+    :toctree: api/
+
+    pandas_udf
+    udf
+
+Misc Functions
+--------------
+.. autosummary::
+    :toctree: api/
+
+    md5
+    sha1
+    sha2
+    crc32
+    hash
+    xxhash64
+    assert_true
+    raise_error
+
diff --git a/python/docs/source/reference/index.rst 
b/python/docs/source/reference/pyspark.sql/grouping.rst
similarity index 68%
copy from python/docs/source/reference/index.rst
copy to python/docs/source/reference/pyspark.sql/grouping.rst
index f023b5a8c99..459ef572756 100644
--- a/python/docs/source/reference/index.rst
+++ b/python/docs/source/reference/pyspark.sql/grouping.rst
@@ -16,22 +16,23 @@
     under the License.
 
 
-=============
-API Reference
-=============
-
-This page lists an overview of all public PySpark modules, classes, functions 
and methods.
-
-Pandas API on Spark follows the API specifications of pandas 1.3.
-
-.. toctree::
-   :maxdepth: 2
-
-   pyspark.sql
-   pyspark.pandas/index
-   pyspark.ss
-   pyspark.ml
-   pyspark.streaming
-   pyspark.mllib
-   pyspark
-   pyspark.resource
+========
+Grouping
+========
+.. currentmodule:: pyspark.sql
+
+.. autosummary::
+    :toctree: api/
+
+    GroupedData.agg
+    GroupedData.apply
+    GroupedData.applyInPandas
+    GroupedData.avg
+    GroupedData.cogroup
+    GroupedData.count
+    GroupedData.max
+    GroupedData.mean
+    GroupedData.min
+    GroupedData.pivot
+    GroupedData.sum
+    PandasCogroupedOps.applyInPandas
diff --git a/python/docs/source/reference/index.rst 
b/python/docs/source/reference/pyspark.sql/index.rst
similarity index 70%
copy from python/docs/source/reference/index.rst
copy to python/docs/source/reference/pyspark.sql/index.rst
index f023b5a8c99..a8b52f4a1b5 100644
--- a/python/docs/source/reference/index.rst
+++ b/python/docs/source/reference/pyspark.sql/index.rst
@@ -16,22 +16,26 @@
     under the License.
 
 
-=============
-API Reference
-=============
+=========
+Spark SQL
+=========
 
-This page lists an overview of all public PySpark modules, classes, functions 
and methods.
-
-Pandas API on Spark follows the API specifications of pandas 1.3.
+This page gives an overview of all public Spark SQL API.
 
 .. toctree::
-   :maxdepth: 2
-
-   pyspark.sql
-   pyspark.pandas/index
-   pyspark.ss
-   pyspark.ml
-   pyspark.streaming
-   pyspark.mllib
-   pyspark
-   pyspark.resource
+    :maxdepth: 2
+
+    core_classes
+    spark_session
+    configuration
+    io
+    dataframe
+    column
+    data_types
+    row
+    functions
+    window
+    grouping
+    catalog
+    observation
+    avro
diff --git a/python/docs/source/reference/pyspark.sql/io.rst 
b/python/docs/source/reference/pyspark.sql/io.rst
new file mode 100644
index 00000000000..52e4593eead
--- /dev/null
+++ b/python/docs/source/reference/pyspark.sql/io.rst
@@ -0,0 +1,54 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+============
+Input/Output
+============
+.. currentmodule:: pyspark.sql
+
+.. autosummary::
+    :toctree: api/
+
+    DataFrameReader.csv
+    DataFrameReader.format
+    DataFrameReader.jdbc
+    DataFrameReader.json
+    DataFrameReader.load
+    DataFrameReader.option
+    DataFrameReader.options
+    DataFrameReader.orc
+    DataFrameReader.parquet
+    DataFrameReader.schema
+    DataFrameReader.table
+    DataFrameReader.text
+    DataFrameWriter.bucketBy
+    DataFrameWriter.csv
+    DataFrameWriter.format
+    DataFrameWriter.insertInto
+    DataFrameWriter.jdbc
+    DataFrameWriter.json
+    DataFrameWriter.mode
+    DataFrameWriter.option
+    DataFrameWriter.options
+    DataFrameWriter.orc
+    DataFrameWriter.parquet
+    DataFrameWriter.partitionBy
+    DataFrameWriter.save
+    DataFrameWriter.saveAsTable
+    DataFrameWriter.sortBy
+    DataFrameWriter.text
diff --git a/python/docs/source/reference/index.rst 
b/python/docs/source/reference/pyspark.sql/observation.rst
similarity index 69%
copy from python/docs/source/reference/index.rst
copy to python/docs/source/reference/pyspark.sql/observation.rst
index f023b5a8c99..52867eda109 100644
--- a/python/docs/source/reference/index.rst
+++ b/python/docs/source/reference/pyspark.sql/observation.rst
@@ -16,22 +16,12 @@
     under the License.
 
 
-=============
-API Reference
-=============
+===========
+Observation
+===========
+.. currentmodule:: pyspark.sql
 
-This page lists an overview of all public PySpark modules, classes, functions 
and methods.
+.. autosummary::
+    :toctree: api/
 
-Pandas API on Spark follows the API specifications of pandas 1.3.
-
-.. toctree::
-   :maxdepth: 2
-
-   pyspark.sql
-   pyspark.pandas/index
-   pyspark.ss
-   pyspark.ml
-   pyspark.streaming
-   pyspark.mllib
-   pyspark
-   pyspark.resource
+    Observation.get
diff --git a/python/docs/source/reference/index.rst 
b/python/docs/source/reference/pyspark.sql/row.rst
similarity index 69%
copy from python/docs/source/reference/index.rst
copy to python/docs/source/reference/pyspark.sql/row.rst
index f023b5a8c99..1234b8d92ae 100644
--- a/python/docs/source/reference/index.rst
+++ b/python/docs/source/reference/pyspark.sql/row.rst
@@ -16,22 +16,12 @@
     under the License.
 
 
-=============
-API Reference
-=============
+===
+Row
+===
+.. currentmodule:: pyspark.sql
 
-This page lists an overview of all public PySpark modules, classes, functions 
and methods.
+.. autosummary::
+    :toctree: api/
 
-Pandas API on Spark follows the API specifications of pandas 1.3.
-
-.. toctree::
-   :maxdepth: 2
-
-   pyspark.sql
-   pyspark.pandas/index
-   pyspark.ss
-   pyspark.ml
-   pyspark.streaming
-   pyspark.mllib
-   pyspark
-   pyspark.resource
+    Row.asDict
diff --git a/python/docs/source/reference/pyspark.sql/spark_session.rst 
b/python/docs/source/reference/pyspark.sql/spark_session.rst
new file mode 100644
index 00000000000..d4fb7270a77
--- /dev/null
+++ b/python/docs/source/reference/pyspark.sql/spark_session.rst
@@ -0,0 +1,53 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+=============
+Spark Session
+=============
+.. currentmodule:: pyspark.sql
+
+.. autosummary::
+    :toctree: api/
+
+The entry point to programming Spark with the Dataset and DataFrame API.
+To create a Spark session, you should use ``SparkSession.builder`` attribute.
+See also :class:`SparkSession`.
+
+.. autosummary::
+    :toctree: api/
+
+    SparkSession.builder.appName
+    SparkSession.builder.config
+    SparkSession.builder.enableHiveSupport
+    SparkSession.builder.getOrCreate
+    SparkSession.builder.master
+    SparkSession.catalog
+    SparkSession.conf
+    SparkSession.createDataFrame
+    SparkSession.getActiveSession
+    SparkSession.newSession
+    SparkSession.range
+    SparkSession.read
+    SparkSession.readStream
+    SparkSession.sparkContext
+    SparkSession.sql
+    SparkSession.stop
+    SparkSession.streams
+    SparkSession.table
+    SparkSession.udf
+    SparkSession.version
diff --git a/python/docs/source/reference/index.rst 
b/python/docs/source/reference/pyspark.sql/window.rst
similarity index 69%
copy from python/docs/source/reference/index.rst
copy to python/docs/source/reference/pyspark.sql/window.rst
index f023b5a8c99..3625164d0a0 100644
--- a/python/docs/source/reference/index.rst
+++ b/python/docs/source/reference/pyspark.sql/window.rst
@@ -16,22 +16,23 @@
     under the License.
 
 
-=============
-API Reference
-=============
-
-This page lists an overview of all public PySpark modules, classes, functions 
and methods.
-
-Pandas API on Spark follows the API specifications of pandas 1.3.
-
-.. toctree::
-   :maxdepth: 2
-
-   pyspark.sql
-   pyspark.pandas/index
-   pyspark.ss
-   pyspark.ml
-   pyspark.streaming
-   pyspark.mllib
-   pyspark
-   pyspark.resource
+======
+Window
+======
+
+.. currentmodule:: pyspark.sql
+
+.. autosummary::
+    :toctree: api/
+
+    Window.currentRow
+    Window.orderBy
+    Window.partitionBy
+    Window.rangeBetween
+    Window.rowsBetween
+    Window.unboundedFollowing
+    Window.unboundedPreceding
+    WindowSpec.orderBy
+    WindowSpec.partitionBy
+    WindowSpec.rangeBetween
+    WindowSpec.rowsBetween


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to