This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 6865309604a [SPARK-40598][PS] Fix plotting features work properly with pandas 1.5.0 6865309604a is described below commit 6865309604a9986d902a6ff145f0855ee3fb7f8f Author: itholic <haejoon....@databricks.com> AuthorDate: Thu Oct 6 19:27:14 2022 -0700 [SPARK-40598][PS] Fix plotting features work properly with pandas 1.5.0 ### What changes were proposed in this pull request? This PR proposes to fix the plotting functions working properly with pandas 1.5.0. This includes two fixes: - Fix the `PandasOnSpark*Plot` to get name of plot in the string format properly. - Fix the default value of `subplots` parameter from `plot_frame` to match with latest pandas. (`None` -> `False`) ### Why are the changes needed? #### 1. get `_kind` from pandas class no longer possible. We're leverage the pandas plotting classes to implement for `matplotlib` implementation, and get the class name from pandas like: ```python >>> from pandas.plotting._matplotlib.core import AreaPlot >>> AreaPlot._kind 'area' ``` However, since pandas 1.5.0, they convert the member variable `_kind` into `property`, so we cannot bring the name of class properly from pandas class as below: ```python >>> from pandas.plotting._matplotlib.core import AreaPlot AreaPlot._kind >>> AreaPlot._kind <property object at 0x7fe520d749a0> ``` #### 2. `subplots` parameter no longer allow the type other than `Iterable` or `bool`. We internally set the default value for `subplots` as `None`, but from pandas 1.5.0 only allows `Iterable` or `bool`, so the plotting function is not work properly as below: ```python >>> psdf.plot(kind="bar") Traceback (most recent call last): ... ValueError: subplots should be a bool or an iterable ``` With this fixes, it work properly with pandas 1.5.0 as below: **<For Series and DataFrame plot>** **Before**: ```python >>> from pyspark.pandas.config import set_option >>> set_option("plotting.backend", "matplotlib") >>> import pyspark.pandas as ps >>> psdf = ps.range(10) >>> psdf.plot(kind="bar") Traceback (most recent call last): ... KeyError: 'bar' ``` **After**: ```python >>> from pyspark.pandas.config import set_option >>> set_option("plotting.backend", "matplotlib") >>> import pyspark.pandas as ps >>> psdf = ps.range(10) >>> psdf.plot(kind="bar") <AxesSubplot:> ``` **<For DataFrame plot>** **Before**: ```python >>> from pyspark.pandas.config import set_option >>> set_option("plotting.backend", "matplotlib") >>> import pyspark.pandas as ps >>> psdf = ps.range(10) >>> psdf.plot(kind="bar") Traceback (most recent call last): ... ValueError: subplots should be a bool or an iterable ``` **After**: ```python >>> from pyspark.pandas.config import set_option >>> set_option("plotting.backend", "matplotlib") >>> import pyspark.pandas as ps >>> psdf = ps.range(10) >>> psdf.plot(kind="bar") <AxesSubplot:> ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually tested with pandas 1.5.0. Closes #38033 from itholic/fix_plot_test. Authored-by: itholic <haejoon....@databricks.com> Signed-off-by: Xinrong Meng <xinr...@apache.org> --- python/pyspark/pandas/plot/matplotlib.py | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/python/pyspark/pandas/plot/matplotlib.py b/python/pyspark/pandas/plot/matplotlib.py index 6f542061297..02938c7f292 100644 --- a/python/pyspark/pandas/plot/matplotlib.py +++ b/python/pyspark/pandas/plot/matplotlib.py @@ -50,6 +50,8 @@ _all_kinds = PlotAccessor._all_kinds # type: ignore[attr-defined] class PandasOnSparkBarPlot(PandasBarPlot, TopNPlotBase): + _kind = "bar" + def __init__(self, data, **kwargs): super().__init__(self.get_top_n(data), **kwargs) @@ -59,6 +61,8 @@ class PandasOnSparkBarPlot(PandasBarPlot, TopNPlotBase): class PandasOnSparkBoxPlot(PandasBoxPlot, BoxPlotBase): + _kind = "box" + def boxplot( self, ax, @@ -354,6 +358,8 @@ class PandasOnSparkBoxPlot(PandasBoxPlot, BoxPlotBase): class PandasOnSparkHistPlot(PandasHistPlot, HistogramPlotBase): + _kind = "hist" + def _args_adjust(self): if is_list_like(self.bottom): self.bottom = np.array(self.bottom) @@ -413,6 +419,8 @@ class PandasOnSparkHistPlot(PandasHistPlot, HistogramPlotBase): class PandasOnSparkPiePlot(PandasPiePlot, TopNPlotBase): + _kind = "pie" + def __init__(self, data, **kwargs): super().__init__(self.get_top_n(data), **kwargs) @@ -422,6 +430,8 @@ class PandasOnSparkPiePlot(PandasPiePlot, TopNPlotBase): class PandasOnSparkAreaPlot(PandasAreaPlot, SampledPlotBase): + _kind = "area" + def __init__(self, data, **kwargs): super().__init__(self.get_sampled(data), **kwargs) @@ -431,6 +441,8 @@ class PandasOnSparkAreaPlot(PandasAreaPlot, SampledPlotBase): class PandasOnSparkLinePlot(PandasLinePlot, SampledPlotBase): + _kind = "line" + def __init__(self, data, **kwargs): super().__init__(self.get_sampled(data), **kwargs) @@ -440,6 +452,8 @@ class PandasOnSparkLinePlot(PandasLinePlot, SampledPlotBase): class PandasOnSparkBarhPlot(PandasBarhPlot, TopNPlotBase): + _kind = "barh" + def __init__(self, data, **kwargs): super().__init__(self.get_top_n(data), **kwargs) @@ -449,6 +463,8 @@ class PandasOnSparkBarhPlot(PandasBarhPlot, TopNPlotBase): class PandasOnSparkScatterPlot(PandasScatterPlot, TopNPlotBase): + _kind = "scatter" + def __init__(self, data, x, y, **kwargs): super().__init__(self.get_top_n(data), x, y, **kwargs) @@ -458,6 +474,8 @@ class PandasOnSparkScatterPlot(PandasScatterPlot, TopNPlotBase): class PandasOnSparkKdePlot(PandasKdePlot, KdePlotBase): + _kind = "kde" + def _compute_plot_data(self): self.data = KdePlotBase.prepare_kde_data(self.data) @@ -707,7 +725,7 @@ def plot_frame( y=None, kind="line", ax=None, - subplots=None, + subplots=False, sharex=None, sharey=False, layout=None, --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org