[ https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843576#comment-17843576 ]
Saidatt Sinai Amonkar commented on SPARK-48045: ----------------------------------------------- Opened a pull request to fix this: [GitHub Pull Request #46391|https://github.com/apache/spark/pull/46391] > Pandas API groupby with multi-agg-relabel ignores as_index=False > ---------------------------------------------------------------- > > Key: SPARK-48045 > URL: https://issues.apache.org/jira/browse/SPARK-48045 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark > Affects Versions: 3.5.1 > Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2 > Reporter: Paul George > Priority: Minor > Labels: pull-request-available > > A Pandas API DataFrame groupby with as_index=False and a multilevel > relabeling, such as > {code:java} > from pyspark import pandas as ps > ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", > as_index=False).agg(b_max=("b", "max")){code} > fails to include group keys in the resulting DataFrame. This diverges from > expected behavior as well as from the behavior of native Pandas, e.g. > *actual* > {code:java} > b_max > 0 1 {code} > *expected* > {code:java} > a b_max > 0 0 1 {code} > > A possible fix is to prepend groupby key columns to {{*order*}} and > {{*columns*}} before filtering here: > [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org