Hanan Shteingart created SPARK-26449: ----------------------------------------
Summary: Dataframe.transform Key: SPARK-26449 URL: https://issues.apache.org/jira/browse/SPARK-26449 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.0 Reporter: Hanan Shteingart I would like to chain custom transformations as is suggested in this [blog post|https://medium.com/@mrpowers/chaining-custom-pyspark-transformations-4f38a8c7ae55] This will allow to write something like the following: {code:java} def with_greeting(df): return df.withColumn("greeting", lit("hi")) def with_something(df, something): return df.withColumn("something", lit(something)) data = [("jose", 1), ("li", 2), ("liz", 3)] source_df = spark.createDataFrame(data, ["name", "age"]) actual_df = (source_df .transform(with_greeting) .transform(lambda df: with_something(df, "crazy"))) print(actual_df.show()) +----+---+--------+---------+ |name|age|greeting|something| +----+---+--------+---------+ |jose| 1| hi| crazy| | li| 2| hi| crazy| | liz| 3| hi| crazy| +----+---+--------+---------+ {code} The only thing needed to accomplish this is the following simple method for DataFrame: {code:java} from pyspark.sql.dataframe import DataFrame def transform(self, f): return f(self) DataFrame.transform = transform {code} I volunteer to do the pull request if approved (at least the python part) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org