Hi, I have searched around but could not find a satisfying answer to this question: what is the best way to do a complex transformation on a dataframe column?
For example, I have a dataframe with the following schema and a function that has pretty complex logic to format addresses. I would like to use the function to format each address and store the output as an additional column in the dataframe. What is the best way to do it? Use Dataframe.map? Define a UDF? Some code example would be appreciated. Input dataframe: root |-- ID: string (nullable = true) |-- Name: string (nullable = true) |-- PhoneNumber: string (nullable = true) |-- Address: string (nullable = true) Output dataframe: root |-- ID: string (nullable = true) |-- Name: string (nullable = true) |-- PhoneNumber: string (nullable = true) |-- Address: string (nullable = true) |-- FormattedAddress: string (nullable = true) The function for format addresses: def formatAddress(address: String): String Best regards, Hao Wang --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org