One way is to split->explode->pivot These are column and Dataframe methods. Here are quick examples from web: https://www.google.com/amp/s/sparkbyexamples.com/spark/spark-split-dataframe-column-into-multiple-columns/amp/
https://www.google.com/amp/s/sparkbyexamples.com/spark/explode-spark-array-and-map-dataframe-column/amp/ On Wed, 9 Feb 2022, 01:55 frakass, <capitnfrak...@free.fr> wrote: > Hello > > for the RDD I can apply flatMap method: > > >>> sc.parallelize(["a few words","ba na ba na"]).flatMap(lambda x: > x.split(" ")).collect() > ['a', 'few', 'words', 'ba', 'na', 'ba', 'na'] > > > But for a dataframe table how can I flatMap that as above? > > >>> df.show() > +----------------+ > | value| > +----------------+ > | a few lines| > |hello world here| > | ba na ba na| > +----------------+ > > > Thanks > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >