how do I do one hot encode on a column of array? e.g. ['TG', 'CA']
FYI here's my code for one hot encoding normal categorical columns. How do I make it work for a column of array? from pyspark.ml import Pipeline from pyspark.ml.feature import StringIndexer indexers = [StringIndexer(inputCol=column, outputCol=column+"_index").fit(flight3) for column in list(set['ColA', 'ColB', 'ColC'])] pipeline = Pipeline(stages=indexers) flight4 = pipeline.fit(flight3).transform(flight3)