​You can try with UDF, like the following code snippet:

from pyspark.sql.functions import udf
from pyspark.sql.types import ArrayType, StringType
df = spark.read.text("./README.md")​
split_func = udf(lambda text: text.split(" "), ArrayType(StringType()))
df.withColumn("split_value", split_func("value")).show()


On Tue, Apr 25, 2017 at 12:27 AM, Selvam Raman wrote:

>     documentDF = spark.createDataFrame([
>     ("Hi I heard about Spark".split(" "), ),
>     ("I wish Java could use case classes".split(" "), ),
>     ("Logistic regression models are neat".split(" "), )
>     ], ["text"])
> How can i achieve the same df while i am reading from source?
> doc = spark.read.text("/Users/rs/Desktop/nohup.out")
> how can i create array<string> type with "sentences" column from
> doc(dataframe)
> The below one creates more than one column.
> rdd.map(lambda rdd: rdd[0]).map(lambda row:row.split(" "))
