hi all, Not sure whether this the right venue to ask. If not, please point me to the right group, if there is any.
I'm trying to create a Spark DataFrame from JSON file using jsonFile(). The call was successful, and I can see the DataFrame created. The JSON file I have contains a number of tweets obtained from Twitter API. Am particularly interested in pulling the hashtags contains in the tweets. If I use printSchema(), the schema is something like: root |-- id_str: string (nullable = true) |-- hashtags: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- indices: array (nullable = true) | | | |-- element: long (containsNull = true) | | |-- text: string (nullable = true) showDF() would show something like this : +--------------------+ | hashtags| +--------------------+ | List()| |List([List(125, 1...| | List()| |List([List(0, 3),...| |List([List(76, 86...| | List()| |List([List(74, 84...| | List()| | List()| | List()| |List([List(85, 96...| |List([List(125, 1...| | List()| | List()| | List()| | List()| |List([List(14, 17...| | List()| | List()| |List([List(14, 17...| +--------------------+ The question is now how to extract the text of the hashtags for each tweet? Still new to SparkR. Am thinking maybe I need to loop through the dataframe to extract for each tweet. But it seems that lapply does not really apply on Spark DataFrame as more. Any though on how to extract the text, as it will be inside a JSON array. Thanks, -JS -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-creating-dataframe-from-json-file-tp23849.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org