Hey, I'm working on this use case that involves converting DStreams to Dataframes after some transformations. I've simplified my code into the following snippet so as to reproduce the error. Also, I've mentioned below my environment settings.
*Environment:* Spark Version: 2.2.0 Java: 1.8 Execution mode: local/ IntelliJ *Code:* object Tests { def main(args: Array[String]): Unit = { val spark: SparkSession = ... import spark.implicits._ val df = List( ("jim", "usa"), ("raj", "india")) .toDF("name", "country") df.rdd .map(x => x.toSeq) .map(x => new GenericRowWithSchema(x.toArray, df.schema)) .foreach(println) } } This results in NullPointerException as I'm directly using df.schema in map(). What I don't understand is that if I use the following code (basically storing the schema as a value before transforming), it works just fine. object Tests { def main(args: Array[String]): Unit = { val spark: SparkSession = ... import spark.implicits._ val df = List( ("jim", "usa"), ("raj", "india")) .toDF("name", "country") val sc = df.schema df.rdd .map(x => x.toSeq) .map(x => new GenericRowWithSchema(x.toArray, sc)) .foreach(println) } } I wonder why this is happening as *df.rdd* is not an action and there is visible change in state of dataframe just yet. What are your thoughts on this? Regards, Chitral Verma