
I'm working on this use case that involves converting DStreams to
Dataframes after some transformations. I've simplified my code into the
following snippet so as to reproduce the error. Also, I've mentioned below
my environment settings.


Spark Version: 2.2.0
Java: 1.8
Execution mode: local/ IntelliJ


object Tests {

def main(args: Array[String]): Unit = {
val spark: SparkSession =  ...
  import spark.implicits._

    val df = List(
        ("jim", "usa"),
        ("raj", "india"))
        .toDF("name", "country")

      .map(x => x.toSeq)
      .map(x => new GenericRowWithSchema(x.toArray, df.schema))

This results in NullPointerException as I'm directly using df.schema in

What I don't understand is that if I use the following code (basically
storing the schema as a value before transforming), it works just fine.

object Tests {

def main(args: Array[String]): Unit = {
val spark: SparkSession =  ...
  import spark.implicits._

    val df = List(
        ("jim", "usa"),
        ("raj", "india"))
        .toDF("name", "country")

    val sc = df.schema

      .map(x => x.toSeq)
      .map(x => new GenericRowWithSchema(x.toArray, sc))

I wonder why this is happening as *df.rdd* is not an action and there is
visible change in state of dataframe just yet. What are your thoughts on

Chitral Verma

Reply via email to