Maybe you could try something like that: SparkSession sparkSession = SparkSession .builder() .appName("Rows2DataSet") .master("local") .getOrCreate(); List<Row> results = new LinkedList<Row>(); JavaRDD<Row> jsonRDD = new JavaSparkContext(sparkSession.sparkContext()).parallelize(results); Dataset<Row> peopleDF = sparkSession.createDataFrame(jsonRDD, Row.class);
Richard Xin On Tuesday, March 28, 2017 7:51 AM, Karin Valisova <ka...@datapine.com> wrote: Hello! I am running Spark on Java and bumped into a problem I can't solve or find anything helpful among answered questions, so I would really appreciate your help. I am running some calculations, creating rows for each result: List<Row> results = new LinkedList<Row>(); for(something){ results.add(RowFactory.create( someStringVariable, someIntegerVariable )); } Now I ended up with a list of rows I need to turn into dataframe to perform some spark sql operations on them, like groupings and sorting. Would like to keep the dataTypes. I tried: Dataset<Row> toShow = spark.createDataFrame(results, Row.class); but it throws nullpointer. (spark being SparkSession) Is my logic wrong there somewhere, should this operation be possible, resulting in what I want? Or do I have to create a custom class which extends serializable and create a list of those objects rather than Rows? Will I be able to perform SQL queries on dataset consisting of custom class objects rather than rows? I'm sorry if this is a duplicate question.Thank you for your help!Karin