Hi All,

useful code:

public class EmployeeBean implements Serializable {

    private Long id;

    private String name;

    private Long salary;

    private Integer age;

    // getters and setters

}


Relevant spark code:

SparkSession spark = 
SparkSession.builder().master("local[2]").appName("play-with-spark").getOrCreate();
List<EmployeeBean> employees1 = populateEmployees(1, 10);

Dataset<EmployeeBean> ds1 = spark.createDataset(employees1, 
Encoders.bean(EmployeeBean.class));
ds1.show();
ds1.printSchema();

Dataset<Row> ds2 = ds1.where("age is null").withColumn("is_age_null", 
lit(true));
Dataset<Row> ds3 = ds1.where("age is not null").withColumn("is_age_null", 
lit(false));

Dataset<Row> ds4 = ds2.union(ds3);
ds4.show();


Relevant Output:


ds1

+----+---+----+------+
| age| id|name|salary|
+----+---+----+------+
|null|  1|dev1| 11000|
|   2|  2|dev2| 12000|
|null|  3|dev3| 13000|
|   4|  4|dev4| 14000|
|null|  5|dev5| 15000|
+----+---+----+------+


ds4

+----+---+----+------+-----------+
| age| id|name|salary|is_age_null|
+----+---+----+------+-----------+
|null|  1|dev1| 11000|       true|
|null|  3|dev3| 13000|       true|
|null|  5|dev5| 15000|       true|
|   2|  2|dev2| 12000|      false|
|   4|  4|dev4| 14000|      false|
+----+---+----+------+-----------+


Is there any better solution to add this column in the dataset rather than 
creating two datasets and performing union?

<https://stackoverflow.com/questions/53834286/add-column-value-in-spark-dataset-on-the-basis-of-the-condition>



Regards,
Devender

________________________________






NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

<<attachment: winmail.dat>>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to