Have you tried using withColumn? You can add a boolean column based on whether the age exists or not and then drop the older age column. You wouldn't need union of dataframes then
On Tue, Dec 18, 2018 at 8:58 AM Devender Yadav <[email protected]> wrote: > Hi All, > > > useful code: > > public class EmployeeBean implements Serializable { > > private Long id; > > private String name; > > private Long salary; > > private Integer age; > > // getters and setters > > } > > > Relevant spark code: > > SparkSession spark = > SparkSession.builder().master("local[2]").appName("play-with-spark").getOrCreate(); > List<EmployeeBean> employees1 = populateEmployees(1, 10); > > Dataset<EmployeeBean> ds1 = spark.createDataset(employees1, > Encoders.bean(EmployeeBean.class)); > ds1.show(); > ds1.printSchema(); > > Dataset<Row> ds2 = ds1.where("age is null").withColumn("is_age_null", > lit(true)); > Dataset<Row> ds3 = ds1.where("age is not null").withColumn("is_age_null", > lit(false)); > > Dataset<Row> ds4 = ds2.union(ds3); > ds4.show(); > > > Relevant Output: > > > ds1 > > +----+---+----+------+ > | age| id|name|salary| > +----+---+----+------+ > |null| 1|dev1| 11000| > | 2| 2|dev2| 12000| > |null| 3|dev3| 13000| > | 4| 4|dev4| 14000| > |null| 5|dev5| 15000| > +----+---+----+------+ > > > ds4 > > +----+---+----+------+-----------+ > | age| id|name|salary|is_age_null| > +----+---+----+------+-----------+ > |null| 1|dev1| 11000| true| > |null| 3|dev3| 13000| true| > |null| 5|dev5| 15000| true| > | 2| 2|dev2| 12000| false| > | 4| 4|dev4| 14000| false| > +----+---+----+------+-----------+ > > > Is there any better solution to add this column in the dataset rather than > creating two datasets and performing union? > > < > https://stackoverflow.com/questions/53834286/add-column-value-in-spark-dataset-on-the-basis-of-the-condition > > > > > > Regards, > Devender > > ________________________________ > > > > > > > NOTE: This message may contain information that is confidential, > proprietary, privileged or otherwise protected by law. The message is > intended solely for the named addressee. If received in error, please > destroy and notify the sender. Any use of this email is prohibited when > received in error. Impetus does not represent, warrant and/or guarantee, > that the integrity of this communication has been maintained nor that the > communication is free of errors, virus, interception or interference. > > --------------------------------------------------------------------- > To unsubscribe e-mail: [email protected]
