Hi all, 1. *Columns in dataframes can be nullable and not nullable. Having a nullable column of Doubles, I can use the following Scala code to filter all "non-null" rows:*
val df = ..... // some code that creates a DataFrame df.filter( df("columnname").isNotNull() ) +-+-----+----+ |x| a| y| +-+-----+----+ |1|hello|null| |2| bob| 5| +-+-----+----+ root |-- x: integer (nullable = false) |-- a: string (nullable = true) |-- y: integer (nullable = true) And with the filter expression +-+---+-+ |x| a|y| +-+---+-+ |2|bob|5| +-+---+-+ Unfortunetaly and while this is a true for a nullable column (according to df.printSchema), it is not true for a column that is not nullable: +-+-----+----+ |x| a| y| +-+-----+----+ |1|hello|null| |2| bob| 5| +-+-----+----+ root |-- x: integer (nullable = false) |-- a: string (nullable = true) |-- y: integer (nullable = false) +-+-----+----+ |x| a| y| +-+-----+----+ |1|hello|null| |2| bob| 5| +-+-----+----+ such that the output is not affected by the filter. Is this intended? 2. *What is the cheapest (in sense of performance) to turn a non-nullable column into a nullable column? A came uo with this:* /** * Set, if a column is nullable. * @param df source DataFrame * @param cn is the column name to change * @param nullable is the flag to set, such that the column is either nullable or not */ def setNullableStateOfColumn( df: DataFrame, cn: String, nullable: Boolean) : DataFrame = { val schema = df.schema val newSchema = StructType(schema.map { case StructField( c, t, _, m) if c.equals(cn) => StructField( c, t, nullable = nullable, m) case y: StructField => y }) df.sqlContext.createDataFrame( df.rdd, newSchema) } Is there a cheaper solution? 3. *Any comments?* Cheers and thx in advance, Martin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org