Nicholas Chammas created SPARK-15191: ----------------------------------------
Summary: createDataFrame() should mark fields that are known not to be null as not nullable Key: SPARK-15191 URL: https://issues.apache.org/jira/browse/SPARK-15191 Project: Spark Issue Type: Improvement Components: PySpark, SQL Reporter: Nicholas Chammas Priority: Minor Here's a brief reproduction: {code} >>> numbers = sqlContext.createDataFrame( ... data=[(1,), (2,), (3,), (4,), (5,)], ... samplingRatio=1 # go through all the data please! ... ) >>> numbers.printSchema() root |-- _1: long (nullable = true) {code} The field is marked as nullable even though none of the data is null and we had {{createDataFrame()}} go through all the data. In situations like this, shouldn't {{createDataFrame()}} return a DataFrame with the field marked as not nullable? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org