Thank you Peter I try this
Le Mardi 4 août 2015 15h02, Peter Rudenko <petro.rude...@gmail.com> a écrit : Hi Clark, the problem is that in this dataset null values represented as NA marker. Spark-csv doesn't have configurable null values marker (i've made a PR with it some time ago: https://github.com/databricks/spark-csv/pull/76). So one option for you is to do post filtering, something like this: val rv = allyears2k.filter("COLUMN != `NA`") Thanks, Peter Rudenko On 2015-08-04 15:03, clark djilo kuissu wrote: Hello, I try to magage NA in this dataset. I import my dataset with the com.databricks.spark.csv package When I do this: allyears2k.na.drop() I have no result. Can you help me please ? Regards, ------------------- The dataset ------------------------- dataset: https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv ------------------- The code ------------------------- // Prepare environment import sys.process._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ val allyears2k = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("/home/clark/allyears2k.csv") allyears2k.registerTempTable("allyears2k") val rv = allyears2k.na.drop()