Hi Arunkumar, You can use datasetDF.select(countDistinct(col1, col2, col3, ...)) or approxCountDistinct for a approximate result.
2016-01-05 17:11 GMT+08:00 Arunkumar Pillai <arunkumar1...@gmail.com>: > Hi > > Is there any functions to find distinct count of all the variables in > dataframe. > > val sc = new SparkContext(conf) // spark context > val options = Map("header" -> "true", "delimiter" -> delimiter, "inferSchema" > -> "true") > val sqlContext = new org.apache.spark.sql.SQLContext(sc) // sql context > val datasetDF = > sqlContext.read.format("com.databricks.spark.csv").options(options).load(inputFile) > > > we are able to get the schema, variable data type. is there any method to get > the distinct count ? > > > > -- > Thanks and Regards > Arun >