Re: finding distinct count using dataframe

Yanbo Liang Tue, 05 Jan 2016 01:55:44 -0800

Hi Arunkumar,

You can use datasetDF.select(countDistinct(col1, col2, col3, ...)) or
approxCountDistinct for a approximate result.


2016-01-05 17:11 GMT+08:00 Arunkumar Pillai <arunkumar1...@gmail.com>:

> Hi
>
> Is there any   functions to find distinct count of all the variables in
> dataframe.
>
> val sc = new SparkContext(conf) // spark context
> val options = Map("header" -> "true", "delimiter" -> delimiter, "inferSchema" 
> -> "true")
> val sqlContext = new org.apache.spark.sql.SQLContext(sc) // sql context
> val datasetDF = 
> sqlContext.read.format("com.databricks.spark.csv").options(options).load(inputFile)
>
>
> we are able to get the schema, variable data type. is there any method to get 
> the distinct count ?
>
>
>
> --
> Thanks and Regards
>         Arun
>

Re: finding distinct count using dataframe

Reply via email to