Hi Rohit You can use accumulators and increase it on every record processing. At last you can get the value of accumulator on driver , which will give you the count.
HTH Deepak On Nov 5, 2016 20:09, "Rohit Verma" <rohit.ve...@rokittech.com> wrote: > I am using spark to read from database and write in hdfs as parquet file. > Here is code snippet. > > private long etlFunction(SparkSession spark){ > spark.sqlContext().setConf("spark.sql.parquet.compression.codec", > “SNAPPY"); > Properties properties = new Properties(); > properties.put("driver”,”oracle.jdbc.driver"); > properties.put("fetchSize”,”5000"); > Dataset<Row> dataset = spark.read().jdbc(jdbcUrl, query, properties); > dataset.write.format(“parquet”).save(“pdfs-path”); > return dataset.count(); > } > > When I look at spark ui, during write I have stats of records written, > visible in sql tab under query plan. > > While the count itself is a heavy task. > > Can someone suggest best way to get count in most optimized way. > > Thanks all.. >