Hi:

 I want to get the total bytes of a DataFrame by following function , but when 
I insert the DataFrame into hive , I found the value of the function is 
different from spark.sql.statistics.totalSize .  The 
spark.sql.statistics.totalSize  is less than the result of following function 
getRDDBytes . 


   def getRDDBytes(df:DataFrame):Long={

  df.rdd.getNumPartitions match {
case 0 =>
0
case numPartitions =>
val rddOfDataframe = df.rdd.map(_.toString().getBytes("UTF-8").length.toLong)
val size = if (rddOfDataframe.isEmpty()) {
0
} else {
        rddOfDataframe.reduce(_ + _)
      }

      size
  }

}
Appreciate if you can provide your suggestion.


Best Regards
Kelly Zhang

Reply via email to