A question about radd bytes size

2019-12-01 Thread zhangliyun
Hi: I want to get the total bytes of a DataFrame by following function , but when I insert the DataFrame into hive , I found the value of the function is different from spark.sql.statistics.totalSize . The spark.sql.statistics.totalSize is less than the result of following function getRDDB

Re: A question about radd bytes size

2019-12-01 Thread Wenchen Fan
When we talk about bytes size, we need to specify how the data is stored. For example, if we cache the dataframe, then the bytes size is the number of bytes of the binary format of the table cache. If we write to hive tables, then the bytes size is the total size of the data files of the table. On