Hi Cheng, on my computer, execute res0.save("xxx", org.apache.spark.sql.SaveMode. Overwrite) produces:
peilunlee@pllee-mini:~/opt/spark-1.3...rc3-bin-hadoop1$ ls -l xxx total 32 -rwxrwxrwx 1 peilunlee staff 0 Mar 27 11:29 _SUCCESS* -rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29 part-r-00001.parquet* -rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29 part-r-00002.parquet* -rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29 part-r-00003.parquet* -rwxrwxrwx 1 peilunlee staff 488 Mar 27 11:29 part-r-00004.parquet* while res0.save("xxx") produces: peilunlee@pllee-mini:~/opt/spark-1.3...rc3-bin-hadoop1$ ls -l xxx total 40 -rwxrwxrwx 1 peilunlee staff 0 Mar 27 11:29 _SUCCESS* -rwxrwxrwx 1 peilunlee staff 250 Mar 27 11:29 _common_metadata* -rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29 part-r-00001.parquet* -rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29 part-r-00002.parquet* -rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29 part-r-00003.parquet* -rwxrwxrwx 1 peilunlee staff 488 Mar 27 11:29 part-r-00004.parquet* On Thu, Mar 26, 2015 at 7:26 PM, Cheng Lian <lian.cs....@gmail.com> wrote: > I couldn’t reproduce this with the following spark-shell snippet: > > scala> import sqlContext.implicits._ > scala> Seq((1, 2)).toDF("a", "b") > scala> res0.save("xxx", org.apache.spark.sql.SaveMode.Overwrite) > scala> res0.save("xxx", org.apache.spark.sql.SaveMode.Overwrite) > > The _common_metadata file is typically much smaller than _metadata, > because it doesn’t contain row group information, and thus can be faster to > read than _metadata. > > Cheng > > On 3/26/15 12:48 PM, Pei-Lun Lee wrote: > > Hi, > > When I save parquet file with SaveMode.Overwrite, it never generate > _common_metadata. Whether it overwrites an existing dir or not. > Is this expected behavior? > And what is the benefit of _common_metadata? Will reading performs better > when it is present? > > Thanks, > -- > Pei-Lun > > >