Spark 1.4 supports dynamic partitioning, you can first convert your RDD to a DataFrame and then save the contents partitioned by date column. Say you have a DataFrame df containing three columns a, b, and c, you may have something like this:

df.write.partitionBy("a", "b").mode("overwrite").parquet("path/to/file")

Cheng

On 6/13/15 5:31 AM, Xin Liu wrote:
Hi,

I have a scenario where I'd like to store a RDD using parquet format in many files, which corresponds to days, such as 2015/01/01, 2015/02/02, etc.

So far I used this method

http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job

to store text files (then I have to read text files and convert to parquet and store again). Anyone has tried to store many parquet files from one RDD?

Thanks,
Xin



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to