Check out this recent post by Cheng Liam regarding dynamic partitioning in Spark 1.4: https://www.mail-archive.com/user@spark.apache.org/msg30204.html
On June 13, 2015, at 5:41 AM, Hao Wang <bill...@gmail.com> wrote: Hi, I have a bunch of large log files on Hadoop. Each line contains a log and its severity. Is there a way that I can use Spark to split the entire data set into different files on Hadoop according the severity field? Thanks. Below is an example of the input and output. Input: [ERROR] log1 [INFO] log2 [ERROR] log3 [INFO] log4 Output: error_file [ERROR] log1 [ERROR] log3 info_file [INFO] log2 [INFO] log4 Best, Hao Wang