Check out this recent post by Cheng Liam regarding dynamic partitioning in 
Spark 1.4: https://www.mail-archive.com/user@spark.apache.org/msg30204.html

On June 13, 2015, at 5:41 AM, Hao Wang <bill...@gmail.com> wrote:

Hi,


I have a bunch of large log files on Hadoop. Each line contains a log and its 
severity. Is there a way that I can use Spark to split the entire data set into 
different files on Hadoop according the severity field? Thanks. Below is an 
example of the input and output.


Input:

[ERROR] log1

[INFO] log2

[ERROR] log3

[INFO] log4


Output:

error_file

[ERROR] log1

[ERROR] log3


info_file

[INFO] log2

[INFO] log4



Best,

Hao Wang

Reply via email to