How to split log data into different files according to severity

2015-06-13 Thread Hao Wang
Hi, I have a bunch of large log files on Hadoop. Each line contains a log and its severity. Is there a way that I can use Spark to split the entire data set into different files on Hadoop according the severity field? Thanks. Below is an example of the input and output. Input: [ERROR] log1 [INFO]

Re: How to split log data into different files according to severity

2015-06-13 Thread Akhil Das
Are you looking for something like filter? See a similar example here https://spark.apache.org/examples.html Thanks Best Regards On Sat, Jun 13, 2015 at 3:11 PM, Hao Wang wrote: > Hi, > > I have a bunch of large log files on Hadoop. Each line contains a log and > its severity. Is there a way th

Re: How to split log data into different files according to severity

2015-06-13 Thread Hao Wang
I am currently using filter inside a loop of all severity levels to do this, which I think is pretty inefficient. It has to read the entire data set once for each severity. I wonder if there is a more efficient way that takes just one pass of the data? Thanks. Best, Hao Wang > On Jun 13, 2015,

Re: How to split log data into different files according to severity

2015-06-13 Thread Will Briggs
Check out this recent post by Cheng Liam regarding dynamic partitioning in Spark 1.4: https://www.mail-archive.com/user@spark.apache.org/msg30204.html On June 13, 2015, at 5:41 AM, Hao Wang wrote: Hi, I have a bunch of large log files on Hadoop. Each line contains a log and its severity. Is

Re: How to split log data into different files according to severity

2015-06-14 Thread Hao Wang
Thanks for the link. I’m still running 1.3.1 but will give it a try :) Hao > On Jun 13, 2015, at 9:38 AM, Will Briggs wrote: > > Check out this recent post by Cheng Liam regarding dynamic partitioning in > Spark 1.4: https://www.mail-archive.com/user@spark.apache.org/msg30204.html >