Are you looking for something like filter? See a similar example here https://spark.apache.org/examples.html
Thanks Best Regards On Sat, Jun 13, 2015 at 3:11 PM, Hao Wang <bill...@gmail.com> wrote: > Hi, > > I have a bunch of large log files on Hadoop. Each line contains a log and > its severity. Is there a way that I can use Spark to split the entire data > set into different files on Hadoop according the severity field? Thanks. > Below is an example of the input and output. > > Input: > [ERROR] log1 > [INFO] log2 > [ERROR] log3 > [INFO] log4 > > Output: > error_file > [ERROR] log1 > [ERROR] log3 > > info_file > [INFO] log2 > [INFO] log4 > > > Best, > Hao Wang >