Hi everyone,
I want to run a MR job continuously. Because i have streaming data and i
try to analyze it all the time in my way(algorithm). For example you want
to solve wordcount problem. It's the simplest one :) If you have some
multiple files and the new files are keep going, how do you handle it
Hi burak,
Perhaps, you could implement a cron job that will execute your MR
program periodically.
Regards
On 05-Dec-11 10:49 PM, burakkk wrote:
Hi everyone,
I want to run a MR job continuously. Because i have streaming data and
i try to analyze it all the time in my way(algorithm). For ex
Burak
If you have a continuous inflow of data, you can choose flume to
aggregate the files into larger sequence files or so if they are small and
when you have a substantial chunk of data(equal to hdfs block size). You
can push that data on to hdfs based on your SLAs you need to schedule you
Burak,
Before we can really answer your question, you need to give us some more
information on the processing you want to do. Do you want output that is
continuous or batched (if so, how)? How should the output at a given time
be related to the input up to then and the previous outputs?
Regar
Athanasios Papaoikonomou, cron job isn't useful for me. Because i want to
execute the MR job on the same algorithm but different files have different
velocity.
Both Storm and facebook's hadoop are designed for that. But i want to use
apache distribution.
Bejoy Ks, i have a continuous inflow of da
his will suffice your requirement.
Regards,
Ravi Teja
From: burakkk [burak.isi...@gmail.com]
Sent: 06 December 2011 04:03:59
To: mapreduce-user@hadoop.apache.org
Cc: common-u...@hadoop.apache.org
Subject: Re: Running a job continuously
Athanasios Papao