Running a job continuously

2011-12-05 Thread burakkk
Hi everyone, I want to run a MR job continuously. Because i have streaming data and i try to analyze it all the time in my way(algorithm). For example you want to solve wordcount problem. It's the simplest one :) If you have some multiple files and the new files are keep going, how do you handle it

Re: Running a job continuously

2011-12-05 Thread Athanasios Papaoikonomou
Hi burak, Perhaps, you could implement a cron job that will execute your MR program periodically. Regards On 05-Dec-11 10:49 PM, burakkk wrote: Hi everyone, I want to run a MR job continuously. Because i have streaming data and i try to analyze it all the time in my way(algorithm). For ex

Re: Running a job continuously

2011-12-05 Thread Bejoy Ks
Burak If you have a continuous inflow of data, you can choose flume to aggregate the files into larger sequence files or so if they are small and when you have a substantial chunk of data(equal to hdfs block size). You can push that data on to hdfs based on your SLAs you need to schedule you

Re: Running a job continuously

2011-12-05 Thread Mike Spreitzer
Burak, Before we can really answer your question, you need to give us some more information on the processing you want to do. Do you want output that is continuous or batched (if so, how)? How should the output at a given time be related to the input up to then and the previous outputs? Regar

Re: Running a job continuously

2011-12-05 Thread burakkk
Athanasios Papaoikonomou, cron job isn't useful for me. Because i want to execute the MR job on the same algorithm but different files have different velocity. Both Storm and facebook's hadoop are designed for that. But i want to use apache distribution. Bejoy Ks, i have a continuous inflow of da

RE: Running a job continuously

2011-12-05 Thread Ravi teja ch n v
his will suffice your requirement. Regards, Ravi Teja From: burakkk [burak.isi...@gmail.com] Sent: 06 December 2011 04:03:59 To: mapreduce-user@hadoop.apache.org Cc: common-u...@hadoop.apache.org Subject: Re: Running a job continuously Athanasios Papao