Running a job continuously

2011-12-05 Thread burakkk
Hi everyone, I want to run a MR job continuously. Because i have streaming data and i try to analyze it all the time in my way(algorithm). For example you want to solve wordcount problem. It's the simplest one :) If you have some multiple files and the new files are keep going, how do you handle it

Re: Running a job continuously

2011-12-05 Thread Bejoy Ks
Burak If you have a continuous inflow of data, you can choose flume to aggregate the files into larger sequence files or so if they are small and when you have a substantial chunk of data(equal to hdfs block size). You can push that data on to hdfs based on your SLAs you need to schedule you

Re: Running a job continuously

2011-12-05 Thread Mike Spreitzer
Burak, Before we can really answer your question, you need to give us some more information on the processing you want to do. Do you want output that is continuous or batched (if so, how)? How should the output at a given time be related to the input up to then and the previous outputs? Regar

Re: Running a job continuously

2011-12-05 Thread John Conwell
You might also want to take a look at Storm, as thats what its design to do: https://github.com/nathanmarz/storm/wiki On Mon, Dec 5, 2011 at 1:34 PM, Mike Spreitzer wrote: > Burak, > Before we can really answer your question, you need to give us some more > information on the processing you want

Re: Running a job continuously

2011-12-05 Thread burakkk
Athanasios Papaoikonomou, cron job isn't useful for me. Because i want to execute the MR job on the same algorithm but different files have different velocity. Both Storm and facebook's hadoop are designed for that. But i want to use apache distribution. Bejoy Ks, i have a continuous inflow of da

Re: Running a job continuously

2011-12-05 Thread Abhishek Pratap Singh
Hi Burak, The model of hadoop is very different, it is based on Job based model, in more easy words its a kind of Batch model where map reduce job is executed on a batch of data which is already present. As per your requirement, word count example doesn't make sense if the file has been written co

RE: Running a job continuously

2011-12-05 Thread Ravi teja ch n v
his will suffice your requirement. Regards, Ravi Teja From: burakkk [burak.isi...@gmail.com] Sent: 06 December 2011 04:03:59 To: mapreduce-u...@hadoop.apache.org Cc: common-user@hadoop.apache.org Subject: Re: Running a job continuously Athanasios Papao

Re: Running a job continuously

2011-12-06 Thread Praveen Sripati
59 > To: mapreduce-u...@hadoop.apache.org > Cc: common-user@hadoop.apache.org > Subject: Re: Running a job continuously > > Athanasios Papaoikonomou, cron job isn't useful for me. Because i want to > execute the MR job on the same algorithm but different files have different >

Re: Running a job continuously

2011-12-11 Thread Inder Pall
unt of data is in, your can configure Oozie to run your > > job. > > I think this will suffice your requirement. > > > > Regards, > > Ravi Teja > > > > > > From: burakkk [burak.isi...@gmail.com] > > S