Re: Programming Multiple rounds of mapreduce

2011-06-13 Thread Sean Owen
You could have a look at the MapReduce pipelines in Apache Mahout (http://mahout.apache.org). See for instance org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. This shows how most of Mahout constructs and runs a series of rounds of MapReduce to accomplish a task. Each job feeds into one or mo

Re: Programming Multiple rounds of mapreduce

2011-06-13 Thread Moustafa Gaber
Actually, HaLoop is a new framework above Hadoop which targets the problem of transitive closure algorithms. This type of algorithms contain rounds of hadoop jobs, so I think it may contain some useful examples for you. On Mon, Jun 13, 2011 at 6:39 PM, Arko Provo Mukherjee < arkoprovomukher...@gma

Delimiter selection for Sequence Files

2011-06-13 Thread Mapred Learn
Hi, I was thinking of using CTRL A as delimiter but data that I am loading to Hadoop already has CTRL A in it. What are other good choices of delimiters that anybody might have used in this kind of scenario, considering that I also want to query this data using Hive. Thanks in advance -JJ

Re: Programming Multiple rounds of mapreduce

2011-06-13 Thread Arko Provo Mukherjee
Hello, Thanks everyone for your responses. I am new to Hadoop, so this was a lot of new information for me. I will surely go though all of these. However, I was actually hoping that someone could point me to some example codes where multiple rounds of map-reduce has been used. Please let me kno

Re: Programming Multiple rounds of mapreduce

2011-06-13 Thread Moustafa Gaber
I think HaLoop is a framework which can answer your question: http://code.google.com/p/haloop/ On Mon, Jun 13, 2011 at 5:46 PM, Arko Provo Mukherjee < arkoprovomukher...@gmail.com> wrote: > Hello, > > I am trying to write a program where I need to write multiple rounds of map > and reduce. > > Th

Re: Programming Multiple rounds of mapreduce

2011-06-13 Thread Alejandro Abdelnur
Thanks Matt, Arko, if you plan to use Oozie, you can have a simple coordinator job that does does, for example (the following schedules a WF every 5 mins that consumes the output produced by the previous run, you just have to have the initial data) Thxs. Alejandro 1 ${

RE: Programming Multiple rounds of mapreduce

2011-06-13 Thread GOEKE, MATTHEW (AG/1000)
If you know for certain that it needs to be split into multiple work units I would suggest looking into Oozie. Easy to install, light weight, low learning curve... for my purposes it's been very helpful so far. I am also fairly certain you can chain multiple job confs into the same run but I hav

Re: Programming Multiple rounds of mapreduce

2011-06-13 Thread Marcos Ortiz
Well, you can define a job for each round and then, you can define the running workflow based in your implementation and to chain your jobs El 6/13/2011 5:46 PM, Arko Provo Mukherjee escribió: Hello, I am trying to write a program where I need to write multiple rounds of map and reduce. The

Re: Programming Multiple rounds of mapreduce

2011-06-13 Thread Bibek Paudel
Hi, On Mon, Jun 13, 2011 at 11:46 PM, Arko Provo Mukherjee wrote: > Hello, > > I am trying to write a program where I need to write multiple rounds of map > and reduce. > > The output of the last round of map-reduce must be fed into the input of the > next round. > > Can anyone please guide me to

Programming Multiple rounds of mapreduce

2011-06-13 Thread Arko Provo Mukherjee
Hello, I am trying to write a program where I need to write multiple rounds of map and reduce. The output of the last round of map-reduce must be fed into the input of the next round. Can anyone please guide me to any link / material that can teach me as to how I can achieve this. Thanks a lot