Re: Job scheduling

2011-06-07 Thread Ian Halperin
For the record, I figured this out. The default task scheduler just assigns tasks to trackers on a first come first served basis as trackers' heartbeats are received; although this might tend to favour data-local and then rack-local trackers at a large enough scale. I switched over to using FairS

Re: Input examples

2011-06-07 Thread Marcos Ortiz
You can use the HackReduce's datasets too for this. http://hackreduce.org/datasets Regards El 6/7/2011 1:56 PM, Jonathan Coveney escribió: Have you taken a look at the O'Reilly Hadoop book? It deals consistently with a weather dataset that is, I believe, largely available. 2011/6/7 Francesco

Re: Job scheduling

2011-06-07 Thread Ian Halperin
I found from Googling around that I should probably be seeing messages like "Choosing data-local task" and "Choosing rack-local task" - from JobInProgress::addRunningTaskToTIP(). (e.g. here: http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201012.mbox/%3C2120373776.44711293532894724.Ja

Re: Input examples

2011-06-07 Thread Francesco De Luca
Great! Thank you Jonathan 2011/6/7 Jonathan Coveney > Have you taken a look at the O'Reilly Hadoop book? It deals consistently > with a weather dataset that is, I believe, largely available. > > > 2011/6/7 Francesco De Luca > >> Hello Sean, >> >> not exactely. I mean some applications like word

Re: Input examples

2011-06-07 Thread Jonathan Coveney
Have you taken a look at the O'Reilly Hadoop book? It deals consistently with a weather dataset that is, I believe, largely available. 2011/6/7 Francesco De Luca > Hello Sean, > > not exactely. I mean some applications like word count or inverted index > and the relative input data. > > 2011/6/7

Re: Input examples

2011-06-07 Thread Francesco De Luca
Hello Sean, not exactely. I mean some applications like word count or inverted index and the relative input data. 2011/6/7 Sean Owen > Not sure if it's quite what you mean, but, Apache Mahout is essentially all > applications of Hadoop for machine learning, a bunch of runnable jobs (some > with

Re: Input examples

2011-06-07 Thread Sean Owen
Not sure if it's quite what you mean, but, Apache Mahout is essentially all applications of Hadoop for machine learning, a bunch of runnable jobs (some with example data too). mahout.apache.org On Tue, Jun 7, 2011 at 3:54 PM, Francesco De Luca wrote: > Where i can find some hadoop map reduce app

Input examples

2011-06-07 Thread Francesco De Luca
Where i can find some hadoop map reduce application examples (except word count) with associate input files? Thanks

Re: Job scheduling

2011-06-07 Thread Ian Halperin
Harsh, thanks for the clarification. But my mappers always seem to run elsewhere. Here's an example with 2 splits, both on rack1node1, but the 2 mappers get started on other nodes. Could the "choosing a non-local task" message be significant? I have actually read through the JobTracker source, bu