Hashing two relations

2010-07-02 Thread abc xyz
Hey Folks, I have to mess around with hashing. I want to take two input sources, partition them using hash function, then make the in-memory hash table for each partition of one sources, and compare the hash of each record of the same partition of the other table against it for joining these t

Re: Text files vs. SequenceFiles

2010-07-02 Thread Joe Stein
David, You can also set compression to occur of your data between your map & reduce tasks (this data can be large and often is quicker to compress and transfer than just transfer when the copy gets going). *mapred.compress.map.output* Setting this value to *true* should speed up the reducers cop

Re: Text files vs. SequenceFiles

2010-07-02 Thread Alex Loddengaard
Hi David, On Fri, Jul 2, 2010 at 2:54 PM, David Rosenstrauch wrote: > > * We should use a SequenceFile (binary) format as it's faster for the > machine to read than parsing text, and the files are smaller. > > * We should use a text file format as it's easier for humans to read, > easier to change

Text files vs. SequenceFiles

2010-07-02 Thread David Rosenstrauch
Our team is still new to Hadoop, and a colleague and I are trying to make a decision on file formats. The arguments are: * We should use a SequenceFile (binary) format as it's faster for the machine to read than parsing text, and the files are smaller. * We should use a text file format as i

Re: Intermediate files generated.

2010-07-02 Thread Ken Goodhope
You could also use multi output from the old api. This will allow you to create multiple output collectors. One collector could be used at the beginning of the reduce call for writing the key-value pairing unaltered, and another collector for writing the results of your processing. On Fri, Jul 2

Job Post: Hadoop opportunity in Stockholm, Sweden

2010-07-02 Thread Per Mellqvist
Hi All, Delta Projects are looking to add more Hadoop skills to our development team in Stockholm, Sweden. If you are interested in applying your knowledge to an interesting set of data and problems in the online advertising domain, send an email to per.mellqv...@deltaprojects.se Cheers Per Mellq

CFP for Surge Scalability Conference 2010

2010-07-02 Thread Jason Dixon
A quick reminder that there's one week left to submit your abstract for this year's Surge Scalability Conference. The event is taking place on Sept 30 and Oct 1, 2010 in Baltimore, MD. Surge focuses on case studies that address production failures and the re-engineering efforts that led to victor

Re: problem with rack-awareness

2010-07-02 Thread Edward Capriolo
On Fri, Jul 2, 2010 at 2:27 PM, Allen Wittenauer wrote: > > On Jul 1, 2010, at 7:50 PM, elton sky wrote: > >> hello, >> >> I am trying to separate my 6 nodes onto 2 different racks. >> For test purpose, I wrote a bash file which smply returns "rack0" all the >> time. And I add property "topology.s

Re: problem with rack-awareness

2010-07-02 Thread Allen Wittenauer
On Jul 1, 2010, at 7:50 PM, elton sky wrote: > hello, > > I am trying to separate my 6 nodes onto 2 different racks. > For test purpose, I wrote a bash file which smply returns "rack0" all the > time. And I add property "topology.script.file.name" in core-site.xml. rack0 or /rack0? I think the

RE: problem with rack-awareness

2010-07-02 Thread Michael Segel
A couple of things... Does your script have a default rack defined? So if it can't find your machine, you default to it being on rack_default ? (You could use rack0, but then you have a problem will you know what's really in rack0 or what's kicking out the default value?) The other issue is th

how to compile HADOOP

2010-07-02 Thread Ahmad Shahzad
Hi ALL, Can anyone tell me that how will i compile the whole hadoop directory if i add some files to hadoop core directory or i change some code in some of the files. Regards, Ahmad Shahzad

Re: Intermediate files generated.

2010-07-02 Thread Pramy Bhats
Hi, Isn't possible to hack-in the intermediate files generated ? I am writing a compilation framework, so i dont want to mess up with existing programming framework. The upper layer or the programmer should write the program the way he should write, and I want to leverage the intermediate file ge

Re: newbie - job failing at reduce

2010-07-02 Thread Siddharth Karandikar
I am running with 10240 now and jobs look to be working fine. Need to confirm this by reverting back to 1024 and see jobs failing. :) Thanks! On Wed, Jun 30, 2010 at 10:59 PM, Siddharth Karandikar wrote: > Yeah. Looks like its set to 1024 right now. Change that to say 10 > times more and run

Re: Intermediate files generated.

2010-07-02 Thread Jones, Nick
Hi Pramy, I would setup one M/R job to just map (setNumReducers=0) and chain another job that uses a unity mapper to pass the intermediate data to the reduce step. Nick Sent by radiation. - Original Message - From: Pramy Bhats To: common-user@hadoop.apache.org Sent: Fri Jul 02 01:05:2