Hey Folks,
I have to mess around with hashing. I want to take two input sources, partition
them using hash function, then make the in-memory hash table for each partition
of one sources, and compare the hash of each record of the same partition of
the
other table against it for joining these t
David,
You can also set compression to occur of your data between your map & reduce
tasks (this data can be large and often is quicker to compress and transfer
than just transfer when the copy gets going).
*mapred.compress.map.output*
Setting this value to *true* should speed up the reducers cop
Hi David,
On Fri, Jul 2, 2010 at 2:54 PM, David Rosenstrauch wrote:
>
> * We should use a SequenceFile (binary) format as it's faster for the
> machine to read than parsing text, and the files are smaller.
>
> * We should use a text file format as it's easier for humans to read,
> easier to change
Our team is still new to Hadoop, and a colleague and I are trying to
make a decision on file formats. The arguments are:
* We should use a SequenceFile (binary) format as it's faster for the
machine to read than parsing text, and the files are smaller.
* We should use a text file format as i
You could also use multi output from the old api. This will allow you to
create multiple output collectors. One collector could be used at
the beginning of the reduce call for writing the key-value pairing
unaltered, and another collector for writing the results of your processing.
On Fri, Jul 2
Hi All,
Delta Projects are looking to add more Hadoop skills to our development
team in Stockholm, Sweden. If you are interested in applying your
knowledge to an interesting set of data and problems in the online
advertising domain, send an email to per.mellqv...@deltaprojects.se
Cheers
Per Mellq
A quick reminder that there's one week left to submit your abstract for
this year's Surge Scalability Conference. The event is taking place on
Sept 30 and Oct 1, 2010 in Baltimore, MD. Surge focuses on case studies
that address production failures and the re-engineering efforts that led
to victor
On Fri, Jul 2, 2010 at 2:27 PM, Allen Wittenauer
wrote:
>
> On Jul 1, 2010, at 7:50 PM, elton sky wrote:
>
>> hello,
>>
>> I am trying to separate my 6 nodes onto 2 different racks.
>> For test purpose, I wrote a bash file which smply returns "rack0" all the
>> time. And I add property "topology.s
On Jul 1, 2010, at 7:50 PM, elton sky wrote:
> hello,
>
> I am trying to separate my 6 nodes onto 2 different racks.
> For test purpose, I wrote a bash file which smply returns "rack0" all the
> time. And I add property "topology.script.file.name" in core-site.xml.
rack0 or /rack0?
I think the
A couple of things...
Does your script have a default rack defined? So if it can't find your machine,
you default to it being on rack_default ?
(You could use rack0, but then you have a problem will you know what's really
in rack0 or what's kicking out the default value?)
The other issue is th
Hi ALL,
Can anyone tell me that how will i compile the whole hadoop
directory if i add some files to hadoop core directory or i change some code
in some of the files.
Regards,
Ahmad Shahzad
Hi,
Isn't possible to hack-in the intermediate files generated ?
I am writing a compilation framework, so i dont want to mess up with
existing programming framework. The upper layer or the programmer should
write the program the way he should write, and I want to leverage the
intermediate file ge
I am running with 10240 now and jobs look to be working fine. Need to
confirm this by reverting back to 1024 and see jobs failing. :)
Thanks!
On Wed, Jun 30, 2010 at 10:59 PM, Siddharth Karandikar
wrote:
> Yeah. Looks like its set to 1024 right now. Change that to say 10
> times more and run
Hi Pramy,
I would setup one M/R job to just map (setNumReducers=0) and chain another job
that uses a unity mapper to pass the intermediate data to the reduce step.
Nick
Sent by radiation.
- Original Message -
From: Pramy Bhats
To: common-user@hadoop.apache.org
Sent: Fri Jul 02 01:05:2
14 matches
Mail list logo