I guess this question was igored, so I just post it again.
From my understanding, HDFS uses a single thread to do read and write.
Since a file is composed of many blocks and each block is stored as a file
in the underlying FS, we can do some parallelism on block base.
When read across
Thanks Grant, but i want to do it without using any external library. Is there
some way of it to have more than one reducers in a single job?
From: Grant Mackey gmac...@cs.ucf.edu
To: general@hadoop.apache.org
Sent: Thu, July 1, 2010 4:15:57 PM
Subject: Re:
Hi folks,
I want to just display the output of my mapper on console or want to log it in
some file. System.out.println and Log4j's logger are not displaying the output.
How can I see the intermediate results?
Cheers
one option can be to use multiple chained jobs using JobControl object.
But can anyone tell whether in this case the output of first job be written to
the disk? Actually I want to minimize access to the disk.
Cheers
From: Fariha Atta fariha.atta...@gmail.com
Actually they also listen here and this is a basic question...
I'm not an expert, but how does having multiple threads really help this
problem?
I'm assuming you're talking about a map/reduce job and not some specific client
code which is being run on a client outside of the cloud/cluster
Yeah, a good way to think of it is that parallelism is achieved at the
application level.
On the input side, you can process multiple files in parallel or one
file in parallel by logically splitting and opening multiple readers
of the same file at multiple points. Each of these readers is single
A quick reminder that there's one week left to submit your abstract for
this year's Surge Scalability Conference. The event is taking place on
Sept 30 and Oct 1, 2010 in Baltimore, MD. Surge focuses on case studies
that address production failures and the re-engineering efforts that led
to