Why single thread for HDFS?

2010-07-02 Thread elton sky
I guess this question was igored, so I just post it again. From my understanding, HDFS uses a single thread to do read and write. Since a file is composed of many blocks and each block is stored as a file in the underlying FS, we can do some parallelism on block base. When read across

Re: Chaining Map-Reduce

2010-07-02 Thread abc xyz
Thanks Grant, but i want to do it without using any external library. Is there some way of it to have more than one reducers in a single job? From: Grant Mackey gmac...@cs.ucf.edu To: general@hadoop.apache.org Sent: Thu, July 1, 2010 4:15:57 PM Subject: Re:

Displaying Map output in MapReduce

2010-07-02 Thread abc xyz
Hi folks, I want to just display the output of my mapper on console or want to log it in some file. System.out.println and Log4j's logger are not displaying the output. How can I see the intermediate results? Cheers

Re: Chaining Map-Reduce

2010-07-02 Thread abc xyz
one option can be to use multiple chained jobs using JobControl object. But can anyone tell whether in this case the output of first job be written to the disk? Actually I want to minimize access to the disk. Cheers From: Fariha Atta fariha.atta...@gmail.com

RE: Why single thread for HDFS?

2010-07-02 Thread Segel, Mike
Actually they also listen here and this is a basic question... I'm not an expert, but how does having multiple threads really help this problem? I'm assuming you're talking about a map/reduce job and not some specific client code which is being run on a client outside of the cloud/cluster

Re: Why single thread for HDFS?

2010-07-02 Thread Jay Booth
Yeah, a good way to think of it is that parallelism is achieved at the application level. On the input side, you can process multiple files in parallel or one file in parallel by logically splitting and opening multiple readers of the same file at multiple points. Each of these readers is single

CFP for Surge Scalability Conference 2010

2010-07-02 Thread Jason Dixon
A quick reminder that there's one week left to submit your abstract for this year's Surge Scalability Conference. The event is taking place on Sept 30 and Oct 1, 2010 in Baltimore, MD. Surge focuses on case studies that address production failures and the re-engineering efforts that led to