HI  Nair,
have you know the class please? I tried to find but failed. I know 
NewDirectOutputCollector is used to write tmp files.


---Original---
From: "?7?4 R Nair (?1?6?1?1?1?1?1?2?1?0?1?9?1?6 
?1?8?1?0?1?5?1?6)"<ravishankar.n...@gmail.com>
Date: 2017/1/30 13:32:04
To: 
"dev"<d...@spark.apache.org>;"user"<user@hadoop.apache.org>;"user"<u...@spark.apache.org>;
Subject: No Reducer scenarios


Dear all,



1) When we don't set the reducer class in driver program, IdentityReducer is 
invoked.


2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is 
invoked.


Now, in the second scenario, we observed that the output is part-m-xx 
format(instead of part-r-xx format) , which shows the map output. But we know 
that the output of Map is always written to intermediate local file system. So 
who/which class is responsible for taking these intermediate Map outputs from 
local file system and writes to HDFS ? Does this particular class performs this 
write operation only when setNumReduceTasks is set to zero?


Best, Ravion

Reply via email to