Renato,
Here's how file reading works in Hadoop (pretty detailed, but I actually
simplified a few spots in this description -- one can get pretty creative by
providing own implementations of certain interfaces).
First, a little about HDFS.
A file is physically stored in HDFS as a collection of b
Thanks for answering Daniel!
But there are a couple of things I don't quite get. For example, you say
that each mapper will be reading just configured amount of rows, but
wouldn't the mappers end up reading the whole file? And my other question is
if this is implemented, do you know in which classe
Sampling job in Pig is used in "order by" and "skewed join". It will be
translated to a single map-reduce job. In the map, we sample the data
with a configurable interval; in the reduce, we do a "group all"
followed by a nested foreach. Within foreach, we do a nested sort and
then feed the resu
Hey does anybody know if PIG-841 was developed? And if it was, how is it
being used by Pig?
Thanks in advance.
Renato M.