Re: File format question when write map-reduce applications

Sean Owen Wed, 06 Jul 2011 03:12:19 -0700

A block is a piece of a file. It does not (necessarily) have a meaning, or a
"file format", by itself. You would not address HDFS blocks individually
from this level. So I suppose the first answer is, no, they do not have
different formats, though the question is not well-formed.

You can have whatever you like in whatever HDFS file you want. Your
application (be it Mahout, or any MapReduce application) just needs to be
prepared to read it. If your input is a CSV file with a header line, one
mapper will read that first chunk with the header line. You don't know which
mapper that will be. Only one will read it, so no you would not construct a
MapReduce app that depends on all mappers seeing some header line, because
they don't.

Yes, so, you would not observe any Mahout job doing this, because it doesn't
work.

On Wed, Jul 6, 2011 at 11:03 AM, Xiaobo Gu <guxiaobo1...@gmail.com> wrote:

> Hi,
> Does every block of files in HDFS have to be the same file format when
> writing map-reduce applications, a more specific question is , when
> dealing with CSV files, can we have a head in the file? I have seen
> Mahout applications using the UCI repository file format which is
> similar as CSV without header, does it because all map reduce task
> must run semantically, having a header will cause one map task be
> unique to others.
>
> Regards,
>
> Xiaobo Gu
>

Re: File format question when write map-reduce applications

Reply via email to