Thanks very much for the prompt reply! It makes perfect sense. I'll give
it a try.
Grace
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Thursday, August 18, 2011 10:03 AM
To: common-dev@hadoop.apache.org
Subject: Re: question about file input format
Grace,
In
ailto:ha...@cloudera.com]
> Sent: Wednesday, August 17, 2011 9:36 PM
> To: common-dev@hadoop.apache.org
> Subject: Re: question about file input format
>
> Zhixuan,
>
> You'll require two things here, as you've deduced correctly:
>
> Under InputFormat
>
o read the file to memory right? How should I implement the next
function accordingly?
Thanks again,
Grace
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Wednesday, August 17, 2011 9:36 PM
To: common-dev@hadoop.apache.org
Subject: Re: question about file input forma
Zhixuan,
You'll require two things here, as you've deduced correctly:
Under InputFormat
- isSplitable -> False
- getRecordReader -> A simple implementation that reads the whole
file's bytes to an array/your-construct and passes it (as part of
next(), etc.).
For example, here's a simple record re
What file format do you want to use ?
If it's Text or SequenceFile, or any other existing derivative of
FileInputFormat, just override isSplittable and rely on the actual RecordReader.
Arun
On Aug 17, 2011, at 3:58 PM, Zhixuan Zhu wrote:
> I'm new Hadoop and currently using Hadoop 0.20.2 to tr