RE: question about file input format

2011-08-18 Thread Zhixuan Zhu
Thanks very much for the prompt reply! It makes perfect sense. I'll give it a try. Grace -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Thursday, August 18, 2011 10:03 AM To: common-dev@hadoop.apache.org Subject: Re: question about file input format Grace, In

Re: question about file input format

2011-08-18 Thread Harsh J
ailto:ha...@cloudera.com] > Sent: Wednesday, August 17, 2011 9:36 PM > To: common-dev@hadoop.apache.org > Subject: Re: question about file input format > > Zhixuan, > > You'll require two things here, as you've deduced correctly: > > Under InputFormat >

RE: question about file input format

2011-08-18 Thread Zhixuan Zhu
o read the file to memory right? How should I implement the next function accordingly? Thanks again, Grace -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Wednesday, August 17, 2011 9:36 PM To: common-dev@hadoop.apache.org Subject: Re: question about file input forma

Re: question about file input format

2011-08-17 Thread Harsh J
Zhixuan, You'll require two things here, as you've deduced correctly: Under InputFormat - isSplitable -> False - getRecordReader -> A simple implementation that reads the whole file's bytes to an array/your-construct and passes it (as part of next(), etc.). For example, here's a simple record re

Re: question about file input format

2011-08-17 Thread Arun C Murthy
What file format do you want to use ? If it's Text or SequenceFile, or any other existing derivative of FileInputFormat, just override isSplittable and rely on the actual RecordReader. Arun On Aug 17, 2011, at 3:58 PM, Zhixuan Zhu wrote: > I'm new Hadoop and currently using Hadoop 0.20.2 to tr