Hi
>
>
>
> I tried to use your CombileFileInputFormat implementation. However, I get
the following exception
>
>
>
> ‘not org.apache.hadoop.mapred.InputFormat’
>
>
>
> I am using hadoop 2.4.1 and it looks like it expect older interface as it
does not accept
‘org.apache.hadoop.mapreduce.lib.input.Co
Hi,
Is it not good idea to model key as Text type?
I have a large number of sequential files that has bunch of key value
pairs. I will read these seq files inside the map. Hence my map needs only
filenames. I believe, with CombineFileInputFormat the map will run on nodes
where data is already ava
amples how to do
that.
Yong
Date: Thu, 21 Aug 2014 22:26:12 +0530
Subject: Re: Hadoop InputFormat - Processing large number of small files
From: rab...@gmail.com
To: user@hadoop.apache.org
Hello,
This means that a file with names of all the files that need to be processed
and is fe
Hello,
This means that a file with names of all the files that need to be
processed and is fed to hadoop with NLineInputFormat?
If this is the case, then, how can we ensure that map processes are
scheduled in the node where blocks containing the files are stored already?
regards
rab
On Thu, Au
If I were you, I’ll first generate a file with those file name:
hadoop fs -ls > term_file
Then run the normal map reduce job
Felix
On Aug 21, 2014, at 1:38 AM, rab ra wrote:
> Thanks for the link. If it is not required for CFinputformat to have contents
> of the files in the map process but
Thanks for the link. If it is not required for CFinputformat to have
contents of the files in the map process but only the filename, what
changes need to be done in the code?
rab.
On 20 Aug 2014 22:59, "Felix Chern" wrote:
> I wrote a post on how to use CombineInputFormat:
>
> http://www.idryman
I wrote a post on how to use CombineInputFormat:
http://www.idryman.org/blog/2013/09/22/process-small-files-on-hadoop-using-combinefileinputformat-1/
In the RecordReader constructor, you can get the context of which file you are
reading in.
In my example, I created FileLineWritable to include the
Thanks for the response.
Yes, I know wholeFileInputFormat. But i am not sure filename comes to map
process either as key or value. But, I think this file format reads the
contents of the file. I wish to have a inputformat that just gives filename
or list of filenames.
Also, files are very small.
Have you looked at the WholeFileInputFormat implementations? There are
quite a few if search for them...
http://hadoop-sandy.blogspot.com/2013/02/wholefileinputformat-in-java-hadoop.html
https://github.com/tomwhite/hadoop-book/blob/master/ch07/src/main/java/WholeFileInputFormat.java
Regards,
Shah
Hello,
I have a use case wherein i need to process huge set of files stored in
HDFS. Those files are non-splittable and they need to be processed as a
whole. Here, I have the following question for which I need answers to
proceed further in this.
1. I wish to schedule the map process in task tra
10 matches
Mail list logo