M/R job with a single reducer would do the job. This way you can
utilize distributed sort and merge/combine/dedupe key/values as you
wish.
On 5/11/11, 丛林 wrote:
> Hi all,
>
> There is lots of SequenceFile in HDFS, how can I merge them into one
> SequenceFile?
>
> Thanks for you suggestion.
>
> -L
Are you doing this as a MapReduce job or is it a simple linear
program? MapReduce could be much faster (Combined-files input format,
with a few Reducers for merging if you need that as well).
On Thu, May 12, 2011 at 5:18 AM, 丛林 wrote:
> Hi, all.
>
> I want to write lots of little files (32GB) to
Hi all,
There is lots of SequenceFile in HDFS, how can I merge them into one
SequenceFile?
Thanks for you suggestion.
-Lin
Hi, all.
I want to write lots of little files (32GB) to HDFS as
org.apache.hadoop.io.SequenceFile.
But now it is too slow: we use about 8 hours to create this
SequenceFile (6.7GB).
So I wonder how to create this SequenceFile more faster?
Thanks for your suggestion.
-Best Wishes,
-Lin
All,
Thanks Harsh for your response. I have, however, solved the problem and I
shall now share.
The upshot is the property in question, when I did the get() was in fact not
null. It contained the full XML document. I thought it was either null or
a zero length string because I was logging the
Hello Geoffry,
On Wed, May 11, 2011 at 8:40 PM, Geoffry Roberts
wrote:
> All,
>
> I am attempting to pass a string value from my driver to each one of my
> mappers and it is not working. I can set the value, but when I read it back
> it returns null. the value is not null when I set() it and I
All,
I am attempting to pass a string value from my driver to each one of my
mappers and it is not working. I can set the value, but when I read it back
it returns null. the value is not null when I set() it and I am using the
correct key when I attempt to get() it. This should be a simple, str
Hi all,
Why do we every now and then see a job remaining in Running state with no more
Mappers or Reducers running, while the reduce progress tells us it's 99.99%
done? Might this be due to a stranded process?
Cheers,
Evert
Thanks
On Tue, May 10, 2011 at 11:48 PM, Amar Kamat wrote:
> The property to set the max number of task failures a job can tolerate is
> ‘mapred.max.map.failures.percent’ in the old API and
> ‘mapreduce.map.failures.maxpercent’ in the new API. This determines the
> job faillure.
> Amar
>
>
>
>