Hello,
For test purpose we need to configure a custom s3 endpoint for s3n/s3a.
More precisely we've need to test that parquet writes correctly the content
to s3.
We've setup a s3rver, so the endpoint should be `http://s3rver:8000`. I've
tried different method but no luck so far.
Things I've
Hi,
For a MapReduce job with lots of intermediate results between mapper and
reducer, I implement a combiner function with a more compact representation
of the result data and I verified the final result is good when using
combiner. But when I look at the job counter FILE_BYTES_WRITTEN or
Reduce
Hi,
I want to change the cluster's capacity of reduce slots on a per job basis.
Originally I have 8 reduce slots for a tasktracker.
I did:
conf.set(mapred.tasktracker.reduce.tasks.maximum, 4);
...
Job job = new Job(conf, ...)
And in the web UI I can see that for this job, the max reduce tasks
on tasktrackers then you will need to
edit tasktracker conf and restart tasktracker
On Apr 30, 2013 3:30 PM, Han JU ju.han.fe...@gmail.com wrote:
Hi,
I want to change the cluster's capacity of reduce slots on a per job
basis. Originally I have 8 reduce slots for a tasktracker.
I did:
conf.set
object you created
job.setNumMapTasks()
Note this is just a hint and again the number will be decided by the input
split size.
On Tue, Apr 30, 2013 at 3:39 PM, Han JU ju.han.fe...@gmail.com wrote:
Thanks Nitin.
What I need is to set slot only for a specific job, not for the whole
cluster
basically if I understand correctly
you want to limit the # parallel execution of reducers only for this job?
On Tue, Apr 30, 2013 at 4:02 PM, Han JU ju.han.fe...@gmail.com wrote:
Thanks.
In fact I don't want to set reducer or mapper numbers, they are fine.
I want to set the reduce slot capacity
Ted mentioned. In this case, the
straggler will be seen to be working on data.
b) you have a hung process. This can be more difficult to diagnose, but
indicates that there is a problem with your cluster.
On Fri, Apr 26, 2013 at 2:21 AM, Han JU ju.han.fe...@gmail.com wrote:
Hi,
I've
Hi,
I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My
questionis that in one of the jobs, map and reduce tasks show 100% finished
in about 1m 30s, but I have to wait another 5m for this job to finish.
This job writes about 720mb compressed data to HDFS with replication factor
the parameters on run time if
any is fixed and run at hadoop like hadoop -jar jarfilename.jar parameters
*Thanks Regards*
∞
Shashwat Shriparv
On Tue, Apr 23, 2013 at 6:51 PM, Han JU ju.han.fe...@gmail.com wrote:
Hi,
I'm getting my hands on hadoop. One thing I really want to know is how
Hi,
I'm getting my hands on hadoop. One thing I really want to know is how you
launch MR jobs in a development environment.
I'm currently using Eclipse 3.7 with hadoop plugin from hadoop 1.0.2. With
this plugin I can manage HDFS and submit job to cluster. But the strange
thing is, every job
10 matches
Mail list logo