Custom s3 endpoint for s3n/s3a

2016-01-05 Thread Han JU
Hello, For test purpose we need to configure a custom s3 endpoint for s3n/s3a. More precisely we've need to test that parquet writes correctly the content to s3. We've setup a s3rver, so the endpoint should be `http://s3rver:8000`. I've tried different method but no luck so far. Things I've

question about combiner

2013-05-10 Thread Han JU
Hi, For a MapReduce job with lots of intermediate results between mapper and reducer, I implement a combiner function with a more compact representation of the result data and I verified the final result is good when using combiner. But when I look at the job counter FILE_BYTES_WRITTEN or Reduce

Set reducer capacity for a specific M/R job

2013-04-30 Thread Han JU
Hi, I want to change the cluster's capacity of reduce slots on a per job basis. Originally I have 8 reduce slots for a tasktracker. I did: conf.set(mapred.tasktracker.reduce.tasks.maximum, 4); ... Job job = new Job(conf, ...) And in the web UI I can see that for this job, the max reduce tasks

Re: Set reducer capacity for a specific M/R job

2013-04-30 Thread Han JU
on tasktrackers then you will need to edit tasktracker conf and restart tasktracker On Apr 30, 2013 3:30 PM, Han JU ju.han.fe...@gmail.com wrote: Hi, I want to change the cluster's capacity of reduce slots on a per job basis. Originally I have 8 reduce slots for a tasktracker. I did: conf.set

Re: Set reducer capacity for a specific M/R job

2013-04-30 Thread Han JU
object you created job.setNumMapTasks() Note this is just a hint and again the number will be decided by the input split size. On Tue, Apr 30, 2013 at 3:39 PM, Han JU ju.han.fe...@gmail.com wrote: Thanks Nitin. What I need is to set slot only for a specific job, not for the whole cluster

Re: Set reducer capacity for a specific M/R job

2013-04-30 Thread Han JU
basically if I understand correctly you want to limit the # parallel execution of reducers only for this job? On Tue, Apr 30, 2013 at 4:02 PM, Han JU ju.han.fe...@gmail.com wrote: Thanks. In fact I don't want to set reducer or mapper numbers, they are fine. I want to set the reduce slot capacity

Re: M/R job optimization

2013-04-29 Thread Han JU
Ted mentioned. In this case, the straggler will be seen to be working on data. b) you have a hung process. This can be more difficult to diagnose, but indicates that there is a problem with your cluster. On Fri, Apr 26, 2013 at 2:21 AM, Han JU ju.han.fe...@gmail.com wrote: Hi, I've

M/R job optimization

2013-04-26 Thread Han JU
Hi, I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My questionis that in one of the jobs, map and reduce tasks show 100% finished in about 1m 30s, but I have to wait another 5m for this job to finish. This job writes about 720mb compressed data to HDFS with replication factor

Re: Job launch from eclipse

2013-04-25 Thread Han JU
the parameters on run time if any is fixed and run at hadoop like hadoop -jar jarfilename.jar parameters *Thanks Regards* ∞ Shashwat Shriparv On Tue, Apr 23, 2013 at 6:51 PM, Han JU ju.han.fe...@gmail.com wrote: Hi, I'm getting my hands on hadoop. One thing I really want to know is how

Job launch from eclipse

2013-04-23 Thread Han JU
Hi, I'm getting my hands on hadoop. One thing I really want to know is how you launch MR jobs in a development environment. I'm currently using Eclipse 3.7 with hadoop plugin from hadoop 1.0.2. With this plugin I can manage HDFS and submit job to cluster. But the strange thing is, every job