Re: are we able to decommission multi nodes at one time?

2013-04-02 Thread Yanbo Liang
It means that may be some replicas will be stay in under replica state? 2013/4/3 Azuryy Yu > bq. then namenode start to copy block replicates on DN-2 to another DN, > supposed DN-2. > > sorry for typo. > > Correct for it: > then namenode start to copy block replicates on DN-1 to another DN, > s

Re: MultipleInputs.addInputPath compile error in eclipse(indigo)

2013-04-02 Thread yypvsxf19870706
hi wow,thank you liang 发自我的 iPhone 在 2013-4-2,17:25,Yanbo Liang 写道: > You set the wrong parameter NodeReducer.class which should be subclass of > Mapper rather than Reducer. > > > 2013/4/2 YouPeng Yang >> HI GUYS >> I want to use the the org.apache.hadoop.mapreduce.lib.input.MultipleInput

Re: MapReduce on Local files

2013-04-02 Thread Harsh J
Not quite sure if I got your question. These tidbits may help though, from what I can understand: * LocalFileSystem's listing uses Java's APIs for file/dir listing, and has no concept of what a hidden file is on its own. It retrieves the whole list. * MR's FileInputFormat (and normal derivatives)

Re: are we able to decommission multi nodes at one time?

2013-04-02 Thread Azuryy Yu
bq. then namenode start to copy block replicates on DN-2 to another DN, supposed DN-2. sorry for typo. Correct for it: then namenode start to copy block replicates on DN-1 to another DN, supposed DN-2. On Wed, Apr 3, 2013 at 9:51 AM, Azuryy Yu wrote: > It's different. > If you just want to st

Re: are we able to decommission multi nodes at one time?

2013-04-02 Thread Azuryy Yu
It's different. If you just want to stop DN-1 a short time, just kill the DataNode process on DN-1. then do what you want. during this time, Namenode cannot receive the heart beat from DN-1, then namenode start to copy block replicates on DN-2 to another DN, supposed DN-2. But when you start DN-1

Re: are we able to decommission multi nodes at one time?

2013-04-02 Thread Henry Junyoung Kim
@Harsh What's the reasons to make big gaps for removing nodes between decommission and just down nodes? In my understanding, both are necessary to copy un-replicated blocks to another alive nodes. If main costs of them are this one, total elapsed time couldn't be big different. Could you shar

Fwd: MapReduce on Local files

2013-04-02 Thread Mohammad Tariq
Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com -- Forwarded message -- From: Mohammad Tariq Date: Tue, Apr 2, 2013 at 5:16 PM Subject: MapReduce on Local files To: mapreduce-u...@hadoop.apache.org Hello list, Is a MR job capable of reading even

Basic hadoop MR question

2013-04-02 Thread jamal sasha
Hi, I have a quick question. I am trying to write MR code using python. In the word count example: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ The reducer.. Why cant in the reducer I can declare a ditionary (hashmap) whose key is word and value is a list o

hadoop datanode kernel build and HDFS multiplier factor

2013-04-02 Thread Marcel Mitsuto F. S.
Hi hadoopers, I just got my hands on ten servers (hp 2950 iii) that were upgraded by another set of servers, and these are the production grid servers. This is a grid to compute exographic metrics from webserver accesslogs like geolocation, ISP, and all kind of metrics related to our portal's aud

Re: hadoop clients

2013-04-02 Thread Marcel Mitsuto F. S.
Thank you for your answer! Sorry for this late response. I just got my hands on ten servers (hp 2950 iii) that were upgraded by another set of servers, and these are the production grid servers. This is a grid to compute exographic metrics from webserver accesslogs like geolocation, ISP, and all

Re: Multiple mappers, different parameter values

2013-04-02 Thread Abhinav M Kulkarni
Hi Mirko, Thanks for the reply. Yeah that's one solution. -- Abhinav On 04/02/2013 10:04 AM, Mirko Kämpf wrote: Hi, I would add an id to the parameter name: "num.iterations.ID =something" If your mapper knows what ID it has it can just pick up this value from the

Re: Linux io scheduler

2013-04-02 Thread Chris Embree
I assume your talking about the I/O scheduler. Based on normal advice, only change this if you have a "smart" device between the OS and the Drives. A SATA controller usually qualifies. I have our DataNodes to to NOOP to reduce the number of layers. As always your mileage may vary and you should

Linux io scheduler

2013-04-02 Thread Patai Sangbutsarakum
Hello Hadoopers, Is anyone ever play linux io scheduler configuration in worker nodes? default for centos is cfq, there are 3 more to choose from. I wonder if we can play with those scheduler, probably we might get better performance ? hope this make sense. Patai

Re: Multiple mappers, different parameter values

2013-04-02 Thread Mirko Kämpf
Hi, I would add an id to the parameter name: "num.iterations.ID=something" If your mapper knows what ID it has it can just pick up this value from the context. But the question is: How does the mapper know about it's ID? Is it related to the input? Thank it can be calculated but this is a domain s

Multiple mappers, different parameter values

2013-04-02 Thread Abhinav M Kulkarni
Hi, I have a situation wherein I have multiple mappers. I am using MultipleInputs class to add them. Now, I need to pass different parameters to different mappers. For e.g. lets say I have a parameter 'num.iterations' that is set differently for different mappers. One way to pass parameters

Re: Provide context to map function

2013-04-02 Thread Abhinav M Kulkarni
Thanks all who replied. I was accidentally using old API hence could not find context argument to the map function. This is solved. On 04/02/2013 01:20 AM, Dino Kečo wrote: You should check multiple input format class which enables you to have more input formats for same mapper. Regards,

Re: Using Hadoop for codec functionality

2013-04-02 Thread Robert Fynes
Thanks for both your responses. I was indeed talking about developing a codec utility as the hadoop application itself. In particular, thanks to Bertrand for the lengthy response. I'm actually learning Hadoop at the moment, so I've been trying to find a suitable (very modestly sized) application f

Re: What is required to run hadoop daemons?

2013-04-02 Thread Nitin Pawar
i guess when you say source code you mean the hadoop binary jar. if not then you never need hadoop source code to run hadoop daemons. You just need the binary jars yes you will need to put the binaries on all the nodes. ideally the hadoop cluster is set across the same unified environment. so i

What is required to run hadoop daemons?

2013-04-02 Thread Agarwal, Nikhil
Hi, I wanted to ask that suppose I have Hadoop source code and Jonbtracker running on my linux machine. If I want to start a TaskTracker daemon on another machine then do I need to put Hadoop source code in that machine or can I start TaskTracker without the source code also? Also, if my maste

Re: Is FileSystem thread-safe?

2013-04-02 Thread Matthew Farrellee
If you're interested in the semantics of FileSystem operations, have a look a HADOOP-9371[0] Depending on what you're trying to do, the thread-safety of a particular FS implementation in a single JVM instance may not be as important as the semantics you get across JVM instances. Best, matt

Job log location and retention

2013-04-02 Thread zheyi rong
Dear all, I would like to ask why the logs (configurations and status) of my job disappeared in the Jobtracker web UI? Specifically, I finished a job four days ago, but I cannot find it in the Jobtracker web UI, neither in the homepage nor "Job Tracker History" on the left-bottom corner. The clu

Re: MultipleInputs.addInputPath compile error in eclipse(indigo)

2013-04-02 Thread Yanbo Liang
You set the wrong parameter NodeReducer.class which should be subclass of Mapper rather than Reducer. 2013/4/2 YouPeng Yang > HI GUYS > I want to use the the > org.apache.hadoop.mapreduce.lib.input.MultipleInputs; > > > However it comes a compile error in my eclipse(indigo): > > public sta

MultipleInputs.addInputPath compile error in eclipse(indigo)

2013-04-02 Thread YouPeng Yang
HI GUYS I want to use the the org.apache.hadoop.mapreduce.lib.input.MultipleInputs; However it comes a compile error in my eclipse(indigo): public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Configuration conf = new Configuration(); Str

Re: Finding mean and median python streaming

2013-04-02 Thread Yanbo Liang
How many Reducer did you start for this job? If you start many Reducers for this job, it will produce multiple output file which named as part-*. And each part is only the local mean and median value of the specific Reducer partition. Two kinds of solutions: 1, Call the method of setNumReduceT

Re: are we able to decommission multi nodes at one time?

2013-04-02 Thread Henry Junyoung Kim
one more question, currently, our cluster is under decommissioning. Without any safe stop steps, could I do downtime work forcibly? 2013. 4. 2., 오후 5:37, Harsh J 작성: > Yes, you can do the downtime work in steps of 2 DNs at a time, > especially since you mentioned the total work would be only

Re: are we able to decommission multi nodes at one time?

2013-04-02 Thread Harsh J
Yes, you can do the downtime work in steps of 2 DNs at a time, especially since you mentioned the total work would be only ~30mins at most. On Tue, Apr 2, 2013 at 1:46 PM, Henry Junyoung Kim wrote: > the rest of nodes to be alive has enough size to store. > > for this one that you've mentioned. >

Re: Provide context to map function

2013-04-02 Thread Dino Kečo
You should check multiple input format class which enables you to have more input formats for same mapper. Regards, Dino On Apr 2, 2013 9:49 AM, "Yanbo Liang" wrote: > protected void map(KEYIN key, VALUEIN value, > Context context) throws IOException, > InterruptedException

Re: are we able to decommission multi nodes at one time?

2013-04-02 Thread Henry Junyoung Kim
the rest of nodes to be alive has enough size to store. for this one that you've mentioned. > its easier to do so in a rolling manner without need of a > decommission. to check my understanding, just shutting down 2 of them and then 2 more and then 2 more without decommissions. is this correct?

Re: are we able to decommission multi nodes at one time?

2013-04-02 Thread Harsh J
Note though that its only possible to decommission 7 nodes at the same time and expect it to finish iff the remaining 8 nodes have adequate free space for the excess replicas. If you're just going to take them down for a short while (few mins each), its easier to do so in a rolling manner without

Re: Provide context to map function

2013-04-02 Thread Yanbo Liang
protected void map(KEYIN key, VALUEIN value, Context context) throws IOException, InterruptedException { context.write((KEYOUT) key, (VALUEOUT) value); } Context is a parameter that the execute environment will pass to the map() function. You can just use it in the map()

Re: are we able to decommission multi nodes at one time?

2013-04-02 Thread Yanbo Liang
It's reasonable to decommission 7 nodes at the same time. But may be it also takes long time to finish it. Because all the replicas in these 7 nodes need to be copied to remaining 8 nodes. The size of transfer from these nodes to the remaining nodes is equal. 2013/4/2 Henry Junyoung Kim > :) >