Re: how to control (or understand) the memory usage in hdfs

2013-03-24 Thread Ted
oh, really? ulimit -n is 2048, I'd assumed that would be sufficient for just testing on my machine. I was going to use 4096 in production. my hdfs-site.xml has dfs.datanode.max.xcievers set to 4096. As for my logs... there's a lot of INFO entries, I haven't gotten around to configuring it down

Best practises to learn hadoop for new users

2013-03-24 Thread suraj nayak

Re: DistributedCache - why not read directly from HDFS?

2013-03-24 Thread Alberto Cordioli
Thanks for your reply Harsh. So if I want to read a simple text file, choosing whether to use DistributedCachce or HDFS it becomes just a matter of performance. Alberto On 23 March 2013 16:17, Harsh J ha...@cloudera.com wrote: A DistributedCache is not used just to distribute simple files but

Child JVM memory allocation / Usage

2013-03-24 Thread nagarjuna kanamarlapudi
Hi, I configured my child jvm heap to 2 GB. So, I thought I could really read 1.5GB of data and store it in memory (mapper/reducer). I wanted to confirm the same and wrote the following piece of code in the configure method of mapper. @Override public void configure(JobConf job) {

2 Reduce method in one Job

2013-03-24 Thread Fatih Haltas
I want to get reduce output as key and value then I want to pass them to a new reduce as input key and input value. So is there any Map-Reduce-Reduce kind of method? Thanks to all.

Re: 2 Reduce method in one Job

2013-03-24 Thread Azuryy Yu
there isn't such method, you had to submit another MR. On Mar 24, 2013 9:03 PM, Fatih Haltas fatih.hal...@nyu.edu wrote: I want to get reduce output as key and value then I want to pass them to a new reduce as input key and input value. So is there any Map-Reduce-Reduce kind of method?

Re: 2 Reduce method in one Job

2013-03-24 Thread Harsh J
You seem to want to re-sort/partition your data without materializing it onto HDFS. Azuryy is right: There isn't a way right now and a second job (with an identity mapper) is necessary. With YARN this is more possible to implement into the project, though. The newly inducted incubator project

Re: 2 Reduce method in one Job

2013-03-24 Thread Fatih Haltas
Thank you very much. You are right Harsh, it is exactly what i am trying to do. I want to process my result, according to the keys and i donot spend time writing this data to hdfs, I want to pass data as input to another reduce. One more question then, Creating 2 diffirent job, secondone has

Re: 2 Reduce method in one Job

2013-03-24 Thread Harsh J
Yes, just use an identity mapper (in new API, the base Mapper class itself identity-maps, in the old API use IdentityMapper class) and set the input path as the output path of the first job. If you'll be ending up doing more such step-wise job chaining, consider using Apache Oozie's workflow

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread பாலாஜி நாராயணன்
Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a

Re: question for commetter

2013-03-24 Thread பாலாஜி நாராயணன்
is there a reason why you dont want to run MRv2 under yarn? On 22 March 2013 22:49, Azuryy Yu azury...@gmail.com wrote: is there a way to separate hdfs2 from hadoop2? I want use hdfs2 and mapreduce1.0.4, exclude yarn. because I need HDFS-HA. -- http://balajin.net/blog

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi
Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread பாலாஜி நாராயணன்
-setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi
Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Jamal B
On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive? If it's set to multiple folders on the same drives, it is probably multiplying the amount of available capacity incorrectly in that it

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi
Thanks. We have a 1-1 configuration of drives and folder in all the datanodes. -Tapas On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote: On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Jamal B
Then I think the only way around this would be to decommission 1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes. And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Alexey Babutin
you said that threshold=10.Run mannualy command : hadoop balancer threshold 9.5 ,then 9 and so with 0.5 step. On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi tapas.sara...@gmail.comwrote: Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi
Hi, Thanks for the idea, I will give this a try and report back. My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi
On Mar 24, 2013, at 3:40 PM, Alexey Babutin zorlaxpokemon...@gmail.com wrote: you said that threshold=10.Run mannualy command : hadoop balancer threshold 9.5 ,then 9 and so with 0.5 step. We are not setting threshold anywhere in our configuration and thus considering the default which I

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Alexey Babutin
I think that it makes help,but start from 1 node.watch where data have moved On Mon, Mar 25, 2013 at 12:44 AM, Tapas Sarangi tapas.sara...@gmail.comwrote: Hi, Thanks for the idea, I will give this a try and report back. My worry is if we decommission a small node (one at a time), will it

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Jamal B
It shouldn't cause further problems since most of your small nodes are already their capacity. You could set or increase the dfs reserved property on your smaller nodes to force the flow of blocks onto the larger nodes. On Mar 24, 2013 4:45 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Hi,

Re: question for commetter

2013-03-24 Thread Azuryy Yu
good question, i just want HA, dont want to change more configuration. On Mar 25, 2013 2:32 AM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: is there a reason why you dont want to run MRv2 under yarn? On 22 March 2013 22:49, Azuryy Yu azury...@gmail.com wrote: is there a way

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi
Thanks. Does this need a restart of hadoop in the nodes where this modification is made ? - On Mar 24, 2013, at 8:06 PM, Jamal B jm151...@gmail.com wrote: dfs.datanode.du.reserved You could tweak that param on the smaller nodes to force the flow of blocks to other nodes. A short

Re: Child JVM memory allocation / Usage

2013-03-24 Thread Ted
did you set the min heap size == your max head size? if you didn't, free memory only shows you the difference between used and commit, not used and max. On 3/24/13, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: Hi, I configured my child jvm heap to 2 GB. So, I thought I

Re: Child JVM memory allocation / Usage

2013-03-24 Thread nagarjuna kanamarlapudi
Hi Ted, As far as i can recollect, I onl configured these parameters property namemapred.child.java.opts/name value-Xmx2048m/value descriptionthis number is the number of megabytes of memory that each mapper and each reducers will have available to use. If jobs start running out

[no subject]

2013-03-24 Thread Fan Bai
Dear Sir, I have a question about Hadoop, when I use Hadoop and Mapreduce to finish a job (only one job in here), can I control the file to work in which node? For example, I have only one job and this job have 10 files (10 mapper need to run). Also in my severs, I have one head node and four

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Jamal B
Yes On Mar 24, 2013 9:25 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks. Does this need a restart of hadoop in the nodes where this modification is made ? - On Mar 24, 2013, at 8:06 PM, Jamal B jm151...@gmail.com wrote: dfs.datanode.du.reserved You could tweak that param on

Re: is it possible to disable security in MapReduce to avoid having PriviledgedActionException?

2013-03-24 Thread Harsh J
What is the exact error you're getting? Can you please paste with the full stack trace and your version in use? Many times the PriviledgedActionException is just a wrapper around the real cause and gets overlooked. It does not necessarily appear due to security code (whether security is enabled

Re:Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread see1230
if the balancer is not running ,or with a low bandwith and slow reaction, i think there may have a signatual unsymmetric between datanodes . At 2013-03-25 04:37:05,Jamal B jm151...@gmail.com wrote: Then I think the only way around this would be to decommission 1 at a time, the smaller

Hadoop-2.x native libraries

2013-03-24 Thread Azuryy Yu
Hi, How to get hadoop-2.0.3-alpha native libraries, it was compiled under 32bits OS in the released package currently.

Re: Hadoop-2.x native libraries

2013-03-24 Thread Harsh J
If you're using a tarball, you'll need to build a native-added tarball yourself with mvn package -Pdist,native,docs -DskipTests -Dtar and then use that. Alternatively, if you're interested in packages, use the Apache Bigtop's scripts from http://bigtop.apache.org/ project's repository and

Re: Child JVM memory allocation / Usage

2013-03-24 Thread Harsh J
The MapTask may consume some memory of its own as well. What is your io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to? On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: Hi, I configured my child jvm heap to 2 GB. So, I thought I could

Re: Child JVM memory allocation / Usage

2013-03-24 Thread nagarjuna kanamarlapudi
io.sort.mb = 256 MB On Monday, March 25, 2013, Harsh J wrote: The MapTask may consume some memory of its own as well. What is your io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to? On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com

shuffling one intermediate pair to more than one reducer

2013-03-24 Thread Vikas Jadhav
Hello I have use case where i want to shuffle same pair to more than one reducer. is there anyone tried this or can give suggestion how to implement it. I have crated jira for same https://issues.apache.org/jira/browse/MAPREDUCE-5063 Thank you. -- * * * Thanx and Regards* * Vikas Jadhav*

Re: MapReduce Failed and Killed

2013-03-24 Thread Hemanth Yamijala
Any MapReduce task needs to communicate with the tasktracker that launched it periodically in order to let the tasktracker know it is still alive and active. The time for which silence is tolerated is controlled by a configuration property mapred.task.timeout. It looks like in your case, this has

Re: Hadoop-2.x native libraries

2013-03-24 Thread Azuryy Yu
Thanks Harsh! I used -Pnative got it. I am compile src code. I made MRv1 work with HDFSv2 successfully. On Mar 25, 2013 12:56 PM, Harsh J ha...@cloudera.com wrote: If you're using a tarball, you'll need to build a native-added tarball yourself with mvn package -Pdist,native,docs -DskipTests

Any answer ? Candidate application for map reduce

2013-03-24 Thread AMARNATH, Balachandar
Any answers from anyone of you :) Regards Bala From: AMARNATH, Balachandar [mailto:balachandar.amarn...@airbus.com] Sent: 22 March 2013 10:25 To: user@hadoop.apache.org Subject: Candidate application for map reduce Hello, I am looking for an sample application (preferably image processing,