Re: How do find the version of Hadoop inside a running Map or reduce task

2012-11-05 Thread Lewis John Mcgibbney
I've never done this but you can try poking around with http://hadoop.apache.org/docs/r1.0.3/api/index.html?org/apache/hadoop/HadoopVersionAnnotation.html hth Lewis On Mon, Nov 5, 2012 at 8:54 PM, Steve Lewis lordjoe2...@gmail.com wrote: I need to determine what version of Hadoop is running -

Re: How do find the version of Hadoop inside a running Map or reduce task

2012-11-05 Thread David Rosenstrauch
On 11/05/2012 03:54 PM, Steve Lewis wrote: I need to determine what version of Hadoop is running - say under AWS - I really want to use an API or properties in the running code but do not know how - any ideas Probably not the best way, but one possible way: make a call to Runtime.exec() and

Re: How do find the version of Hadoop inside a running Map or reduce task

2012-11-05 Thread David Rosenstrauch
On 11/05/2012 04:02 PM, David Rosenstrauch wrote: On 11/05/2012 03:54 PM, Steve Lewis wrote: I need to determine what version of Hadoop is running - say under AWS - I really want to use an API or properties in the running code but do not know how - any ideas Probably not the best way, but one

Re: How do find the version of Hadoop inside a running Map or reduce task

2012-11-05 Thread Steve Lewis
Thanks - that works perfectly The following code reports the version as a counter under Performance // sneaky trick to extract the version String version = VersionInfo.getVersion(); context.getCounter(Performance, Version- + version ).increment(1); On Mon, Nov 5, 2012

Re: General Information

2012-11-05 Thread Bertrand Dechoux
Hi Ramesh, It is an Apache project and if you want information you should look at the website, especially : http://hadoop.apache.org/bylaws.html http://hadoop.apache.org/who.html Using the code repository, you might get an idea of who contributed what. As for the objectives, you will have to

Re: Task does not enter reduce function after secondary sort

2012-11-05 Thread Bertrand Dechoux
Only to be clear but the @Override annotation has no impact by itself. However if you put it then the compiler can check for you that you are indeed overriding a method. If you don't use the annotation, you might be defining another function (with another signature) which wouldn't be called. And

Re: Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError Java Heap Space

2012-11-05 Thread Eduard Skaley
We increased mapreduce.reduce.memory.mb to 2GB and mapreduce.reduce.java.opts to 1.5GB. Now we are getting livelocks for our jobs, map jobs don't start. We are using CapacityScheduler because we had LiveLocks with FifoScheduler. Does anybody have a clue ? By the way it happens on Yarn not on

Re: General Information

2012-11-05 Thread Ramesh C
Thank you, Bertrand. Regards, Ramesh On Nov 5, 2012, at 12:30 AM, Bertrand Dechoux decho...@gmail.com wrote: Hi Ramesh, It is an Apache project and if you want information you should look at the website, especially : http://hadoop.apache.org/bylaws.html http://hadoop.apache.org/who.html

RE: Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError Java Heap Space

2012-11-05 Thread Kartashov, Andy
Your error takes place during reduce task, when temporary files are written to memory/disk. You are clearly running low on resources. Check your memory $ free -m and disk space $ df -H as well as $hadoop fs -df I remember it took me a couple of days to figure out why I was getting heap size

Re: Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError Java Heap Space

2012-11-05 Thread Alejandro Abdelnur
Eduard, Would you try using the following properties in your job invocation? -D mapreduce.map.java.opts=-Xmx768m -D mapreduce.reduce.java.opts=-Xmx768m -D mapreduce.map.memory.mb=2000 -D mapreduce.reduce.memory.mb=3000 Thx On Mon, Nov 5, 2012 at 7:43 AM, Kartashov, Andy andy.kartas...@mpac.ca

Re: File to large Error when MR

2012-11-05 Thread Andy Isaacson
Moving the thread to user@. The general@ list is not used for technical questions. On Fri, Nov 2, 2012 at 1:59 AM, zjl208399617 zjl208399...@163.com wrote: When i running Hive query option: there often throw Error from Reduce Tasks: Error: java.io.IOException: File too large at

Re: Job running on YARN gets automatically killed after 10-12 minutes

2012-11-05 Thread Vinod Kumar Vavilapalli
Is this your custom application and not, say, MapReduce or the distributed shell? If that is the case, the ApplicationMaster needs to constantly ping the ResourceManager so that RM can know that it is alive. This is done by simply doing an allocate(..) call that is part of the scheduler API.

AUTO: Yuan Jin is out of the office. (returning 11/07/2012)

2012-11-05 Thread Yuan Jin
I am out of the office until 11/07/2012. I am out of office. I will reply you when I am back. For HAMSTER related things, you can contact Jason(Deng Peng Zhou/China/IBM) For CFM related things, you can contact Daniel(Liang SH Su/China/Contr/IBM) For TMB related things, you can contact

Re: AUTO: Yuan Jin is out of the office. (returning 11/07/2012)

2012-11-05 Thread Jonas Partner
Might be better to let people know when you are in the office On Monday, 5 November 2012 at 20:07, Yuan Jin wrote: I am out of the office until 11/07/2012. I am out of office. I will reply you when I am back. For HAMSTER related things, you can contact Jason(Deng Peng Zhou/China/IBM)

backup of hdfs data

2012-11-05 Thread uday chopra
What do folks do to backup hdfs data? Has anyone experience in trying to use enterprise solutions such as netbackup with datadomain D-2-D appliance for doing backups of data in hdfs? If so, what is the average dedup ratio? (I understand mileage can vary based on the type of data) Thanks, Uday

Re: backup of hdfs data

2012-11-05 Thread Ted Dunning
Conventional enterprise backup systems are rarely scaled for hadoop needs. Both bandwidth and size are typically lacking. My employer, Mapr, offers a hadoop-derived distribution that includes both point in time snapshots and remote mirrors. Contact me off line for more info. Sent from my

problem with hadoop-snappy

2012-11-05 Thread alxsss
Hello, I use hadoop-1.0.4 I have followed instruction to install hadoop-snappy at http://code.google.com/p/hadoop-snappy/ When I run a mapred job I see FATAL org.apache.hadoop.mapred.TaskTracker: Task: attempt_201211051656_0002_m_00_3 - Killed :

One mapper/reducer runs on a single JVM

2012-11-05 Thread Lin Ma
Hello Hadoop experts, I have a question in my mind for a long time. Supposing I am developing M-R program, and it is Java based (Java UDF, implements mapper or reducer interface). My question is, in this scenario, whether a mapper or a reducer is a separate JVM process? E.g. supposing on a

Re: backup of hdfs data

2012-11-05 Thread Jay Vyas
Amazon has a really cheap, large scale backup solution called glacier which is good if your just backing up for the sake of archival in emergencies. If you need the archival to be performant, than you might want to just consider a higher replication rate.

Re: problem with hadoop-snappy

2012-11-05 Thread Binglin Chang
I think hadoop-1.0.4 already have snappy included, you should not using other third party libraries. On Tue, Nov 6, 2012 at 9:10 AM, alx...@aim.com wrote: Hello, I use hadoop-1.0.4 I have followed instruction to install hadoop-snappy at http://code.google.com/p/hadoop-snappy/ When I

Reading SequenceFiles throws Wrong key class r

2012-11-05 Thread Saptarshi Guha
Hello, Sorry for the vague subject I am writing some code using CDH 0.20.2-cdh3u4 to read RHBytesWritable from a file(F) on the HDFS. (1) The key/values present in F are class org.godhuli.rhipe. RHBytesWritable I am restructuring my code, so now, RHBytesWritable is in

Re: backup of hdfs data

2012-11-05 Thread Michael Segel
You have other options. You could create a secondary cluster. You could also look in to Cleversafe and what they are doing with Hadoop. Here's the sad thing about backing up to tape... you can dump a couple of 10's of TB to tape. You lose your system. How long will it take to recover? And

Re: One mapper/reducer runs on a single JVM

2012-11-05 Thread Michael Segel
Mappers and Reducers are separate JVM processes. And yes you need to take in to account the amount of memory the machine(s) when you configure the number of slots. If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it. On Nov 5, 2012, at 7:12 PM, Lin

Re: One mapper/reducer runs on a single JVM

2012-11-05 Thread Lin Ma
Thanks Michael, If you are running just Hadoop, you could have a little swap. Running HBase, fuggit about it. -- could you give a bit more information about what do you mean swap and why forget for HBase? regards, Lin On Tue, Nov 6, 2012 at 12:46 PM, Michael Segel

Re: backup of hdfs data

2012-11-05 Thread Bharath Mundlapudi
If data is less in your cluster (say less than few GBs) then answer is yes. But it is an expensive route. For large data sets, traditional means is not feasible and it is expensive. If you want optimal cost based solution, you could setup another local/remote cluster and try discp or simply

Re: backup of hdfs data

2012-11-05 Thread Serge Blazhiyevskyy
I second this proposed solution. Distcp work very well with backing up data on the separate cluster From: Bharath Mundlapudi bharathw...@yahoo.commailto:bharathw...@yahoo.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org,