Adding new name node location

2013-04-17 Thread Henry Hung
Hi Everyone, I'm using Hadoop 1.0.4 and only define 1 location for name node files, like this: property namedfs.name.dir/name value/home/hadoop/hadoop-data/namenode/value /property Now I want to protect my name node files by changing the configuration to: property

Re: How to balance reduce job

2013-04-17 Thread Ajay Srivastava
Tariq probably meant distribution of keys from key, value pair emitted by mapper. Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer. Regards, Ajay Srivastava On 17-Apr-2013, at 11:08

Re: Adding new name node location

2013-04-17 Thread varun kumar
Hi Henry, As per your mail Point number 1 is correct. After doing these changes metadata will be written in the new partition. Regards, Varun Kumar.P On Wed, Apr 17, 2013 at 11:32 AM, Henry Hung ythu...@winbond.com wrote: Hi Everyone, ** ** I’m using Hadoop 1.0.4 and only define 1

Re: How to balance reduce job

2013-04-17 Thread bejoy . hadoop
Yes, That is a valid point. The partitioner might do non uniform distribution and reducers can be unevenly loaded. But this doesn't change the number of reducers and its distribution across nodes. The bottom issue as I understand is that his reduce tasks are scheduled on just a few nodes.

RE: Adding new name node location

2013-04-17 Thread Henry Hung
Hi Varun Kumar, Could you be more elaborate about how the new changes being made to new name node? The scenario in my mind is: Suppose old name node metadata contains 100 hdfs files. Then I restart by using stop-dfs, change config and start-dfs. Hadoop will automatically create new name node

Re: Re: How to balance reduce job

2013-04-17 Thread rauljin
property namemapred.tasktracker.map.tasks.maximum/name value4/value /property property namemapred.tasktracker.reduce.tasks.maximum/name value4/value /property I am not clear the number of reuce slots in each Task tracker.Is it define in the

Re: Adding new name node location

2013-04-17 Thread 李洪忠
modify conf file and restart name node is the best way. you needn't restart the cluster DFS. the files in /backup and /home are the same. 于 2013/4/17 14:38, Henry Hung 写道: Hi Varun Kumar, Could you be more elaborate about how the new changes being made to new name node? The scenario in my

Re: Submitting mapreduce and nothing happens

2013-04-17 Thread Amit Sela
Input path includes data in it. I also tried job.waitFromComplete(true) but it acts exactly the same. For what it's worth, in the staging dir in hdfs I do see empty (or cleaned up) folders created with the correct JT ID and with an incremented count: job_201304150711_01,

答复: Adding new name node location

2013-04-17 Thread jiangchaocai
NOT authoritative: The “new name node” will get the full copy of the name node metadata when you restart the dfs service. During the hdfs running , same changes will be made on the name node and the backup folder. In fact, it only contains 2 files: FSImage, EditLog Thanks, John

Re: Basic Doubt in Hadoop

2013-04-17 Thread Ramesh R Nair
Hi Bejoy, Regarding the output of Map phase, does Hadoop store it in local fs or in HDFS. I believe it is in the former. Correct me if I am wrong. Regards Ramesh On Wed, Apr 17, 2013 at 10:30 AM, bejoy.had...@gmail.com wrote: The data is in HDFS in case of WordCount MR sample. In

Re: Basic Doubt in Hadoop

2013-04-17 Thread bejoy . hadoop
You are correct, map outputs are stored in LFS not in HDFS. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Ramesh R Nair rameshrkco...@gmail.com Date: Wed, 17 Apr 2013 13:06:32 To: user@hadoop.apache.org; bejoy.had...@gmail.com Subject: Re:

Re: Job cleanup

2013-04-17 Thread Robert Dyer
I think the problem is I need to report progress() from my cleanup task. How can I do this? The commitJob() in my custom org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter[1] only provides org.apache.hadoop.mapreduce.JobContext[2] which has no getProgressible() like the old

How to change secondary namenode location in Hadoop 1.0.4?

2013-04-17 Thread Henry Hung
Hi All, What is the property name of Hadoop 1.0.4 to change secondary namenode location? Currently the default in my machine is /tmp/hadoop-hadoop/dfs/namesecondary, I would like to change it to /data/namesecondary Best regards, Henry The privileged

Re: How to change secondary namenode location in Hadoop 1.0.4?

2013-04-17 Thread Bejoy Ks
Hi Henry You can change the secondary name node storage location by overriding the property 'fs.checkpoint.dir' in your core-site.xml On Wed, Apr 17, 2013 at 2:35 PM, Henry Hung ythu...@winbond.com wrote: Hi All, ** ** What is the property name of Hadoop 1.0.4 to change secondary

Re: Adjusting tasktracker heap size?

2013-04-17 Thread Bejoy Ks
Hi Marcos, You need to consider the slots based on the available memory Available Memory = Total RAM - (Memory for OS + Memory for Hadoop Daemons like DN,TT + Memory for other servicess if any running in that node) Now you need to consider the generic MR jobs planned on your cluster. Say if

Re: Adjusting tasktracker heap size?

2013-04-17 Thread MARCOS MEDRADO RUBINELLI
Thank you for the replies. Thankfully, this cluster works with a fairly regular load, so it shouldn't be too hard to fine-tune. Regards, Marcos On 17-04-2013 09:23, Bejoy Ks wrote: Hi Marcos, You need to consider the slots based on the available memory Available Memory = Total RAM - (Memory

Hadoop fs -getmerge

2013-04-17 Thread Fabio Pitzolu
Hi all, is there a way to use the *getmerge* fs command and not generate the .crc files in the output local directory? Thanks, Fabio Pitzolu**

Re: Hadoop fs -getmerge

2013-04-17 Thread 姚吉龙
Use this command:hadoop fs -getmerge  <file in hdfs> <local> — Sent from Mailbox for iPhone On Wed, Apr 17, 2013 at 10:40 PM, Fabio Pitzolu fabio.pitz...@gr-ci.com wrote: Hi all, is there a way to use the *getmerge* fs command and not generate the .crc files in the output local directory?

Re: Mapreduce jobs to download job input from across the internet

2013-04-17 Thread Peyman Mohajerian
Apache Flume may help you for this use case. I read an article on Cloudera's site about using Flume to pull tweets and same idea may apply here. On Tue, Apr 16, 2013 at 9:26 PM, David Parks davidpark...@yahoo.com wrote: For a set of jobs to run I need to download about 100GB of data from the

Physically moving HDFS cluster to new

2013-04-17 Thread Tom Brown
We have a situation where we want to physically move our small (4 node) cluster from one data center to another. As part of this move, each node will receive both a new FQN and a new IP address. As I understand it, HDFS is somehow tied to the the FQN or IP address, and changing them causes data

RE: How to configure mapreduce archive size?

2013-04-17 Thread Xia_Yang
Hi Hemanth and Bejoy KS, I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, After they are done, the property will help cleanup the files due to the limit

Setting up a Hadoop client in OSGI bundle

2013-04-17 Thread Amit Sela
Hi all, I'm trying to setup an Hadoop client for job submissions (and more) as an OSGI bundle. I came over a lot of hardships but I'm kinda stuck now. When I create a new Job for submission I setClassLoader() for the Job Configuration so that it would use the bundle's ClassLoader (felix), but

Article: 'How to Deploy Hadoop 2 (Yarn) on EC2'

2013-04-17 Thread Keith Wiley
I've posted an article on my website that details precisely how to deploy Hadoop 2.0 with Yarn on AWS (or least how I did it, whether or not such an approach will translate to others' circumstances). I had been disappointed that most articles of this type described the process with much older

Re: Mapreduce jobs to download job input from across the internet

2013-04-17 Thread Marcos Luis Ortiz Valmaseda
You can find it here: http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/ 2013/4/17 Peyman Mohajerian mohaj...@gmail.com Apache Flume may help you for this use case. I read an article on Cloudera's site about using Flume to pull tweets and same idea may apply here.

How to fix Under-replication

2013-04-17 Thread Keith Wyss
Hi. I sent this a few minutes ago, but I had not confirmed subscription to the mailing list so I don't think it went through. If it did, I apologize for the re-post - Hello there. I am operating a cluster that is consistently unable to create three replicas for a

Unable to install hadoop

2013-04-17 Thread Juraj Jurco
Hi all, I wanted to try to install Hadoop on my machine with latest Fedora release and I'm getting following error: Test Transaction Errors: file /usr/bin from install of hadoop-1.1.2-1.i386 conflicts with file from package filesystem-3.1-2.fc18.x86_64 file /usr/lib from install of

Reading and Writing Sequencefile using Hadoop 2.0 Apis

2013-04-17 Thread sumit ghosh
I am looking for an example which is using the new Hadoop 2.0 API to read and write Sequence Files. Effectively I need to know how to use these functions: createWriter(Configuration conf, org.apache.hadoop.io.SequenceFile.Writer.Option... opts) The Old definition is not working for me:

Re: Article: 'How to Deploy Hadoop 2 (Yarn) on EC2'

2013-04-17 Thread Sandy Ryza
This is great, Keith. On Wed, Apr 17, 2013 at 12:58 PM, Keith Wiley kwi...@keithwiley.com wrote: I've posted an article on my website that details precisely how to deploy Hadoop 2.0 with Yarn on AWS (or least how I did it, whether or not such an approach will translate to others'

Re: Physically moving HDFS cluster to new

2013-04-17 Thread Ted Dunning
It may or may not help you in your current distress, but MapR's distribution could handle this pretty easily. One method is direct distcp between clusters, but you could also use MapR's mirroring capabilities to migrate data. You can also carry a MapR cluster, change the IP addresses and relight

Re: Unable to install hadoop

2013-04-17 Thread Roman Shaposhnik
On Wed, Apr 17, 2013 at 3:07 PM, Juraj Jurco jjurco.fo...@gmail.com wrote: Hi all, I wanted to try to install Hadoop on my machine with latest Fedora release and I'm getting following error: Test Transaction Errors: file /usr/bin from install of hadoop-1.1.2-1.i386 Depending on what

Re: How to configure mapreduce archive size?

2013-04-17 Thread Hemanth Yamijala
The check for cache file cleanup is controlled by the property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to 1 minute (which should be sufficient for your requirement). I am not sure why the JobTracker UI is inaccessible. If you know where JT is running, try hitting

Re: Hadoop fs -getmerge

2013-04-17 Thread Hemanth Yamijala
I don't think that is possible. When we use -getmerge, the destination filesystem happens to be a LocalFileSystem which extends from ChecksumFileSystem. I believe that's why the CRC files are getting in. Would it not be possible for you to ignore them, since they have a fixed extension ? Thanks

Re: Reading and Writing Sequencefile using Hadoop 2.0 Apis

2013-04-17 Thread Azuryy Yu
you can use if even if it's depracated. I can find in the org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.java, @Override public void initialize(InputSplit split, TaskAttemptContext context ) throws IOException,

Re: Physically moving HDFS cluster to new

2013-04-17 Thread Azuryy Yu
Data nodes name or IP changed cannot cause your data loss. only kept fsimage(under the namenode.data.dir) and all block data on the data nodes, then everything can be recoveryed when your start the cluster. On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown tombrow...@gmail.com wrote: We have a

Re: Reading and Writing Sequencefile using Hadoop 2.0 Apis

2013-04-17 Thread Harsh J
Sumit, I believe we've answered this one before, so you may find http://search-hadoop.com/m/xp2w02A8bqw1 helpful too. On Thu, Apr 18, 2013 at 4:14 AM, sumit ghosh sumi...@yahoo.com wrote: ** I am looking for an example which is using the new Hadoop 2.0 API to read and write Sequence

How to install the MRunit

2013-04-17 Thread 姚吉龙
Hi everyone How can I install the MRunit to do the unit test? Is there any requriement for the tool My hadoop version is :1.0.4 Except the MRunit, any other test tool available? BRs Geelong -- From Good To Great

change cache path

2013-04-17 Thread Mohit Vadhera
Hi, I want to change cache path because i don't have much space in my root / fs. I want to change path for hadoop cache files to be create on external drives. It is standalone hadoop cluster and it is running more than a year. I want the changes so that it should not effect the running machine. I