Re: Hodoop namenode startup problem

2010-08-05 Thread edward choi
I've fixed the problem. The reason namenode won't start was that I accidentally started the cluster with root account. This somehow changed the ownership of some hadoop-related files(ex: log files, and hadoop.tmp.dir/dfs/name/current/edits) from hadoop:hadoop to root:root. After I fixed the

hdfs space problem.

2010-08-05 Thread Raj V
I run a 512 node hadoop cluster. Yesterday I moved 30Gb of compressed data from a NFS mounted partition by running on the namenode hadoop fs -copyFromLocal /mnt/data/data1 /mnt/data/data2 mnt/data/data3 hdfs:/data When the job completed the local disk on the namenode was 40% full ( Most of

centralized record reader in new API

2010-08-05 Thread Gang Luo
Hi all, to create a RecordReader in new API, we needs a TaskAttemptContext object, which seems to me the RecordReader should only be created on each split that has been assigned a task ID. However, I want to do a centralized sampling and create record reader on some splits before the job is

RE: hdfs space problem.

2010-08-05 Thread Dmitry Pushkarev
when you copy files and have a local datanode - first copy will end up there. Just stop datanode at the node from which you copy files, and they will end up on random nodes. Also don't run datanode at the same machine as namenode. -Original Message- From: Raj V [mailto:rajv...@yahoo.com]

Problems running hadoop on Amazon Elastic MapReduce

2010-08-05 Thread grabbler
I am a complete newbie to hadoop. I'm running a job on 19 Amazon Elastic MapReduce servers and am trying to understand two separate issues. 1) The job is ending with an error ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 6015: During execution, encountered a Hadoop error. I do not

RE: Problems running hadoop on Amazon Elastic MapReduce

2010-08-05 Thread Ankit Bhatnagar
Hi, EMR has a live debug option in the panel, you will find the logs there as well. Ankit -Original Message- From: grabbler [mailto:twiza...@gmail.com] Sent: Thursday, August 05, 2010 2:37 PM To: core-u...@hadoop.apache.org Subject: Problems running hadoop on Amazon Elastic MapReduce

Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Bobby Dennett
We are looking to enable LZO compression of the map outputs on our Cloudera 0.20.1 cluster. It seems there are various sets of instructions available and I am curious what your thoughts are regarding which one would be best for our Hadoop distribution and OS (Ubuntu 8.04 64-bit). In particular,

Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Arun C Murthy
Please take questions on Cloudera Distro to their internal lists. On Aug 5, 2010, at 3:52 PM, Bobby Dennett wrote: We are looking to enable LZO compression of the map outputs on our Cloudera 0.20.1 cluster. It seems there are various sets of instructions available and I am curious what your

Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Josh Patterson
Bobby, We're working hard to make compression easier, the biggest hurdle currently is the licensing issues around the LZO codec libs (GPL, which is not compatible with ASF bsd-style license). Outside of making the changes to the mapred-site.xml file, with your setup would do you view as the

Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Bobby Dennett
Hi Josh, No real pain points... just trying to investigate/research the best way to create the necessary libraries and jar files to support LZO compression in Hadoop. In particular, there are the 2 repositories to build from and I am trying to find out if one should be used over the other. For

Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Todd Lipcon
On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett bdenn...@gmail.com wrote: Hi Josh, No real pain points... just trying to investigate/research the best way to create the necessary libraries and jar files to support LZO compression in Hadoop. In particular, there are the 2 repositories to build

Best way to reduce a 8-node cluster in half and get hdfs to come out of safe mode

2010-08-05 Thread Steve Kuo
As part of our experimentation, the plan is to pull 4 slave nodes out of a 8-slave/1-master cluster. With replication factor set to 3, I thought losing half of the cluster may be too much for hdfs to recover. Thus I copied out all relevant data from hdfs to local disk and reconfigure the