Block re-balancing speed/slowness

2008-05-09 Thread Otis Gospodnetic
Hi, First off, big thanks to Lohit and Hairong with help with HDFS "corruption", DN decommissioning and block re-balancing! I'm now re-balancing, but like Ted Dunning noted in http://markmail.org/message/fzd33k7a3isijto5 , this seems to be a veeery slow process. Here are some concrete numbers

Re: Read timed out, Abandoning block blk_-5476242061384228962

2008-05-09 Thread James Moore
On Fri, May 9, 2008 at 12:00 PM, Hairong Kuang <[EMAIL PROTECTED]> wrote: >> I'm using the machine running the namenode to run maps as well. > Please do not run maps on the machine that is running the namenode. This > would cause CPU contention and slow down namenode. Thus more easily to see > Sock

I need help to set HADOOP

2008-05-09 Thread sigma syd
Hello!!: I am trying to set hadoop two pc. I have in conf/master: master master.visid.com and in conf/slave: master.visid.com slave3.visid.com. When i execute bin/start-dfs.sh and bin/start-mapred.sh in logs/hadoop-hadoop-datanode-slave3.visid.com.log is displayed the next error: ///

[ANNOUNCE] Hadoop release 0.16.4 available

2008-05-09 Thread Nigel Daley
Release 0.16.4 fixes 4 critical bugs in 0.16.3. For Hadoop release details and downloads, visit: http://hadoop.apache.org/core/releases.html Thanks to all who contributed to this release! Nigel

Re: Hadoop Permissions Question -> [Fwd: Hbase on hadoop]

2008-05-09 Thread stack
[EMAIL PROTECTED] wrote: The stack track is good enough. HMasting does DistributedFileSystem.setSafeMod(...) which required superuser privilege. Nicholas HBase won't start if HDFS is in safe mode. HADOOP-3066, committed to hadoop-0.17, made it so querying if hdfs is in 'safe mode' no lo

RE: Hadoop Permission Problem

2008-05-09 Thread Natarajan, Senthil
Hi Nicholas, Sorry I didn't realize there is a datastore directory inside HDFS. Now I set permission to 777 and permission problem resolved . And getting this error Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: /usr/local/hadoop/datastore/hadoop-hadoop/m

Re: Hadoop Permission Problem

2008-05-09 Thread s29752-hadoopuser
Hi Senthil, drwxrwxrwx5 hadoop hadoop 4096 May 7 18:02 datastore This one is your local directory. I think you might have mixed up the local and hdfs directories. Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.or

RE: Hadoop Permission Problem

2008-05-09 Thread Natarajan, Senthil
Hi Nicholas, You are right, the permission problem is with datastore, that's what I mentioned in the previous mails. But I gave the 777 permission. Here is the datastore permission in the master. drwxrwxrwx5 hadoop hadoop 4096 May 7 18:02 datastore I am not seeing any datastore in t

Re: Hadoop Permission Problem

2008-05-09 Thread s29752-hadoopuser
Hi Senthil, Let me explain the error message " Permission denied: user=test, access=WRITE, inode="datastore":hadoop:supergroup:rwxr-xr-x". It says that the current user "test" is trying to WRITE to the inode "datastore" with owner hadoop:supergroup and permission 755. So the problem is in the

RE: Hadoop Permission Problem

2008-05-09 Thread Natarajan, Senthil
Hi Nicholas, Here I tried as user test after I got the error (is the exception comes from slave machine?) Exception in thread "main" org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=test, access=WRITE, inode="datastore":hadoo

Re: Hadoop Permission Problem

2008-05-09 Thread s29752-hadoopuser
Hi Senthil, I cannot see why it does not work. Could you try again, do a fs -ls right after you see the error message? Nicholas - Original Message From: "Natarajan, Senthil" <[EMAIL PROTECTED]> To: "core-user@hadoop.apache.org" Sent: Friday, May 9, 2008 11:49:49 AM Subject: RE: Had

Re: Hadoop Permissions Question -> [Fwd: Hbase on hadoop]

2008-05-09 Thread s29752-hadoopuser
Hi Rick, > the hbase master must be run on the same machine as the hadoop hdfs (what > part of it?) if one wants to use the hdfs permissions system or that right > now we must run without permissions? Hdfs and hbase (and all clients) should run under the same administrative domain, but not the

Re: Read timed out, Abandoning block blk_-5476242061384228962

2008-05-09 Thread Hairong Kuang
> I'm using the machine running the namenode to run maps as well. Please do not run maps on the machine that is running the namenode. This would cause CPU contention and slow down namenode. Thus more easily to see SocketTimeoutException. Hairong On 5/9/08 11:24 AM, "James Moore" <[EMAIL PROTECTE

Re: Hadoop Permissions Question -> [Fwd: Hbase on hadoop]

2008-05-09 Thread Rick Hangartner
Hi Nicholas, I was the original poster of this question. Thanks for your response. (And thanks for elevating attention to this Stack). Am I missing something or is one implication of how hdfs determines privileges from the Linux filesystem that the hbase master must be run on the same m

RE: Hadoop Permission Problem

2008-05-09 Thread Natarajan, Senthil
Hi Nicholas, No, I am running map/red jobs over HDFS file. That permission is for datastore (hadoop.tmp.dir) Here is the HDFS /usr/local/hadoop/bin/hadoop dfs -ls / Found 2 items /user 2008-05-07 17:55rwxrwxrwx hadoop supergroup /usr 2008-05-07 17:18

Re: Hadoop Permission Problem

2008-05-09 Thread s29752-hadoopuser
Hi Senthil, drwxrwxrwx4 hadoop hadoop 4096 May 8 16:31 hadoop-hadoop drwxrwxrwx2 test test 4096 May 9 09:29 hadoop-test >From the output format, the directories above seem not HDFS directories. Are >you running map/red jobs over local file system (e.g. Linux)? Nicholas

Re: Corrupt HDFS and salvaging data

2008-05-09 Thread Hairong Kuang
Default replication factor takes effect only at the file creation time. If you want to increase the replication factor of existing blocks, you need to run command "hadoop fs - setrep". It's better to finish decommission first, remove the old DN, and then rebalance. Rebalancing moves blocks around

Re: Hadoop Permissions Question -> [Fwd: Hbase on hadoop]

2008-05-09 Thread s29752-hadoopuser
Hi Stack, > One question this raises is if the "hbase:hbase" user and group are being > derived from the Linux file system user and group, or if they are the hdfs > user and group? HDFS currently does not manage user and group information. User and group in HDFS are being derived from the unde

Re: Read timed out, Abandoning block blk_-5476242061384228962

2008-05-09 Thread James Moore
On Wed, May 7, 2008 at 2:45 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote: > Hi James > > Were you able to start all the nodes in the same 'availability zone'? You > using the new AMI kernels? After I saw your note, I restarted new instances with the new kernels (aki-b51cf9dc and ari-b31cf9da) and

Re: Corrupt HDFS and salvaging data

2008-05-09 Thread lohit
Hi Otis, Thanks for the reports. Looks like you have lot of blocks with replication factor of 1 and when the node which had these blocks was stopped, namenode started reporting the block as missing, as it could not find any other replica. Here is what I did Find all blocks with replication fac

Re: How to handle tif image files in hadoop

2008-05-09 Thread Ted Dunning
Your read loop has a bug in it and also allocate way more garbage than is necessary. Also, the small buffer size will slow things down somewhat. Try this instead: byte[] buffer = new byte[10]; int readBytes = instream.read(buffer); while (readBytes > 0) { fimb.wri

Re: How to re-balance, NN safe mode

2008-05-09 Thread Ted Dunning
On 5/8/08 9:11 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > I've removed 1 old DN and added 1 new DN . The cluster has 4 nodes total (all > 4 act as DNs) ... I pumped the replication factor to 6 and > restarted all daemons, but still nothing. 6 > 4 Not good.

Re: Corrupt HDFS and salvaging data

2008-05-09 Thread Otis Gospodnetic
Hi, > A default replication factor of 3 does not mean that every block's > replication factor in the file system is 3. Hm, and I thought that is exactly what it meant. What does it mean then? Or are you saying: The number of block replicas matches the r.f. that was in place when the block w

primary namenode not starting

2008-05-09 Thread Colin Freas
The primary namenode on my cluster seems to have stopped working. The secondary name node starts, but the primary fails with the error message below. I've scoured the cluster, particularly this node for changes, but I haven't found any that I believe would cause this problem. If anyone has an id

Re: Corrupt HDFS and salvaging data

2008-05-09 Thread Hairong Kuang
A default replication factor of 3 does not mean that every block's replication factor in the file system is 3. In case (1), some blocks have a replication factor which is less than 3. So the average replication factor is less than 3. But no missing replicas. In case 2, some blocks have zero replica

Re: How to re-balance, NN safe mode

2008-05-09 Thread Hairong Kuang
Otis, I would recommend you to follow the following steps: 1. Bring up all 4 DNs (both old and new). 2. Decommission the DN that you want to remove. See http://wiki.apache.org/hadoop/FAQ#17 3. Run balancer Hairong On 5/8/08 9:11 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > Hi, > > (I s

RE: single node Hbase

2008-05-09 Thread Jim Kellerman
Did you first create the table and then try to delete it? In hbase-0.1.1 and older, offlining a table did not work reliably. There is a fix for this problem in the hbase-0.1.2 release candidate. Also, if you direct hbase questions to the hbase mailing list [EMAIL PROTECTED] you will get a more t

Recover the deprecated mapred.tasktracker.tasks.maximum

2008-05-09 Thread Iván de Prado
https://issues.apache.org/jira/browse/HADOOP-1274 replaced the configuration attribute mapred.tasktracker.tasks.maximum with mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum because it sometimes make sense to have more mappers than reducers assigned to each node. Bu

Re: Corrupt HDFS and salvaging data

2008-05-09 Thread Otis Gospodnetic
Hi, Here are 2 "bin/hadoop fsck / -files -blocks locations" reports: 1) For the old HDFS cluster, reportedly HEALTHY, but with this inconsistency: http://www.krumpir.com/fsck-old.txt.zip ( < 1MB) Total blocks: 32264 (avg. block size 11591245 B) Minimally replicated blocks: 32264 (100.0 %)

Re: Corrupt HDFS and salvaging data

2008-05-09 Thread Otis Gospodnetic
Hi, Yes, when I say "all daemons" I mean all 4 types - NN, DNs, JT, and TTs. The cluster did initially use replication factor of 1. This was later changed to 3. About 90% of cluster's runtime was spent with repl. factor of 3. If I run bin/hadoop balancer with all 4 old DNs it tells me that the

RE: Hadoop Permission Problem

2008-05-09 Thread Natarajan, Senthil
Hi Nicholas, That's what I was wondering. Here is the datastore directory permission in the master machine. drwxrwxrwx5 hadoop hadoop 4096 May 7 18:02 datastore This datastore directory present only in the master right not on the slaves right? I couldn't find. After I changed the

Re: How to handle tif image files in hadoop

2008-05-09 Thread charan
Hi, Thankyou sir for letting me know one more aspect of hadoop. But I used JAI and processed our files by reading them as bytes in HDFS and sending it to JAI library for tiff . And it worked :) For those Who want to work with tiff files in hdfs, here is a way Path inFile = new Path(i

Re: Problem in 2-node cluster

2008-05-09 Thread Sridhar Raman
Woah! I managed to fix it, but god! It was a bug that was so ... I don't know ... hard to name it. The first thing I noticed was that if my conf/slaves file had, > master > slave > I would get "* no address associated with name master*", and if it was, > slave > master > I would get "* no ad

Re: [Reduce task stalls] Problem Detailed Report

2008-05-09 Thread Amar Kamat
From the logs it looks like the reducer is able to fetch the data from the slave on the master node ('cse' machine) but is not able to fetch it from the other node ('mtech' machine here). The 16% shown in the reducer is fetched from the local machine. It seems like the jetty on the 'mtech' mach

upgrade from 1.15.0 to 1.16?

2008-05-09 Thread adam
Hi, guys I have a cluster running 1.15.0 and a lot of data on HDFS. If I just update hadoop to 1.16.x, would the existing data be affected? Xiance

Re: single node Hbase

2008-05-09 Thread Shiraz Memon
Hi I have installed and started hbase using getting started guide on http://hadoop.apache.org/hbase/docs/r0.1.1/api/overview-summary.html#overview_description. I successfully installed and started hbase server and furthermore I can perform basic operations on the table(s) via hbase-shell, but I am

RE: [Reduce task stalls] Problem Detailed Report

2008-05-09 Thread Amit Kumar Singh
Hi checked it, thanks for the reply mohan. I tried them but not much progress. 1) no such errors in the logs. 2) /etc/host file is fine (only 1 master and 1 slave). 3) fsck output: everythings clean It seems that output file of a mapper is not availabe to reducer(in 0.16.3) and hence all the maps

Problem in 2-node cluster

2008-05-09 Thread Sridhar Raman
This is the setup I have: 2 machines - master, slave. master is both a namenode and a datanode. slave is just a datanode. In the master machine, I have configured the conf/hadoop-site.xml files in both machines so that fs.default.name = hdfs://master:54310, and mapred.job.tracker = master:54311. A