Splitting input file - increasing number of mappers

2013-07-06 Thread parnab kumar
Hi , I have an input file where each line is of the form : URL A NUMBER URLs whose number is within a threshold are considered similar. My task is to group together all similar urls. For this i wrote a *custom writable* where i implemented the threshold check in the

Re: New to hadoop - which version to start with ?

2013-07-06 Thread sudhir543-...@yahoo.com
That did not work, same error comes up.  So , I tried  bin/hadoop fs -chmod -R 755 /tmp/hadoop-Sudhir/mapred/staging/sudhir-1664368101/.staging - as well However next run would give  13/07/06 14:09:46 ERROR security.UserGroupInformation: PriviledgedActionException as:Sudhir

Re: New to hadoop - which version to start with ?

2013-07-06 Thread Azuryy Yu
oh, this is fair scheduler issue. you need a patch for that. I will send it to you later. or can you use another scheduler insteadof fs? On Jul 6, 2013 4:42 PM, sudhir543-...@yahoo.com sudhir543-...@yahoo.com wrote: That did not work, same error comes up. So , I tried bin/hadoop fs -chmod

New to hadoop - which version to start with ?

2013-07-06 Thread Sudhir N
I am new to hadoop, just started reading 'hadoop the definitive guide'. I downloaded hadoop 1.1.2 and tried to run a sample Map reduce job using cygwin, but I got following error java.io.IOException: Failed to set permissions of path:

Using CapacityScheduler to divide resources between jobs (not users)

2013-07-06 Thread Amit Sela
Hi all, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). The jobs running on the cluster can be divided (resource wise) as follows:

Re: Using CapacityScheduler to divide resources between jobs (not users)

2013-07-06 Thread Amit Sela
Sorry, Gmail tab error, please disregard and I will re-send, Thanks. On Sat, Jul 6, 2013 at 5:02 PM, Amit Sela am...@infolinks.com wrote: Hi all, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). The jobs running on the cluster can be divided (resource wise) as follows:

Using CapacityScheduler to divide resources between jobs (not users)

2013-07-06 Thread Amit Sela
Hi all, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). The jobs running on the cluster can be divided (resource wise) as follows: 1. Very short jobs: less then 1 minute. 2. Normal jobs: 2-3 minutes up to an hour or two. 3. Very long jobs: days of processing. (still not active and

Using CapacityScheduler to divide resources between jobs (not users)

2013-07-06 Thread Amit Sela
Hi all, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). The jobs running on the cluster can be divided (resource wise) as follows: 1. Very short jobs: less then 1 minute. 2. Normal jobs: 2-3 minutes up to an hour or two. 3. Very long jobs: days of processing. (still not active and

Re: New to hadoop - which version to start with ?

2013-07-06 Thread sudhir543-...@yahoo.com
Using hdfs instead of local file system would fix the issue ?      Sudhir    From: Azuryy Yu azury...@gmail.com To: user@hadoop.apache.org Sent: Saturday, 6 July 2013 2:15 PM Subject: Re: New to hadoop - which version to start with ? oh, this is fair

Re: hadoop datanode capacity issue

2013-07-06 Thread Ian Wrigley
You're correct: each directory is assumed to be a different storage device. There's really no reason to specify two directories on the same physical disk in dfs.data.dir -- just use one directory. Ian. On Jul 6, 2013, at 9:18 AM, Amit Anand aan...@aquratesolutions.com wrote: Hi All, I

Re: Splitting input file - increasing number of mappers

2013-07-06 Thread Zhen Ren
Hi,can you paste your code here? In addition,Hadoop:The Definitive Guide 2th introduces a tool named 'HPROF' to analysis performance of your mapper etc at page 161. Hope to help you! On 6 July 2013 15:50, parnab kumar parnab.2...@gmail.com wrote: Hi , I have an input file where each

Re: Splitting input file - increasing number of mappers

2013-07-06 Thread Shumin Guo
You also need to pay attention to the split boundary, because you don’t want to split one line to different mappers. May be you can think about multi-line input format. Simon. On Jul 6, 2013 10:18 AM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com wrote: More mappers will make it

Data node EPERM not permitted.

2013-07-06 Thread Jay Vyas
Hi : I've mounted my own ext4 disk ont /mnt/sdb and chmodded it to 777. However, when starting the data node: /etc/init.d/hadoop-hdfs-datanode start I get the following error in my logs (bottom of this message) *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** What is the EPERM

hadoop mahout

2013-07-06 Thread Manal Helal
Hi, I installed hadoop 1.1.2 on 4 nodes (one is CentOS and the rest are ubuntu) and I successfully did the word count example. Then I installed mahout and compiled it and defined the environment vatiable MAHOUT_HOME then the hadoop in practice repository (

Re: hadoop mahout

2013-07-06 Thread Ravi Mutyala
The code samples use hadoop 0.20.2 as i see it in pom.xml. So there is a version mismatch between the client libs and hadoop cluster. See if you can change the dependences to 1.1.2 and see if it builds. On Jul 6, 2013, at 3:22 PM, Manal Helal wrote: Hi, I installed hadoop 1.1.2 on 4

Re: data loss after cluster wide power loss

2013-07-06 Thread Dave Latham
On Wed, Jul 3, 2013 at 7:03 AM, Michael Segel michael_se...@hotmail.comwrote: How did you lose power to the entire cluster? I realize that this question goes beyond HBase, but is an Ops question. Do you have redundant power sources and redundant power supplies to the racks and machines in

Re: data loss after cluster wide power loss

2013-07-06 Thread Dave Latham
Thanks for the detailed information Michael, Colin, Suresh, Kiwhal. It looks like we're on CentOS 5.7 (kernel 2.6.18-274.el5) so if the fix was included in 5.4 then it sounds like we should have it. I don't believe we're using LVM. It sounds like HDFS could improve handling of this scenario by