Re: reducer gets values with empty attributes

2013-04-29 Thread Mahesh Balija
Hi Alex, Can you please attach your code? and the sample input data. Best, Mahesh Balija, Calsoft Labs. On Tue, Apr 30, 2013 at 2:29 AM, wrote: > > Hello, > > I try to write mapreduce program in hadoop -1.0.4. using mapred libs. I have > a map function which gets > > keys and cr

Re: Incompartible cluserIDS

2013-04-29 Thread Kevin Burton
"It" is '/'? On Apr 29, 2013, at 5:09 PM, Mohammad Tariq wrote: > make it 755. > > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Tue, Apr 30, 2013 at 3:30 AM, Kevin Burton > wrote: >> Thank you the HDFS system seems to be up. Now I am having a problem wi

Re: Incompartible cluserIDS

2013-04-29 Thread Mohammad Tariq
make it 755. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Tue, Apr 30, 2013 at 3:30 AM, Kevin Burton wrote: > Thank you the HDFS system seems to be up. Now I am having a problem with > getting the JobTracker and TaskTracker up. According to the logs on the > JobTracker

Re: Incompartible cluserIDS

2013-04-29 Thread Kevin Burton
Thank you the HDFS system seems to be up. Now I am having a problem with getting the JobTracker and TaskTracker up. According to the logs on the JobTracker mapred doesn't have write permission to /. I am not clear on what the permissions should be. Anyway, thank you. On Apr 29, 2013, at 4:30

Re: Incompartible cluserIDS

2013-04-29 Thread Mohammad Tariq
Hello Kevin, Have you reformatted the NN(unsuccessfully)?Was your NN serving some other cluster earlier or your DNs were part of some other cluster?Datanodes bind themselves to namenode through namespaceID and in your case the IDs of DNs and NN seem to be different. As a workaround you c

Re: Hardware Selection for Hadoop

2013-04-29 Thread Mohammad Tariq
If I were to start with a 5 node cluster, I would do this : *Machine 1 : *NN+JT 32GB RAM, 2xQuad Core Proc, 500GB SATA HDD along with a NAS(To make sure metadata is safe) *Machine 2 : *SNN* * 32GB RAM, 2xQuad Core Proc, 500GB SATA HDD *Machine 3,4,5 : *DN+TT 16GB RAM, 2xQuad Core Proc, 5 x 200GB

Incompartible cluserIDS

2013-04-29 Thread rkevinburton
I am trying to start up a cluster and in the datanode log on the NameNode server I get the error: 2013-04-29 15:50:20,988 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/hadoop/dfs/data/in_use.lock acquired by nodename 1406@devUbuntu05 2013-04-29 15:50:20,990 FATAL org.apac

reducer gets values with empty attributes

2013-04-29 Thread alxsss
Hello, I try to write mapreduce program in hadoop -1.0.4. using mapred libs. I have a map function which gets keys and creates a different object with a few attributes like id and etc and passes it to reducer function using output.collect(key, value); Reducer gets keys, but values has empt

Permissions

2013-04-29 Thread rkevinburton
I look in the name node log and I get the following errors: 2013-04-29 15:25:11,646 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE,

Re: Hardware Selection for Hadoop

2013-04-29 Thread Raj Hadoop
Hi,   In 5 node cluster - you mean   Name Node , Job Tracker , Secondary Name Node all on 1 64 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )   Data Trackers and Job Trackers - on 4 machies - each of 32 GB Ram ( Processor - 2 x Quad cores Intel  , Storage - ? )   NIC ?

Re: Warnings?

2013-04-29 Thread Harsh J
The env-var is auto-created by the "hadoop" script for you when you invoke "hadoop jar". You do not necessarily have to manually set it, nor do you have to compile the native libs if what you're using is pre-built for your OS. On Tue, Apr 30, 2013 at 12:52 AM, wrote: > I don't have this environm

Gap in logs?

2013-04-29 Thread rkevinburton
I see a startup error in the /var/log/hadoop-hdfs/hadoop-hdfs-namenode-.log 2013-04-29 14:12:36,095 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.io.IOException: There appears to be a gap in the edit log. We expected txid 1, but got txid 2103.

Re: Warnings?

2013-04-29 Thread Omkar Joshi
Hi, did you check in your ubuntu installation; "libhadoop" binary.. it is present in my ubuntu installation at a relative path of (I used apache installation) "hadoop-common-project/hadoop-common/target/native/target/usr/local/lib" if present add it to your LID_LIBRARY_PATH. if not present then

Re: Hardware Selection for Hadoop

2013-04-29 Thread Ted Dunning
I think that having more than 6 drives is better. More memory never hurts. If you have too little, you may have to run with fewer slots than optimal. 10GB networking is good. If not, having more than 2 1GBe ports is good, at least on distributions that can deal with them properly. On Mon, Apr

Re: Hardware Selection for Hadoop

2013-04-29 Thread Patai Sangbutsarakum
2 x Quad cores Intel 2-3 TB x 6 SATA 64GB mem 2 NICs teaming my 2 cents On Apr 29, 2013, at 9:24 AM, Raj Hadoop mailto:hadoop...@yahoo.com>> wrote: Hi, I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw

Re: Warnings?

2013-04-29 Thread Kevin Burton
If it doesn't work what are my options? Is there source that I can download and compile? On Apr 29, 2013, at 10:31 AM, Ted Xu wrote: > Hi Kevin, > > Native libraries are those implemented using C/C++, which only provide code > level portability (instead of binary level portability, as Java do

Re: M/R job to a cluster?

2013-04-29 Thread Harsh J
To validate if your jobs are running locally, look for the classname "LocalJobRunner" in the runtime output. Configs are sourced either from the classpath (if a dir or jar on the classpath has the XMLs at their root, they're read), or via the code (conf.set("mapred.job.tracker", "foo:349");) or al

Re: M/R job to a cluster?

2013-04-29 Thread Michel Segel
This is one of the reasons we set up edge nodes in the cluster. This is a node where Hadoop is loaded yet none of the Hadoop services are running . This allows jobs to automatically pick up the right Hadoop configuration from the node and point to the right cluster. The edge nodes are used for

Re: Relations ship between HDFS_BYTE_READ and Map input bytes

2013-04-29 Thread Vinod Kumar Vavilapalli
They can be different if maps read HDFS files directly instead of or on top of getting key-val pairs via the map interface. HDFS_BYTES_READ will always be greater than or equal to map-input-bytes. Thanks, +Vinod On Apr 29, 2013, at 1:50 AM, Pralabh Kumar wrote: > Hi > > What's the relationsh

Re: Hardware Selection for Hadoop

2013-04-29 Thread Marcos Luis Ortiz Valmaseda
Regards, Raj. To know that data that you want to process with Hadoop is critical for this, at least an approximation of the data. I think that Hadoop Operations is an invaluable resource for this: - Hadoop use heavily RAM, so, the first resource that you have to consider is to use all available RA

Hardware Selection for Hadoop

2013-04-29 Thread Raj Hadoop
Hi, I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, t

Add jars to worker classpaths

2013-04-29 Thread Mark
What's the best way to add a number of jar's to workers class path? Preferably only adding something to one of the main configuration files (core-site.xml, mapred-site.xml) since we don't really want to mess with any of the startup scripts. Thanks

Re: M/R job optimization

2013-04-29 Thread Ted Xu
Hi Han, I think your point is valid. In fact you can change the progress report logic by manually calling the Reporter API, but by default it is quite straight forward. Reducer progress is divided into 3 phases, namely copy phase, merge/sort phase and reduce phase, each with ~33%. In your case it

Re: Warnings?

2013-04-29 Thread Ted Xu
Hi Kevin, Native libraries are those implemented using C/C++, which only provide code level portability (instead of binary level portability, as Java do). That is to say, the binaries provided by CDH4 distribution will in most cases be broken in your environment. To check if your native libraries

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
Yes, this indeed seem to be the case. After running java -version and seeing 1.5 it rung a bell because all our servers (as far as I knew) were 1.6 or above. So I never thought that this would be any issue!! But boy I was wrong and it indeed turned out to be something so obvious. Thanks guys for yo

Re: M/R job optimization

2013-04-29 Thread Han JU
Thanks Ted and .. Ted .. I've been looking at the progress when the job is executing. In fact, I think it's not a skewed partition problem. I've looked at the mapper output files, all are of the same size and the reducer each takes a single group. What I want to know is that how hadoop M/R framewor

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Harsh J
Well… Bingo! :) We don't write our projects for 1.5 JVMs, and especially not the GCJ (1.5 didn't have annotations either IIRC? We depend on that here). Try with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved. On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus wrote: > The output of "ja

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
The output of "java -version" is: java -version java version "1.5.0" gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4) Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FIT

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Harsh J
This is rather odd and am unable to reproduce this across several versions. It may even be something to do with all that static loading done in the VersionInfo class but am unsure at the moment. What does "java -version" print for you? On Mon, Apr 29, 2013 at 8:12 PM, Shahab Yunus wrote: > Okay,

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
Okay, I think I know what you mean. Those were back ticks! So I tried the following: java -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo and I still get: 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown 13/04/29 09:40:31 INFO util.VersionInfo: Subversion Unknown -r Unknow

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
Ted, Sorry I didn't understand. What do you mean exactly by "specifying `hbase classpath` "? You mean declare a environment variable 'HBASE_CLASSPATH'? Regards, Shaahb On Mon, Apr 29, 2013 at 10:31 AM, Ted Yu wrote: > bq. 'java -cp /usr/lib/hbase/hbase... > > Instead of hard coding class path

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Ted Yu
bq. 'java -cp /usr/lib/hbase/hbase... Instead of hard coding class path, can you try specifying `hbase classpath` ? Cheers On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus wrote: > Hello, > > This might be something very obvious that I am missing but this has been > bugging me and I am unable to

VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
Hello, This might be something very obvious that I am missing but this has been bugging me and I am unable to find what am I missing? I have hadoop and hbase installed on Linux machine. Version 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and I can invoke hbase shell and hado

Re: Multiple ways to write Hadoop program driver - Which one to choose?

2013-04-29 Thread Jens Scheidtmann
Dear Chandrash3khar K0tekar, Using the run() method implies implementing Tool and using ToolRunner. This gives as additional benefit that some "standard" hadoop command line options are available. See here: http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera

Relations ship between HDFS_BYTE_READ and Map input bytes

2013-04-29 Thread Pralabh Kumar
Hi What's the relationship between HDFS_BYTE_READ and Map input bytes counter . Why can they be different for particular MR job. Thanks and Regards > Pralabh Kumar