Hadoop Certification

2016-09-04 Thread X Dev KG
Hi, I m newbie on Hadoop and I want to prepare for a certification. What is the best one Hortonworks or Cloudera Certification ? Thank you in advance

Re: What's the best way to do Outer join and Inner join of two SequentialTextFiles using Hadoop streaming and Python ?

2016-01-23 Thread Rex X
Googled, but didnot find any sample code. On Fri, Jan 22, 2016 at 9:50 AM, Rex X wrote: > The two SequentialTextFiles correspond to two Hive tables, say tableA and > tableB below on > > hdfs://hive/tableA//MM/DD/*/part-0 > and > hdfs://hive/tableB//

What's the best way to do Outer join and Inner join of two SequentialTextFiles using Hadoop streaming and Python ?

2016-01-22 Thread Rex X
The two SequentialTextFiles correspond to two Hive tables, say tableA and tableB below on hdfs://hive/tableA//MM/DD/*/part-0 and hdfs://hive/tableB//MM/DD/*/part-0 Both of them are partitioned by date, for example, hdfs://hive/tableA/2016/01/01/*/part-0 Now we wa

Re: What is the best way to locate the offset and length of all fields in a Hadoop sequential text file?

2016-01-22 Thread Rex X
have any information about your data. > > I don't think we can help you with this. Also, I cannot understand what > you are trying to achieve. Please also tell us why you are using hadoop > streaming instead of hive to do your operations. > > Regards, > LLoyd > > O

What is the best way to locate the offset and length of all fields in a Hadoop sequential text file?

2016-01-21 Thread Rex X
The given sequential files correspond to an external Hive table. They are stored in /tableName/part-0 /tableName/part-1 ... There are about 2000 attributes in the table. Now I want to process the data using Hadoop streaming and mapReduce. The first step is to find the offset and length fo

Re: Hadoop Streaming: How to parition output into subfolders?

2016-01-21 Thread Rex X
Hi Camusensei, Thank you. That's very helpful! Rex On Thu, Jan 21, 2016 at 1:41 AM, Namikaze Minato wrote: > Hi Rex X, > > We are using the -outputFormat option of hadoop-streaming. > Here is the detail: http://www.infoq.com/articles/HadoopOutputFormat > > Regards,

Re: Hadoop Streaming: How to parition output into subfolders?

2016-01-20 Thread Rex X
t; > . > > Regards > Rohit Sarewar > > > On Thu, Jan 21, 2016 at 5:13 AM, Rex X wrote: > >> Dear all, >> >> To be specific, for example, given >> >> hadoop jar hadoop-streaming.jar \ >> -input myInputDirs \ >> -output

Hadoop Streaming: How to parition output into subfolders?

2016-01-20 Thread Rex X
Dear all, To be specific, for example, given hadoop jar hadoop-streaming.jar \ -input myInputDirs \ -output myOutputDir \ -mapper /bin/cat \ -reducer /usr/bin/wc Where myInputDirs has a *dated* subfolder structure of /input_dir//mm/dd/part-* I want myOutp

RE: SSH passwordless & Hadoop starup/shutdown scripts

2014-11-26 Thread John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco)
Please disregard. Issue resolved. -John From: John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco) Sent: Wednesday, November 26, 2014 9:34 AM To: user@hadoop.apache.org Subject: SSH passwordless & Hadoop starup/shutdown scripts Hello, I had originally configured our

SSH passwordless & Hadoop starup/shutdown scripts

2014-11-26 Thread John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco)
Hello, I had originally configured our dev cluster with SSH passwordless connectivity to the datanodes, but had a passphrase. I have updated with no passphrase, and have copied the new public key to all datanodes updating their know_host files, and have tested SSH with no passphrase from the nam

hdfs over http error

2014-06-12 Thread John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco)
Hello, Apache Hadoop 0.20.203.0 A colleague is using a SPARK shell on a remote host using HDFS protocol attempting to run a job on our Hadoop cluster, but the job errors out before finishing with the following noted in the namenode log. 2014-06-11 16:13:24,958 WARN org.apache.hadoop.se

[no subject]

2014-02-20 Thread x

FW: start-dfs.sh requesting password for user used to start daemon

2012-11-08 Thread John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco)
Hi All, Never mind. Found the errors of my ways. Did not ssh keys setup for localhost. Thanks -John From: John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco) Sent: Thursday, November 08, 2012 10:01 AM To: 'user@hadoop.apache.org' Subject: start-dfs.sh requestin

start-dfs.sh requesting password for user used to start daemon

2012-11-08 Thread John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco)
Hello, Apache Hadoop 0.20.203.0 (tarball) Java HotSpot (build 1.6.0_21-b07) I have a 4 datanode cluster sandbox I'm trying to startup, but when I initiate start.dfs.sh as the local user I created, , and after the namenode and all the datanodes start, the output stops and asks for the password f