Hadoop Certification

2016-09-04 Thread X Dev KG
Hi, I m newbie on Hadoop and I want to prepare for a certification. What is the best one Hortonworks or Cloudera Certification ? Thank you in advance

Re: What's the best way to do Outer join and Inner join of two SequentialTextFiles using Hadoop streaming and Python ?

2016-01-23 Thread Rex X
Googled, but didnot find any sample code. On Fri, Jan 22, 2016 at 9:50 AM, Rex X <dnsr...@gmail.com> wrote: > The two SequentialTextFiles correspond to two Hive tables, say tableA and > tableB below on > > hdfs://hive/tableA//MM/DD/*/part-0 > and > hdfs:

Re: What is the best way to locate the offset and length of all fields in a Hadoop sequential text file?

2016-01-22 Thread Rex X
gt; > On 22 January 2016 at 06:30, Rex X <dnsr...@gmail.com> wrote: > >> The given sequential files correspond to an external Hive table. >> >> They are stored in >> /tableName/part-0 >> /tableName/part-1 >> ... >> >> There are

What's the best way to do Outer join and Inner join of two SequentialTextFiles using Hadoop streaming and Python ?

2016-01-22 Thread Rex X
The two SequentialTextFiles correspond to two Hive tables, say tableA and tableB below on hdfs://hive/tableA//MM/DD/*/part-0 and hdfs://hive/tableB//MM/DD/*/part-0 Both of them are partitioned by date, for example, hdfs://hive/tableA/2016/01/01/*/part-0 Now we

Re: Hadoop Streaming: How to parition output into subfolders?

2016-01-21 Thread Rex X
Hi Camusensei, Thank you. That's very helpful! Rex On Thu, Jan 21, 2016 at 1:41 AM, Namikaze Minato <lloydsen...@gmail.com> wrote: > Hi Rex X, > > We are using the -outputFormat option of hadoop-streaming. > Here is the detail: http://www.infoq.com/articles/HadoopOutput

What is the best way to locate the offset and length of all fields in a Hadoop sequential text file?

2016-01-21 Thread Rex X
The given sequential files correspond to an external Hive table. They are stored in /tableName/part-0 /tableName/part-1 ... There are about 2000 attributes in the table. Now I want to process the data using Hadoop streaming and mapReduce. The first step is to find the offset and length

Re: Hadoop Streaming: How to parition output into subfolders?

2016-01-20 Thread Rex X
/mapreduce/lib/output/MultipleOutputs.html> > . > > Regards > Rohit Sarewar > > > On Thu, Jan 21, 2016 at 5:13 AM, Rex X <dnsr...@gmail.com> wrote: > >> Dear all, >> >> To be specific, for example, given >> >> hadoop jar hadoop-streaming.jar \ &g

Hadoop Streaming: How to parition output into subfolders?

2016-01-20 Thread Rex X
Dear all, To be specific, for example, given hadoop jar hadoop-streaming.jar \ -input myInputDirs \ -output myOutputDir \ -mapper /bin/cat \ -reducer /usr/bin/wc Where myInputDirs has a *dated* subfolder structure of /input_dir//mm/dd/part-* I want

SSH passwordless Hadoop starup/shutdown scripts

2014-11-26 Thread John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco)
Hello, I had originally configured our dev cluster with SSH passwordless connectivity to the datanodes, but had a passphrase. I have updated with no passphrase, and have copied the new public key to all datanodes updating their know_host files, and have tested SSH with no passphrase from the

hdfs over http error

2014-06-12 Thread John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco)
Hello, Apache Hadoop 0.20.203.0 A colleague is using a SPARK shell on a remote host using HDFS protocol attempting to run a job on our Hadoop cluster, but the job errors out before finishing with the following noted in the namenode log. 2014-06-11 16:13:24,958 WARN

[no subject]

2014-02-20 Thread x

FW: start-dfs.sh requesting password for user used to start daemon

2012-11-08 Thread John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco)
Hi All, Never mind. Found the errors of my ways. Did not ssh keys setup for localhost. Thanks -John From: John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco) Sent: Thursday, November 08, 2012 10:01 AM To: 'user@hadoop.apache.org' Subject: start-dfs.sh requesting password

Re: installation of Hadoop 0.21

2011-01-25 Thread Jim X
:9101 mentioned in the tutorial. Jim On Tue, Jan 25, 2011 at 12:04 AM, li ping li.j...@gmail.com wrote: The exception java.io.IOException: NameNode is not formatted. indicated you should format the NameNode first. hadoop -fs namenode -format On Tue, Jan 25, 2011 at 12:47 PM, Jim X jim.p

installation of Hadoop 0.21

2011-01-24 Thread Jim X
I am trying to install Hadoop by following the instruction from http://alans.se/blog/2010/hadoop-hbase-cygwin-windows-7-x64/. 1. I can not open http://localhost:9100 or http://localhost:9101 after I run bin/start-dfs.sh and bin/start-mapred.sh without any error message being printed. 2. I

java.io.IOException: Error opening job jar

2009-12-26 Thread Purnima Balu -X (pbalu - Linkwex at Cisco)
Can you please send me the solution for this exception. I got a stack trace printed as below: at org.apache.hadoop.util.RunJar.main(RunJar.java:90) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

java.io.IOException: Error opening job jar

2009-12-26 Thread Purnima Balu -X (pbalu - Linkwex at Cisco)
Hey, I am trying to run a map reduce job and I get an exception. can anyone give me a solution looking at the trace below : =20 java.io.IOException: Error opening job jar: jar-on-local-fs at org.apache.hadoop.util.RunJar.main(RunJar.java:91) at