Count lines example

2013-06-05 Thread Pedro Sá da Costa
I am trying to create a mapreduce example that add values of same keys. E.g. the input A 1 A 2 B 4 get the output A 3 B4 The problem is that I cannot make the program read 2 inputs. How I do that? Here is my example: package org.apache.hadoop.examples; import java.io.IOException;

Re: Count lines example

2013-06-05 Thread Pedro Sá da Costa
I made a mistake in my example. Given 2 files with the same content: file 1 | file 2 A 3 | A 3 B 4 | B 4 gives the output A 6 B 8 On 5 June 2013 21:08, Pedro Sá da Costa psdc1...@gmail.com wrote: I am trying to create a mapreduce example that add values of same keys. E.g. the

Re:5

2013-06-05 Thread sejong510
Nadine_RIOU http://fonio-bio.org/yahoo.com/bernard_blanchet.jpg

Re:6

2013-06-05 Thread sejong510
mapoun_prioux http://obsession.mu/yahoo.com/isabelle_maillard.jpeg

Re: Recover dfs/name

2013-06-05 Thread Ted Xu
Hi Han, HDFS metadata cannot be fully reconstructed by datanode report. If you have deployed a checkpoint node/secondary namenode, you can copy the metadata to namenode and restart. This could recover most of the metadata. On Wed, Jun 5, 2013 at 5:30 PM, Han JU ju.han.fe...@gmail.com wrote:

Re: YARN servers and ports

2013-06-05 Thread Harsh J
If you're asking in terms of discovering where to communicate at, then basically just the RM scheduler address and port (yarn.resourcemanager.scheduler.address). The NodeManager addresses and ports are carried back from the RM to the requesting AM as part of container requests and needn't be in

YARN servers and ports

2013-06-05 Thread John Lilley
What service addresses and ports does a YARN ApplicationMaster need to know about? Thanks, John

Hadoop JARs and Eclipse

2013-06-05 Thread John Lilley
Well, I've failed and given up on building Hadoop in Eclipse. Too many things go wrong with Maven plugins and m2e. But Hadoop builds just fine using the command-line, and it runs using Sandy's development-node instructions. My strategy now is 1) Tell Eclipse about all of the Hadoop JARs

RE: yarn-site.xml and aux-services

2013-06-05 Thread John Lilley
Wow, thanks. Is this documented anywhere other than the code? I hate to waste y'alls time on things that can be RTFMed. John -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Wednesday, June 05, 2013 9:35 AM To: user@hadoop.apache.org Subject: Re: yarn-site.xml and

Re: Logs directory for HBASE and Task Tracker

2013-06-05 Thread Ted Yu
Rams: For hadoop related log directories, you can use ps command to see the command line of namenode. You would see the log dir in the command line, e.g.: -Dhadoop.log.dir=/homes/zy/deploy/hadoop-common-2.0.5-SNAPSHOT/logs Cheers On Wed, Jun 5, 2013 at 8:38 AM, Jean-Marc Spaggiari

Re: Hadoop JARs and Eclipse

2013-06-05 Thread Harsh J
If your goal is to simply build an application, then you can use a Maven project. Why do you require the whole of Hadoop's projects itself on Eclipse when you can simply have the dependencies via a maven pom.xml? The following is what you can use in a simple maven app, to include all necessary

How to test the performance of NN?

2013-06-05 Thread Mark Kerzner
Hi, I am trying to create a more efficient namenode, and for that I need to the standard distribution, and then compare it to my version. Which benchmark should I run? I am doing nnbench, but it is not telling me anything about performance, only about potential failures. Thank you. Sincerely,

Re: How to test the performance of NN?

2013-06-05 Thread Suresh Srinivas
What do you mean by it is not telling me any thing about performance? Also I do not understand the part, only about potential failures.. Can you add more details. nnbench is the best microbenchmark for nn performance test. On Wed, Jun 5, 2013 at 3:17 PM, Mark Kerzner

RE: How to test the performance of NN?

2013-06-05 Thread Ivan Mitic
Hi Mark, NNBench is a namenode load test. Output of the test is the set of performance numbers, like transactions per second, average latency of operations, etc. What do you mean by trying to create a more efficient namenode? What dimension are you trying to optimize? Depending on this, people

streaming/pipes interface and multiple inputs / outputs

2013-06-05 Thread John Lilley
Is it possible to use Hadoop streaming or Hadoop pipes for multiple inputs and outputs? Consider for example an equality join that accepts two inputs (left, right), and produces three outputs (left unmatched, right unmatched, joined). That's not actually what I'm trying to implement, but

Re: Mapreduce using JSONObjects

2013-06-05 Thread Max Lebedev
I’ve taken your advice and made a wrapper class which implements WritableComparable. Thank you very much for your help. I believe everything is working fine on that front. I used google’s gson for the comparison. public int compareTo(Object o) { JsonElement o1 =

Re: Hadoop JARs and Eclipse

2013-06-05 Thread Harsh J
Hi John, On Thu, Jun 6, 2013 at 1:21 AM, John Lilley john.lil...@redpoint.net wrote: -- From where will it fetch the Hadoop JARs? From the Maven Central repository (we publish our jars and dependencies are also available there), or a custom defined repository if you lack internet access. --

Re: ClassNotFound Error when use hdfs namenode -format in 2.0.4

2013-06-05 Thread Harsh J
Do not use HADOOP_HOME anymore. Try removing the below line (and any other references in your env to HADOOP_HOME): export HADOOP_HOME=/hadoop-2.0.4-alpha On Thu, Jun 6, 2013 at 1:18 AM, Boyu Zhang boyuzhan...@gmail.com wrote: Dear All, I just moved from version 0.20.2 to 2.0.4, there are a

Re: ClassNotFound Error when use hdfs namenode -format in 2.0.4

2013-06-05 Thread Boyu Zhang
Thanks Harsh, I got it working, the problem for me is the java home, I reinstall java and point the java home to the new one, then it worked. Thanks, Boyu On Wed, Jun 5, 2013 at 7:22 PM, Harsh J ha...@cloudera.com wrote: Do not use HADOOP_HOME anymore. Try removing the below line (and any

Re:

2013-06-05 Thread Serge Blazhievsky
Does not seem like a hadoop question Maybe gridgain list ?? Sent from my iPhone On Jun 5, 2013, at 8:27 PM, Job Thomas j...@suntecgroup.com wrote: Hi all, When I am starting my jobtracker in gridgain and hadoop combined project i am getting the following error Exception in

How to commit

2013-06-05 Thread Lokesh Basu
Hi, It's not been so long when I started to learn about Hadoop/HDFS/MapReduce and have been implementing those things. Now I want to dive into the source code and see whether I can be useful in providing patches. I have a good foundation of programming and algorithm, owing to my computer science