Including third party jar files in Map Reduce job

2012-04-03 Thread Utkarsh Gupta
Hi All, I am new to Hadoop and was trying to generate random numbers using apache commons math library. I used Netbeans to build the jar file and the manifest has path to commons-math jar as lib/commons-math3.jar I have placed this jar file in HADOOP_HOME/lib folder but still I am getting Class

Re: how to overwrite output in HDFS?

2012-04-03 Thread Arko Provo Mukherjee
Hi, Check the links below. Read from HDFS: https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs Write from HDFS: https://sites.google.com/site/hadoopandhive/home/how-to-write-a-file-in-hdfs-using-hadoop Hope they help! Thanks & regards Arko On Tue, Apr 3, 2012 a

AW: how to overwrite output in HDFS?

2012-04-03 Thread Christoph Schmitz
Hi Xin, when you're running your MapReduce job, at some point you'll have to wire it together, i.e., say what the mapper class is, what the reducer class is, etc. There you can also configure the job to use your new OutputFormat class. Something like this: -- Job job = new Job(conf

Re: how to overwrite output in HDFS?

2012-04-03 Thread Fang Xin
Hi Christoph, Thank you for your reply. I create such a class in the project, and build an instance of it in main, and try to use this method included, but it didnt work. Can you explain a little bit more about how to let this function work? Thank you! On Tue, Apr 3, 2012 at 6:39 PM, Christoph S

Re: how to overwrite output in HDFS?

2012-04-03 Thread Fang Xin
Hi Bejoy, Could you kindly further elaborate this? what and where should I insert? Thank you! On Tue, Apr 3, 2012 at 7:36 PM, Bejoy Ks wrote: > Hi Xin >      In a very simple way, just include the line of code in your Driver > class to check whether the output dir exists in hdfs, if exists de

Re: Map reduce example - is it possible?

2012-04-03 Thread madhu phatak
Hi, The following code creates a cross product between two files. If you for same file specify the same file in arguments. package com.example.hadoopexamples.joinnew; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.util.ArrayList; import ja

Re: how to overwrite output in HDFS?

2012-04-03 Thread Bejoy Ks
Hi Xin In a very simple way, just include the line of code in your Driver class to check whether the output dir exists in hdfs, if exists delete that. Regards Bejoy KS On Tue, Apr 3, 2012 at 4:09 PM, Christoph Schmitz < christoph.schm...@1und1.de> wrote: > Hi Xin, > > you can derive your ow

Re: Send a map to all nodes

2012-04-03 Thread Radim Kolar
YARN in hadoop 0.23.1 can do this.

AW: how to overwrite output in HDFS?

2012-04-03 Thread Christoph Schmitz
Hi Xin, you can derive your own output format class from one of the Hadoop OutputFormats and make sure the "checkOutputSpecs" method, which usually does the checking, is empty: --- public final class OverwritingTextOutputFormat extends TextOutputFormat { @Override public void c

how to overwrite output in HDFS?

2012-04-03 Thread Fang Xin
Hi, all I'm writing my own map-reduce code using eclipse with hadoop plug-in. I've specified input and output directories in the project property. (two folders, namely input and output) My problem is that each time when I do some modification and try to run it again, i have to manually delete the

Re: What determines the map task / reduce task capacity? average task per node?

2012-04-03 Thread Bejoy Ks
hi Xin To add on the factors that you need to primarily consider in deciding the slots is - Memory If your task needs 1Gb each and you have an available memory of 12Gb you can host 12 slots. Divide the same between mapper and reducer slots proportionally based on the jobs in

Re: What determines the map task / reduce task capacity? average task per node?

2012-04-03 Thread Bejoy Ks
Hi Xin Yes, the number of worker nodes do count on the map and reduce capacity of the cluster. The map and reduce task capacity/slots is dependen't on each node and of course the requirements of your applications that use the cluster. Based on the available memory, number of cores etc you nee

What determines the map task / reduce task capacity? average task per node?

2012-04-03 Thread Fang Xin
Hi all, of course it's sensible that number of nodes in the cluster will influence map / reduce task capacity, but what determines average task per node? Can the number be manually set? any hardware constraint on setting the number? Thank you! Xin