Re: Adding mahout math jar to hadoop mapreduce execution

2012-01-30 Thread Daniel Quach
I compiled using javac: javac -classpath :/usr/local/hadoop/hadoop-core-0.20.203.0.jar:/usr/local/hadoop/lib/commons-cli-1.2.jar:/usr/local/mahout/math/target/mahout-math-0.6-SNAPSHOT.jar -d makevector_classes/ MakeVector.java; If I don't include the mahout-math jar, it gives me a compile error

Re: Adding mahout math jar to hadoop mapreduce execution

2012-01-30 Thread Prashant Kommireddi
How are you building the mapreduce jar? Try not to include the Mahout dist while building MR jar, and include it only on "-libjars" option. On Mon, Jan 30, 2012 at 10:33 PM, Daniel Quach wrote: > I have been compiling my mapreduce with the jars in the classpath, and I > believe I need to also ad

Adding mahout math jar to hadoop mapreduce execution

2012-01-30 Thread Daniel Quach
I have been compiling my mapreduce with the jars in the classpath, and I believe I need to also add the jars as an option to -libjars to hadoop. However, even when I do this, I still get an error complaining about missing classes at runtime. (Compilation works fine). Here is my command: hadoop

Re: ClassNotFound just started with custom mapper

2012-01-30 Thread Anil Gupta
Hi Hema, I had set-up a Hadoop cluster in which the name has hyphen character and it works fine. So, it don't think this problem is related to hyphen character. The problem is related to your Hadoop classpath settings. So, check your Hadoop classpath. I don't have experience of running the jo

Re: ClassNotFound just started with custom mapper

2012-01-30 Thread hadoop hive
hey Hema, I m not sure but the problem is with you hdfs name *hdfs://vm-acd2-4c51:54310/ , *change you host name and it ll run fine. specially remove "-" from hostname. regards Vikas Srivastava On Tue, Jan 31, 2012 at 4:07 AM, Subramanian, Hema < hema.subraman...@citi.com> wrote: > I am facing

Re: refresh namenode topology cache

2012-01-30 Thread Harsh J
Sateesh, On Tue, Jan 31, 2012 at 12:24 AM, Sateesh Lakkarsu wrote: > A couple of nodes got registered under /default-rack as NN was unable to > resolve when the nodes got added. NN can resolve them now, so wondering how > to fix this without bouncing NN... any way to have it refresh cache? > > Th

Re: NameNode per-block memory usage?

2012-01-30 Thread ke yuan
how much blocks a file has depends on your data status,please consider it according to your data status 2012/1/18 prasenjit mukherjee > Does it mean that on an average 1 file has only 2 blocks ( with > replication=1 ) ? > > > On 1/18/12, M. C. Srivas wrote: > > Konstantin's paper > > http:

Re: Namenode service not running on the Configured IP address

2012-01-30 Thread Harsh J
What does "host 192.168.1.99" output? (Also, slightly OT, but you need to fix this:) Do not use IPs in your fs location. Do the following instead: 1. Append an entry to /etc/hosts, across all nodes: 192.168.1.99 nn-host.remote nn-host 2. Set fs.default.name to "hdfs://nn-host.remote" On Tue,

Re: Namenode service not running on the Configured IP address

2012-01-30 Thread praveenesh kumar
Have you configured your hostname and localhost with your IP in /etc/hosts file. Thanks, Praveenesh On Tue, Jan 31, 2012 at 3:18 AM, anil gupta wrote: > Hi All, > > I am using hadoop-0.20.2 and doing a fresh installation of a distributed > Hadoop cluster along with Hbase.I am having virtualized

Re: Hadoop Datacenter Setup

2012-01-30 Thread Michael Segel
If you are going this route why not net boot the nodes in the cluster? Sent from my iPhone On Jan 30, 2012, at 8:17 PM, "Patrick Angeles" wrote: > Hey Aaron, > > I'm still skeptical when it comes to flash drives, especially as pertains > to Hadoop. The write cycle limit is impractical to make

Re: Hadoop Datacenter Setup

2012-01-30 Thread Patrick Angeles
Hey Aaron, I'm still skeptical when it comes to flash drives, especially as pertains to Hadoop. The write cycle limit is impractical to make them usable for dfs.data.dir and mapred.local.dir, and as you pointed out, you can't use them for logs either. If you put HADOOP_LOG_DIR in /mnt/d0, you wil

Re: Killing hadoop jobs automatically

2012-01-30 Thread Masoud
Dear Praveenesh I think there are only two ways to kill a job: 1- kill command, (not perfect way cause you should know the job id) 2- mapred.task.timeout (in "bin/hadoop jar" command using {-Dmapred.task.timeout=} set your desired value in msec) sometimes for me its happened too, not in all mac

Re: Hadoop Datacenter Setup

2012-01-30 Thread Aaron Tokhy
I forgot to add: Are there use cases for using a swap partition for Hadoop nodes if our combined planned heap size is not expected to go over 24GB for any particular node type? I've noticed that if HBase starts to GC, it will pause for unreasonable amounts of time if old pages get swapped to

Re: ClassNotFound just started with custom mapper

2012-01-30 Thread Subramanian, Hema
I am facing issues while trying to run a job from windows (through eclipse) on my hadoop cluster on my RHEL VM's. When I run it as "run on hadoop" it works fine, but when I run it as a java application, it throws classnotfound exception INFO: Task Id : attempt_201201101527_0037_m_00_0, Statu

Hadoop Datacenter Setup

2012-01-30 Thread Aaron Tokhy
Hi, Our group is trying to set up a prototype for what will eventually become a cluster of ~50 nodes. Anyone have experiences with a stateless Hadoop cluster setup using this method on CentOS? Are there any caveats with a read-only root file system approach? This would save us from having

[Announcing Release] YSmart: An SQL-to-MapReduce Translator

2012-01-30 Thread Yin Huai
We would like to announce that YSmart Release 12.01 (effectively version 0.1) is available. YSmart is a software that translates an SQL query to Hadoop Java programs. Compared to other existing SQL-to-MapReduce translators, YSmart has the following advantages: (1) High Performance: YSmart can dete

Re: Sorting text data

2012-01-30 Thread John Conwell
If you use the TextInputFormat is your mapreduce job's input format, then Hadoop doesn't need your input data to be in a sequence file. It will read your text file, and call the mapper for each line in the text file (\n delimited), where the key value is the byte offset of that line from the begin

Re: Regarding security in hadoop

2012-01-30 Thread Owen O'Malley
On Mon, Jan 30, 2012 at 12:45 AM, renuka wrote: > > > Hi All, > > As per the below link security feature•Security (strong authentication via > Kerberos authentication protocol) is added in hadoop 1.0.0 release. > http://www.infoq.com/news/2012/01/apache-hadoop-1.0.0 Actually, it was first release

Sorting text data

2012-01-30 Thread sangroya
Hello, I have a large amount of text file 1GB, that I want to sort. So far, I know of hadoop examples that takes sequence file as an input to sort program. Does anyone know of any implementation that uses text data as input? Thanks, Amit - Sangroya -- View this message in context: http://

Best practices for hadoop shuffling/tunning ?

2012-01-30 Thread praveenesh kumar
Hey guys, Just wanted to ask, are there any sort of best practices to be followed for hadoop shuffling improvements ? I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs with 48 GB RAM. I have set the following parameters : fs.inmemory.size.mb=2000 io.sort.mb=2000 io.sort

Re: Best Linux Operating system used for Hadoop

2012-01-30 Thread Ioan Eugen Stan
Pe 27.01.2012 11:15, Sujit Dhamale a scris: Hi All, I am new to Hadoop, Can any one tell me which is the best Linux Operating system used for installing& running Hadoop. ?? now a day i am using Ubuntu 11.4 and install Hadoop on it but it crashes number of times . can some please help me out ???

Re: Killing hadoop jobs automatically

2012-01-30 Thread praveenesh kumar
@ Harsh - Yeah, mapred.task.timeout is the valid option. but for some reasons, its not happening the way it should be.. I am not sure what could be the cause.Thing is my jobs are running fine, its just that they are slow at shuffling phase, sometimes.. not everytime.. so I was thinking "as an admi

Regarding security in hadoop

2012-01-30 Thread renuka
Hi All, As per the below link security feature•Security (strong authentication via Kerberos authentication protocol) is added in hadoop 1.0.0 release. http://www.infoq.com/news/2012/01/apache-hadoop-1.0.0 But we didnt find any documentation related to this in 1.0.0 documentation. http://hado