Re: Configuring Multiple Data Nodes on Pseudo-distributed mode ?

2011-06-10 Thread Kumar Kandasami
Thank you Harsh. Perfect, worked as expected. :) Kumar_/|\_ www.saisk.com ku...@saisk.com "making a profound difference with knowledge and creativity..." On Sat, Jun 11, 2011 at 12:48 AM, Harsh J wrote: > Kumar, > > Your config seems alright. That post described it for 0.21/trunk > script

Re: Configuring Multiple Data Nodes on Pseudo-distributed mode ?

2011-06-10 Thread Harsh J
Kumar, Your config seems alright. That post described it for 0.21/trunk scripts I believe. On 0.20.x based release, like CDH3, you can also simply use the hadoop-daemon.sh to do it. Just have to mess with some PID files. Here's how I do it on my Mac to start 3 DNs: $ ls conf* conf conf.1 conf.2

java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

2011-06-10 Thread relesabe
Greetings, I am getting this error despite many different tries. The error occurs when I invoke the getConnection() method of org.apache.hadoop.mapreduce.lib.db.DBConfiguration. I have looked at the jar file (generated using Maven) for the Hadoop job and indeed I find that the jar file lib/mysql

Re: compress files in hadoop

2011-06-10 Thread Dhruv
Can you be more specific? Tom White's book has a whole section devoted to it. On Fri, Jun 10, 2011 at 7:24 PM, Madhu Ramanna wrote: > Hello, > > What is the most optimal way to compress several files already in hadoop ? > >

Re: Configuring Multiple Data Nodes on Pseudo-distributed mode ?

2011-06-10 Thread Kumar Kandasami
Thank you Harsh. I have been following the documentation in the mailing list, and have an issue starting the second data node (because of port conflict). - First, I don't see the bin/hdfs in the directory ( I am using Mac m/c and installed hadoop using CDH3 tarball) - I am using the following co

compress files in hadoop

2011-06-10 Thread Madhu Ramanna
Hello, What is the most optimal way to compress several files already in hadoop ?

Parallel map side join

2011-06-10 Thread Shi Yu
Hi, How to configure map side join in multiple mappers in parallel? Suppose I have data set s a1, a2, a3 and data set b1, b2, b3. I want to let a1 join with b1, a2 join with b2, a3 join with b3 and let the join done in parallel? I think it should be able to configure in mapper 1 j

Re: Hadoop on windows with bat and ant scripts

2011-06-10 Thread Matthew Foley
>> Java is supposed to be Write Once - Run Anywhere, but a lot of java projects >> seem to forget that. It's not that Hadoop developers forgot this goal, but that Java 6 doesn't support some system-level disk i/o operations that HDFS needs. This left us a choice of coding JNI calls (requiring

Re: Automatic line number in reducer output

2011-06-10 Thread Shi Yu
Yes, it works perfectly. Actually didn't realize the flexibility to employ different classes in combiner and reducer. In that case it would have a three layer architecture, think that would be interesting and useful. Shi On 6/10/2011 9:31 AM, Robert Evans wrote: In this case you probably want

Re: NameNode heapsize

2011-06-10 Thread si...@ugcv.com
On 10/06/2011 10:00 PM, Edward Capriolo wrote: On Fri, Jun 10, 2011 at 8:22 AM, Brian Bockelmanwrote: On Jun 10, 2011, at 6:32 AM, si...@ugcv.com wrote: Dear all, I'm looking for ways to improve the namenode heap size usage of a 800-node 10PB testing Hadoop cluster that stores around 30 mi

RE: Question about DFS Reserved Space

2011-06-10 Thread Bible, Landy
-Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Thursday, June 09, 2011 12:14 PM To: common-user@hadoop.apache.org Subject: Re: Question about DFS Reserved Space >Landy, >>On Thu, Jun 9, 2011 at 10:05 PM, Bible, Landy wrote: >> Hi all, >> >> I'm planning a rather non-

Re: Automatic line number in reducer output

2011-06-10 Thread Robert Evans
In this case you probably want two different classes. You can have the base Reducer class that adds in the line count, and then subclass it for the combiner, that sets a flag to not output the line numbers. --Bobby On 6/9/11 12:57 PM, "Shi Yu" wrote: Hi, Thanks for the reply. The line coun

RE: Hadoop on windows with bat and ant scripts

2011-06-10 Thread Bible, Landy
Hi Raja, I'm currently running HDFS on Windows 7 desktops. I had to create a hadoop.bat that provided the same functionality of the shell scripts, and some Java Service Wrapper configs to run the DataNodes and NameNode as windows services. Once I get my system more functional I plan to do a w

Re: NameNode heapsize

2011-06-10 Thread Edward Capriolo
On Fri, Jun 10, 2011 at 8:22 AM, Brian Bockelman wrote: > > On Jun 10, 2011, at 6:32 AM, si...@ugcv.com wrote: > > > Dear all, > > > > I'm looking for ways to improve the namenode heap size usage of a > 800-node 10PB testing Hadoop cluster that stores > > around 30 million files. > > > > Here's so

RE: Hadoop on windows with bat and ant scripts

2011-06-10 Thread Habermaas, William
It is more than just getting away from shell script usage. Hadoop invokes the BASH shell internally to run commands like DF, DU, etc to perform operating system functions. CYGWIN is not intended as a production platform and its BASH shell doesn't always work which becomes a problem for Hadoop.

Re: NameNode heapsize

2011-06-10 Thread Brian Bockelman
On Jun 10, 2011, at 6:32 AM, si...@ugcv.com wrote: > Dear all, > > I'm looking for ways to improve the namenode heap size usage of a 800-node > 10PB testing Hadoop cluster that stores > around 30 million files. > > Here's some info: > > 1 x namenode: 32GB RAM, 24GB heap size > 800 x datan

Re: NameNode heapsize

2011-06-10 Thread James Seigel
Are you using compressed pointers? Sent from my mobile. Please excuse the typos. On 2011-06-10, at 5:33 AM, "si...@ugcv.com" wrote: > Dear all, > > I'm looking for ways to improve the namenode heap size usage of a 800-node > 10PB testing Hadoop cluster that stores > around 30 million files. >

frequency with which hadoop or hdfs daemons run and refresh information

2011-06-10 Thread George Kousiouris
Hi all, We are having some problems with the refresh rate of the information coming from hadoop commands. For example, when we decommision a node and then reinstate it, it seems that it has 0KB capacity attached. If we stop and start the dfs then it is immediately shown with the correct capa

NameNode heapsize

2011-06-10 Thread si...@ugcv.com
Dear all, I'm looking for ways to improve the namenode heap size usage of a 800-node 10PB testing Hadoop cluster that stores around 30 million files. Here's some info: 1 x namenode: 32GB RAM, 24GB heap size 800 x datanode: 8GB RAM, 13TB hdd *33050825 files and directories, 47708724 blo

Re: Hardware specs

2011-06-10 Thread Steve Loughran
On 06/09/2011 03:59 PM, Mark wrote: Can someone give some brief guidance on some (ballpark) specs for a hadoop cluster? - What are the typical size and memory requirements are there for a dedicated namenode? -this depends on cluster size - What are the typical size and memory requirements ar

Re: about eclipse-plugin

2011-06-10 Thread praveenesh kumar
Hi, The plugin that comes with default hadoop folder is outdated.. it does not work well. usually... You need to be very specific for which version of eclipse. Not all eclipse version are compatible with the hadoop plugin. Here we are using Eclipse SDK Helios 3.6.2. You can download it from here

Re: org.apache.hadoop.mapred.Utils can not be resolved

2011-06-10 Thread Mark question
Thanks again Harsh. Yah, learning this would make creating new project faster! Mark On Thu, Jun 9, 2011 at 10:23 PM, Harsh J wrote: > "mapred" package would indicate the Hadoop Map/Reduce jar is required. > > (Note: If this is for a client op, you may require a few jars from > lib/ too, like a

about eclipse-plugin

2011-06-10 Thread stream
Hi guys. i;m novice of hadoop. i;ve set up a single server setp by the article of get started from web site. then i wanna run example with eclipse, i know there is a plugin in folder of contrib. so i've uesd it. but there is a prolbem when i've configured the Hadoop location and Host Port. the