Fw: problem in assigning an array

2010-04-13 Thread pinky priya
--- On Tue, 4/13/10, Dharani Selvaraj wrote: From: Dharani Selvaraj Subject: problem in assigning an array To: k_gokulapr...@yahoo.com Date: Tuesday, April 13, 2010, 4:16 PM Hello,    While  we are  trying  to assign the values to a 3-d  array,in  the  map  function  we have  got 

Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-13 Thread stephen mulcahy
Todd Lipcon wrote: Most likely a kernel bug. In previous versions of Debian there was a buggy forcedeth driver, for example, that caused it to drop off the network in high load. Who knows what new bug is in 2.6.32 which is brand spanking new. Yes, it looks like it is a kernel bug alright (see t

How do I use MapFile Reader and Writer

2010-04-13 Thread Placebo
I have a large text file, approximately 500mb containing key value pairs on each line. I would like to implement Hadoop MapFile so that I can access any key,value pair fairly quickly. To construct either the Reader or Writer the MapFile requires a Configurations object and a File System object. I

Re: Optimal setup for a test problem

2010-04-13 Thread Andrew Nguyen
Correction, they are 100Mbps NIC's... iperf shows that we're getting about 95 Mbits/sec from one node to another. On Apr 12, 2010, at 1:05 PM, Andrew Nguyen wrote: > @Todd: > > I do need the sorting behavior, eventually. However, I'll try it with zero > reduce jobs to see. > > @Alex: > > Ye

Announcement: Hadoop Training - new courses (Hive and HBase), new locations, and discounts

2010-04-13 Thread Christophe Bisciglia
Hadoop Fans, we wanted to share some news with the Hadoop community about new upcoming courses, new locations, and a substantial discount on next week's session in the Bay Area. We're excited to offer an extended sysadmin course and new courses on Hive and HBase at this year's Hadoop Summit. You c

Re: Optimal setup for a test problem

2010-04-13 Thread alex kamil
Andrew, here are some tips for hadoop runtime config: http://cloudepr.blogspot.com/2009/09/cluster-facilities-hardware-and.html also here are some results from my cluster (using 1GE NICs, Fiber), Dell 5500, 24GB, 8-core (16 hypervised), JBOD, i saw slightly better numbers on a different 4-nodes c

Re: Optimal setup for a test problem

2010-04-13 Thread alex kamil
also http://www.slideshare.net/cloudera/hw09-optimizing-hadoop-deployments On Tue, Apr 13, 2010 at 12:58 PM, alex kamil wrote: > Andrew, > > here are some tips for hadoop runtime config: > http://cloudepr.blogspot.com/2009/09/cluster-facilities-hardware-and.html > also > > here are some results

Re: Optimal setup for a test problem

2010-04-13 Thread Todd Lipcon
On Mon, Apr 12, 2010 at 1:45 PM, Andrew Nguyen < andrew-lists-had...@ucsfcti.org> wrote: > I don't think you can :-). Sorry, they are 100Mbps NIC's... I get > 95Mbit/sec from one node to another with iperf. > > Should I still be expecting such dismal performance with just 100Mbps? > Yes - in my

Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-13 Thread Todd Lipcon
On Tue, Apr 13, 2010 at 4:13 AM, stephen mulcahy wrote: > Todd Lipcon wrote: > >> Most likely a kernel bug. In previous versions of Debian there was a buggy >> forcedeth driver, for example, that caused it to drop off the network in >> high load. Who knows what new bug is in 2.6.32 which is brand

Re: Optimal setup for a test problem

2010-04-13 Thread Andrew Nguyen
Good to know... The problem is that I'm in an academic environment that needs a lot of convincing regarding new computational technologies. I need to show proven benefit before getting the funds to actually implement anything. These servers were the best I could come up with for this proof-of-co

Re: Optimal setup for a test problem

2010-04-13 Thread Todd Lipcon
On Tue, Apr 13, 2010 at 11:40 AM, Andrew Nguyen < andrew-lists-had...@ucsfcti.org> wrote: > Good to know... The problem is that I'm in an academic environment that > needs a lot of convincing regarding new computational technologies. I need > to show proven benefit before getting the funds to ac

Re: Optimal setup for a test problem

2010-04-13 Thread Brian Bockelman
Hey Andrew, I can name 3 California universities (San Diego, Caltech, Santa-Barbera) that use Hadoop at a small (~20TB raw) or medium scale (~800TB raw). Why not go talk to those guys? Otherwise, you might just be able to confirm old hardware is old (there's good money that you might be hard

Per-file block size

2010-04-13 Thread Andrew Nguyen
I thought I saw a way to specify the block size for individual files using the command-line using "hadoop dfs -put/copyFromLocal..." However, I can't seem to find the reference anywhere. I see that I can do it via the API but no references to a command-line mechanism. Am I just remembering so

Re: Per-file block size

2010-04-13 Thread Amogh Vasekar
Hi, Pass the -D property in command line. eg: Hadoop fs -Ddfs.block.size= . You can check if its actually set the way you needed by hadoop fs -stat %o HTH, Amogh On 4/14/10 9:01 AM, "Andrew Nguyen" wrote: I thought I saw a way to specify the block size for individual files using the com

Re: How do I use MapFile Reader and Writer

2010-04-13 Thread Amogh Vasekar
Hi, The file system object will contain the scheme, authority etc for the given uri or path. The conf object acts as reference ( unable to get a better terminology ) to this info. Looking at the MapFileOutputFormat should help provide better understanding as to how writers and readers are initia

stop scripts not working properly

2010-04-13 Thread abhishek sharma
Hi all, I am using the Cloudera Hadoop distribution version 0.20.2+228. I have a small 9 node cluster and when I try to stop the Hadoop DFS and Mapred using the stop-mapred.sh and stop-dfs.sh scripts, it downs shutdown some of the TaskTrackers and DataNodes. I get a message saying no tasktracker

Re: stop scripts not working properly

2010-04-13 Thread Todd Lipcon
Hi Abhishek, Are you using the tarball or the RPMs/debs? The issue is most likely that your pid files are ending up in /tmp and thus getting cleaned out periodically. -Todd On Tue, Apr 13, 2010 at 11:07 PM, abhishek sharma wrote: > Hi all, > > I am using the Cloudera Hadoop distribution version

Re: stop scripts not working properly

2010-04-13 Thread abhishek sharma
Hi Todd, I am using the tarball. Let me try configuring the pid files to stored somewhere else. Thanks for the tip, Abhishek On Tue, Apr 13, 2010 at 11:10 PM, Todd Lipcon wrote: > Hi Abhishek, > > Are you using the tarball or the RPMs/debs? The issue is most likely that > your pid files are en

Re: cluster under-utilization with Hadoop Fair Scheduler

2010-04-13 Thread abhishek sharma
Hi Ted, Were you referring to the Hadoop 0.20.2 distribution or the CDH version? I just looked at the FairScheduler assignTasks function in Hadoop dist. 0.20.2 and it is the same as version 0.20.0, and it will assign only 1 Map and 1 reduce task to a tasktracker per heartbeat (as far I can tell b