Re: Reading Records from a Sequence File

2011-04-01 Thread Harsh J
On Fri, Apr 1, 2011 at 9:00 AM, maha wrote: > Hello Everyone, > >        As far as I know, when my java program opens a sequence file for a map > calculations, from hdfs. Using SequenceFile.Reader(key,value) will actually > read the file in dfs.block.size then grabes record-by-record from memory

Re: sorting reducer input numerically in hadoop streaming

2011-04-01 Thread Harsh J
You will need to supply your own Key-comparator Java class by setting an appropriate parameter for it, as noted in: http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#A+Useful+Comparator+Class [The -D mapred.output.key.comparator.class=xyz part] On Thu, Mar 31, 2011 at 6:26 PM, Dieter Pla

Re: Awareness of Map tasks

2011-04-01 Thread Harsh J
Hey Matthew, You can gain some more knowledge on this by reading up on how the MapReduce parts interact with their DFS counterparts in Hadoop's architecture. Yahoo's resources carry a good graphical representation and description, for starters: http://developer.yahoo.com/hadoop/tutorial/module4.h

Problem in Job/TasK scheduling

2011-04-01 Thread Nitin Khandelwal
Hi, I am right now stuck on the issue of division of tasks among slaves for a job. Currently, as far as I know, hadoop does not allow us to fix/determine in advance how many tasks of a job would run on each slave. I am trying to design a system, where I want a slave to execute only one task of on

Lost Task Tracker

2011-04-01 Thread baran cakici
Hi, First of all, I use Hadoop-0.20.2 on Windows XP Pro with Eclipse Plug-In. I have a Cluster with 1 Master (Jobtracker and Namenode) and 4 Slaves(Datanode and TaskTracker). I have some problems about my Hadoop-Cluster last few weeks. When I start a job with big Input(4GB - it`s may be not to bi

Re: How to avoid receiving threads send by other people.

2011-04-01 Thread ke xie
Please see the FAQ. You can subscribe a digest-list Regards, Ke Xie On Thu, Mar 31, 2011 at 9:01 PM, XiaoboGu wrote: > Hi, > > I have subscribed to the digest mode, but I still get all the messages > instantly from other people in the list. But other mailing list won’t do > this, they will send

RE: How to avoid receiving threads send by other people.

2011-04-01 Thread XiaoboGu
Delivery to the following recipient failed permanently: common-user-digest-sc.1301575901.ekjafjdnhkdohmdpgfpk-guxiaobo1982=gmail@hadoop.apache.org Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the recipient domain. We recommend co

runtime resource change of applications

2011-04-01 Thread bikash sharma
Hi, Can we dynamically vary the resource allocation/consumption (say memory, cores) of Hadoop MR applications like sort? Thanks, Bikash

NameNode reliability on EC2

2011-04-01 Thread Chetan Sarva
Hi, I'm exploring solutions to handle instance and/or volume failures of my NameNode. DRBD seems to be the most common solution in general, but it's not an option on EC2 as DRBD won't work in that environment (AFAIK). These are the two options I've come up with so far: * glusterfs to replicate th

Re: questions on map-side spills

2011-04-01 Thread Shrinivas Joshi
I would appreciate any inputs on this. Thanks, -Shrinivas On Thu, Mar 31, 2011 at 11:29 AM, Shrinivas Joshi wrote: > I am trying TeraSort with Apache 0.21.0 build. io.sort.mb is 360M, > map.sort.spill.percent is 0.8, dfs.blocksize is 256M. I am having some > difficulty understanding spill relate

Chukwa setup issues

2011-04-01 Thread bikash sharma
Hi, I am trying to setup Chukwa for a 16-node Hadoop cluster. I followed the admin guide - http://incubator.apache.org/chukwa/docs/r0.4.0/admin.html#Agents However, I ran two the following issues: 1. What should be the collector port that needs to be specified in conf/collectors file 2. Am unable t

Re: Problem in Job/TasK scheduling

2011-04-01 Thread Harsh J
Perhaps you can have a look at the available Job schedulers - Capacity Scheduler and Fair Scheduler? I think they ought to fit into your requirements pretty well. On Fri, Apr 1, 2011 at 3:33 PM, Nitin Khandelwal wrote: > Hi, > > > I am right now stuck on the issue of division of tasks among slave

Is Rumen broken in Hadoop 0.21.0?

2011-04-01 Thread Zhenhua Guo
I am trying Rumen to process Hadoop logs. However, it always gives me errors. 1) name of job log file incorrect In my Hadoop installation, name of job log file looks like "job___". But Rumen expects "__job___" which seems to be used by old versions of Hadoop. I manually renamed log file to fix t

Re: Chukwa setup issues

2011-04-01 Thread Bill Graham
Unfortunately conf/collectors is used in two different ways in Chukwa, each with a different syntax. This should really be fixed. 1. The script that starts the collectors looks at it for a list of hostnames (no ports) to start collectors on. To start it just on one host, set it to localhost. 2. Th

Re: Chukwa setup issues

2011-04-01 Thread bikash sharma
Thanks Bill. I am able to connect via web now, actually had put wrong http port in config file. One following question - if i run a mapreduce program say terasort, how can we link chukwa to collect job metrics via web. On Fri, Apr 1, 2011 at 5:37 PM, Bill Graham wrote: > Unfortunately conf/coll

Re: Chukwa setup issues

2011-04-01 Thread bikash sharma
I was trying to install HICC in Chukwa, but hicc.sh does not exist in the repository. Any idea? -bikash On Fri, Apr 1, 2011 at 5:57 PM, bikash sharma wrote: > Thanks Bill. > I am able to connect via web now, actually had put wrong http port in > config file. > One following question - if i run a

Question on Streaming with RVM. Perhaps environment settings related.

2011-04-01 Thread Guang-Nan Cheng
Hi, I'm using Cloudera's distribution with the pseudo config. I'm also using a system-wide install of RVM, which manages Ruby and Gems. My mapper is a Ruby script like this #!/bin/env ruby ... The problem is MapRed process seems can't load RVM. I added /etc/profile.d/rvm.sh in hadoop-env.sh

Re: Question on Streaming with RVM. Perhaps environment settings related.

2011-04-01 Thread Harsh J
Is the 'ruby' binary available in the $PATH for the 'mapred' user? You can see if it finds one using 'which ruby'. On Sat, Apr 2, 2011 at 10:17 AM, Guang-Nan Cheng wrote: >   /bin/env: ruby: No such file or directory -- Harsh J http://harshj.com

Re: Question on Streaming with RVM. Perhaps environment settings related.

2011-04-01 Thread Guang-Nan Cheng
If I manually logged in as `mapred', then yes. No matter it's interactive or non-interactive shell. But streaming seems launch the program differently. I've tried to use `-mapper env' for diagnosis. It prints things below. So right, `ruby' is not in $PATH when streaming launch it. fs_s3n_block