Re: How do I create per-reducer temporary files?

2011-05-04 Thread Matt Pouttu-Clarke
Bryan, Not sure you should be concerned with whether the output is on local vs. HDFS. I wouldn't think there would be much of a performance difference if you are doing streaming output (append) in both cases. Hadoop already uses local storage where ever possible (including for the task working d

Re: bin/start-dfs/mapred.sh with input slave file

2011-05-04 Thread Harsh J
Keep two configuration directories with different slaves files (say conf.dfs/ and conf.mr/) and use `hadoop-daemons.sh --config {conf dir path} start {daemon}` to start up DN/TT daemons. On Thu, May 5, 2011 at 8:06 AM, Matthew John wrote: > Hi all, > > I see that there is an option to provide a s

When is Map updated for a Job

2011-05-04 Thread Matthew John
Hi all, I went down the lines of figuring out how the JobTracker, JobInProgress and TaskScheduler combined works out the problem of giving a Task (corresponding to an InputSplit) to a Node (corresponding to a TaskTracker in the node). I understand that a set of methods in JobInProgress like obtain

bin/start-dfs/mapred.sh with input slave file

2011-05-04 Thread Matthew John
Hi all, I see that there is an option to provide a slaves_file as input to bin/start-dfs.sh and bin/start-mapred.sh so that slaves are parsed from this input file rather than the default conf/slaves. Can someone please help me with the syntax for this. I am not able to figure this out. Thanks, M

Re: Cluster hard drive ratios

2011-05-04 Thread M. C. Srivas
Hey Matt, we are using the same Dell boxes, and we can get 2 GB/s per node (read and write) without problems. On Wed, May 4, 2011 at 8:43 AM, Matt Goeke wrote: > I have been reviewing quite a few presentations on the web from > various businesses, in addition to the ones I watched first hand

Re: Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread He Chen
Got it. Thankyou Harsh. BTW It is `hadoop dfs -Ddfs.blocksize=size -put file file`. No dot between "block" and "size" On Wed, May 4, 2011 at 3:18 PM, He Chen wrote: > Tried second solution. Does not work, still 2 64M blocks. h > > > On Wed, May 4, 2011 at 3:16 PM, He Chen wrote: > >> Hi Har

(nfs) outputdir

2011-05-04 Thread gabriel
Hello I'm using a small fully distributed Hadoop cluster. All Hadoop daemons run under "hadoop" users, and I submit jobs as "user". I ran into a couple of problems when I set mapred.output.dir to an (nfs) file:// location. 1. The output dir gets created, but it belongs to "hadoop". It sort o

Re: don't want to output anything

2011-05-04 Thread Gang Luo
Exactly what I want. Thanks Harsh J. -Gang - 原始邮件 发件人: Harsh J 收件人: common-user@hadoop.apache.org 发送日期: 2011/5/4 (周三) 4:03:35 下午 主 题: Re: don't want to output anything Hello Gang, On Thu, May 5, 2011 at 1:22 AM, Gang Luo wrote: > > > Hi, > > I use MapReduce to process and output

Re: Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread He Chen
Tried second solution. Does not work, still 2 64M blocks. h On Wed, May 4, 2011 at 3:16 PM, He Chen wrote: > Hi Harsh > > Thank you for the reply. > > Actually, the hadoop directory is on my NFS server, every node reads the > same file from NFS server. I think this is not a problem. > > I li

Re: Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread He Chen
Hi Harsh Thank you for the reply. Actually, the hadoop directory is on my NFS server, every node reads the same file from NFS server. I think this is not a problem. I like your second solution. But I am not sure, whether the namenode will divide those 128MB blocks to smaller ones in future or

Re: How do I create per-reducer temporary files?

2011-05-04 Thread Bryan Keller
Am I mistaken or are side-effect files on HDFS? I need my temp files to be on the local filesystem. Also, the java working directory is not the reducer's local processing directory, thus "./tmp" doesn't get me what I'm after. As it stands now I'm using java.io.tmpdir which is not a long-term sol

Re: don't want to output anything

2011-05-04 Thread Harsh J
Hello Gang, On Thu, May 5, 2011 at 1:22 AM, Gang Luo wrote: > > > Hi, > > I use MapReduce to process and output my own stuff, in a customized way. I > don't > use context.write to output anything, and thus I don't want the empty files > part-r-x on my fs. Is there someway to eliminate the ou

Re: Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread Harsh J
Your client (put) machine must have the same block size configuration during upload as well. Alternatively, you may do something explicit like `hadoop dfs -Ddfs.block.size=size -put file file` On Thu, May 5, 2011 at 12:59 AM, He Chen wrote: > Hi all > > I met a problem about changing block size

don't want to output anything

2011-05-04 Thread Gang Luo
Hi, I use MapReduce to process and output my own stuff, in a customized way. I don't use context.write to output anything, and thus I don't want the empty files part-r-x on my fs. Is there someway to eliminate the output? Thanks. -Gang

Re: How do I create per-reducer temporary files?

2011-05-04 Thread Matt Pouttu-Clarke
Hi Bryan, These are called side effect files, and I use them extensively: O'Riley Hadoop 2nd Edition, p. 187 Pro Hadoop, p. 279 You get the path to the save the file(s) using: http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/Fi leOutputFormat.html#getWorkOutputPath%28org

Re: How do I create per-reducer temporary files?

2011-05-04 Thread Harsh J
Bryan, The relative ./tmp directory is under the work-directory of the task's attempt on a node. I believe it should be sufficient to use that as a base, to create directories or files under? Or are you simply looking for a way to expand this relative directory name to a fully absolute one (which

Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread He Chen
Hi all I met a problem about changing block size from 64M to 128M. I am sure I modified the correct configuration file hdfs-site.xml. Because I can change the replication number correctly. However, it does not work on block size changing. For example: I change the dfs.block.size to 134217728 byt

RE: Cluster hard drive ratios

2011-05-04 Thread Michael Segel
Hey! Sorry my math is off. I keep thinking in terms of TB per core and not drives. :-) To be honest I don't know if I would recommend 6 core cpus. We're running on what is now considered 'old hardware' (Intel Xeon e5500 series) . Yes we saw that w 8 cores and 4 drives, we were limited by the

Re: How do I create per-reducer temporary files?

2011-05-04 Thread Bryan Keller
Right. What I am struggling with is how to retrieve the path/drive that the reducer is using, so I can use the same path for local temp files. On May 4, 2011, at 9:03 AM, Robert Evans wrote: > Bryan, > > I believe that map/reduce gives you a single drive to write to so that your > reducer has

RE: Cluster hard drive ratios

2011-05-04 Thread Matt Goeke
Mike, Thanks for the response. It looks like this discussion forked on the CDH list so I have two different conversations now. Also, you're dead on that one of the presentations I was referencing was Ravi's. With your setup I agree that it would have made no sense to go the 2.5" drive route given

RE: Cluster hard drive ratios

2011-05-04 Thread Michael Segel
Hi Matt. I think you attended Ravi's presentation One of the reasons we used 4 drives per node is that our nodes are in 1U boxes and you can only fit 4 3.5" SATA drives in those boxes. Could we have gone for more drives using 2.5" SATA drives? Yes, but then you will reduce the amount of

Cluster hard drive ratios

2011-05-04 Thread Matt Goeke
I have been reviewing quite a few presentations on the web from various businesses, in addition to the ones I watched first hand at the cloudera data summit last week, and I am curious as to others thoughts around hard drive ratios. Various sources including Cloudera have sited 1 HDD x 2 cores x 4

Re: How do I create per-reducer temporary files?

2011-05-04 Thread Robert Evans
Bryan, I believe that map/reduce gives you a single drive to write to so that your reducer has less of an impact on other reducers/mappers running on the same box. If you want to write to more drives I thought the idea would then be to increase the number of reducers you have and let mapred as

Re: How do I configure a Partitioner in the new API?

2011-05-04 Thread W.P. McNeill
I only wanted the context as a way of getting at the configuration, so making the class implement Configurable will solve my problem. On Tue, May 3, 2011 at 11:35 PM, Harsh J wrote: > Hello, > > On Wed, May 4, 2011 at 5:13 AM, W.P. McNeill wrote: > > I have a new-API > > Partitioner< > http://h

Re: How do I create per-reducer temporary files?

2011-05-04 Thread Bryan Keller
I too am looking for the best place to put local temp files I create during reduce processing. I am hoping there is a variable or property someplace that defines a per-reducer temp directory. The "mapred.child.tmp" property is by default simply the relative directory "./tmp" so it isn't useful o

What is GEO_RSS_URI ???

2011-05-04 Thread praveenesh kumar
Hello Hadoop users, I came across some Map-Reduce examples on google code. Here is the link http://code.google.com/p/hadoop-map-reduce-examples/wiki/Wikipedia_GeoLocation In the Mapper class, the writer has used GEO_RSS_URI. If someone has used these codes, can anyone help me in figuring out w

Use of Transaction Log

2011-05-04 Thread hadoop maniac
Hello, Can anyone please explain the use of the Transaction Log on the NameNode ? AFAIK, it logs the files created and deleted details in the Hadoop Cluster. Thanks.

33 Days left to Berlin Buzzwords 2011

2011-05-04 Thread Simon Willnauer
hey folks, BerlinBuzzwords 2011 is close only 33 days left until the big Search, Store and Scale opensource crowd is gathering in Berlin on June 6th/7th. The conference again focuses on the topics search, data analysis and NoSQL. It is to take place on June 6/7th 2011 in Berlin. We are looking f