Input and Output types?

2008-04-18 Thread Sridhar Raman
Hi I am new to MapReduce and Hadoop, and I have managed to find my way through with a few programs. But I still have some doubts that are constantly clinging onto me. I am not too sure whether these are basic doubts, or just some documentation that I missed somewhere. 1) Should my input

Re: Lease expired on open file

2008-04-18 Thread Luca
dhruba Borthakur wrote: The DFSClient has a thread that renews leases periodically for all files that are being written to. I suspect that this thread is not getting a chance to run because the gunzip program is eating all the CPU. You might want to put in a Sleep() after every few seconds on

Re: Not able to back up to S3

2008-04-18 Thread Steve Loughran
Chris K Wensel wrote: you cannot have underscores in a bucket name. it freaks out java.net.URI. freaks out DNS, too, which is why the java.net classes whine. minus signs should work -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action

Map Intermediate key/value pairs written to file system

2008-04-18 Thread Kayla Jay
Hi I have no reduces. I would like to directly write my map results while they are produced after each map has completed to disk. I don't want to collect then write to output. If I wanted to directly write my map output 1-by-1 (intermediate key/value pairs) after each map completes into

Map Intermediate key/value pairs written to file system

2008-04-18 Thread Kayla Jay
Hi I have no reduces. I would like to directly write my map results while they are produced after each map has completed to disk. I don't want to collect then write to output. If I wanted to directly write my map output 1-by-1 (intermediate key/value pairs) after each map completes into

RE: Lease expired on open file

2008-04-18 Thread Runping Qi
Sounds like you also hit this problem: https://issues.apache.org/jira/browse/HADOOP-2669 Runping -Original Message- From: Luca [mailto:[EMAIL PROTECTED] Sent: Friday, April 18, 2008 1:21 AM To: core-user@hadoop.apache.org Subject: Re: Lease expired on open file dhruba Borthakur

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das
Will your requirement be addressed if, from within the map method, you create a sequence file using SequenceFile.createWriter api, write a key/value using the writer's append(key,value) API and then close the file ? You can do this for every key/value. Pls have a look at createWriter APIs and

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Kayla Jay
Thanks for the quick reply. I will look into that and post further questions. I'm not familiar with the API, so I may have more questions. I'm assuming I will have to take out the output.collect so that it doesn't try to create that end file in addition to the individual files I will

RE: Reusing jobs

2008-04-18 Thread Devaraj Das
Jason, didn't get that. The jvm should exit naturally even without calling System.exit. Where exactly did you insert the System.exit? Please clarify. Thanks! -Original Message- From: Jason Venner [mailto:[EMAIL PROTECTED] Sent: Friday, April 18, 2008 6:48 PM To:

Make NameNode listen on multiple interfaces

2008-04-18 Thread David Alves
Hi In my setup I have a cluster in witch each server has two network interfaces on for hadoop network traffic (lets call it A) and one for traffic to the rest of the network (lets call it B). Until now I only needed to make the nodes communicate with the master and vice-versa

Red Hat Linux and Window XP in the same cluster

2008-04-18 Thread Leon Yu
I am new to hadoop. I would like to put a red hat linux box and window xp box (installed cygwin with openssh) in the same cluster. Both work in standalone mode and they can ping each other. But ssh does not work on either way. I suppose ssh working is the pre-requirement. Can anyone give a

Re: Input and Output types?

2008-04-18 Thread Owen O'Malley
On Apr 17, 2008, at 11:20 PM, Sridhar Raman wrote: I am new to MapReduce and Hadoop, and I have managed to find my way through with a few programs. But I still have some doubts that are constantly clinging onto me. I am not too sure whether these are basic doubts, or just some

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Kayla Jay
Hi. I don't know how to create unique individual file names for each mapper's key/value pairs. How do you create individual files per mappers key/value pairs so they don't overwrite one another? I.e how do you create a new file each time and use that code for all the mappers and not have

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das
Within a task you can get the taskId (which are unique). Define public void configure(JobConf job) and in that get the taskId by doing job.get(mapred.task.id) ). Now create filenames starting with that as the prefix and maybe a monotonically increasing integer as the suffix (defined as a static

Re: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Ted Dunning
Isn't this just what Hadoop does when you set numReduces = 0? On 4/18/08 10:45 AM, Devaraj Das [EMAIL PROTECTED] wrote: Within a task you can get the taskId (which are unique). Define public void configure(JobConf job) and in that get the taskId by doing job.get(mapred.task.id) ). Now

Re: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Ted Dunning
Yes, but Kayla is likely misguided in this respect. (my apologies for sounding doctrinaire) On 4/18/08 11:08 AM, Devaraj Das [EMAIL PROTECTED] wrote: Ted, note that Kayla wants one file per output key/value. -Original Message- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent:

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das
Well.. Kayla specifically mentioned that he wants one file per key/value.. Kayla should clarify this.. -Original Message- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Friday, April 18, 2008 11:54 PM To: core-user@hadoop.apache.org Subject: Re: Map Intermediate key/value pairs

Re: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Ted Dunning
Having one file per key value has several bad consequences: A) you are essentially doing a sort, but inefficiently. Better to have the framework do it well even if you really want this. B) having one file per key leads (often, not always) to having many files. This means that subsequent steps

Re: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Kayla Jay
Here's what I am trying to do. I get a set of huge text files that has many XML docs contained within one file. Now, within each of those xml docs within that one huge text file, it has some junk in the xml, so I want to clean it up basically. My custom input file format doesn't split the

How to instruct Job Tracker to use certain hosts only

2008-04-18 Thread Htin Hlaing
Hi - I have a situation that I cannot seem to get good answer myself. I am using 0.1.6.2. Basically, I have two jobs that I run in order from the same java driver process. I would like to run the first job to run on all the compute hosts in the cluster (which is by default) and then, I

jar files on NFS instead of DistributedCache

2008-04-18 Thread Mikhail Bautin
Hello, We are using Hadoop here at Stony Brook University to power the next-generation text analytics backend for www.textmap.com. We also have an NFS partition that is mounted on all machines of our 100-node cluster. I found it much more convenient to store manually created files (e.g.

Re: jar files on NFS instead of DistributedCache

2008-04-18 Thread Doug Cutting
Mikhail Bautin wrote: Specifically, I just need a way to alter the child JVM's classpath via JobConf, without having the framework copy anything in and out of HDFS, because all my files are already accessible from all nodes. I see how to do that by adding a couple of lines to TaskRunner's run()

TestDU.testDU() throws assertionfailederror

2008-04-18 Thread Edward J. Yoon
Hi, community. In my local computer (CentOS 5), TestDU.testDU() throws assertionfailederror. What is the 4096 byte? Testsuite: org.apache.hadoop.fs.TestDU Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 5.143 sec Testcase: testDU took 5.138 sec FAILED expected:32768 but was:36864

Re: Hadoop summit video capture?

2008-04-18 Thread Jeremy Zawodny
Almost... The videos and slides are up (as of yesterday) but there appears to be an ACL problem with the videos. http://developer.yahoo.com/blogs/hadoop/2008/04/hadoop_summit_slides_and_video.html Jeremy On 4/17/08, wuqi [EMAIL PROTECTED] wrote: Are the videos and slides available now?

Re: TestDU.testDU() throws assertionfailederror

2008-04-18 Thread Konstantin Shvachko
Edward, testDU() writes a 32K file to the local fs and then verifies whether the value reported by du changes exactly to the amount written. Although this is true for most block oriented file systems it might not be true for some. I suspect that in your case the file is written to tmpfs, which