date:20080418

Re: jar files on NFS instead of DistributedCache

2008-04-18 Thread Mikhail Bautin

This is the way I've been doing it, but this requires cluster restart when each additional .jar is added, and certainly is not suitable for multiple users using the cluster independently, especially when the .jars in concern contain user code and not just standard libraries. Thanks, Mikhail On 4/

Re: TestDU.testDU() throws assertionfailederror

2008-04-18 Thread Konstantin Shvachko

Edward, testDU() writes a 32K file to the local fs and then verifies whether the value reported by du changes exactly to the amount written. Although this is true for most block oriented file systems it might not be true for some. I suspect that in your case the file is written to tmpfs, which

Re: Hadoop summit video capture?

2008-04-18 Thread Jeremy Zawodny

Almost... The videos and slides are up (as of yesterday) but there appears to be an ACL problem with the videos. http://developer.yahoo.com/blogs/hadoop/2008/04/hadoop_summit_slides_and_video.html Jeremy On 4/17/08, wuqi <[EMAIL PROTECTED]> wrote: > > Are the videos and slides available now? > >

TestDU.testDU() throws assertionfailederror

2008-04-18 Thread Edward J. Yoon

Hi, community. In my local computer (CentOS 5), TestDU.testDU() throws assertionfailederror. What is the "4096 byte"? Testsuite: org.apache.hadoop.fs.TestDU Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 5.143 sec Testcase: testDU took 5.138 sec FAILED expected:<32768> but was:<36864

Re: jar files on NFS instead of DistributedCache

2008-04-18 Thread Doug Cutting

Mikhail Bautin wrote: Specifically, I just need a way to alter the child JVM's classpath via JobConf, without having the framework copy anything in and out of HDFS, because all my files are already accessible from all nodes. I see how to do that by adding a couple of lines to TaskRunner's run()

jar files on NFS instead of DistributedCache

2008-04-18 Thread Mikhail Bautin

Hello, We are using Hadoop here at Stony Brook University to power the next-generation text analytics backend for www.textmap.com. We also have an NFS partition that is mounted on all machines of our 100-node cluster. I found it much more convenient to store manually created files (e.g. configur

Re: Make NameNode listen on multiple interfaces

2008-04-18 Thread Raghu Angadi

You can try setting fs.default.name to "0.0.0.0:8020" on the NameNode and it might listen on all the interface (there should really be a better way to specify bind address). But that mostly wont solve all the problems for accessing from a different network. The problem is that client use the

How to instruct Job Tracker to use certain hosts only

2008-04-18 Thread Htin Hlaing

Hi - I have a situation that I cannot seem to get good answer myself. I am using 0.1.6.2. Basically, I have two jobs that I run in order from the same java driver process. I would like to run the first job to run on all the compute hosts in the cluster (which is by default) and then, I

Re: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Kayla Jay

Here's what I am trying to do. I get a set of huge text files that has many XML docs contained within one file. Now, within each of those xml docs within that one huge text file, it has some junk in the xml, so I want to clean it up basically. My custom input file format doesn't split the

Re: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Ted Dunning

Having one file per key value has several bad consequences: A) you are essentially doing a sort, but inefficiently. Better to have the framework do it well even if you really want this. B) having one file per key leads (often, not always) to having many files. This means that subsequent steps a

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Kayla Jay

Hi. Thanks for all of your responses -- it has been very helpful indeed. I am working on the filename as suggested and will ask more questions if I get stuck. I do need one file per key/value. I don't want to sit and collect then create the output file. Is this something bad based on thi

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das

Well.. Kayla specifically mentioned that he wants one file per key/value.. Kayla should clarify this.. > -Original Message- > From: Ted Dunning [mailto:[EMAIL PROTECTED] > Sent: Friday, April 18, 2008 11:54 PM > To: core-user@hadoop.apache.org > Subject: Re: Map Intermediate key/value pa

Re: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Ted Dunning

Yes, but Kayla is likely misguided in this respect. (my apologies for sounding doctrinaire) On 4/18/08 11:08 AM, "Devaraj Das" <[EMAIL PROTECTED]> wrote: > Ted, note that Kayla wants one file per output key/value. > >> -Original Message- >> From: Ted Dunning [mailto:[EMAIL PROTECTED]

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das

Ted, note that Kayla wants one file per output key/value. > -Original Message- > From: Ted Dunning [mailto:[EMAIL PROTECTED] > Sent: Friday, April 18, 2008 11:20 PM > To: core-user@hadoop.apache.org > Subject: Re: Map Intermediate key/value pairs written to file system > > > Isn't this

Re: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Ted Dunning

Isn't this just what Hadoop does when you set numReduces = 0? On 4/18/08 10:45 AM, "Devaraj Das" <[EMAIL PROTECTED]> wrote: > Within a task you can get the taskId (which are unique). Define "public void > configure(JobConf job)" and in that get the taskId by doing > job.get("mapred.task.id") ).

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das

Within a task you can get the taskId (which are unique). Define "public void configure(JobConf job)" and in that get the taskId by doing job.get("mapred.task.id") ). Now create filenames starting with that as the prefix and maybe a monotonically increasing integer as the suffix (defined as a stat

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Kayla Jay

Hi. I don't know how to create unique individual file names for each mapper's key/value pairs. How do you create individual files per mappers key/value pairs so they don't overwrite one another? I.e how do you create a new file each time and use that code for all the mappers and not have each

Re: Reusing jobs

2008-04-18 Thread Jason Venner

When there are non daemon threads, JMX threads being our #1 cause, the jvm will not exit with out help. This is in TaskTracker.java, in 0.16.0, this is line 2088, in the finally clause of Child.main LogManager.shutdown(); System.exit( 0 ); // Force the jvm to exit even if i

Re: Input and Output types?

2008-04-18 Thread Owen O'Malley

On Apr 17, 2008, at 11:20 PM, Sridhar Raman wrote: I am new to MapReduce and Hadoop, and I have managed to find my way through with a few programs. But I still have some doubts that are constantly clinging onto me. I am not too sure whether these are basic doubts, or just some documentat

Red Hat Linux and Window XP in the same cluster

2008-04-18 Thread Leon Yu

I am new to hadoop. I would like to put a red hat linux box and window xp box (installed cygwin with openssh) in the same cluster. Both work in standalone mode and they can ping each other. But ssh does not work on either way. I suppose ssh working is the pre-requirement. Can anyone give a hand

Make NameNode listen on multiple interfaces

2008-04-18 Thread David Alves

Hi In my setup I have a cluster in witch each server has two network interfaces on for hadoop network traffic (lets call it A) and one for traffic to the rest of the network (lets call it B). Until now I only needed to make the nodes communicate with the master and vice-versa (throu

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Kayla Jay

Thanks for the quick reply. I will look into that and post further questions. I'm not familiar with the API, so I may have more questions. I'm assuming I will have to take out the output.collect so that it doesn't try to create that end file in addition to the individual files I will force/creat

RE: Reusing jobs

2008-04-18 Thread Devaraj Das

Jason, didn't get that. The jvm should exit naturally even without calling System.exit. Where exactly did you insert the System.exit? Please clarify. Thanks! > -Original Message- > From: Jason Venner [mailto:[EMAIL PROTECTED] > Sent: Friday, April 18, 2008 6:48 PM > To: core-user@hadoop

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das

Will your requirement be addressed if, from within the map method, you create a sequence file using SequenceFile.createWriter api, write a key/value using the writer's append(key,value) API and then close the file ? You can do this for every key/value. Pls have a look at createWriter APIs and the

Re: Reusing jobs

2008-04-18 Thread Jason Venner

We have terrible issues with threads in the JVM's holding down resources and causing the compute nodes to run out of memory and lock up. We in fact patch the JobTracker to cause the mapper/reduce jvm to System.exit, to ensure that the resources are freed. This is particularly a problem for map

RE: Lease expired on open file

2008-04-18 Thread Runping Qi

Sounds like you also hit this problem: https://issues.apache.org/jira/browse/HADOOP-2669 Runping > -Original Message- > From: Luca [mailto:[EMAIL PROTECTED] > Sent: Friday, April 18, 2008 1:21 AM > To: core-user@hadoop.apache.org > Subject: Re: Lease expired on open file > > dhruba Bor

Map Intermediate key/value pairs written to file system

2008-04-18 Thread Kayla Jay

Hi I have no reduces. I would like to directly write my map results while they are produced after each map has completed to disk. I don't want to collect then write to output. If I wanted to directly write my map output 1-by-1 (intermediate key/value pairs) after each map completes into indiv

Map Intermediate key/value pairs written to file system

2008-04-18 Thread Kayla Jay

Hi I have no reduces. I would like to directly write my map results while they are produced after each map has completed to disk. I don't want to collect then write to output. If I wanted to directly write my map output 1-by-1 (intermediate key/value pairs) after each map completes into indiv

Re: Not able to back up to S3

2008-04-18 Thread Steve Loughran

Chris K Wensel wrote: you cannot have underscores in a bucket name. it freaks out java.net.URI. freaks out DNS, too, which is why the java.net classes whine. minus signs should work -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http

Re: Lease expired on open file

2008-04-18 Thread Luca

dhruba Borthakur wrote: The DFSClient has a thread that renews leases periodically for all files that are being written to. I suspect that this thread is not getting a chance to run because the gunzip program is eating all the CPU. You might want to put in a Sleep() after every few seconds on unz

Re: jar files on NFS instead of DistributedCache

Re: TestDU.testDU() throws assertionfailederror

Re: Hadoop summit video capture?

TestDU.testDU() throws assertionfailederror

Re: jar files on NFS instead of DistributedCache

jar files on NFS instead of DistributedCache

Re: Make NameNode listen on multiple interfaces

How to instruct Job Tracker to use certain hosts only

Re: Map Intermediate key/value pairs written to file system

Re: Map Intermediate key/value pairs written to file system

RE: Map Intermediate key/value pairs written to file system

RE: Map Intermediate key/value pairs written to file system

Re: Map Intermediate key/value pairs written to file system

RE: Map Intermediate key/value pairs written to file system

Re: Map Intermediate key/value pairs written to file system

RE: Map Intermediate key/value pairs written to file system

RE: Map Intermediate key/value pairs written to file system

Re: Reusing jobs

Re: Input and Output types?

Red Hat Linux and Window XP in the same cluster

Make NameNode listen on multiple interfaces

RE: Map Intermediate key/value pairs written to file system

RE: Reusing jobs

RE: Map Intermediate key/value pairs written to file system

Re: Reusing jobs

RE: Lease expired on open file

Map Intermediate key/value pairs written to file system

Map Intermediate key/value pairs written to file system

Re: Not able to back up to S3

Re: Lease expired on open file

30 matches

Site Navigation

Mail list logo

Footer information