Hi
I am new to MapReduce and Hadoop, and I have managed to find my way through
with a few programs. But I still have some doubts that are constantly
clinging onto me. I am not too sure whether these are basic doubts, or just
some documentation that I missed somewhere.
1) Should my input
dhruba Borthakur wrote:
The DFSClient has a thread that renews leases periodically for all files
that are being written to. I suspect that this thread is not getting a
chance to run because the gunzip program is eating all the CPU. You
might want to put in a Sleep() after every few seconds on
Chris K Wensel wrote:
you cannot have underscores in a bucket name. it freaks out java.net.URI.
freaks out DNS, too, which is why the java.net classes whine. minus
signs should work
--
Steve Loughran http://www.1060.org/blogxter/publish/5
Author: Ant in Action
Hi
I have no reduces. I would like to directly write my map results while they are
produced after each map has completed to disk. I don't want to collect then
write to output.
If I wanted to directly write my map output 1-by-1 (intermediate key/value
pairs) after each map completes into
Hi
I have no reduces. I would like to directly write my map results while they are
produced after each map has completed to disk. I don't want to collect then
write to output.
If I wanted to directly write my map output 1-by-1 (intermediate key/value
pairs) after each map completes into
Sounds like you also hit this problem:
https://issues.apache.org/jira/browse/HADOOP-2669
Runping
-Original Message-
From: Luca [mailto:[EMAIL PROTECTED]
Sent: Friday, April 18, 2008 1:21 AM
To: core-user@hadoop.apache.org
Subject: Re: Lease expired on open file
dhruba Borthakur
Will your requirement be addressed if, from within the map method, you
create a sequence file using SequenceFile.createWriter api, write a
key/value using the writer's append(key,value) API and then close the file
? You can do this for every key/value.
Pls have a look at createWriter APIs and
Thanks for the quick reply. I will look into that and post further questions.
I'm not familiar with the API, so I may have more questions.
I'm assuming I will have to take out the output.collect so that it doesn't try
to create that end file in addition to the individual files I will
Jason, didn't get that. The jvm should exit naturally even without calling
System.exit. Where exactly did you insert the System.exit? Please clarify.
Thanks!
-Original Message-
From: Jason Venner [mailto:[EMAIL PROTECTED]
Sent: Friday, April 18, 2008 6:48 PM
To:
Hi
In my setup I have a cluster in witch each server has two network
interfaces on for hadoop network traffic (lets call it A) and one for
traffic to the rest of the network (lets call it B).
Until now I only needed to make the nodes communicate with the master
and vice-versa
I am new to hadoop. I would like to put a red hat linux box and window xp box
(installed cygwin with openssh) in the same cluster. Both work in standalone
mode and they can ping each other. But ssh does not work on either way. I
suppose ssh working is the pre-requirement. Can anyone give a
On Apr 17, 2008, at 11:20 PM, Sridhar Raman wrote:
I am new to MapReduce and Hadoop, and I have managed to find my way
through
with a few programs. But I still have some doubts that are constantly
clinging onto me. I am not too sure whether these are basic
doubts, or just
some
Hi.
I don't know how to create unique individual file names for each mapper's
key/value pairs. How do you create individual files per mappers key/value
pairs so they don't overwrite one another?
I.e how do you create a new file each time and use that code for all the
mappers and not have
Within a task you can get the taskId (which are unique). Define public void
configure(JobConf job) and in that get the taskId by doing
job.get(mapred.task.id) ).
Now create filenames starting with that as the prefix and maybe a
monotonically increasing integer as the suffix (defined as a static
Isn't this just what Hadoop does when you set numReduces = 0?
On 4/18/08 10:45 AM, Devaraj Das [EMAIL PROTECTED] wrote:
Within a task you can get the taskId (which are unique). Define public void
configure(JobConf job) and in that get the taskId by doing
job.get(mapred.task.id) ).
Now
Yes, but Kayla is likely misguided in this respect.
(my apologies for sounding doctrinaire)
On 4/18/08 11:08 AM, Devaraj Das [EMAIL PROTECTED] wrote:
Ted, note that Kayla wants one file per output key/value.
-Original Message-
From: Ted Dunning [mailto:[EMAIL PROTECTED]
Sent:
Well.. Kayla specifically mentioned that he wants one file per key/value..
Kayla should clarify this..
-Original Message-
From: Ted Dunning [mailto:[EMAIL PROTECTED]
Sent: Friday, April 18, 2008 11:54 PM
To: core-user@hadoop.apache.org
Subject: Re: Map Intermediate key/value pairs
Having one file per key value has several bad consequences:
A) you are essentially doing a sort, but inefficiently. Better to have the
framework do it well even if you really want this.
B) having one file per key leads (often, not always) to having many files.
This means that subsequent steps
Here's what I am trying to do.
I get a set of huge text files that has many XML docs contained within one
file. Now, within each of those xml docs within that one huge text file, it
has some junk in the xml, so I want to clean it up basically.
My custom input file format doesn't split the
Hi - I have a situation that I cannot seem to get good answer myself. I
am using 0.1.6.2.
Basically, I have two jobs that I run in order from the same java driver
process.
I would like to run the first job to run on all the compute hosts in the
cluster (which is by default) and then, I
Hello,
We are using Hadoop here at Stony Brook University to power the
next-generation text analytics backend for www.textmap.com. We also have an
NFS partition that is mounted on all machines of our 100-node cluster. I
found it much more convenient to store manually created files (e.g.
Mikhail Bautin wrote:
Specifically, I just need a way to alter the child JVM's classpath via
JobConf, without having the framework copy anything in and out of HDFS,
because all my files are already accessible from all nodes. I see how to do
that by adding a couple of lines to TaskRunner's run()
Hi, community.
In my local computer (CentOS 5), TestDU.testDU() throws assertionfailederror.
What is the 4096 byte?
Testsuite: org.apache.hadoop.fs.TestDU
Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 5.143 sec
Testcase: testDU took 5.138 sec
FAILED
expected:32768 but was:36864
Almost... The videos and slides are up (as of yesterday) but there appears
to be an ACL problem with the videos.
http://developer.yahoo.com/blogs/hadoop/2008/04/hadoop_summit_slides_and_video.html
Jeremy
On 4/17/08, wuqi [EMAIL PROTECTED] wrote:
Are the videos and slides available now?
Edward,
testDU() writes a 32K file to the local fs and then verifies whether the value
reported by du
changes exactly to the amount written.
Although this is true for most block oriented file systems it might not be true
for some.
I suspect that in your case the file is written to tmpfs, which
25 matches
Mail list logo