thout patching the code within the
> aggregator package?
>
> It sure doesn't look like it, but just to make sure.
>
> Thanks again,
> -Dan M
>
>
> On Apr 24, 2009, at 12:56 PM, Runping Qi wrote:
>
> A couple of general goals behind of the aggregate package:
&g
A couple of general goals behind of the aggregate package:
1. If you are application developers using aggregate package, you only need
to develop your own (user defined) valuator descriptor classes, which are
typically sub class of ValueAggregatorDescriptor. You can use
the existing aggregator typ
You need to implement your own OutputFormat.
See MultipleOutputFormat class for examples.
Runping
On Wed, Mar 18, 2009 at 9:11 PM, Rodrigo Schmidt wrote:
>
> In a Hadoop job, how do I set the prefix of the output files to something
> different than "part-" ?
>
> I mean, what should I do if I w
out at the individual map task level. What would be the best
> way
> for me to determine that?
>
> -Sean
>
> On Wed, Mar 4, 2009 at 12:13 PM, Runping Qi wrote:
>
> > Do you know the break down of times for a mapper task takes to initialize
> > and to execute the map
Do you know the break down of times for a mapper task takes to initialize
and to execute the map function?
On Wed, Mar 4, 2009 at 8:44 AM, Sean Laurent wrote:
> On Tue, Mar 3, 2009 at 10:14 PM, Amar Kamat wrote:
>
> > Yeah. May be its not the problem with the JobTracker. Can you check (via
> >
Were task Trackers black-listed?
On Tue, Mar 3, 2009 at 3:25 PM, Nathan Marz wrote:
> I'm seeing some really bizarre behavior from Hadoop 0.19.1. I have a fairly
> large job with about 29000 map tasks and 72 reducers. there are 304 map task
> slots in the cluster. When the job starts, it runs 3
)
> 5) Run #4 281.96 (secs)
>
> I don't think that's the problem here... :(
>
> -S
> - Show quoted text -
>
> On Tue, Mar 3, 2009 at 2:33 PM, Runping Qi wrote:
>
> > The jobtracker's memory increased as you ran more and more jobs because
> th
e greatly appreciated.
>
> -Sean
> - Show quoted text -
>
> On Mon, Mar 2, 2009 at 7:50 PM, Runping Qi wrote:
>
> > Your problem may be related to
> > https://issues.apache.org/jira/browse/HADOOP-4766
> >
> > Runping
> >
> >
> > On Mon,
Your job tracker out-of-memory problem may be related to
https://issues.apache.org/jira/browse/HADOOP-4766
Runping
On Mon, Mar 2, 2009 at 4:29 PM, bzheng wrote:
>
> Thanks for all the info. Upon further investigation, we are dealing with
> two
> separate issues:
>
> 1. problem processing a l
Your problem may be related to
https://issues.apache.org/jira/browse/HADOOP-4766
Runping
On Mon, Mar 2, 2009 at 4:46 PM, Sean Laurent wrote:
> Hi all,
> I'm conducting some initial tests with Hadoop to better understand how well
> it will handle and scale with some of our specific problems. As
Yes, all the machines in the tests are new, with the same spec.
The 30% to 50% throughput variations of the disks were observed on the disks
of the same machines.
Runping
On 1/15/09 2:41 AM, "Steve Loughran" wrote:
> Runping Qi wrote:
>> Hi,
>>
>> We at Yah
Hi,
We at Yahoo did some Hadoop benchmarking experiments on clusters with JBOD
and RAID0. We found that under heavy loads (such as gridmix), JBOD cluster
performed better.
Gridmix tests:
Load: gridmix2
Cluster size: 190 nodes
Test results:
RAID0: 75 minutes
JBOD: 67 minutes
Difference: 10%
T
If you may have turned on ipv6 on your hadoop cluster, it may cause severe
performance hit!
When I ran the gridmix2 benchmark on a newly constructed cluster, it took
30% more time than the baseline time that was obtained on a similar cluster.
I noticed that some task processes on some machines
Each mapper works on only one file split, which is either from file1 or
file2 in your case. So the value for map.input.file gives you the exact
information you need.
Runping
On 10/23/08 11:09 AM, "Steve Gao" <[EMAIL PROTECTED]> wrote:
> Thanks, Amogh. But my case is slightly different. The
All this is because you were using streaming.
Streaming treats each line in the stream as one "record" and then break it
into a key/value pair (using '\t' as the separator by default).
If you write your mapper class in Java, the values passed to the calls to
your map function should be the whole te
Your record reader must be able to find the beginning of the next record
beyond the start position of a given split. Your file format must enable
your record reader to detect the beginning of the next record beyond the
start pos of a split. It seems to me that is not possible based on the
info I s
Looks like the reducer stuck at shuffling phase.
What is the progression percentage do you see for the reducer from web
GUI?
It is known that 0.17 does not handle shuffling well.
Runping
> -Original Message-
> From: Andreas Kostyrka [mailto:[EMAIL PROTECTED]
> Sent: Monday, June 30, 20
Right.
Please open a Jira for that.
Runping
> -Original Message-
> From: Goel, Ankur [mailto:[EMAIL PROTECTED]
> Sent: Friday, June 27, 2008 6:33 AM
> To: core-user@hadoop.apache.org
> Subject: RE: Using value aggregator framework with
> MultipleTextOutputFormat
>
> I guess I made a m
This is a known problem for 0.17.0:
https://issues.apache.org/jira/browse/HADOOP-3442
It should be fixed in 0.17.1
Runping
> -Original Message-
> From: Colin Freas [mailto:[EMAIL PROTECTED]
> Sent: Monday, June 09, 2008 12:56 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Stack Ov
You can run another map-only job to read convert the deflated files and
write them out in the format you want.
Runping
> -Original Message-
> From: Jim R. Wilson [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, June 04, 2008 4:13 PM
> To: core-user@hadoop.apache.org
> Subject: [core-user] H
Chris,
Your version will use LongWritable as the map output key type, which
changes the job nature completely. You should use
${hadoop} jar hadoop-0.17-examples.jar sort -m \
>-r 88 \
>-inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
>-outFormat org.apache.hadoop.mapred.
From: Flavio Junqueira [mailto:[EMAIL PROTECTED]
Sent: Saturday, May 31, 2008 2:27 AM
To: [EMAIL PROTECTED]
Subject: bug on jute?
Hi, I found a small bug on jute, and I was wondering how to proceed with
fixing it. The problem is the following. If I decla
My experience is to call Thread.sleep(100) after calling dfs writes N
(say 1000) times.
> -Original Message-
> From: Xavier Stevens [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, May 14, 2008 10:47 AM
> To: core-user@hadoop.apache.org
> Subject: FileSystem.create
>
> I've having some probl
Your diagnose sounds reasonable.
Since the mappers of your optimized solution outputs 3 key/value pairs
for each input key/value pair, the map output size may be three times of
the input size for each mapper. That size map exceeds the value of
io.sort.mb in your configuration. If so, the mappers h
Sounds like you also hit this problem:
https://issues.apache.org/jira/browse/HADOOP-2669
Runping
> -Original Message-
> From: Luca [mailto:[EMAIL PROTECTED]
> Sent: Friday, April 18, 2008 1:21 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Lease expired on open file
>
> dhruba Bor
Here is a related jira:
https://issues.apache.org/jira/browse/HADOOP-3126
> -Original Message-
> From: Devaraj Das [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, April 16, 2008 3:56 AM
> To: core-user@hadoop.apache.org
> Subject: RE: Counters giving double values
>
> Also, in those cases
Observing a few emails on this list, I think the following email
exchange between me and john may be of interest to a broader audience.
Runping
From: Runping Qi
Sent: Sunday, April 13, 2008 8:58 AM
To: 'JJ'
Subject: RE: streaming + bi
Actually, there is an old jira about the same issue:
https://issues.apache.org/jira/browse/HADOOP-1722
Runping
> -Original Message-
> From: John Menzer [mailto:[EMAIL PROTECTED]
> Sent: Saturday, April 12, 2008 2:45 PM
> To: core-user@hadoop.apache.org
> Subject: RE: streaming + binary
Look like you use your reducer class as the combiner.
The combiner will be called from mappers, potentially for multiple
times.
If you want to create side files in reducer, you cannot use that class
as the combiner.
Runping
> -Original Message-
> From: Zhang, jian [mailto:[EMAIL PROT
This is a know issue:
https://issues.apache.org/jira/browse/HADOOP-3033
Your best bet now is to use 0.16.2 release.
Runping
> -Original Message-
> From: Iván de Prado [mailto:[EMAIL PROTECTED]
> Sent: Friday, March 28, 2008 6:08 AM
> To: core-user@hadoop.apache.org
> Subject: DFS get b
If you want to output data to different files based on date or any value
parts, you may want to check
https://issues.apache.org/jira/browse/HADOOP-2906
Runping
> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 20, 2008 4:00 PM
> To: core-us
There is a package for joining data from multiple sources:
contrib/data-join.
It implements the basic joining logic and allows the user to provide
application specific logic for filtering/projecting and combining
multiple records into one.
Runping
> -Original Message-
> From: Ted Dun
32 matches
Mail list logo