On Thu, Apr 23, 2009 at 11:06 PM, Zhou, Yunqing wrote:
>
>> currently I'm managing a 64-nodes hadoop 0.19.1 cluster with 100TB data.
>> and I found 0.19.1 is buggy and I have already applied some patches on
>> hadoop jira to solve problems.
>> But I'm looking f
currently I'm managing a 64-nodes hadoop 0.19.1 cluster with 100TB data.
and I found 0.19.1 is buggy and I have already applied some patches on
hadoop jira to solve problems.
But I'm looking forward to a more stable release of hadoop.
Do you know when will 0.19.2 be released?
Thanks.
Here I have a job , it contains 2000 map tasks and each map need 1
hour or so (map cannot be splited because its input is a compressed
archive.)
How can I set this job's max concurrent task numbers (map and reduce)
to leave resources for other urgent jobs?
Thanks.
I'm running a job on a data with size 5TB. But currently it reports
there is a checksum error block in the file. Then it cause a map task
failure then the whole job failed.
But the lack of a 64MB block will almost not affect the final result.
So can I ignore some map task failure and continue with
Here is a cluster with 13 machines. And due to the lack of storage
space, we set the replication factor to 1.
but recently we found 2 machines in the cluster are not stable. so I'd
like to exclude them from the cluster.
but I can't simply set the replication factor to 1 and remove them due
to the l
n and deserialization yourself.
>
> -Bryan
>
>
> On Nov 1, 2008, at 8:01 PM, Alex Loddengaard wrote:
>
> Take a look at Thrift:
>> <http://developers.facebook.com/thrift/>
>>
>> Alex
>>
>> On Sat, Nov 1, 2008 at 7:15 PM, Zhou, Yunqing <[EMA
; Alex
>
> On Sat, Nov 1, 2008 at 7:15 PM, Zhou, Yunqing <[EMAIL PROTECTED]> wrote:
>
> > The project I focused on has many modules written in different languages
> > (several modules are hadoop jobs).
> > So I'd like to utilize a common record based data f
TED]> wrote:
> Consider Embeded Database? Berkeley DB, written in C++, and have interface
> for many languages.
>
>
>
>
>
> 在2008-11-02?10:15:22,"Zhou,?Yunqing"?<[EMAIL PROTECTED]>?写道:
> >The?project?I?focused?on?has?many?modules?written?in?different?l
The project I focused on has many modules written in different languages
(several modules are hadoop jobs).
So I'd like to utilize a common record based data file format for data
exchange.
XML is not efficient for appending new records.
SequenceFile seems not having API of other languages except Ja
Recently the tasks on our cluster random failed (both map tasks and reduce
tasks) . When rerun them, they are all ok.
The whole job is a IO-bound job. (250G input and 500G output(map) and
10G(final))
from the jobtracker, I can see the failed job says:
task_200810220830_0004_m_000653_0
tip_2008
should store its blocks. If this is a
> comma-delimited list of directories, then data will be stored in all
> named directories, typically on different devices.
> >
>
> Miles
>
> 2008/10/7 Zhou, Yunqing <[EMAIL PROTECTED]>:
> > Here I have an existing hadoo
Here I have an existing hadoop 0.17.1 cluster. Now I'd like to add a second
disk on every machine.
So can I startup multi datanodes on 1 machine? Or do I have to setup each
machine with soft RAID configured ? (no RAID support on mainboards)
Thanks
gt;
> Then the reducers will collect all values with the same UID, so here is
> what we get:
>
> 1. -> Reducer -> <{}, null>
> 2. -> Reducer -> <{a,b}, null>
> 3. -> Reducer -> <{c,d,e}, null>
> 4. -> Reducer -> <{f}, null&g
gt; Hello,
>
> Does MapReduceBase.close() fit your needs? Take a look at
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/MapReduceBase.html#close()
>
> On Fri, October 3, 2008 11:36 pm, Zhou, Yunqing said:
> > the input is as follows. flag a b flag c d
the input is as follows.
flag
a
b
flag
c
d
e
flag
f
then I used a mapper to first store values and then emit them all when met
with a line contains "flag"
but when the file reached its end, I have no chance to emit the last
record.(in this case ,f)
so how can I detect the mapper's end of its life
I've tried it and it works.
Thank you very much
On Mon, Jul 21, 2008 at 6:33 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:
> then just do what i said --set the number of reducers to zero. this should
> just run the mapper phase
>
> 2008/7/21 Zhou, Yunqing <[EMAIL PR
conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);
>
> Cheers,
> Christian
>
>
> Zhou, Yunqing wrote:
>
>> I only use it to do something in parallel,but the reduce step will cost me
>> additional several days, is it possible to make hadoop do not use a reduce
>> step?
>>
>> Thanks
>>
>>
>>
>
>
I only use it to do something in parallel,but the reduce step will cost me
additional several days, is it possible to make hadoop do not use a reduce
step?
Thanks
Thirded. I'm doing my machine learning experiment on a hadoop cluster and
eagering to acquire more info on it. :-)
2008/5/7, Leon Mergen <[EMAIL PROTECTED]>:
>
> On Tue, May 6, 2008 at 6:59 PM, Cole Flournoy <[EMAIL PROTECTED]>
> wrote:
>
> > Is there anyway we could set up some off site web cam c
19 matches
Mail list logo