Re: When will hadoop 0.19.2 be released?

2009-04-23 Thread Zhou, Yunqing
On Thu, Apr 23, 2009 at 11:06 PM, Zhou, Yunqing wrote: > >> currently I'm managing a 64-nodes hadoop 0.19.1 cluster with 100TB data. >> and I found 0.19.1 is buggy and I have already applied some patches on >> hadoop jira to solve problems. >> But I'm looking f

When will hadoop 0.19.2 be released?

2009-04-23 Thread Zhou, Yunqing
currently I'm managing a 64-nodes hadoop 0.19.1 cluster with 100TB data. and I found 0.19.1 is buggy and I have already applied some patches on hadoop jira to solve problems. But I'm looking forward to a more stable release of hadoop. Do you know when will 0.19.2 be released? Thanks.

How to limit concurrent task numbers of a job.

2009-03-12 Thread Zhou, Yunqing
Here I have a job , it contains 2000 map tasks and each map need 1 hour or so (map cannot be splited because its input is a compressed archive.) How can I set this job's max concurrent task numbers (map and reduce) to leave resources for other urgent jobs? Thanks.

Can I ignore some errors in map step?

2008-12-03 Thread Zhou, Yunqing
I'm running a job on a data with size 5TB. But currently it reports there is a checksum error block in the file. Then it cause a map task failure then the whole job failed. But the lack of a 64MB block will almost not affect the final result. So can I ignore some map task failure and continue with

How to exclude machines from a cluster

2008-11-13 Thread Zhou, Yunqing
Here is a cluster with 13 machines. And due to the lack of storage space, we set the replication factor to 1. but recently we found 2 machines in the cluster are not stable. so I'd like to exclude them from the cluster. but I can't simply set the replication factor to 1 and remove them due to the l

Re: Can anyone recommend me a inter-language data file format?

2008-11-02 Thread Zhou, Yunqing
n and deserialization yourself. > > -Bryan > > > On Nov 1, 2008, at 8:01 PM, Alex Loddengaard wrote: > > Take a look at Thrift: >> <http://developers.facebook.com/thrift/> >> >> Alex >> >> On Sat, Nov 1, 2008 at 7:15 PM, Zhou, Yunqing <[EMA

Re: Can anyone recommend me a inter-language data file format?

2008-11-01 Thread Zhou, Yunqing
; Alex > > On Sat, Nov 1, 2008 at 7:15 PM, Zhou, Yunqing <[EMAIL PROTECTED]> wrote: > > > The project I focused on has many modules written in different languages > > (several modules are hadoop jobs). > > So I'd like to utilize a common record based data f

Re: Can anyone recommend me a inter-language data file format?

2008-11-01 Thread Zhou, Yunqing
TED]> wrote: > Consider Embeded Database? Berkeley DB, written in C++, and have interface > for many languages. > > > > > > 在2008-11-02?10:15:22,"Zhou,?Yunqing"?<[EMAIL PROTECTED]>?写道: > >The?project?I?focused?on?has?many?modules?written?in?different?l

Can anyone recommend me a inter-language data file format?

2008-11-01 Thread Zhou, Yunqing
The project I focused on has many modules written in different languages (several modules are hadoop jobs). So I'd like to utilize a common record based data file format for data exchange. XML is not efficient for appending new records. SequenceFile seems not having API of other languages except Ja

Task Random Fail

2008-10-22 Thread Zhou, Yunqing
Recently the tasks on our cluster random failed (both map tasks and reduce tasks) . When rerun them, they are all ok. The whole job is a IO-bound job. (250G input and 500G output(map) and 10G(final)) from the jobtracker, I can see the failed job says: task_200810220830_0004_m_000653_0 tip_2008

Re: Can I startup 2 datanodes on 1 machine?

2008-10-07 Thread Zhou, Yunqing
should store its blocks. If this is a > comma-delimited list of directories, then data will be stored in all > named directories, typically on different devices. > > > > Miles > > 2008/10/7 Zhou, Yunqing <[EMAIL PROTECTED]>: > > Here I have an existing hadoo

Can I startup 2 datanodes on 1 machine?

2008-10-07 Thread Zhou, Yunqing
Here I have an existing hadoop 0.17.1 cluster. Now I'd like to add a second disk on every machine. So can I startup multi datanodes on 1 machine? Or do I have to setup each machine with soft RAID configured ? (no RAID support on mainboards) Thanks

Re: A question about Mapper

2008-10-04 Thread Zhou, Yunqing
gt; > Then the reducers will collect all values with the same UID, so here is > what we get: > > 1. -> Reducer -> <{}, null> > 2. -> Reducer -> <{a,b}, null> > 3. -> Reducer -> <{c,d,e}, null> > 4. -> Reducer -> <{f}, null&g

Re: A question about Mapper

2008-10-03 Thread Zhou, Yunqing
gt; Hello, > > Does MapReduceBase.close() fit your needs? Take a look at > http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/MapReduceBase.html#close() > > On Fri, October 3, 2008 11:36 pm, Zhou, Yunqing said: > > the input is as follows. flag a b flag c d

A question about Mapper

2008-10-03 Thread Zhou, Yunqing
the input is as follows. flag a b flag c d e flag f then I used a mapper to first store values and then emit them all when met with a line contains "flag" but when the file reached its end, I have no chance to emit the last record.(in this case ,f) so how can I detect the mapper's end of its life

Re: Can a MapReduce task only consist of a Map step?

2008-07-21 Thread Zhou, Yunqing
I've tried it and it works. Thank you very much On Mon, Jul 21, 2008 at 6:33 PM, Miles Osborne <[EMAIL PROTECTED]> wrote: > then just do what i said --set the number of reducers to zero. this should > just run the mapper phase > > 2008/7/21 Zhou, Yunqing <[EMAIL PR

Re: Can a MapReduce task only consist of a Map step?

2008-07-21 Thread Zhou, Yunqing
conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class); > > Cheers, > Christian > > > Zhou, Yunqing wrote: > >> I only use it to do something in parallel,but the reduce step will cost me >> additional several days, is it possible to make hadoop do not use a reduce >> step? >> >> Thanks >> >> >> > >

Can a MapReduce task only consist of a Map step?

2008-07-21 Thread Zhou, Yunqing
I only use it to do something in parallel,but the reduce step will cost me additional several days, is it possible to make hadoop do not use a reduce step? Thanks

Re: Monthly Hadoop user group meetings

2008-05-06 Thread Zhou, Yunqing
Thirded. I'm doing my machine learning experiment on a hadoop cluster and eagering to acquire more info on it. :-) 2008/5/7, Leon Mergen <[EMAIL PROTECTED]>: > > On Tue, May 6, 2008 at 6:59 PM, Cole Flournoy <[EMAIL PROTECTED]> > wrote: > > > Is there anyway we could set up some off site web cam c