How to select random n records using mapreduce ?

2011-06-27 Thread Jeff Zhang
Hi all, I'd like to select random N records from a large amount of data using hadoop, just wonder how can I archive this ? Currently my idea is that let each mapper task select N / mapper_number records. Does anyone has such experience ? -- Best Regards Jeff Zhang

Re: What is the property for setting the number of tolerated failure task in one job

2011-05-11 Thread Jeff Zhang
lure. > Amar > > > > On 5/10/11 2:02 PM, "Jeff Zhang" wrote: > > > Hi all, > > I just remember there's a property for setting the number of failure task > can been tolerated in one job. Does anyone know what's the property name ? > > -- Best Regards Jeff Zhang

What is the property for setting the number of tolerated failure task in one job

2011-05-10 Thread Jeff Zhang
Hi all, I just remember there's a property for setting the number of failure task can been tolerated in one job. Does anyone know what's the property name ? -- Best Regards Jeff Zhang

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Jeff Zhang
> In my mapper code I need to know the total number of mappers which is the > same as number of input splits. > (I need it for unique int Id generation) > > > Basically Im looking for an analog of context.getNumReduceTasks() but can't > find it. > > > Thanks > > > >> > -- Best Regards Jeff Zhang

Re: Starting a Hadoop job programtically

2010-11-25 Thread Jeff Zhang
erver A). but on Server B, I can't telnet to Server A.(The hadoop server > is running on Server A ) > If I use the netstat -a to check the port. I can't find the 9001 port. > I have no idea why I can't run the job on the other server. If anyone can > give me some suggestion, that's very appreciated. > Thanks > Best Regards > -- > -李平 > -- > -李平 > -- Best Regards Jeff Zhang

Re: Yahoo Open Source Real-Time MapReduce

2010-11-09 Thread Jeff Zhang
en, Yes, "stream process" should be more accurate than "real-time" On Tue, Nov 9, 2010 at 6:36 PM, Bibek Paudel wrote: > On Tue, Nov 9, 2010 at 10:49 AM, Jeff Zhang wrote: >> Not sure whether this has been post on this mail list. But I strongly >> feel to

Yahoo Open Source Real-Time MapReduce

2010-11-09 Thread Jeff Zhang
Not sure whether this has been post on this mail list. But I strongly feel to tell everyone here that "Yahoo Open Source Real-Time MapReduce". See http://s4.io/ for more details. And thanks again for Yahoo's contribution for open source world. -- Best Regards Jeff Zhang

Re: Job without Output files

2010-11-08 Thread Jeff Zhang
My guess is that HBase has version on cells, so inserting multiple-times is OK, not sure my guessing is correct On Mon, Nov 8, 2010 at 8:32 PM, Harsh J wrote: > Hi Jeff, > > On Mon, Nov 8, 2010 at 3:17 PM, Jeff Zhang wrote: >> Hi Harsh, >> >> you point is

Re: Job without Output files

2010-11-08 Thread Jeff Zhang
you handle speculative execution of > tasks (if it is turned on)? > > -- > Harsh J > www.harshj.com > -- Best Regards Jeff Zhang

Re: Job without Output files

2010-11-07 Thread Jeff Zhang
Thanks > -- > Regards > Shuja-ur-Rehman Baig > > > -- Best Regards Jeff Zhang

Re: help with rewriting hadoop java code for new API: RecordReader getPos()

2010-10-27 Thread Jeff Zhang
ed for the type > RecordReader > > Any pointers or help will be highly appreciated. > > Thanks, > Bibek > > [0] > http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/RecordReader.html#getPos%28%29 > [1] http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api > -- Best Regards Jeff Zhang

Re: How to modify task assignment algorithm?

2010-10-07 Thread Jeff Zhang
10:38 AM, Shen LI wrote: > Hi, Thanks you very much for your reply. I want to run my own algorithm for > this part  to see if we can achieve better outcome in specific scenario. So > how can I modify it? > Thanks a lot! > Shen > > On Thu, Oct 7, 2010 at 6:33 PM, Jeff Zhang wr

Re: How to modify task assignment algorithm?

2010-10-07 Thread Jeff Zhang
scheduler) > Big thanks, > Shen -- Best Regards Jeff Zhang

Re: Hdfs Block Size

2010-10-07 Thread Jeff Zhang
block defragmentation etc. ? > > Thanks, > -Rakesh > -- Best Regards Jeff Zhang

Re: Is Hadoop suitable for web site visitor analysis?

2010-07-08 Thread Jeff Zhang
lternative > approach. > > > Any pointers would be greatly appreciated. > > Thanks, > Tim > > > > > > -- Best Regards Jeff Zhang

Re: name of input file which has the key value pair

2010-05-17 Thread Jeff Zhang
o you believe in fate, Neo? > Neo: No. > Morpheus: Why Not? > Neo: Because I don't like the idea that I'm not in control of my life. > > > > > -- Best Regards Jeff Zhang

Re: Configured & PathFilter

2010-04-14 Thread Jeff Zhang
Kris Try use /test-batchEventLog/metrics<http://hadoop-eventlog01.socialmedia.com/test-batchEventLog/metrics> /* Append asterisk. On Wed, Apr 14, 2010 at 7:26 AM, Kris Nuttycombe wrote: > On Wed, Apr 14, 2010 at 2:16 AM, Jeff Zhang wrote: > > Hi Kris, > > > > I a

Re: Configured & PathFilter

2010-04-14 Thread Jeff Zhang
stStatus(SequenceFileInputFormat.java:55) > >> at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241) > >>at > org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) > >>at > org.apach

Re: Configured & PathFilter

2010-04-13 Thread Jeff Zhang
t; >> This indicates that reflection will be used to instantiate the > >> required PathFilter object, and I need to be able to access the > >> minimum and maximum date for a given run. I don't want to have to > >> implement a separate PathFilter class for each set o

Re: Configured & PathFilter

2010-04-12 Thread Jeff Zhang
t; >> This indicates that reflection will be used to instantiate the > >> required PathFilter object, and I need to be able to access the > >> minimum and maximum date for a given run. I don't want to have to > >> implement a separate PathFilter class for each set o

Re: Configured & PathFilter

2010-04-12 Thread Jeff Zhang
have to > hard-code a separate PathFilter instance for each date range I'm > interested in, obviously. If I make my PathFilter extend Configured, > will it do the right thing? > > Thanks! > > Kris > -- Best Regards Jeff Zhang

Re: job.jar

2010-03-15 Thread Jeff Zhang
Is it possible to create a job.jar file in the bash command line? > > > PS: > I've put some posts in the MR mailing list that weren't answered. These > posts can be viewed by other users? > > > Regards > -- > Pedro > -- Best Regards Jeff Zhang

Re: How can I get system environment variable in core-site.xml

2010-03-11 Thread Jeff Zhang
oint, it's a same directory, and only can be locked once. > So that why I can't deploy Hadoop. > > Best Regards > welman Lu > -- Best Regards Jeff Zhang

Re: How can I get system environment variable in core-site.xml

2010-03-11 Thread Jeff Zhang
ame contents inside this $HOME > directory. > > I borrowed these three computers from a big cluster. And I only use ssh to > remote control them. > I am not sure what they did to this cluster, but there really a terrible > for me. > > Regards > welman Lu > -- Best Regards Jeff Zhang

Re: How can I get system environment variable in core-site.xml

2010-03-11 Thread Jeff Zhang
>> Unfortunately, I don't where I can set the codes you mentioned. >> Can you tell me more about that? >> Thanks! >> >> Regards >> welman Lu >> > > -- Best Regards Jeff Zhang

Re: How can I get system environment variable in core-site.xml

2010-03-11 Thread Jeff Zhang
> Regards > welman Lu > -- Best Regards Jeff Zhang

Re: How can I get system environment variable in core-site.xml

2010-03-11 Thread Jeff Zhang
{HOSTNAME}, ${env.hostname}, both of them can't work. > It just return the string of "${HOSTNAME}" and "${env.hostname}" > themselves. > > So can anybody tell me what I should use for get this environment? > Thank you! > > welman Lu > -- Best Regards Jeff Zhang

Re: Using SequenceFiles in Hadoop for an imaging application.

2010-01-19 Thread Jeff Zhang
{ > > JobClient.runJob(conf); > } catch (Exception e) { > e.printStackTrace(); > } > } > > > > Thanks, > > Regards, > > Suhail Rehman > MS by Research in Computer Science > International Institute of Information Technology - Hyderabad > reh...@research.iiit.ac.in > - > http://research.iiit.ac.in/~rehman <http://research.iiit.ac.in/%7Erehman> > -- Best Regards Jeff Zhang

Re: Question about setting the number of mappers.

2010-01-18 Thread Jeff Zhang
b, submitSplitFile); > } > job.set("mapred.job.split.file", submitSplitFile.toString()); > job.setNumMapTasks(maps); > > // Write job file to JobTracker's fs > FSDataOutputStream out = > FileSystem.create(fs, submitJobFile, > new FsPermission(JOB_FILE_PERMISSION)); > > try { > job.writeXml(out); > } finally { > out.close(); >. > > 737,0-1 39% > } > > > *** > > Is there anything I can do to get the number of mappers to be more > flexible? > > > Cheers, > > Teryl > > -- Best Regards Jeff Zhang

Problems on configure FairScheduler

2009-12-10 Thread Jeff Zhang
uster Although I did these work, I can not open the page http:///scheduler Did I miss something ? Thank you for any help. Jeff Zhang

Re: Question regarding wordCount example

2009-10-25 Thread Jeff Zhang
as you can Jeff zhang On Mon, Oct 26, 2009 at 6:35 AM, felix gao wrote: > Hi all, I have some question regarding how to compile a simple hadoop > program. > > setup > Java 1.6 > Ubuntu 9.02 > Hadoop 0.19.2 > > > //below is the mapper class > imp