date:20110416

Map Result Caching

2011-04-16 Thread DoomUs

I'd like to see if caching Map outputs is worth the time. The idea is that for certain jobs, many of the Map tasks will do the same thing they did last time they were run, for instance a monthly report with vast data, but very little changing data. So every month the job is run, and some % of th

Re: Estimating Time required to compute M/Rjob

2011-04-16 Thread Ted Dunning

Sounds like this paper might help you: Predicting Multiple Performance Metrics for Queries: Better Decisions Enabled by Machine Learning by Ganapathi, Archana, Harumi Kuno, Umeshwar Daval, Janet Wiener, Armando Fox, Michael Jordan, & David Patterson http://radlab.cs.berkeley.edu/publication/187

Re: Estimating Time required to compute M/Rjob

2011-04-16 Thread Stephen Boesch

some additional thoughts about the the 'variables' involved in characterizing the M/R application itself. - the configuration of the cluster for numbers of mappers vs reducers compared to the characteristics (amount of work/procesing) required in each of the map/shuffle/reduce stages

Re: Estimating Time required to compute M/Rjob

2011-04-16 Thread Stephen Boesch

You could consider two scenarios / set of requirements for your estimator: 1. Allow it to 'learn' from certain input data and then project running times of similar (or moderately dissimilar) workloads. So the first steps could be to define a couple of relatively small "control" M/R jo

Re: io.sort.mb based on HDFS block size

2011-04-16 Thread 顾荣

Hi Shrinivas, sry for this late reply. yeah,I can understand what you mean.I also don't mean the io.sort.mb is equal to the block size.The point is that the data in buffer are spilled to HDFS by several times,and each time just spill a little.Before writing to HDFS ,the spilled data will be comb

Re: Hadoop 0.21.0 and eclipse europa compatibility issue

2011-04-16 Thread 顾荣

Hi, You can try hadoop-0.20.2 with Eclipse 3.3,that is recommended. Regards Walker Gu. 2011/4/15 Vandana Kanchanapalli > Hi, > I am facing this problem with eclipse europa version. I am using > hadoop-0.21.0. After adding the hadoop-eclipse plugin to the IDE, I started > to use the map/reduce

Re: I got errors from hdfs about DataStreamer Exceptions.

2011-04-16 Thread 茅旭峰

I double checked the cluster, all of the disk have plenty of free space. I could not put any data into the cluster. The cluster summary shows == Cluster Summary***1987209 files and directories, 1993327 blocks = 3980536 total. Heap Size is 4.2 GB / 5.56 GB (75%) * Configured Capacity : 120.88 TB DF

Re: Estimating Time required to compute M/Rjob

2011-04-16 Thread Sonal Goyal

What is your MR job doing? What is the amount of data it is processing? What kind of a cluster do you have? Would you be able to share some details about what you are trying to do? If you are looking for metrics, you could look at the Terasort run .. Thanks and Regards, Sonal

Re: Question on hadoop installation and setup - Pseudo-distributed mode

2011-04-16 Thread Sonal Goyal

I see a space in fs. default.name after fs and hdfs: //, is that intentional or a typo? Thanks and Regards, Sonal Hadoop ETL and Data Integration Nube Technologies

Question on hadoop installation and setup - Pseudo-distributed mode

2011-04-16 Thread Rajesh Balwani

Need help while setting hadoop on my local machine, I am able to run hadoop in Standalone mode without any problem. Next I was trying to run it in Pseudo-distributed mode but was running into following exception: *Configuration changed for Pseudo-distributed mode* fs. default.name

Estimating Time required to compute M/Rjob

2011-04-16 Thread real great..

Hi, As a part of my final year BE final project I want to estimate the time required by a M/R job given an application and a base file system. Can you folks please help me by posting some thoughts on this issue or posting some links here. -- Regards, R.V.

Map Result Caching

Re: Estimating Time required to compute M/Rjob

Re: Estimating Time required to compute M/Rjob

Re: Estimating Time required to compute M/Rjob

Re: io.sort.mb based on HDFS block size

Re: Hadoop 0.21.0 and eclipse europa compatibility issue

Re: I got errors from hdfs about DataStreamer Exceptions.

Re: Estimating Time required to compute M/Rjob

Re: Question on hadoop installation and setup - Pseudo-distributed mode

Question on hadoop installation and setup - Pseudo-distributed mode

Estimating Time required to compute M/Rjob

11 matches

Site Navigation

Mail list logo

Footer information