I double checked the cluster, all of the disk have plenty of free space.
I could not put any data into the cluster. The cluster summary shows
==
Cluster Summary***1987209 files and directories, 1993327 blocks = 3980536
total. Heap Size is 4.2 GB / 5.56 GB (75%)
*
Configured Capacity : 120.88 TB
Hi,
You can try hadoop-0.20.2 with Eclipse 3.3,that is recommended.
Regards
Walker Gu.
2011/4/15 Vandana Kanchanapalli vandana@gmail.com
Hi,
I am facing this problem with eclipse europa version. I am using
hadoop-0.21.0. After adding the hadoop-eclipse plugin to the IDE, I started
to
Hi Shrinivas,
sry for this late reply.
yeah,I can understand what you mean.I also don't mean the io.sort.mb is
equal to the block size.The point is that the data in buffer are spilled to
HDFS by several times,and each time just spill a little.Before writing to
HDFS ,the spilled data will be
You could consider two scenarios / set of requirements for your estimator:
1. Allow it to 'learn' from certain input data and then project running
times of similar (or moderately dissimilar) workloads. So the first steps
could be to define a couple of relatively small control M/R
some additional thoughts about the the 'variables' involved in
characterizing the M/R application itself.
- the configuration of the cluster for numbers of mappers vs reducers
compared to the characteristics (amount of work/procesing) required in each
of the map/shuffle/reduce stages
Sounds like this paper might help you:
Predicting Multiple Performance Metrics for Queries: Better Decisions
Enabled by Machine Learning by Ganapathi, Archana, Harumi Kuno,
Umeshwar Daval, Janet Wiener, Armando Fox, Michael Jordan, David
Patterson
http://radlab.cs.berkeley.edu/publication/187
I'd like to see if caching Map outputs is worth the time. The idea is that
for certain jobs, many of the Map tasks will do the same thing they did last
time they were run, for instance a monthly report with vast data, but very
little changing data. So every month the job is run, and some % of