Check bin/hadoop script and search -jvm option in there that is getting
passed to datanode launch command. Removing it should get around this issue.
I am not aware of significance of this flag though.
On Oct 16, 2011 12:32 PM, Majid Azimi majid.merk...@gmail.com wrote:
I have tested both
This blog post on YDN website
http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
detailed discussion on different steps involved in Hadoop IO
operations
and opportunities for optimizations. Could someone please comment on current
state of these potential
We are using Oracle JDK 6 update 26 and have not observed any problems so
far. EA of JDK 6 update 27 is available now. We are planning to move to
update 27 when the GA release is made available.
-Shrinivas
On Jul 18, 2011 7:52 PM, Michael Segel michael_se...@hotmail.com wrote:
Any release after
.
(Basically, I think its to prevent overloading the TT a bit by keeping
the connection open too long for lots of files?)
On Sun, Jun 19, 2011 at 4:01 AM, Shrinivas Joshi jshrini...@gmail.com
wrote:
We see following type of lines in our reducer log files. Based on my
understanding it looks like
We see following type of lines in our reducer log files. Based on my
understanding it looks like the target map host has 53 map outputs that are
ready to be fetched. The shuffle scheduler seems to be allowing only 20 of
them to be fetched at a time. This is controlled by MAX_MAPS_AT_ONCE
variable
speculation though;
there must be another problem causing that.
Matei
On Jun 1, 2011, at 12:42 PM, Shrinivas Joshi wrote:
To find out whether it had any positive performance impact, I am trying
with
turning OFF speculative execution. Surprisingly, the job starts to fail
in
reduce phase
To find out whether it had any positive performance impact, I am trying with
turning OFF speculative execution. Surprisingly, the job starts to fail in
reduce phase with OOM errors when I disable speculative execution for both
map and reduce tasks. Has anybody noticed similar behavior? Is there a
, Shrinivas Joshi jshrini...@gmail.comwrote:
Noticed this on a TeraSort run - map JVM processes do not exit/cease to
exist even after a long while from successful execution of all map tasks.
Resources consumed by these JVM processes do not seem to be relinquished
either and that causes poor performance
Hi Geoffry,
A good answer to your question will probably involve more discussion on the
nature of the workload and the main bottlenecks that you are seeing with it.
My 2 cents:
If your workload is IO intensive, adding more disks and increasing the
amount of physical memory on these systems
Looking at workloads like TeraSort where intermediate map output is
proportional to HDFS block size, I was wondering whether it would be
beneficial to have a mechanism for setting buffer spaces like io.sort.mb to
be a certain factor of HDFS block size? I am sure there are other config
parameters
I would appreciate any inputs on this.
Thanks,
-Shrinivas
On Thu, Mar 31, 2011 at 11:29 AM, Shrinivas Joshi jshrini...@gmail.comwrote:
I am trying TeraSort with Apache 0.21.0 build. io.sort.mb is 360M,
map.sort.spill.percent is 0.8, dfs.blocksize is 256M. I am having some
difficulty
I am trying TeraSort with Apache 0.21.0 build. io.sort.mb is 360M,
map.sort.spill.percent is 0.8, dfs.blocksize is 256M. I am having some
difficulty understanding spill related decisions from the log files. Here
are the relevant log lines:
2011-03-30 13:46:51,591 INFO
It seems like when JVM reuse is enabled map task log data is not getting
written to their corresponding log files; log data from certain map tasks
gets appended to log files corresponding to some other map task.
For example, I have a case here where 8 map JVMs are running simultaneously
and all
Noticed this on a TeraSort run - map JVM processes do not exit/cease to
exist even after a long while from successful execution of all map tasks.
Resources consumed by these JVM processes do not seem to be relinquished
either and that causes poor performance in the rest of the reduce phase
which
Default GridMix2 config is set in such a way that the number of reducer
tasks increases from small to medium to large data set jobs. For example in
case of StreamingSorter it is 15, 170 and 370 in that order. On a single
node setup I noticed that decreasing the number of reducer tasks for
I am not sure about this but you might want to take a look at the GridMix
config file. FWIU, it lets you define the # of jobs for different workloads
and categories.
HTH,
-Shrinivas
On Tue, Feb 22, 2011 at 10:46 AM, David Saile da...@uni-koblenz.de wrote:
Hello everybody,
I am trying to
Which workloads are used for serious benchmarking of Hadoop clusters? Do you
care about any of the following workloads :
TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench,
sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.
Thanks,
-Shrinivas
seemed somewhat interested in disk bandwidth.
That is facilitated by having more than on disk in the box.
On Sat, Feb 12, 2011 at 8:26 AM, Michael Segel
michael_se...@hotmail.com
wrote:
On Tue, Feb 15, 2011 at 11:49 AM, Shrinivas Joshi jshrini...@gmail.com
wrote:
Thanks much to all
Are there any openly published job traces that one can use for cluster
benchmarking using GridMix? We are looking for something that could run on
10 node cluster.
Thanks,
-Shrinivas
Thanks much to all who shared their inputs. This really helps. It would be
nice to have a wiki page collecting all this good information. I will check
with that. We are definitely going with large capacity disks (= 1TB).
-Shrinivas
On Sat, Feb 12, 2011 at 1:22 PM, Ted Dunning
at 12:26 PM, Shrinivas Joshi
jshrini...@gmail.com
wrote:
What would be a good hard drive for a 7 node cluster which is
targeted
to
run a mix of IO and CPU intensive Hadoop workloads? We are looking
for
around 1 TB of storage on each node distributed amongst 4 or 5
disks. So
What would be a good hard drive for a 7 node cluster which is targeted to
run a mix of IO and CPU intensive Hadoop workloads? We are looking for
around 1 TB of storage on each node distributed amongst 4 or 5 disks. So
either 250GB * 4 disks or 160GB * 5 disks. Also it should be less than 100$
each
Hi All,
I just wanted to check if anybody had a comment on this query.
Thanks,
-Shrinivas
On Wed, Jun 16, 2010 at 9:50 PM, Shrinivas Joshi jshrini...@gmail.comwrote:
Sorry if this is repeat email for you, I did send this to common-dev list
as well.
Hello,
I am trying to get profiles
Sorry if this is repeat email for you, I did send this to common-dev list as
well.
Hello,
I am trying to get profiles for a workload running on top of Hadoop 0.20.2
framework. The workload jars and Hadoop jars have been compiled with debug
symbols enabled. I could see local variable tables and
24 matches
Mail list logo