I am using Yarn, and
1 - I want to know the average IO throughput of the HDFS (like know how
fast the datanodes are writing in a disk) so that I can compare beween 2
HDFS intances. The command hdfs dfsadmin -report doesn't give me that.
The HDFS has a command for that?
2 - and there is a similar
http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
On 12/06/2013 09:49, Pedro Sá da Costa wrote:
I am using Yarn, and
1 - I want to know the average IO throughput of the HDFS (like know
how fast the datanodes
I tried the command mapred job list all to get the history of the jobs
completed, but the log doesn't have the time where a jobs started, end,
the number of maps and reduce, and the size of data read and written. Can I
get this info by a shell command?
I am using Yarn.
--
Best regards,
Hi,
You can get all the details for Job using this mapred command
mapred job status Job-ID
For this you need to have Job History Server Running and the same job
history server address configured in the client side.
Thanks Regards
Devaraj K
From: Pedro Sá da Costa
In last 4-5 of day the task tracker on one of my slave machines has gone
down couple of time. It has been working fine from the past 4-5 months
The cluster configuration is
4 machine cluster on AWS
1 m2.xlarge master
3 m2.xlarge slaves
The cluster is dedicated to run hive queries, with the data
Hi Harsh,
What will happen when I specify local host as the required host? Doesn't
the resource manager give me all the containers on the local host? I don't
want to constrain myself to the local host, which might be busy while other
nodes in the cluster have enough resources available for
Rita,
There aren't any specs as far as I know, but in my experience the interface is
stable enough from version to version, with the occasional extra field added
here or there. If you query specifically for the beans you want (e.g.
Rahul-da
I found bz2 pretty slow (although splittable) so I switched to snappy (only
sequence files are splittable but compress-decompress is fast)
Thanks
Sanjay
From: Rahul Bhattacharjee
rahul.rec@gmail.commailto:rahul.rec@gmail.com
Reply-To:
Yeah I too found that quite slow and memory hungry !
Thanks,
Rahul-da
On Wed, Jun 12, 2013 at 11:13 PM, Sanjay Subramanian
sanjay.subraman...@wizecommerce.com wrote:
Rahul-da
I found bz2 pretty slow (although splittable) so I switched to snappy
(only sequence files are splittable but
In reading this link as well as the sailfish report, it strikes me that Hadoop
skipped a potentially significant optimization. Namely, why are multiple
sorted spill files merged into a single output file? Why not have the
auxiliary service merge on the fly, thus avoiding landing them to disk?
Hello,
I'm new to Hadoop.
I have a large quantity of JSON documents with a structure similar to
what is shown below.
{
g : some-group-identifier,
sg: some-subgroup-identifier,
j : some-job-identifier,
page : 23,
... // other fields omitted
Hi folks,
I am trying to install CDH4 using tar ball with MRv1, Not YARN
version(MRv2).
I downloaded two tarballs (mr1-0.20.2+n and hadoop-2.0.0+n) from this
location http://archive.cloudera.com/cdh4/cdh/4/
as per cloudera instruction i found
If you install CDH4 from a tarball, you will
Hi everyone,
We have a pig script scheduled running every 4 hours. Someone accidentally
deleted the pig script(rm). Is there any way to recover the script?
I am guessing Hadoop copy the program to every nodes before running. Just
in case it has any copy in the nodes.
Best regards,
Feng Jiang
Where was the pig script? On HDFS?
How often does your cluster clean up the trash?
(Deleted stuff doesn't get cleaned up when the file is deleted... ) Its a
configurable setting so YMMV
On Jun 12, 2013, at 8:58 PM, feng jiang jiangfut...@gmail.com wrote:
Hi everyone,
We have a pig
I could have sworn there was a thread on this already. (Maybe the HBase list?)
Andrew P. kinda nailed it when he talked about the fact that you had to write
the replication(s).
If you wanted improved performance, why not look at the hybrid drives that have
a small SSD buffer and a spinning
Hi, all,
I was wondering could an application written with hadoop 0.20.3 API run on
a hadoop 1.0.3 cluster?
If not, is there any way to run this application on hadoop 1.0.3 instead of
re-writing all the code??
--
Lin Yang
Hi,
I have a SequenceFile which contains several jpeg images with (image name,
image bytes) as key-value pairs. My objective is to count the no. of images by
grouping them by the source, something like this :
Nikon Coolpix 100
Sony Cybershot 251
N82 100
The MR code is :
package
You're not using the recommended @Override annotations, and are
hitting a classic programming mistake. Your issue is same as this
earlier discussion: http://search-hadoop.com/m/gqA3rAaVQ7 (and the
ones before it).
On Thu, Jun 13, 2013 at 9:52 AM, Omkar Joshi
omkar.jo...@lntinfotech.com wrote:
Ok but that link is broken - can you provide a working one?
Regards,
Omkar Joshi
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Thursday, June 13, 2013 11:01 AM
To: user@hadoop.apache.org
Subject: Re: Reducer not getting called
You're not using the recommended
Hi, Vinod,
Thanks.*
*
2013/6/13 Vinod Kumar Vavilapalli vino...@hortonworks.com
It should mostly work. I just checked our CHANGES.txt file and haven't
seen much incompatibilities introduced between those releases.
But 0.20.3 is pretty old, so only one way to know for sure - compile and
20 matches
Mail list logo