to them? Can you give an example?
Many thanks.
Bill
On Mon, Dec 9, 2013 at 3:30 AM, Sandy Ryza sandy.r...@cloudera.comwrote:
YARN currently is unable to handle requests with different resource
requirements at the same priority (YARN-314). Using different priorities
would likely solve
Hi Geelong,
Check out Todd Lipcon's presentation on tuning MapReduce performance:
http://www.slideshare.net/cloudera/mr-perf
-Sandy
On Mon, Dec 2, 2013 at 11:14 PM, Geelong Yao geelong...@gmail.com wrote:
Hi Everyone
I am now testing the best performance of my cluster
Can anyone give me
allowed.
On Tue, Dec 3, 2013 at 4:26 PM, Sandy Ryza sandy.r...@cloudera.comwrote:
Hi Geelong,
Check out Todd Lipcon's presentation on tuning MapReduce performance:
http://www.slideshare.net/cloudera/mr-perf
-Sandy
On Mon, Dec 2, 2013 at 11:14 PM, Geelong Yao geelong...@gmail.comwrote
What scheduler are you using? What do you mean by start? For the first
map task to start?
-Sandy
On Thu, Nov 28, 2013 at 6:07 AM, Juan Martin Pampliega jpampli...@gmail.com
wrote:
Hi,
I have a map-reduce job that was developed for MRV1 and is now being run
daily with no modifications in
/allocations
*发件人:* Sandy Ryza [mailto:sandy.r...@cloudera.com]
*发送时间:* 2013年11月27日 16:33
*收件人:* user@hadoop.apache.org
*主题:* Re: problems of FairScheduler in hadoop2.2.0
Hi,
Can you share the contents of your fair-scheduler.xml? If you submit just
a single job, does it run? What do you
For MapReduce and YARN, we recently published a couple blog posts on
migrating:
http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-users/
http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/
hope that helps,
Sandy
On Fri, Nov 22, 2013 at
Unfortunately, this is not possible in the MR1 fair scheduler without
setting the jobs for individual pools. In MR2, fair scheduler hierarchical
queues will allow setting maxRunningApps at the top of the hierarchy, which
would have the effect you're looking for.
-Sandy
On Tue, Nov 19, 2013 at
());
amRmClient.releaseAssignedContainer(containerId);
}
return amRmClient.allocate(0);
-Gaurav
On 11/13/2013 07:36 PM, Sandy Ryza wrote:
In that case, the AMRMClient code looks correct to me. Can you share the
code you've written against it that's not receiving the correct
on and relax locality set to true without
requesting rack, I don’t get the containers on the required host
What scheduler are you using and what properties are you using to turn the
scheduler delay on?
Thanks
-Gaurav
*From:* Sandy Ryza [mailto:sandy.r...@cloudera.com]
*Sent:* Thursday
by default, set to -1.
/description
/property
-Gaurav
*From:* Sandy Ryza [mailto:sandy.r...@cloudera.com]
*Sent:* Thursday, November 14, 2013 12:41 PM
*To:* user@hadoop.apache.org
*Subject:* Re: Allocating Containers on a particular Node in Yarn
Great to hear. Other
-Gaurav
On 11/13/2013 4:02 PM, Sandy Ryza wrote:
Yeah, specifying a host name with relaxLocality is meaningful. Schedulers
use delay scheduling (
http://www.cs.berkeley.edu/~matei/talks/2010/eurosys_delaysched.pdf) to
achieve locality when relaxLocality is on. But it is turned off by
default
-Gaurav
On 11/13/2013 4:24 PM, Sandy Ryza wrote:
[moving to user list]
Right. relaxLocality needs to be set on the next level up. It
determines whether locality can be relaxed to that level. Confusing, I
know. If you are using AMRMClient, you should be able to accomplish what
you're
-Gaurav
On 11/13/2013 5:04 PM, gaurav wrote:
I have hadoop-2.2.0
Thanks
-Gaurav
On 11/13/2013 4:59 PM, Sandy Ryza wrote:
What version are you using? Setting the relax locality to true for nodes
is always correct. For racks, this is not necessarily the case. When I
look at trunk
that needs to consider HDFS
data-locality. thx.
r.
On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza sandy.r...@cloudera.comwrote:
Hi Ricky,
The input splits contain the locations of the blocks they cover. The AM
gets the information from the input splits and submits requests for those
location
Hi Ricky,
The input splits contain the locations of the blocks they cover. The AM
gets the information from the input splits and submits requests for those
location. Each container request spans all the replicas that the block is
located on. Are you interested in something more specific?
Hi Sam,
Have you tried changing the map or reduce classes and seeing if that has
any effect?
-Sandy
On Fri, Oct 18, 2013 at 8:05 AM, Ravi Prakash ravi...@ymail.com wrote:
Sam, I would guess that the jar file you think is running, is not actually
the one. I am guessing that in the task
Just a clarification: Cloudera Manager is now free for any number of nodes.
Ref:
http://www.cloudera.com/content/cloudera/en/products/cloudera-manager.html
-Sandy
On Fri, Oct 11, 2013 at 7:05 AM, DSuiter RDX dsui...@rdx.com wrote:
Sagar,
It sounds like you want a management console. We are
Hi Andre,
Try setting yarn.scheduler.capacity.node-locality-delay to a number between
0 and 1. This will turn on delay scheduling - here's the doc on how this
works:
For applications that request containers on particular nodes, the number of
scheduling opportunities since the last container
is a
scheduling opportunity, how many are there?). It does not seem to be in the
current documentation
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
2013/10/3 Sandy Ryza sandy.r...@cloudera.com
Hi Andre,
Try setting yarn.scheduler.capacity.node-locality
Hi Himanshu,
Changing the ratio is definitely a reasonable thing to do. The capacities
come from the mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
You can tweak these on your nodes to get your desired ratio.
-Sandy
On Mon, Sep
Average map time includes everything the map task is doing, i.e. all the
things you mentioned. Reduce time does not cover shuffle time. Reduce
time is the time spent calling the reducer function and writing its output
to HDFS. Merge time is related to reduce, not map.
-Sandy
On Tue, Sep 24,
Hi Albert,
You're correct about used.
Reserved is a little bit more arcane - it refers to a mechanism that
schedulers use to prevent applications with larger container sizes from
starving. Applications place container reservations on nodes, and no
other containers can be placed on the node
Hi Mohit,
answers inline
On Fri, Sep 20, 2013 at 1:33 AM, Mohit Anchlia mohitanch...@gmail.comwrote:
I am going through the concepts of resource manager, application master
and node manager. As I undersand resource manager receives the job
submission and launches application master. It also
Hi John,
YARN schedulers handle this with the concept of reservations. Scheduling
decisions occur on node heartbeats. When a node that is full heartbeats,
the next application that should be able to place a container on it gets to
place a reservation on it. Each node has space for a single
That's right that the other 2 apps will end up getting 10 resources each,
but as more resources become released, eventually the cluster will converge
to a fair state. I.e. if the first app requested additional resources
after releasing resources, it would not receive any more until either
another
Moving to cdh-user,
Hi,
The Fair Scheduler in 4.3 is stable and is recommended by Cloudera.
-Sandy
On Aug 22, 2013, at 6:20 PM, ch huang justlo...@gmail.com wrote:
hi,all:
i use cdh4.3 yarn , it's default scheduler is capacity scheduler ,i
want to switch to fair scheduler,but i
Hi Lin,
It might be worth checking out Apache Flume, which was built for highly
parallel ingest into HDFS.
-Sandy
On Thu, Aug 15, 2013 at 11:16 AM, Adam Faris afa...@linkedin.com wrote:
If every device can send it's information as a 'event', you could use a
publish-subscribe messaging
To add to that, if you want to take advantage of MapReduce, e.g. you need
to do a distributed grouping or sort, pipes or streaming would be the way
to go. If you're mainly interested in running your code in parallel on a
cluster, distributed shell, a YARN application outside of MapReduce, could
Hi Pavan,
Configuration properties generally aren't included in the jar itself unless you
explicitly set them in your java code. Rather they're picked up from the
mapred-site.xml file located in the Hadoop configuration directory on the host
you're running your job from.
Is there an issue
Nothing in your pom.xml should affect the configurations your job runs with.
Are you running your job from a node on the cluster? When you say localhost
configurations, do you mean it's using the LocalJobRunner?
-sandy
(iphnoe tpying)
On Aug 13, 2013, at 9:07 AM, Pavan Sudheendra
Hi devdoer,
What version are you using?
-Sandy
On Thu, Aug 8, 2013 at 4:25 AM, devdoer bird devd...@gmail.com wrote:
HI:
I configure the FairScheduler with default settings and my job has 19
reduce tasks. I found that all the reduce tasks are schedule to run in one
node.
While with
is this white list feature is supported with. But am not sure
what is meant by submitting ResourceRequests directly to RM. Can you please
elaborate on this or give me a pointer to some example code on how to do
it...
Thanks for the reply,
-Kishore
On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza
relaxLocality)
that means the old argument containerCount is gone! How would I be able to
specify how many containers do I need?
We now expect that you submit a ContainerRequest for each container you
want.
-Kishore
On Wed, Aug 7, 2013 at 11:37 AM, Sandy Ryza sandy.r
Hi Sam,
LinuxResourceCalculatorPlugin and DominantResourceCalculator control
separate things. The former is for a NodeManager to calculate the resource
usage of a container process so that it can kill it if it gets too large.
The latter is used by the Capacity Scheduler to allocate containers,
Hi Li Yu,
I don't think it has been published yet, but a document on the MapReduce
changes was recently completed at
https://issues.apache.org/jira/browse/MAPREDUCE-5184.
-Sandy
On Thu, Jul 11, 2013 at 4:18 AM, Yu Li car...@gmail.com wrote:
Dear all,
I have some applications used to run on
Hi Andrea,
For copying the full sky map to each node, look up the distributed cache.
It works by placing the sky map file on HDFS and each task will pull it
down when needed. For feeding the input data into Hadoop, what format is
it in currently? One simple way would be to have a text file
LocalDirAllocator should help with this. You can look through MapReduce
code to see how it's used.
-Sandy
On Mon, Jul 1, 2013 at 11:01 PM, Devaraj k devara...@huawei.com wrote:
You can make use of this configuration to do the same.
** **
property
descriptionList of
CPU limits are only enforced if cgroups is turned on. With cgroups on,
they are only limited when there is contention, in which case tasks are
given CPU time in proportion to the number of cores requested for/allocated
to them. Does that make sense?
-Sandy
On Tue, Jul 2, 2013 at 9:50 AM,
cores and simply fight it
out in the OS thread scheduler.
Thanks,
john
** **
*From:* Sandy Ryza [mailto:sandy.r...@cloudera.com]
*Sent:* Tuesday, July 02, 2013 11:56 AM
*To:* user@hadoop.apache.org
*Subject:* Re: Containers and CPU
** **
CPU limits are only enforced
containers per 8-core node?
John
** **
** **
*From:* Sandy Ryza [mailto:sandy.r...@cloudera.com]
*Sent:* Tuesday, July 02, 2013 1:26 PM
*To:* user@hadoop.apache.org
*Subject:* Re: Containers and CPU
** **
Use of cgroups for controlling CPU is off by default, but can
:28 PM, Sandy Ryza sandy.r...@cloudera.comwrote:
Hi Siddhi,
Moving this question to the CDH list.
Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5
help?
Have you tried using the Fair Scheduler?
-Sandy
On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta smehtau
Hi Yuzhang,
Moving this question to the Hadoop user list.
Are you using MapReduce or writing your own YARN application? In
MapReduce, all maps must request the same amount of memory and all reduces
must request the same amount of memory. It would be trivial to do this in
your own YARN
Hey Sam,
Thanks for sharing your results. I'm definitely curious about what's
causing the difference.
A couple observations:
It looks like you've got yarn.nodemanager.resource.memory-mb in there twice
with two different values.
Your max JVM memory of 1000 MB is (dangerously?) close to the
Hi Rahul,
The job history server is currently specific to MapReduce.
-Sandy
On Fri, Jun 7, 2013 at 8:56 AM, Rahul Bhattacharjee rahul.rec@gmail.com
wrote:
Hello,
I was doing some sort of prototyping on top of YARN. I was able to launch
AM and then AM in turn was able to spawn a few
Hi Lin,
This is by no means a comprehensive answer to your question, but I've found
that I'm able to iterate fastest by writing unit tests using MRUnit (
http://mrunit.apache.org/)
-Sandy
On Thu, Jun 6, 2013 at 7:02 PM, Lin Yang lin.yang.ja...@gmail.com wrote:
Hi,dear friends,
I have setup
(and running them from Eclipse)
On Thu, Jun 6, 2013 at 7:21 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
Hi Lin,
This is by no means a comprehensive answer to your question, but I've
found that I'm able to iterate fastest by writing unit tests using MRUnit (
http://mrunit.apache.org
Hi John,
Here's how I deploy/debug Hadoop locally:
To build and tar Hadoop:
mvn clean package -Pdist -Dtar -DskipTests=true
The tar will be located in the project directory under hadoop-dist/target/.
I untar it into my deploy directory.
I then copy these scripts into the same directory:
-reduce-jobs-with-eclipse
looks promising as a Hadoop-in-Eclipse strategy, but it is over a year old
and I’m not sure if it applies to Hadoop 2.0 and YARN.
John
** **
*From:* Sandy Ryza [mailto:sandy.r...@cloudera.com]
*Sent:* Friday, May 31, 2013 12:13 PM
*To:* user
In MR1, the tasktracker serves the mapper files (so that tasks don't have
to stick around taking up resources). In MR2, the shuffle service, which
lives inside the nodemanager, serves them.
-Sandy
On Thu, May 23, 2013 at 10:22 AM, John Lilley john.lil...@redpoint.netwrote:
Ling,
Hi Mehmet,
Are you using MR1 or MR2?
The fair scheduler, present in both versions, but configured slightly
differently, allows you to limit the number of map and reduce tasks in a
queue. The configuration can be updated at runtime by modifying the
scheduler's allocations file. It also has a
Hi John,
You are correct that both 0.23 and 2.0 contain YARN, and that 1.x does not.
The (confusing) reason for this is that the 1.x line descends from the
0.20 line, while the 2.0 line descends from the 0.23 line.
-Sandy
On Thu, May 16, 2013 at 11:46 AM, John Lilley
Hi Raj,
The web UIs are located on different ports than the RPC ports you
specified. If you are using MR1, the HDFS UI is typically located on port
50070, and the MapReduce UI is typically located on port 50030.
-Sandy
On Thu, May 16, 2013 at 2:58 PM, Raj Hadoop hadoop...@yahoo.com wrote:
This shouldn't be asked on the dev lists, so putting mapreduce-dev and
hdfs-dev in the bcc. Have you made sure you're not using the local job
runner? Did you restart the resourcemanager after running the job?
-Sandy
On Thu, May 2, 2013 at 6:31 PM, sam liu samliuhad...@gmail.com wrote:
Can
/hadoop-
mapreduce-examples-2.0.3-alpha.jar pi 2 30'
2013/5/3 Sandy Ryza sandy.r...@cloudera.com
This shouldn't be asked on the dev lists, so putting mapreduce-dev and
hdfs-dev in the bcc. Have you made sure you're not using the local job
runner? Did you restart the resourcemanager after
The yarn-default.xml file in the Hadoop repository contains the default
ports for all of the YARN protocols.
-Sandy
On Mon, Apr 22, 2013 at 8:27 AM, Marcos Luis Ortiz Valmaseda
marcosluis2...@gmail.com wrote:
A great overview of MR2, you can find it in the Cloudera´s blog:
This is great, Keith.
On Wed, Apr 17, 2013 at 12:58 PM, Keith Wiley kwi...@keithwiley.com wrote:
I've posted an article on my website that details precisely how to deploy
Hadoop 2.0 with Yarn on AWS (or least how I did it, whether or not such an
approach will translate to others'
. Huffman
bhuff...@etinternational.com wrote:
I get a container, but not on the node I'm asking for.
Thanks,
Brian
On 04/12/2013 03:01 PM, Sandy Ryza wrote:
What do you mean when you say it doesn't seem to use the code? That
you're not getting containers back?
-Sandy
On Fri, Apr 12
Hi Jerry,
I assume you're providing your own Writable implementation? The Writable
readFields method is given a stream. Are you able to perform you able to
perform your processing while reading the it there?
-Sandy
On Sat, Mar 30, 2013 at 10:52 AM, Jerry Lam chiling...@gmail.com wrote:
Hi
Hi Rahul,
I don't think saving the stream for later use would work - I was just
suggesting that if only some aggregate statistics needed to be calculated,
they could be calculated at read time instead of in the mapper. Nothing
requires a Writable to contain all the data that it reads.
That's a
Hi tmp,
YARN doesn't provide an explicit protocol for doing this. Applications are
expected to have their own mechanism for communication between task
containers, other task containers, and app masters. If you want to see how
this is done in MapReduce, I would suggest looking at the
Hi Bala,
A standard benchmark program for mapreduce is terasort, which is included
in the hadoop examples jar. You can generate data for it using teragen,
which runs a map-only job:
hadoop jar path-to-examples-jar.jar number of records directory to put
them in
and then sort the data using
Hi Kishore,
50010 is the datanode port. Does your lsof indicate that the sockets are in
CLOSE_WAIT? I had come across an issue like this where that was a symptom.
-Sandy
On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri
write2kish...@gmail.com wrote:
Hi,
I am running a date
the
jar command vs mapred job command (looks like the hadoop job command is
deprecated).
Thanks
Kay
On Wed, Mar 13, 2013 at 10:14 AM, Sandy Ryza sandy.r...@cloudera.comwrote:
Hi Kay,
The jar is just executed locally. If the jar fires up a mapreduce job
and sets itself as the job jar
Hi,
Essentially what you want to do is group your data points by their position
in the column, and have each reduce call construct the data for each row
into a row. To have each record that the mapper processes be one of the
columns, you can use TextInputFormat with
Hi Balachandar,
In MapReduce, interpreting input files as key value pairs is accomplished
through InputFormats. Some common InputFormats are TextInputFormat, which
uses lines in a text file as values and their byte offset into the file as
keys, KeyValueTextInputFormat, which interprets the first
Hi Aji,
Oozie is a mature project for managing MapReduce workflows.
http://oozie.apache.org/
-Sandy
On Mon, Mar 4, 2013 at 8:17 AM, Justin Woody justin.wo...@gmail.com wrote:
Aji,
Why don't you just chain the jobs together?
http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining
Hi Paul,
To do this, you need to make your Dog class implement Hadoop's Writable
interface, so that it can be serialized to and deserialized from bytes.
http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/io/Writable.html
The methods you implement would look something like this:
public
)
out.writeInt(12)
the following would be correct
text = in.readUTF();
number = in.readInt();
and this would fail:
number = in.readInt();
text = in.readUTF();
?
2013/2/27 Sandy Ryza sandy.r...@cloudera.com:
Hi Paul,
To do this, you need to make your Dog class implement Hadoop's Writable
Hi Nikhil,
The jobtracker doesn't do any deployment of other daemons. They are
expected to be installed and started on other nodes separately.
If I understand your question more broadly, MR doesn't necessarily run its
map and reduce tasks on the nodes that contain the data. All data is read
A map-only job does not result in the standard shuffle-sort. Map outputs
are written directly to HDFS.
-Sandy
On Fri, Feb 15, 2013 at 12:23 PM, Jay Vyas jayunit...@gmail.com wrote:
Maybe im mistaken about what is meant by map-only. Does a map-only job
still result in standard shuffle-sort ?
Hi Amit,
One way to accomplish this would be to create a custom writable
implementation, TextOrIntWritable, that has fields for both. It could look
something like:
class TextOrIntWritable implements Writable {
private boolean isText;
private Text text;
private IntWritable integer;
void
71 matches
Mail list logo