Hi Dave,
Looks like your analysis is correct. I have faced similar issue some time
back.
See the discussion link:
http://markmail.org/message/ruev3aa4x5zh2l4w#query:+page:1+mid:33gcdcu3coodkks3+state:results
On sudden restarts, it can lost the OS filesystem edits. Similar thing happened
in ou
Hi Uma,
I think there is minimum performance degration if set
dfs.datanode.synconclose to true.
On Tue, Jul 2, 2013 at 3:31 PM, Uma Maheswara Rao G wrote:
> Hi Dave,
>
> Looks like your analysis is correct. I have faced similar issue some time
> back.
> See the discussion link:
> http://markm
Any one could help answer above questions? Thanks a lot!
2013/7/1 sam liu
> Thanks Pramod and Clark!
>
> 1. What's the relationship of Hadoop 2.x branch and mpich2-yarn project?
> 2. Does Hadoop 2.x branch plan to include MPI implementation? I mentioned
> there is already a JIRA:
> https://issu
Sam,
The fundamental idea of YARN is to split up the two major functionalities of
the JobTracker, resource management and job scheduling/monitoring, into
separate daemons. With YARN, we're going to be able to run multiple
workload, one of them being MapReduce another one being MPI. So mpich2-yarn
Hi,
Which branch of Hadoop has latest Disctp code.
The branch-1 mentions something like distcp2
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/tools/org/apache/hadoop/tools/distcp2/util/DistCpUtils.java
The trunk has no mention of distcp2
http://svn.apache.org/repos/asf/h
The trunk's distcp is by default distcp2. The branch-1 received a
backport of distcp2 recently, so is named differently.
In general we try not to have a new feature introduced in branch-1.
All new features must go to the trunk first, before being back-ported
into maintained release branches.
On T
Hello Harsh,
Thank you very much for your reply.
Regards,
Jagat
On Tue, Jul 2, 2013 at 8:29 PM, Harsh J wrote:
> The trunk's distcp is by default distcp2. The branch-1 received a
> backport of distcp2 recently, so is named differently.
>
> In general we try not to have a new feature introduc
Hi Uma,
Thanks for the pointer. Your case sounds very similar. The main
differences that I see are that in my case it happened on all 3 replicas
and the power failure occurred merely seconds after the blocks were
finalized. So I guess the question is whether HDFS can do anything to
better recov
I wrote a script as below.
Data = LOAD 'part-r-0' AS (session_start_gmt:long)
FilterData = FILTER Data BY session_start_gmt=1369546091667
I get below error
2013-07-01 22:48:06,510 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1200: For input string: "1369546091667"
In detail log it
Replication also has downstream effects: it puts pressure on the available
network bandwidth and disk I/O bandwidth when the cluster is loaded.
john
From: Mohammad Tariq [mailto:donta...@gmail.com]
Sent: Monday, July 01, 2013 6:35 PM
To: user@hadoop.apache.org
Subject: Re: intermediate results fi
Devaraj,
Thanks, this is also good information. But I was really asking if a child
*process* that was spawned by a task can persist, in addition to the data.
john
From: Devaraj k [mailto:devara...@huawei.com]
Sent: Monday, July 01, 2013 11:50 PM
To: user@hadoop.apache.org
Subject: RE: YARN tasks
Hi John!
If your block is going to be replicated to three nodes, then in the default
block placement policy, 2 of them will be on the same rack, and a third one
will be on a different rack. Depending on the network bandwidths available
intra-rack and inter-rack, writing with replication factor=3
Nopes! The node manager kills the entire process tree when the task reports
that it is done. Now if you were able to figure out a way for one of the
children to break out of the process tree, maybe?
However your approach is obviously not recommended. You would be stealing from
the resources tha
I'm sure this has been asked a zillion times, so please just point me to the
JIRA comments: is there a feature underway to allow for re-writing of HDFS file
sections?
Thanks
John
I would like to hear your experiences working with large JSON data sets,
specifically:
1) How large is each JSON document?
2) Do they tend to be a single JSON doc per file, or multiples per file?
3) Do the JSON schemas change over time?
4) Are there interesting public data
I have YARN tasks that benefit from multicore scaling. However, they don't
*always* use more than one core. I would like to allocate containers based
only on memory, and let each task use as many cores as needed, without
allocating exclusive CPU "slots" in the scheduler. For example, on an 8-
Geelong,
1. These files will probably be some standard format like .gz or .bz2 or
.zip. In that case, pick an appropriate InputFormat. See e.g.
http://cotdp.com/2012/07/hadoop-processing-zip-files-in-mapreduce/,
http://stackoverflow.com/questions/14497572/reading-gzipped-file-in-hadoop
Hi all,
I would like some help/direction on implementing a custom join class. I
believe this is the way to address my task at hand, which is given 2
matrices in SequenceFile format, I wish to run operations on all pairs of
rows between them. The rows may not be equal in number. The actual
operatio
I believe this is the default behavior.
By default, only memory limit on resources is enforced.
The capacity scheduler will use DefaultResourceCalculator to compute resource
allocation for containers by default, which also does not take CPU into account.
-Chuan
From: John Lilley [mailto:john.lil
CPU limits are only enforced if cgroups is turned on. With cgroups on,
they are only limited when there is contention, in which case tasks are
given CPU time in proportion to the number of cores requested for/allocated
to them. Does that make sense?
-Sandy
On Tue, Jul 2, 2013 at 9:50 AM, Chuan
Blah blah,
One point you might have missed: multiple tasks cannot all write the same HDFS
file at the same time. So you can't just split an output file into sections
and say "task1 write block1, etc". Typically each task outputs a separate file
and these file-parts are read or merged later.
jo
Blah blah,
Can you build and run the DistributedShell example? If it does not run
correctly this would tend to implicate your configuration. If it run correctly
then your code is suspect.
John
From: blah blah [mailto:tmp5...@gmail.com]
Sent: Tuesday, June 25, 2013 6:09 PM
To: user@hadoop.apac
I don’t know the answer… but if it is possible to make the DNs report a
domain-name instead of an IP quad it may help.
John
From: Robin East [mailto:robin.e...@xense.co.uk]
Sent: Thursday, June 27, 2013 12:18 AM
To: user@hadoop.apache.org
Subject: Re: Exception in createBlockOutputStream - poss
Hadoop is not yet an easy learning curve, so I'd recommend that you start with
Amazon Elastic MapReduce as an experimental platform to start learning.
John
From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bho...@huawei.com]
Sent: Friday, June 28, 2013 7:10 AM
To: user@hadoop.apache.org
Su
HDFS only supports regular writes and append. Random write is not
supported. I do not know of any feature/jira that is underway to support
this feature.
On Tue, Jul 2, 2013 at 9:01 AM, John Lilley wrote:
> I’m sure this has been asked a zillion times, so please just point me to
> the JIRA comme
Thanks, that answers my question. I am trying to explore alternatives to a
YARN auxiliary service, but apparently this isn’t an option.
John
From: Ravi Prakash [mailto:ravi...@ymail.com]
Sent: Tuesday, July 02, 2013 9:55 AM
To: user@hadoop.apache.org
Subject: Re: YARN tasks and child processes
Sandy,
Sorry, I don't completely follow.
When you say "with cgroups on", is that an attribute of the AM, the Scheduler,
or the Site/RM? In other words is it site-wide or something that my
application can control?
With cgroups on, is there still a way to get my desired behavior? I'd really
like
To explain my reasoning, suppose that I have an application that performs some
CPU-intensive calculation, and can scale to multiple cores internally, but it
doesn't need those cores all the time because the CPU-intensive phase is only a
part of the overall computation. I'm not sure I understand
Is there any convention for clients/applications wishing to use temporary file
space in HDFS? For example, my application wants to:
1) Load data into some temporary space in HDFS as an external client
2) Run an AM, which produces HDFS output (also in the temporary space)
3) Read
Use of cgroups for controlling CPU is off by default, but can be turned on
as a nodemanager configuration with
yarn.nodemanager.linux-container-executor.resources-handler.class. So it
is site-wide. If you want tasks to purely fight it out in the OS thread
scheduler, simply don't change from the d
Sandy,
Thanks, I think I understand. So it only makes a difference if cgroups is on
AND the AM requests multiple cores? E.g. if each task wants 4 cores the RM
would only allow two containers per 8-core node?
John
From: Sandy Ryza [mailto:sandy.r...@cloudera.com]
Sent: Tuesday, July 02, 2013 1
Hi
Just a quick short reply (tomorrow is my prototype presentation).
@Omkar Joshi
- RM port 8030 already running when I start my AM
- I'll do the client thread size AM
- Only AM communicates with RM
- RM/NM no exceptions there (as far as I remember will check later [sorry])
Furthermore in fully
I found this:
https://issues.apache.org/jira/browse/HADOOP-5215
Doesn't seem to have attracted much interest.
John
From: Suresh Srinivas [mailto:sur...@hortonworks.com]
Sent: Tuesday, July 02, 2013 1:03 PM
To: hdfs-u...@hadoop.apache.org
Subject: Re: HDFS file section rewrite
HDFS only supports
That's correct.
-Sandy
On Tue, Jul 2, 2013 at 12:28 PM, John Lilley wrote:
> Sandy,
>
> Thanks, I think I understand. So it only makes a difference if cgroups is
> on AND the AM requests multiple cores? E.g. if each task wants 4 cores the
> RM would only allow two containers per 8-core n
Hi John, exactly what I was thinking, however I haven't found a way to do that.
If I ever have time I'll trawl through the code, however I've managed to avoid
the issue by placing both machines inside the firewall.
Regards
Robin
Sent from my iPhone
On 2 Jul 2013, at 19:48, John Lilley wrote:
Hello,
I have a Hadoop 2.0.5 Alpha cluster. When I execute any Hadoop command, I
see the following message.
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
Is it at the lib/native folder? How do I configure the s
Hello,
I have a Hadoop 2.0.5 Alpha cluster. When I execute any Hadoop command, I
see the following message.
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
Is it at the lib/native folder? How do I configure the s
Take a look here: http://search-hadoop.com/m/FXOOOTJruq1
On Tue, Jul 2, 2013 at 3:25 PM, Chui-Hui Chiu wrote:
> Hello,
>
> I have a Hadoop 2.0.5 Alpha cluster. When I execute any Hadoop command, I
> see the following message.
>
> WARN util.NativeCodeLoader: Unable to load native-hadoop library
Hi Dear all,
I just fount it occasionally, maybe all you know that, but I just show here
again.
Yet Another Resource Negotiator—YARN
from:
http://adtmag.com/blogs/watersworks/2012/08/apache-yarn-promotion.aspx
Hi,
With Hadoop 2.0.4-alpha, yarn.nodemanager.resource.cpu-cores does not work
for me:
1. The performance of running same terasort job do not change, even after
increasing or decreasing the value of 'yarn.nodemanager.resource.cpu-cores'
in yarn-site.xml and restart the yarn cluster.
2. Even if I
40 matches
Mail list logo