Ramya Sunil created MAPREDUCE-5766:
--
Summary: Ping messages from attempts should be moved to DEBUG
Key: MAPREDUCE-5766
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5766
Project: Hadoop Map
+1
Deployed on secure and nonsecure clusters. Ran some hdfs, MR and yarn
tests. Seems good to me.
On Mon, Oct 7, 2013 at 12:21 PM, Alejandro Abdelnur wrote:
> +1
>
> * downloaded source tarball
> * verified MD5
> * verified signature
> * verified CHANGES.txt files, release # and date
> * run 'mv
-1.
Some of the cli and distcp system tests which use hftp:// and webhdfs://
are failing on secure cluster (HDFS-4841 and HDFS-4952/HDFS-4896). This is
a regression and we need to make sure they work before we call a release.
On Wed, Jun 26, 2013 at 1:17 AM, Arun C Murthy wrote:
> Folks,
>
> I'
We have started testing branch-2.1-beta and for most parts the code looks
very stable. We have deployed both secure and non-secure multinode clusters.
We had some minor hiccups with some of our e2e tests breaking due to
additional setsid info being logged by the bin scripts and errors while
buildi
[
https://issues.apache.org/jira/browse/MAPREDUCE-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ramya Sunil resolved MAPREDUCE-858.
---
Resolution: Fixed
This is no longer an issue. JT gracefully shuts down if
+1 for the merge.
As someone who has been testing the code for many months now, both on
singlenode and multinode clusters, I am very confident about the stability
and the quality of the code. I have run several regression tests to verify
distributed cache, streaming, compression, capacity schedule
Type: Bug
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Local logs link redirects to the cluster page and Server metrics opens an empty
page on the RM/JHS homepage. So does the links from nodemanager UI.
--
This message is automatically generated by JIRA
: 0.23.1
Reporter: Ramya Sunil
Priority: Critical
hadoop jar hadoop-mapreduce-test.jar loadgen -outKey org.apache.hadoop.io.Text
-outValue org.apache.hadoop.io.Text
The tasks fail with the following exception:
{noformat}
Error: java.lang.NullPointerException
at
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Priority: Minor
When containers run beyond memory limits, they are killed without logging any
useful message. Diagnostics message reads "Task attemptID failed 0 times" and
the console ou
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Priority: Minor
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more
Type: Bug
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
"mapred job -list" lists only the jobs submitted by the user who ran the
command. This behavior is different from 1.x.
--
This message is automatically generated by JIRA.
If you think i
Versions: 0.23.1
Reporter: Ramya Sunil
Priority: Minor
Currently, there are no checks being made for misconfigured userLimit (such as
negative values/values >100) This can potentially be a problem if the RM comes
up with incorrect userLimit values.
--
This message
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
We ran into this issue while testing on small clusters.
On a 7node cluster with 8G per node, for a queue with absolute capacity 30%,
user limit 100%, maxActiveApplications and
Type: Bug
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Priority: Minor
Occasionally, the capacity of the queue as displayed by "queue -list" has
incorrect values.
For e.g:
yarn.scheduler.capa
Type: Bug
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Fix For: 0.23.1
Currently the history for applications which were terminated/killed/failed
before the AM was launched redirects to a page that does not exist.
--
This message is
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Fix For: 0.23.1
RM attempts to assign containers to killed applications. The applications were
killed when they were inactive and waiting for AM allocation.
--
This message is automatically generated by JIRA.
If you think
: Bug
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Assignee: Arun C Murthy
Fix For: 0.23.1
After a queue addition to capacity scheduler and submission of an application,
root queue utilization and used memory have negative
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Fix For: 0.23.1
Stumbled upon this problem while refreshing queues with incorrect
configuration. The exact scenario was:
1. Added a new queue
: Improvement
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Fix For: 0.23.1
The AM info field on "bin/mapred job -list" currently has a value
:8088/proxy/appID. This info is irrelevant unless it
shows the real information of where
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Priority: Minor
Fix For: 0.23.1
Blacklisted NMs appear in both "Active Nodes" and "Unhealthy nodes" on the RM
UI. This should be fixed.
--
This message is automatically g
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Fix For: 0.23.1
When there are a few blacklisted nodes in the cluster, "bin/mapred job
-list-active-trackers&qu
: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Priority: Minor
Fix For: 0.23.1
Currently, the RM has nodeUpdate logs per NM per second such as the following:
2012-01-27 21:51:32,429 INFO
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Fix For: 0.23.1
Memory Total on the RM UI is not refreshed until an application is launched.
This is a problem when the cluster is started for the first time or when there
are any lost/decommissioned
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Fix For: 0.23.1
Nodemanagers are not automatically shutdown after decommissioning.
MAPREDUCE-2775 does not seem to fix the issue.
--
This message is automatically generated by
nts: pipes
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Fix For: 0.23.1
Pipes job fail with "Hadoop Pipes Exception: Illegal text protocol command"
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your
Type: Bug
Components: client
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Priority: Minor
Fix For: 0.23.1
The URL information to track the job is printed for all the "mapred job"mrv2
commands. This information is redundant and has to be removed.
E
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Fix For: 0.23.1
MR command line options are not supported in bin/hadoop.
{noformat}
bin/hadoop job
Exception in thread "main" java.lang.NoClassDefFoundError: job
Caused by: java.lang.ClassNotFoundException: job
Reporter: Ramya Sunil
Priority: Critical
Fix For: 0.23.0
compile-mapred-test target is failing once again.
Details:
https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Mapreduce-0.23-Build/83/consoleFull
--
This message is automatically generated by JIRA
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
Make provision to report the AM hostname of an application in the RM/JHS UI.
It is difficult to trace back the AM on which an app ran when there are 100+
jobs in history. Digging through the logs is an
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Giridharan Kesavan
Priority: Critical
Fix For: 0.23.0
MR builds are failing due to unresolved dependencies.
[ivy:resolve] :: problems summary ::
[ivy:resolve] WARNINGS
[ivy:resolve
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
There is a lot of redundant information being printed on the console and a not
so intuitive flow of events. We should improve the logging on console during
job execution. More details
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
bin/mapred job -list-blacklisted-trackers currently prints
"getBlacklistedTrackers - Not implemented yet" This is a long pending issue.
Could not find a tracking tic
Type: Improvement
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
The reason for killing a speculated task has to be logged. Currently, a
speculated task is killed with a note of "Container killed by the
ApplicationM
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
When apps fail, the reason for failure is not correctly reflected in the UI.
For one such app failure, the UI reports "Application failed 1 times
due to . Failing the applic
: Bug
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
In secure mode, saw an app failure due to
"org.apache.hadoop.security.token.SecretManager$InvalidToken: token
(HDFS_DELEGATION_TOKEN token for ) can't be foun
Reporter: Ramya Sunil
Fix For: 0.23.0
bin/mapred -list-active-trackers throws NPE in mrV2. Trace in the next comment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https
Reporter: Ramya Sunil
Priority: Blocker
Fix For: 0.23.0
Hadoop mapreduce 0.23 builds are failing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
bin/mapred queue fails with the following exception:
{code}
-bash$ bin/mapred queue
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apa
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
bin/yarn classpath does not display the complete classpath. Below is how the
classpath looks like:
{noformat}
$HADOOP_CONF_DIR:$HADOOP_CONF_DIR::$TOOLS_JAR:$HADOOP_COMMON_HOME
Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
bin/mapred job -list throws exception when mapreduce.framework.name is set to
"yarn"
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA admi
Reporter: Ramya Sunil
Fix For: 0.23.0
Lost nodemanagers fail to join back.
When the NM is lost, RM log reads
{noformat}
INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:
Timed out after 600 secs
INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl
Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
When nodemanagers are lost, the "Lost Nodes" list and the count is not
incremented. Either we,
1. Fix the lost nodes list when a nodemanager is lost - The problem with
tracking lost nodes is, if the nodeman
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Priority: Blocker
A decommissioned node is not being removed from the "Total nodes" list and is
not added to the "Decommissioned nodes" list.
The list of nodes to decommission is added
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Priority: Blocker
The jobsummary logs are not being moved to a separate file. Below is the
configuration in log4j.properties:
{noformat}
mapred.jobsummary.logger=INFO,console
Versions: 0.23.0
Reporter: Ramya Sunil
Compile mapred test target is broken due to which the builds are not archiving
the test jars.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https
Type: Bug
Components: contrib/streaming, mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
ClassNotFoundException: org.apache.hadoop.streaming.PipeMapRunner encountered
while running streaming jobs. Stack trace in the next comment.
--
This message is automatically
, mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
The tracking URL for streaming jobs currently display "http://N/A";
{noformat}
INFO streaming.StreamJob: To kill this job, run:
INFO streaming.StreamJob: hadoop job -kill
INFO streaming
Issue Type: Bug
Components: contrib/streaming
Affects Versions: 0.20.205.0
Reporter: Ramya Sunil
When libjars option is used with streaming, the symlink to the jar file is not
created in the working dir of the task. Any map/reduce tasks which uses this
jar fails with
Type: Bug
Components: contrib/streaming
Affects Versions: 0.20.205.0
Reporter: Ramya Sunil
Fix For: 0.20.205.0
Dfs calls from streaming seem to fail with the following error:
{noformat}
Exception in thread "main" java.lang.ExceptionInInitia
Reporter: Ramya Sunil
Priority: Critical
Mapreduce trunk commit builds are failing due to test failures.
See
https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Mapreduce-trunk-Commit/946/testReport/
for more details.
--
This message is automatically generated by JIRA
[
https://issues.apache.org/jira/browse/MAPREDUCE-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ramya Sunil resolved MAPREDUCE-2763.
Resolution: Fixed
I see this issue being fixed in the latest code base. Hence
: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Priority: Minor
Fix For: 0.23.0
HADOOP_CONF_DIR is exported twice in the classpath during RM, NM and container
startup time. Not an issue so far but seems redundant.
--
This message is automatically
/Reduce
Issue Type: Bug
Components: jobhistoryserver, mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Priority: Minor
Fix For: 0.23.0
The elapsed time filter on the jobhistory server filters incorrect information.
For e.g. on a cluster
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
A simple example pipes job gets stuck without making any progress. The AM is
launched but the maps do not make any progress.
--
This message is automatically generated by JIRA.
For more information on JIRA, see
: resourcemanager
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
The containers info on the nodes page on the RM seems to be missing. This was
useful in understanding the usage on each of the nodemanagers.
--
This message is automatically generated by JIRA.
For more
: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
The node ID info for the nodemanager entires on the RM UI incorrectly displays
the value of $yarn.server.nodemanager.address instead of the ID.
--
This message is automatically generated by JIRA.
For more information on JIRA, see
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
The nodemanager entries on the RM UI is not sortable unlike the other web
pages.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
Currently, if File sink is enabled for MRAppMaster or Resourcemanager, it does
not populate the file with all the available attributes. It would be useful for
debugging and admin purpose to have
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Priority: Minor
Fix For: 0.23.0
The following fields are missing in the job summary logs in mrv2:
- numSlotsPerMap
- numSlotsPerReduce
- clusterCapacity (Earlier known as clusterMapCapacity and
/Reduce
Issue Type: Improvement
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Priority: Minor
Fix For: 0.23.0
For jobID such as job_1312933838300_0007, jobhistory file names are named as
job%5F1312933838300
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
For the child tasks in mrv2, java.library.path is set to just $PWD and the
native libs are not included. Whereas in 0.20.x, java.library.path for child
tasks was set to :$PWD
--
This message is
Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
clockSplits, cpuUsages, vMemKbytes, physMemKbytes is set to -1 for all the map
tasks for the last 4 progress interval in the jobhistory
[
https://issues.apache.org/jira/browse/MAPREDUCE-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ramya Sunil resolved MAPREDUCE-2799.
Resolution: Duplicate
Devaraj, there is already a known bug MAPREDUCE-2686 for the
: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
The start time for all the apps in the output of "job -list" is set to 0
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
AppsKilled metrics is never incremented even though there are killed jobs in
the system.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
AvailableGB per queue is not the same as AvailableGB per queue per user when
the user limit is set to 100%.
i.e. if the total available GB of the cluster is 60
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
appIDs, jobIDs and attempt/container ids are not consistently named in the
logs, console and UI. For consistency purpose, they all have to follow a common
: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
Currently, all the logs, UI, CLI have IP addresses of the NM/RM, which are
difficult to manage. It will be useful to have hostnames like in 0.20.x for
easier debugging and maintenance purpose
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
There are a couple of details missing/incorrect on the job -status command line
output for completed jobs:
1. Incorrect job file
2. map() completion is always 0
3. reduce() completion is
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
bin/mapred job [-list [all]] displays the AM or job history location in the
"SchedulingInfo" field. An additional column has to be added to d
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
The scheduling information such as number of containers running, memory usage
and reservations per job is not available on bin/mapred job -list CLI.
--
This message is automatically generated by JIRA.
For more
: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
A Nodemanager which is decommissioned by an admin via refreshnodes does not
automatically shutdown.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http
Versions: 0.23.0
Reporter: Ramya Sunil
Priority: Minor
Fix For: 0.23.0
Add a startup msg while starting NM/RM indicating the version, build details
etc. This will help in easier parsing of logs and debugging.
--
This message is automatically generated by JIRA
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
The NMs are not being blacklisted via the node health script. Below is the
configuration used:
yarn.server.nodemanager.healthchecker.script.path
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
Currently, the files in both public and private dist cache are having 777
permission. Also, the group ownership of files on private cache have to be set
to $TT_SPECIAL_GROUP
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
IllegalArgumentException is seen while using distributed cache to cache some
files and custom jars in classpath.
A simple way to reproduce this error is by using a streaming job:
hadoop jar
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
The files created under the staging dir have to be deleted after job
completion. Currently, all job.* files remain forever in the
${yarn.apps.stagingDir}
--
This message is automatically
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Priority: Minor
Fix For: 0.23.0
Currently, there is no log info available about the actual location of the
file/archive in dist cache being used by the task except for the "ln" command
in t
Type: Bug
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Priority: Minor
Fix For: 0.23.0
A redundant directory called "file:" is being created under
${yarn.server.nodemanager.local-dir}/usercache/${username}/appcache/appI
: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
Currently the AM logs are written to $YARN_LOG_DIR/appID/containerID/stderr. In
order to maintain consistency with other container logs, it probably should be
moved to syslog.
--
This message is automatically generated by JIRA
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Priority: Minor
Fix For: 0.23.0
Error messages flagging the reason for app failures are currently being moved
to stdout of container logs instead of stderr.
--
This
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
Cluster usage information such as the following are currently not available in
the RM UI.
- Total number of apps submitted so far
- Total number of containers running/total memory usage
- Total capacity of the
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
The progress of the jobs are not being correctly updated on the client side.
The map progress halts at 66% and both map/reduce progress % does not display
100 when the job completes.
--
This message is
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
Currently, the web page for default scheduler reads as "Under construction".
This is a long known issue, but could not find a tracking ticket. Hence opening
one.
--
This message is automatically
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
In cases where an AM is not being assigned to a job, RELEASED at COMPLETED
invalid event is observed. This is easily reproducible in cases such as
MAPREDUCE-2687.
--
This message is
: Bug
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
Apps of non superuser fail to succeed in a non-secure environment. Only the
superuser(i.e. one who started/owns the mrv2 cluster) is able to launch apps
successfully. However
[
https://issues.apache.org/jira/browse/MAPREDUCE-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ramya Sunil resolved MAPREDUCE-1986.
Resolution: Duplicate
This issue is exactly the same as MAPREDUCE-2463. Since there is
: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
While performing job related operations such as job -kill, -status, -events etc
for an unknown job, the following NPE is seen:
Exception in thread "main" java.lang.NullPointerException
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Fix For: 0.23.0
Accessing the following pages from the history server, causes 404 HTTP error
1. Cluster-> About
2. Cluster -> Applications
3. Cluster -> Scheduler
4. Application
89 matches
Mail list logo