Re: AW: Hadoop 2.6.0 - No DataNode to stop

Ulul Mon, 02 Mar 2015 12:53:18 -0800

Hi

The hadoop-daemon.sh script prints the no $command to stop if it doesn'ffind the pid file.You should echo the $pid variable and see if you hava a correct pid filethere.

Ulul


Le 02/03/2015 13:53, Daniel Klinger a écrit :

Thanks for your help. But unfortunatly this didn’t do the job. Here’sthe Shellscript I’ve written to start my cluster (the scripts on theother node only contains the command to start the datanoderespectively the command to start the Nodemanager on the other node(with the right user (hdfs / yarn)):


#!/bin/bash

# StartHDFS-------------------------------------------------------------------------------------------------------------------------


# Start Namenode

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config$HADOOP_CONF_DIR --script hdfs start namenode"


wait

# Start all Datanodes

export HADOOP_SECURE_DN_USER=hdfs

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config$HADOOP_CONF_DIR --script hdfs start datanode"


wait

ssh root@hadoop-data.klinger.local 'bash startDatanode.sh'

wait

# Start Resourcemanager

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config$HADOOP_CONF_DIR start resourcemanager"


wait

# Start Nodemanager on all Nodes

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config$HADOOP_CONF_DIR start nodemanager"


wait

ssh root@hadoop-data.klinger.local 'bash startNodemanager.sh'

wait

# Start Proxyserver

#su - yarn -c "$HADOOP_YARN_HOME/bin/yarn start proxyserver --config$HADOOP_CONF_DIR"


#wait

# Start Historyserver

su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh starthistoryserver --config $HADOOP_CONF_DIR"


wait

This script generates the following output:

starting namenode, logging to/var/log/cluster/hadoop/hadoop-hdfs-namenode-hadoop.klinger.local.out

starting datanode, logging to/var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop.klinger.local.out

starting datanode, logging to/var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop-data.klinger.local.out

starting resourcemanager, logging to/var/log/cluster/yarn/yarn-yarn-resourcemanager-hadoop.klinger.local.out

starting nodemanager, logging to/var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop.klinger.local.out

starting nodemanager, logging to/var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop-data.klinger.local.out

starting historyserver, logging to/var/log/cluster/mapred/mapred-mapred-historyserver-hadoop.klinger.local.out


Following my stopscript and it’s output:

#!/bin/bash

# StopHDFS------------------------------------------------------------------------------------------------


# Stop Namenode

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config$HADOOP_CONF_DIR --script hdfs stop namenode"


# Stop all Datanodes

su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config$HADOOP_CONF_DIR --script hdfs stop datanode"


ssh root@hadoop-data.klinger.local 'bash stopDatanode.sh'

# Stop Resourcemanager

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config$HADOOP_CONF_DIR stop resourcemanager"


#Stop Nodemanager on all Hosts

su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config$HADOOP_CONF_DIR stop nodemanager"


ssh root@hadoop-data.klinger.local 'bash stopNodemanager.sh'

#Stop Proxyserver

#su - yarn -c "$HADOOP_YARN_HOME/bin/yarn stop proxyserver --config$HADOOP_CONF_DIR"


#Stop Historyserver

su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stophistoryserver --config $HADOOP_CONF_DIR"


stopping namenode

no datanode to stop

no datanode to stop

stopping resourcemanager

stopping nodemanager

stopping nodemanager

nodemanager did not stop gracefully after 5 seconds: killing with kill -9

stopping historyserver

Is there may be anything wrong with my commands?

Greets

DK

*Von:*Varun Kumar [mailto:varun....@gmail.com]
*Gesendet:* Montag, 2. März 2015 05:28
*An:* user
*Betreff:* Re: Hadoop 2.6.0 - No DataNode to stop

1.Stop the service

2.Change the permissions for log and pid directory once again to hdfs.

3.Start service with hdfs.

This will resolve the issue

On Sun, Mar 1, 2015 at 6:40 PM, Daniel Klinger <d...@web-computing.de<mailto:d...@web-computing.de>> wrote:


    Thanks for your answer.

    I put the FQDN of the DataNodes in the slaves file on each node
    (one FQDN per line). Here’s the full DataNode log after the start
    (the log of the other DataNode is exactly the same):

    2015-03-02 00:29:41,841 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX
    signal handlers for [TERM, HUP, INT]

    2015-03-02 00:29:42,207 INFO
    org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
    from hadoop-metrics2.properties

    2015-03-02 00:29:42,312 INFO
    org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
    snapshot period at 10 second(s).

    2015-03-02 00:29:42,313 INFO
    org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode
    metrics system started

    2015-03-02 00:29:42,319 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Configured
    hostname is hadoop.klinger.local

    2015-03-02 00:29:42,327 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode
    with maxLockedMemory = 0

    2015-03-02 00:29:42,350 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming
    server at /0.0.0.0:50010 <http://0.0.0.0:50010>

    2015-03-02 00:29:42,357 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing
    bandwith is 1048576 bytes/s

    2015-03-02 00:29:42,358 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads
    for balancing is 5

    2015-03-02 00:29:42,458 INFO org.mortbay.log: Logging to
    org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
    org.mortbay.log.Slf4jLog

    2015-03-02 00:29:42,462 INFO
    org.apache.hadoop.http.HttpRequestLog: Http request log for
    http.requests.datanode is not defined

    2015-03-02 00:29:42,474 INFO org.apache.hadoop.http.HttpServer2:
    Added global filter 'safety'
    (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)

    2015-03-02 00:29:42,476 INFO org.apache.hadoop.http.HttpServer2:
    Added filter static_user_filter
    (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
    to context datanode

    2015-03-02 00:29:42,476 INFO org.apache.hadoop.http.HttpServer2:
    Added filter static_user_filter
    (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
    to context logs

    2015-03-02 00:29:42,476 INFO org.apache.hadoop.http.HttpServer2:
    Added filter static_user_filter
    (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
    to context static

    2015-03-02 00:29:42,494 INFO org.apache.hadoop.http.HttpServer2:
    addJerseyResourcePackage:
    
packageName=org.apache.hadoop.hdfs.server.datanode.web.resources;org.apache.hadoop.hdfs.web.resources,
    pathSpec=/webhdfs/v1/*

    2015-03-02 00:29:42,499 INFO org.mortbay.log: jetty-6.1.26

    2015-03-02 00:29:42,555 WARN org.mortbay.log: Can't reuse
    /tmp/Jetty_0_0_0_0_50075_datanode____hwtdwq, using
    /tmp/Jetty_0_0_0_0_50075_datanode____hwtdwq_3168831075162569402

    2015-03-02 00:29:43,205 INFO org.mortbay.log: Started
    HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50075
    <http://SelectChannelConnectorWithSafeStartup@0.0.0.0:50075>

    2015-03-02 00:29:43,635 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hdfs

    2015-03-02 00:29:43,635 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup =
    supergroup

    2015-03-02 00:29:43,802 INFO
    org.apache.hadoop.ipc.CallQueueManager: Using callQueue class
    java.util.concurrent.LinkedBlockingQueue

    2015-03-02 00:29:43,823 INFO org.apache.hadoop.ipc.Server:
    Starting Socket Reader #1 for port 50020

    2015-03-02 00:29:43,875 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server
    at /0.0.0.0:50020 <http://0.0.0.0:50020>

    2015-03-02 00:29:43,913 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request
    received for nameservices: null

    2015-03-02 00:29:43,953 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Starting
    BPOfferServices for nameservices: <default>

    2015-03-02 00:29:43,973 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
    <registering> (Datanode Uuid unassigned) service to
    hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
    starting to offer service

    2015-03-02 00:29:43,981 INFO org.apache.hadoop.ipc.Server: IPC
    Server Responder: starting

    2015-03-02 00:29:43,982 INFO org.apache.hadoop.ipc.Server: IPC
    Server listener on 50020: starting

    2015-03-02 00:29:44,620 INFO
    org.apache.hadoop.hdfs.server.common.Storage: DataNode version:
    -56 and NameNode layout version: -60

    2015-03-02 00:29:44,641 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Lock on
    /cluster/storage/datanode/in_use.lock acquired by nodename
    1660@hadoop.klinger.local <mailto:1660@hadoop.klinger.local>

    2015-03-02 00:29:44,822 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage
    directories for bpid BP-158097147-10.0.1.148-1424966425688

    2015-03-02 00:29:44,822 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled

    2015-03-02 00:29:44,825 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Restored 0 block
    files from trash.

    2015-03-02 00:29:44,829 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up
    storage:
    
nsid=330980018;bpid=BP-158097147-10.0.1.148-1424966425688;lv=-56;nsInfo=lv=-60;cid=CID-a2c81934-b3ce-44aa-b920-436ee2f0d5a7;nsid=330980018;c=0;bpid=BP-158097147-10.0.1.148-1424966425688;dnuuid=a3b6c890-41ca-4bde-855c-015c67e6e0df

    2015-03-02 00:29:44,996 INFO
    org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
    Added new volume: /cluster/storage/datanode/current

    2015-03-02 00:29:44,998 INFO
    org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
    Added volume - /cluster/storage/datanode/current, StorageType: DISK

    2015-03-02 00:29:45,035 INFO
    org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
    Registered FSDatasetState MBean

    2015-03-02 00:29:45,057 INFO
    org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Periodic
    Directory Tree Verification scan starting at 1425265856057 with
    interval 21600000

    2015-03-02 00:29:45,064 INFO
    org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
    Adding block pool BP-158097147-10.0.1.148-1424966425688

    2015-03-02 00:29:45,071 INFO
    org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
    Scanning block pool BP-158097147-10.0.1.148-1424966425688 on
    volume /cluster/storage/datanode/current...

    2015-03-02 00:29:45,128 INFO
    org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
    Time taken to scan block pool
    BP-158097147-10.0.1.148-1424966425688 on
    /cluster/storage/datanode/current: 56ms

    2015-03-02 00:29:45,128 INFO
    org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
    Total time to scan all replicas for block pool
    BP-158097147-10.0.1.148-1424966425688: 64ms

    2015-03-02 00:29:45,128 INFO
    org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
    Adding replicas to map for block pool
    BP-158097147-10.0.1.148-1424966425688 on volume
    /cluster/storage/datanode/current...

    2015-03-02 00:29:45,129 INFO
    org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
    Time to add replicas to map for block pool
    BP-158097147-10.0.1.148-1424966425688 on volume
    /cluster/storage/datanode/current: 0ms

    2015-03-02 00:29:45,134 INFO
    org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
    Total time to add all replicas to map: 5ms

    2015-03-02 00:29:45,138 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
    BP-158097147-10.0.1.148-1424966425688 (Datanode Uuid null) service
    to hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
    beginning handshake with NN

    2015-03-02 00:29:45,316 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block
    pool BP-158097147-10.0.1.148-1424966425688 (Datanode Uuid null)
    service to hadoop.klinger.local/10.0.1.148:8020
    <http://10.0.1.148:8020> successfully registered with NN

    2015-03-02 00:29:45,316 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
    hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
    using DELETEREPORT_INTERVAL of 300000 msec BLOCKREPORT_INTERVAL of
    21600000msec CACHEREPORT_INTERVAL of 10000msec Initial delay:
    0msec; heartBeatInterval=3000

    2015-03-02 00:29:45,751 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Namenode Block
    pool BP-158097147-10.0.1.148-1424966425688 (Datanode Uuid
    a3b6c890-41ca-4bde-855c-015c67e6e0df) service to
    hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
    trying to claim ACTIVE state with txid=24

    2015-03-02 00:29:45,751 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging
    ACTIVE Namenode Block pool BP-158097147-10.0.1.148-1424966425688
    (Datanode Uuid a3b6c890-41ca-4bde-855c-015c67e6e0df) service to
    hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>

    2015-03-02 00:29:45,883 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Sent 1
    blockreports 0 blocks total. Took 4 msec to generate and 126 msecs
    for RPC and NN processing. Got back commands
    org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@3d528774
    <mailto:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@3d528774>

    2015-03-02 00:29:45,883 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize
    command for block pool BP-158097147-10.0.1.148-1424966425688

    2015-03-02 00:29:45,891 INFO org.apache.hadoop.util.GSet:
    Computing capacity for map BlockMap

    2015-03-02 00:29:45,891 INFO org.apache.hadoop.util.GSet: VM
    type       = 64-bit

    2015-03-02 00:29:45,893 INFO org.apache.hadoop.util.GSet: 0.5% max
    memory 966.7 MB = 4.8 MB

    2015-03-02 00:29:45,893 INFO org.apache.hadoop.util.GSet:
    capacity      = 2^19 = 524288 entries

    2015-03-02 00:29:45,894 INFO
    org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
    Periodic Block Verification Scanner initialized with interval 504
    hours for block pool BP-158097147-10.0.1.148-1424966425688

    2015-03-02 00:29:45,900 INFO
    org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added
    bpid=BP-158097147-10.0.1.148-1424966425688 to blockPoolScannerMap,
    new size=1

    Dfsadmin –report (called as user hdfs on NameNode) generated
    following output. It looks like both DataNodes are available:

    Configured Capacity: 985465716736 (917.79 GB)

    Present Capacity: 929892360192 (866.03 GB)

    DFS Remaining: 929892302848 (866.03 GB)

    DFS Used: 57344 (56 KB)

    DFS Used%: 0.00%

    Under replicated blocks: 0

    Blocks with corrupt replicas: 0

    Missing blocks: 0

    -------------------------------------------------

    Live datanodes (2):

    Name: 10.0.1.148:50010 <http://10.0.1.148:50010>
    (hadoop.klinger.local)

    Hostname: hadoop.klinger.local

    Decommission Status : Normal

    Configured Capacity: 492732858368 (458.89 GB)

    DFS Used: 28672 (28 KB)

    Non DFS Used: 27942051840 (26.02 GB)

    DFS Remaining: 464790777856 (432.87 GB)

    DFS Used%: 0.00%

    DFS Remaining%: 94.33%

    Configured Cache Capacity: 0 (0 B)

    Cache Used: 0 (0 B)

    Cache Remaining: 0 (0 B)

    Cache Used%: 100.00%

    Cache Remaining%: 0.00%

    Xceivers: 1

    Last contact: Mon Mar 02 00:38:00 CET 2015

    Name: 10.0.1.89:50010 <http://10.0.1.89:50010>
    (hadoop-data.klinger.local)

    Hostname: hadoop-data.klinger.local

    Decommission Status : Normal

    Configured Capacity: 492732858368 (458.89 GB)

    DFS Used: 28672 (28 KB)

    Non DFS Used: 27631304704 (25.73 GB)

    DFS Remaining: 465101524992 (433.16 GB)

    DFS Used%: 0.00%

    DFS Remaining%: 94.39%

    Configured Cache Capacity: 0 (0 B)

    Cache Used: 0 (0 B)

    Cache Remaining: 0 (0 B)

    Cache Used%: 100.00%

    Cache Remaining%: 0.00%

    Xceivers: 1

    Last contact: Mon Mar 02 00:37:59 CET 2015

    Any further thoughts?

    Greets

    DK

    *Von:*Ulul [mailto:had...@ulul.org <mailto:had...@ulul.org>]
    *Gesendet:* Sonntag, 1. März 2015 13:12


    *An:* user@hadoop.apache.org <mailto:user@hadoop.apache.org>
    *Betreff:* Re: Hadoop 2.6.0 - No DataNode to stop

    Hi

    Did you check your slaves file is correct ?
    That the datanode process is actually running ?
    Did you check its log file ?
    That the datanode is available ? (dfsadmin -report, through the WUI)

    We need more detail

    Ulul

    Le 28/02/2015 22:05, Daniel Klinger a écrit :

        Thanks but i know how to kill a process in Linux. But this didn’t 
answer the question why the command say no Datanode to stop instead of stopping 
the Datanode:

        $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script 
hdfs stop datanode

*Von:*Surbhi Gupta [mailto:surbhi.gupt...@gmail.com]
*Gesendet:* Samstag, 28. Februar 2015 20:16
*An:* user@hadoop.apache.org <mailto:user@hadoop.apache.org>
*Betreff:* Re: Hadoop 2.6.0 - No DataNode to stop

Issue jps and get the process id or
Try to get the process id of datanode.

Issue ps-fu userid of the user through which datanode is running.

Then kill the process using kill -9

On 28 Feb 2015 09:38, "Daniel Klinger" <d...@web-computing.de
<mailto:d...@web-computing.de>> wrote:

Hello,

I used a lot of Hadoop-Distributions. Now I’m trying to
install a pure Hadoop on a little „cluster“ for testing (2
CentOS-VMs: 1 Name+DataNode 1 DataNode). I followed the
instructions on the Documentation site:

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html.

I’m starting the Cluster like it is described in the
Chapter „Operating the Hadoop Cluster“(with different
users). The starting process works great. The PID-Files
are created in /var/run and u can see that Folders and
Files are created in the Data- and NameNode folders. I’m
getting no errors in the log-files.

When I try to stop the cluster all Services are stopped
(NameNode, ResourceManager etc.). But when I stop the
DataNodes I’m getting the message: „No DataNode to stop“.
The PID-File and the in_use.lock-File are still there and
if I try to start the DataNode again I’m getting the error
that the Process is already running. When I stop the
DataNode as hdfs instead of root the PID and in_use-File
are removed but I’m still getting the message: „No
DataNode to stop“

What I’m doing wrong?

Greets

Regards,

Varun Kumar.P

Re: AW: Hadoop 2.6.0 - No DataNode to stop

Reply via email to