[ 
https://issues.apache.org/jira/browse/HDFS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LINTE updated HDFS-8897:
------------------------
    Summary: Loadbalancer always exits with : java.io.IOException: Another 
Balancer is running..  Exiting ...  (was: Loadbalancer )

> Loadbalancer always exits with : java.io.IOException: Another Balancer is 
> running..  Exiting ...
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8897
>                 URL: https://issues.apache.org/jira/browse/HDFS-8897
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>    Affects Versions: 2.7.1
>         Environment: Centos 6.6
>            Reporter: LINTE
>
> When balancer is launched, it should test if there is already a 
> /system/balancer.id file in HDFS.
> When the file doesn't exist, the balancer don't want to run : 
> 15/08/14 16:35:12 INFO balancer.Balancer: namenodes  = [hdfs://sandbox/, 
> hdfs://sandbox]
> 15/08/14 16:35:12 INFO balancer.Balancer: parameters = 
> Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration 
> = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]
> Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  
> Bytes Being Moved
> 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from 
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
> 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 
> 30mins, 0sec
> 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
> 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from 
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
> 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 
> 30mins, 0sec
> java.io.IOException: Another Balancer is running..  Exiting ...
> Aug 14, 2015 4:35:14 PM  Balancing took 2.408 seconds
> Looking at the audit log file when trying to run the balancer, the balancer 
> create the /system/balancer.id and then delete it on exiting ... 
> 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true   
> ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x       cmd=getfileinfo 
> src=/system/balancer.id dst=null        perm=null       proto=rpc
> 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true   
> ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x       cmd=create      
> src=/system/balancer.id dst=null        perm=hdfs:hadoop:rw-r-----      
> proto=rpc
> 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true   
> ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x       cmd=getfileinfo 
> src=/system/balancer.id dst=null        perm=null       proto=rpc
> 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true   
> ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x       cmd=getfileinfo 
> src=/system/balancer.id dst=null        perm=null       proto=rpc
> 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true   
> ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x       cmd=getfileinfo 
> src=/system/balancer.id dst=null        perm=null       proto=rpc
> 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true   
> ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x       cmd=delete      
> src=/system/balancer.id dst=null        perm=null       proto=rpc
> The error seems to be located in 
> org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java 
> The function checkAndMarkRunning return null even if the /system/balancer.id 
> doesn't exist before entering this function; if it exists, then it is deleted 
> and the balancer exit with the same error.
> ----
>   private OutputStream checkAndMarkRunning() throws IOException {
>     try {
>       if (fs.exists(idPath)) {
>         // try appending to it so that it will fail fast if another balancer 
> is
>         // running.
>         IOUtils.closeStream(fs.append(idPath));
>         fs.delete(idPath, true);
>       }
>       final FSDataOutputStream fsout = fs.create(idPath, false);
>       // mark balancer idPath to be deleted during filesystem closure
>       fs.deleteOnExit(idPath);
>       if (write2IdFile) {
>         fsout.writeBytes(InetAddress.getLocalHost().getHostName());
>         fsout.hflush();
>       }
>       return fsout;
>     } catch(RemoteException e) {
>       
> if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){
>         return null;
>       } else {
>         throw e;
>       }
>     }
>   }
> ----
> Regards



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to