导航：首页 > 服务器 >

hadoop2.4源码分析

发表于：2025-12-01 作者：千家信息网编辑

千家信息网最后更新 2025年12月01日，本篇内容介绍了"hadoop2.4源码分析"的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！ZKFail

千家信息网最后更新 2025年12月01日hadoop2.4源码分析

本篇内容介绍了"hadoop2.4源码分析"的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！

ZKFailoverController是整个HA的协调者。下面我们将分析几个实际的问题。

1.怎么协调选举的？怎么选举出来active的？

2.active宕机后，做了什么事情，如何切换的？

下面，我们来分析第一个问题怎么协调选举的？怎么选举出来active的？

步骤1：参看NameNode源码，可以看出，对于使用HA的NN来说，进入Standby是必须的。升级除外

protected HAState createHAState(StartupOption startOpt) {    if (!haEnabled || startOpt == StartupOption.UPGRADE) {      return ACTIVE_STATE;    } else {      return STANDBY_STATE; //standby状态    }  }

步骤2：此时的HealthMonitor监控NN，发现是HEALTH的状态，会执行：

if (healthy) {     //设置状态，用于通知回调函数        enterState(State.SERVICE_HEALTHY);      }

enterState会通知回调函数，进行处理。对于HEALTH状态的开始执行选举方法。

elector.joinElection(targetToData(localTarget));

通过创建零时节点，来抢占节点，获取Active

createLockNodeAsync();

对于创建节点，会触发ZK的EVENT时间。

对于事件的处理，见源码部分：

public synchronized void processResult(int rc, String path, Object ctx,      String name) {    if (isStaleClient(ctx)) return;    LOG.debug("CreateNode result: " + rc + " for path: " + path        + " connectionState: " + zkConnectionState +        " for " + this);    Code code = Code.get(rc);//为了方便使用，这里自定义了一组状态    if (isSuccess(code)) {//成功返回,成功创建zklocakpath节点      // we successfully created the znode. we are the leader. start monitoring      if (becomeActive()) {//要将本节点上的NN变成active        monitorActiveStatus();//继续监控节点状态      } else {        reJoinElectionAfterFailureToBecomeActive();//失败，继续选举尝试      }      return;    }    if (isNodeExists(code)) {//节点存在，说明已经有active，wait即可      if (createRetryCount == 0) {        // znode exists and we did not retry the operation. so a different        // instance has created it. become standby and monitor lock.        becomeStandby();      }      // if we had retried then the znode could have been created by our first      // attempt to the server (that we lost) and this node exists response is      // for the second attempt. verify this case via ephemeral node owner. this      // will happen on the callback for monitoring the lock.      monitorActiveStatus();//不过努力成为active的动作不能停      return;    }    String errorMessage = "Received create error from Zookeeper. code:"        + code.toString() + " for path " + path;    LOG.debug(errorMessage);    if (shouldRetry(code)) {      if (createRetryCount < maxRetryNum) {        LOG.debug("Retrying createNode createRetryCount: " + createRetryCount);        ++createRetryCount;        createLockNodeAsync();        return;      }      errorMessage = errorMessage          + ". Not retrying further znode create connection errors.";    } else if (isSessionExpired(code)) {      // This isn't fatal - the client Watcher will re-join the election      LOG.warn("Lock acquisition failed because session was lost");      return;    }    fatalError(errorMessage);  }

对于获取Active的机器，调用becomeActive()方法

private synchronized void becomeActive() throws ServiceFailedException {    LOG.info("Trying to make " + localTarget + " active...");    try {      HAServiceProtocolHelper.transitionToActive(localTarget.getProxy(          conf, FailoverController.getRpcTimeoutToNewActive(conf)),          createReqInfo());      String msg = "Successfully transitioned " + localTarget +          " to active state";      LOG.info(msg);      serviceState = HAServiceState.ACTIVE;      recordActiveAttempt(new ActiveAttemptRecord(true, msg));    } catch (Throwable t) {      String msg = "Couldn't make " + localTarget + " active";      LOG.fatal(msg, t);           recordActiveAttempt(new ActiveAttemptRecord(false, msg + "\n" +          StringUtils.stringifyException(t)));      if (t instanceof ServiceFailedException) {        throw (ServiceFailedException)t;      } else {        throw new ServiceFailedException("Couldn't transition to active",            t);      }

通过对RPC进过一系列的调用，最终执行NameNode的

synchronized void transitionToActive()      throws ServiceFailedException, AccessControlException {    namesystem.checkSuperuserPrivilege();    if (!haEnabled) {      throw new ServiceFailedException("HA for namenode is not enabled");    }    state.setState(haContext, ACTIVE_STATE);  }

OVER

2.active宕机后，做了什么事情，如何切换的？

active宕机后或者异常会导致ZK节点的消失或监控状态的UNHEALTH，这些都会导致新一轮的选举，原理同上。

"hadoop2.4源码分析"的内容就介绍到这里了，感谢大家的阅读。如果想了解更多行业相关的知识可以关注网站，小编将为大家输出更多高质量的实用文章！

很赞哦！