千家信息网

Kubernetes Node Controller怎么启动

发表于:2025-12-01 作者:千家信息网编辑
千家信息网最后更新 2025年12月01日,本篇内容介绍了"Kubernetes Node Controller怎么启动"的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔
千家信息网最后更新 2025年12月01日Kubernetes Node Controller怎么启动

本篇内容介绍了"Kubernetes Node Controller怎么启动"的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!


Node Controller的启动

if ctx.IsControllerEnabled(nodeControllerName) {    // 解析得到Cluster CIDR, # clusterCIDR is CIDR Range for Pods in cluster.        _, clusterCIDR, err := net.ParseCIDR(s.ClusterCIDR)                // 解析得到Service CIDR,# serviceCIDR is CIDR Range for Services in cluster.        _, serviceCIDR, err := net.ParseCIDR(s.ServiceCIDR)                // 创建NodeController实例        nodeController, err := nodecontroller.NewNodeController(                sharedInformers.Core().V1().Pods(),                sharedInformers.Core().V1().Nodes(),                sharedInformers.Extensions().V1beta1().DaemonSets(),                cloud,                clientBuilder.ClientOrDie("node-controller"),                s.PodEvictionTimeout.Duration,                s.NodeEvictionRate,                s.SecondaryNodeEvictionRate,                s.LargeClusterSizeThreshold,                s.UnhealthyZoneThreshold,                s.NodeMonitorGracePeriod.Duration,                s.NodeStartupGracePeriod.Duration,                s.NodeMonitorPeriod.Duration,                clusterCIDR,                serviceCIDR,                int(s.NodeCIDRMaskSize),                s.AllocateNodeCIDRs,                s.EnableTaintManager,                utilfeature.DefaultFeatureGate.Enabled(features.TaintBasedEvictions),        )            // 执行Run方法启动该Controller        nodeController.Run()                // sleep一个随机时间,该时间大小为 "ControllerStartInterval + rand.Float64()*1.0*float64(ControllerStartInterval))",其中ControllerStartInterval可以通过配置kube-controller-manager的"--controller-start-interval"参数指定。        time.Sleep(wait.Jitter(s.ControllerStartInterval.Duration, ControllerStartJitter))}

因此,很清晰地,关键就在以下两步:

  • nodeController, err := nodecontroller.NewNodeController创建NodeController实例。

  • nodeController.Run()执行Run方法启动该Controller。

NodeController的定义

在分析NodeController的原理之前,我们有必要先看看NodeController是如何定义的,其完整的定义如下:

type NodeController struct {        allocateNodeCIDRs bool        cloud             cloudprovider.Interface        clusterCIDR       *net.IPNet        serviceCIDR       *net.IPNet        knownNodeSet      map[string]*v1.Node        kubeClient        clientset.Interface        // Method for easy mocking in unittest.        lookupIP func(host string) ([]net.IP, error)        // Value used if sync_nodes_status=False. NodeController will not proactively        // sync node status in this case, but will monitor node status updated from kubelet. If        // it doesn't receive update for this amount of time, it will start posting "NodeReady==        // ConditionUnknown". The amount of time before which NodeController start evicting pods        // is controlled via flag 'pod-eviction-timeout'.        // Note: be cautious when changing the constant, it must work with nodeStatusUpdateFrequency        // in kubelet. There are several constraints:        // 1. nodeMonitorGracePeriod must be N times more than nodeStatusUpdateFrequency, where        //    N means number of retries allowed for kubelet to post node status. It is pointless        //    to make nodeMonitorGracePeriod be less than nodeStatusUpdateFrequency, since there        //    will only be fresh values from Kubelet at an interval of nodeStatusUpdateFrequency.        //    The constant must be less than podEvictionTimeout.        // 2. nodeMonitorGracePeriod can't be too large for user experience - larger value takes        //    longer for user to see up-to-date node status.        nodeMonitorGracePeriod time.Duration        // Value controlling NodeController monitoring period, i.e. how often does NodeController        // check node status posted from kubelet. This value should be lower than nodeMonitorGracePeriod.        // TODO: Change node status monitor to watch based.        nodeMonitorPeriod time.Duration        // Value used if sync_nodes_status=False, only for node startup. When node        // is just created, e.g. cluster bootstrap or node creation, we give a longer grace period.        nodeStartupGracePeriod time.Duration        // per Node map storing last observed Status together with a local time when it was observed.        // This timestamp is to be used instead of LastProbeTime stored in Condition. We do this        // to aviod the problem with time skew across the cluster.        nodeStatusMap map[string]nodeStatusData        now           func() metav1.Time        // Lock to access evictor workers        evictorLock sync.Mutex        // workers that evicts pods from unresponsive nodes.        zonePodEvictor map[string]*RateLimitedTimedQueue        // workers that are responsible for tainting nodes.        zoneNotReadyOrUnreachableTainer map[string]*RateLimitedTimedQueue        podEvictionTimeout              time.Duration        // The maximum duration before a pod evicted from a node can be forcefully terminated.        maximumGracePeriod time.Duration        recorder           record.EventRecorder        nodeLister         corelisters.NodeLister        nodeInformerSynced cache.InformerSynced        daemonSetStore          extensionslisters.DaemonSetLister        daemonSetInformerSynced cache.InformerSynced        podInformerSynced cache.InformerSynced        // allocate/recycle CIDRs for node if allocateNodeCIDRs == true        cidrAllocator CIDRAllocator        // manages taints        taintManager *NoExecuteTaintManager        forcefullyDeletePod        func(*v1.Pod) error        nodeExistsInCloudProvider  func(types.NodeName) (bool, error)        computeZoneStateFunc       func(nodeConditions []*v1.NodeCondition) (int, zoneState)        enterPartialDisruptionFunc func(nodeNum int) float32        enterFullDisruptionFunc    func(nodeNum int) float32        zoneStates                  map[string]zoneState        evictionLimiterQPS          float32        secondaryEvictionLimiterQPS float32        largeClusterThreshold       int32        unhealthyZoneThreshold      float32        // if set to true NodeController will start TaintManager that will evict Pods from        // tainted nodes, if they're not tolerated.        runTaintManager bool        // if set to true NodeController will taint Nodes with 'TaintNodeNotReady' and 'TaintNodeUnreachable'        // taints instead of evicting Pods itself.        useTaintBasedEvictions bool}

NodeController的行为配置

整个NodeController结构体非常复杂,包含30+项,我们将重点关注:

  • clusterCIDR - 通过--cluster-cidr 来设置,表示CIDR Range for Pods in cluster。

  • serivceCIDR - 通过--service-cluster-ip-range来设置,表示CIDR Range for Services in cluster。

  • knownNodeSet - 用来记录NodeController observed节点的集合。

  • nodeMonitorGracePeriod - 通过--node-monitor-grace-period来设置,默认为40s,表示在标记某个Node为unhealthy前,允许40s内该Node unresponsive。

  • nodeMonitorPeriod - 通过--node-monitor-period 来设置,默认为5s,表示在NodeController中同步NodeStatus的周期。

  • nodeStatusMap - 用来记录每个Node最近一次观察到的Status。

  • zonePodEvictor - workers that evicts pods from unresponsive nodes.

  • zoneNotReadyOrUnreachableTainer - workers that are responsible for tainting nodes.

  • podEvictionTimeout - 通过--pod-eviction-timeout设置,默认为5min,表示在强制删除Pod时,允许的最大的Pod eviction时间。

  • maximumGracePeriod - The maximum duration before a pod evicted from a node can be forcefully terminated. 不可配置,代码中写死为5min。

  • nodeLister - 用来获取Node数据的Interface。

  • daemonSetStore - 用来获取 daemonSet数据的Interface。在通过Eviction方式删除Pods时,会跳过该Node上所有的daemonSet对应的Pods。

  • taintManager - 它是一个NoExecuteTaintManager对象,当runTaintManager(默认true)为true时:

    • PodInformer和NodeInformer将监听到PodAdd,PodDelete,PodUpdate和NodeAdd,NodeDelete,NodeUpdate事件后,

    • 触发TraintManager执行对应的NoExecuteTaintManager.PodUpdatedNoExecuteTaintManager.NodeUpdated方法,

    • 将事件加入到对应的queue(podUpdateQueue and nodeUpdateQueue),TaintController会从这些queue中消费这些消息,

    • TaintController分别调用handlePodUpdate和handleNodeUpdate处理。

    • 具体的TaintController的处理逻辑,后续再单独分析。

  • forcefullyDeletePod - 该方法用来NodeController调用apiserver接口强制删除该Pod。用来删除那些被调度到kubelet version 小于v1.1.0 Node上的Pod,因为kubelet v1.1.0之前的版本不支持graceful termination。

  • computeZoneStateFunc - 该方法返回Zone中NotReadyNodes数量以及该Zone的state。

    • 如果没有一个Ready Node,则该node state为FullDisruption

    • 如果unhealthy Nodes所占的比例大于等于unhealthyZoneThreshold,则该node state为PartialDisruption;

    • 否则该node state就是Narmal

  • enterPartialDisruptionFunc - 该方法用当前node num对比largeClusterThreshold

    • 如果nodeNum > largeClusterThreshold则返回secondaryEvictionLimiterQPS(默认为0.01);

    • 否则返回0,表示停止evict操作。

  • enterFullDisruptionFunc - 用来获取evictionLimiterQPS(默认为0.1)的方法,关于evictionLimiterQPS 的理解见下。

  • zoneStates - 表示各个zone的状态,状态值可以为

    • Initial;

    • Normal;

    • FullDisruption;

    • PartialDisruption;

  • evictionLimiterQPS - 通过--node-eviction-rate设置,默认为0.1,表示当某个Zone status为healthy时,每秒应该剔除的Nodes数量,即每10s剔除1个Node。

  • secondaryEvictionLimiterQPS - 通过--secondary-node-eviction-rate设置,默认为0.01,表示当某个Zone status为unhealthy时,每秒应该剔除的Nodes数量,即每100s剔除1个Node。

  • largeClusterThreshold - 通过--large-cluster-size-threshold设置,默认为50,表示当健康nodes组成的集群规模小于等于50时,secondary-node-eviction-rate将被设置为0。

  • unhealthyZoneThreshold - 通过--unhealthy-zone-threshold设置,默认为0.55,表示当某个Zone中unhealthy Nodes(最少为3)所占的比例达到0.55时,就认为该Zone的状态为unhealthy。

  • runTaintManager - 在--enable-taint-manager中指定,默认为true。如果为true,则表示NodeController将会启动TaintManager,由TaintManager负责将不能容忍该Taint的Nodes上的Pods进行evict操作。

  • useTaintBasedEvictions - 在--feature-gates中指定,默认TaintBasedEvictions=false,仍属于Alpha特性。如果为true,则表示将通过Taint Nodes的方式来Evict Pods。

"Kubernetes Node Controller怎么启动"的内容就介绍到这里了,感谢大家的阅读。如果想了解更多行业相关的知识可以关注网站,小编将为大家输出更多高质量的实用文章!

0