千家信息网

K8S实践Ⅸ(集群监控)

发表于:2025-12-02 作者:千家信息网编辑
千家信息网最后更新 2025年12月02日,一、PrometheusOperator介绍PrometheusOperator是CoreOS开源的一套用于管理在Kubernetes集群上的Prometheus的控制器,简化在Kubernetes上
千家信息网最后更新 2025年12月02日K8S实践Ⅸ(集群监控)

一、PrometheusOperator介绍

PrometheusOperator是CoreOS开源的一套用于管理在Kubernetes集群上的Prometheus的控制器,简化在Kubernetes上部署、管理和运行Prometheus和Alertmanager集群的操作。

二、部署

1.从官方下载部署文件

# git clone https://github.com/coreos/kube-prometheus.git

2.更改镜像仓库地址

# mkdir prometheus# cp kube-prometheus/manifests/* prometheus/# sed -i 's#k8s.gcr.io#gcr.azk8s.cn/google_containers#g' prometheus/*# sed -i 's#quay.io#quay.azk8s.cn#g' prometheus/*# cat prometheus/* | grep image

3.部署所有资源

# kubectl apply -f prometheus/

4.查看创建的ns和crd

# kubectl get ns |grep monitoringmonitoring        Active   3m30s
# kubectl get crdNAME                                    CREATED ATalertmanagers.monitoring.coreos.com     2019-09-10T09:13:00Zpodmonitors.monitoring.coreos.com       2019-09-10T09:13:00Zprometheuses.monitoring.coreos.com      2019-09-10T09:13:01Zprometheusrules.monitoring.coreos.com   2019-09-10T09:13:02Zservicemonitors.monitoring.coreos.com   2019-09-10T09:13:03Z

5.查看monitoring下所有的pod和svc

# kubectl get pod -n monitoringNAME                                   READY   STATUS    RESTARTS   AGEalertmanager-main-0                    2/2     Running   0          23halertmanager-main-1                    2/2     Running   0          23halertmanager-main-2                    2/2     Running   0          23hgrafana-57bfdd47f8-bhkvv               1/1     Running   0          23hkube-state-metrics-8cf4797dc-7dg4w     4/4     Running   0          23hnode-exporter-446xd                    2/2     Running   0          23hnode-exporter-8sbsf                    2/2     Running   0          23hnode-exporter-dk7qk                    2/2     Running   0          23hnode-exporter-vdsqg                    2/2     Running   0          23hnode-exporter-w7czt                    2/2     Running   0          23hnode-exporter-wx7vj                    2/2     Running   0          23hprometheus-adapter-6b9989ccbd-bcl2h    1/1     Running   0          23hprometheus-k8s-0                       3/3     Running   1          23hprometheus-k8s-1                       3/3     Running   1          23hprometheus-operator-7894d75578-rg2gl   1/1     Running   0          23h# kubectl get svc -n monitoringNAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGEalertmanager-main       NodePort    10.97.155.71            9093:30093/TCP               23halertmanager-operated   ClusterIP   None                    9093/TCP,9094/TCP,9094/UDP   23hgrafana                 NodePort    10.110.28.251           3000:30030/TCP               23hkube-state-metrics      ClusterIP   None                    8443/TCP,9443/TCP            23hnode-exporter           ClusterIP   None                    9100/TCP                     23hprometheus-adapter      ClusterIP   10.111.75.114           443/TCP                      23hprometheus-k8s          NodePort    10.109.3.70             9090:30090/TCP               23hprometheus-operated     ClusterIP   None                    9090/TCP                     23hprometheus-operator     ClusterIP   None                    8080/TCP                     23h

6.更改端口模式为NodePort映射端口

# kubectl edit svc prometheus-k8s -n monitoringservice/prometheus-k8s edited# kubectl edit svc grafana -n monitoringservice/grafana edited# kubectl edit svc alertmanager-main -n monitoringservice/alertmanager-main edited
# kubectl get svc -n monitoring | grep NodePortalertmanager-main       NodePort    10.97.155.71            9093:30093/TCP               21hgrafana                 NodePort    10.110.28.251           3000:30030/TCP               21hprometheus-k8s          NodePort    10.109.3.70             9090:30090/TCP               21h

7.访问测试

三、配置

1.查看prometheus的targets页面

发现kube-controller-manager 和 kube-scheduler 这两个系统组件没有监控到,此处和ServiceMonitor 的定义有关系

# cat prometheus/prometheus-serviceMonitorKubeScheduler.yamlapiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata:  labels:    k8s-app: kube-scheduler  name: kube-scheduler  namespace: monitoringspec:  endpoints:  - interval: 30s    port: http-metrics  jobLabel: k8s-app  namespaceSelector:    matchNames:    - kube-system  selector:    matchLabels:      k8s-app: kube-scheduler

selector.matchLabels在kube-system这个命名空间下面匹配具有k8s-app=kube-scheduler这样的Service,但是系统中没有对应的Service。

2.创建kube-controller-manager 和 kube-scheduler对应的Service

# cat cms-svc.yaml apiVersion: v1kind: Servicemetadata:  namespace: kube-system  name: kube-controller-manager  labels:    k8s-app: kube-controller-managerspec:  selector:    component: kube-controller-manager  ports:  - name: http-metrics    port: 10252    targetPort: 10252    protocol: TCP---apiVersion: v1kind: Servicemetadata:  namespace: kube-system  name: kube-scheduler  labels:    k8s-app: kube-schedulerspec:  selector:    component: kube-scheduler  ports:  - name: http-metrics    port: 10251    targetPort: 10251    protocol: TCP
# kubectl describe pod kube-controller-manager-k8s-master01 -n kube-systemLabels:               component=kube-controller-manager                      tier=control-plane

3.查看kube-controller-manager 和 kube-scheduler是否正常

4.访问Grafana

0