千家信息网

IP改变引起的Ceph monitor异常及OSD盘崩溃怎么办

发表于:2025-12-01 作者:千家信息网编辑
千家信息网最后更新 2025年12月01日,这篇文章主要介绍了IP改变引起的Ceph monitor异常及OSD盘崩溃怎么办,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章之后大有收获,下面让小编带着大家一起了解一下。公司搬家
千家信息网最后更新 2025年12月01日IP改变引起的Ceph monitor异常及OSD盘崩溃怎么办

这篇文章主要介绍了IP改变引起的Ceph monitor异常及OSD盘崩溃怎么办,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章之后大有收获,下面让小编带着大家一起了解一下。

公司搬家,所有服务器的ip改变。对ceph服务器配置好ip后启动,发现monitor进程启动失败,monitor进程总是试图绑定到以前的ip地址,那当然不可能成功了。开始以为服务器的ip设置有问题,在改变hostname、ceph.conf等方法无果后,逐步分析发现,是monmap中的ip地址还是以前的ip,ceph通过读取monmap来启动monitor进程,所以需要修改monmap。方法如下:

#Add the new monitor locations  # monmaptool --create --add mon0 192.168.32.2:6789 --add osd1 192.168.32.3:6789 \    --add osd2 192.168.32.4:6789 --fsid 61a520db-317b-41f1-9752-30cedc5ffb9a \    --clobber monmap     #Retrieve the monitor map  # ceph mon getmap -o monmap.bin     #Check new contents  # monmaptool --print monmap.bin     #Inject the monmap  # ceph-mon -i mon0 --inject-monmap monmap.bin  # ceph-mon -i osd1 --inject-monmap monmap.bin  # ceph-mon -i osd2 --inject-monmap monmap.bin

再启动monitor,一切正常。

但出现了上一篇文章中描述的一块osd盘挂掉的情况。查了一圈,只搜到ceph的官网上说是ceph的一个bug。无力修复,于是删掉这块osd,再重装:

# service ceph stop osd.4  #不必执行ceph osd crush remove osd.4  # ceph auth del osd.4  # ceph osd rm 4     # umount /cephmp1  # mkfs.xfs -f /dev/sdc  # mount /dev/sdc /cephmp1  #此处执行create无法正常安装osd  # ceph-deploy osd prepare osd2:/cephmp1:/dev/sdf1  # ceph-deploy osd activate osd2:/cephmp1:/dev/sdf1

完成后重启该osd,成功运行。ceph会自动平衡数据,***的状态是:

[root@osd2 ~]# ceph -s      cluster 61a520db-317b-41f1-9752-30cedc5ffb9a      health HEALTH_WARN 9 pgs incomplete; 9 pgs stuck inactive; 9 pgs stuck unclean; 3 requests are blocked > 32 sec       monmap e3: 3 mons at {mon0=192.168.32.2:6789/0,osd1=192.168.32.3:6789/0,osd2=192.168.32.4:6789/0}, election epoch 76, quorum 0,1,2 mon0,osd1,osd2       osdmap e689: 6 osds: 6 up, 6 in       pgmap v189608: 704 pgs, 5 pools, 34983 MB data, 8966 objects              69349 MB used, 11104 GB / 11172 GB avail                   695 active+clean                     9 incomplete

出现了9个pg的incomplete状态。

[root@osd2 ~]# ceph health detail  HEALTH_WARN 9 pgs incomplete; 9 pgs stuck inactive; 9 pgs stuck unclean; 3 requests are blocked > 32 sec; 1 osds have slow requests  pg 5.95 is stuck inactive for 838842.634721, current state incomplete, last acting [1,4]  pg 5.66 is stuck inactive since forever, current state incomplete, last acting [4,0]  pg 5.de is stuck inactive for 808270.105968, current state incomplete, last acting [0,4]  pg 5.f5 is stuck inactive for 496137.708887, current state incomplete, last acting [0,4]  pg 5.11 is stuck inactive since forever, current state incomplete, last acting [4,1]  pg 5.30 is stuck inactive for 507062.828403, current state incomplete, last acting [0,4]  pg 5.bc is stuck inactive since forever, current state incomplete, last acting [4,1]  pg 5.a7 is stuck inactive for 499713.993372, current state incomplete, last acting [1,4]  pg 5.22 is stuck inactive for 496125.831204, current state incomplete, last acting [0,4]  pg 5.95 is stuck unclean for 838842.634796, current state incomplete, last acting [1,4]  pg 5.66 is stuck unclean since forever, current state incomplete, last acting [4,0]  pg 5.de is stuck unclean for 808270.106039, current state incomplete, last acting [0,4]  pg 5.f5 is stuck unclean for 496137.708958, current state incomplete, last acting [0,4]  pg 5.11 is stuck unclean since forever, current state incomplete, last acting [4,1]  pg 5.30 is stuck unclean for 507062.828475, current state incomplete, last acting [0,4]  pg 5.bc is stuck unclean since forever, current state incomplete, last acting [4,1]  pg 5.a7 is stuck unclean for 499713.993443, current state incomplete, last acting [1,4]  pg 5.22 is stuck unclean for 496125.831274, current state incomplete, last acting [0,4]  pg 5.de is incomplete, acting [0,4]  pg 5.bc is incomplete, acting [4,1]  pg 5.a7 is incomplete, acting [1,4]  pg 5.95 is incomplete, acting [1,4]  pg 5.66 is incomplete, acting [4,0]  pg 5.30 is incomplete, acting [0,4]  pg 5.22 is incomplete, acting [0,4]  pg 5.11 is incomplete, acting [4,1]  pg 5.f5 is incomplete, acting [0,4]  2 ops are blocked > 8388.61 sec  1 ops are blocked > 4194.3 sec  2 ops are blocked > 8388.61 sec on osd.0 1 ops are blocked > 4194.3 sec on osd.0 1 osds have slow requests

查了一圈无果。一个有同样遭遇的人的一段话:

I already tried "ceph pg repair 4.77", stop/start OSDs, "ceph osd lost", "ceph pg force_create_pg 4.77".  Most scary thing is "force_create_pg" does not work. At least it should be a way to wipe out a incomplete PG  without destroying a whole pool.

以上方法尝试了一下,都不行。暂时无法解决,感觉有点坑。

PS:常用pg操作

[root@osd2 ~]# ceph pg map 5.de  osdmap e689 pg 5.de (5.de) -> up [0,4] acting [0,4]  [root@osd2 ~]# ceph pg 5.de query  [root@osd2 ~]# ceph pg scrub 5.de  instructing pg 5.de on osd.0 to scrub  [root@osd2 ~]# ceph pg 5.de mark_unfound_lost revert  pg has no unfound objects  #ceph pg dump_stuck stale  #ceph pg dump_stuck inactive  #ceph pg dump_stuck unclean  [root@osd2 ~]# ceph osd lost 1  Error EPERM: are you SURE?  this might mean real, permanent data loss.  pass --yes-i-really-mean-it if you really do.  [root@osd2 ~]#   [root@osd2 ~]# ceph osd lost 4 --yes-i-really-mean-it  osd.4 is not down or doesn't exist  [root@osd2 ~]# service ceph stop osd.4  === osd.4 ===   Stopping Ceph osd.4 on osd2...kill 22287...kill 22287...done  [root@osd2 ~]# ceph osd lost 4 --yes-i-really-mean-it  marked osd lost in epoch 690 [root@osd1 mnt]# ceph pg repair 5.de  instructing pg 5.de on osd.0 to repair  [root@osd1 mnt]# ceph pg repair 5.de  instructing pg 5.de on osd.0 to repair

感谢你能够认真阅读完这篇文章,希望小编分享的"IP改变引起的Ceph monitor异常及OSD盘崩溃怎么办"这篇文章对大家有帮助,同时也希望大家多多支持,关注行业资讯频道,更多相关知识等着你来学习!

篇文章 方法 服务器 进程 服务 怎么办 成功 地址 状态 不行 价值 公司 兴趣 同时 常用 情况 感觉 数据 更多 朋友 数据库的安全要保护哪些东西 数据库安全各自的含义是什么 生产安全数据库录入 数据库的安全性及管理 数据库安全策略包含哪些 海淀数据库安全审计系统 建立农村房屋安全信息数据库 易用的数据库客户端支持安全管理 连接数据库失败ssl安全错误 数据库的锁怎样保障安全 dnf全服务器有哪些 界首app软件开发 如何安装文件服务器资源管理器 戴尔管理服务器是干什么的 宿州迅捷网络技术有限公司 北京访客管理软件开发方案 软件工程和网络安全专业 江苏海航软件开发业务流程 获取 服务器 时间 服务器和 服务器odm 出货量 北京帅弘魏网络技术有限公司 erp数据库挂了怎么解决 全面筑牢网络安全管理 大学软件开发专科必修课程 网络安全分析技术大会 盐城学软件开发 金铲铲之战不用服务器可以联机吗 vs 2010 修改数据库路径 怎样进行用友数据库维护 应用服务器的一般配置 做医疗软件开发需要什么知识 蓟州区电子网络技术不二之选 计算机网络技术简历培训经历 西安第六届国家网络安全宣传周 pb连接数据库显示找不到表 专科生计算机网络技术和建筑工程技术哪个好 基础设施网络技术 小学生有关网络安全的画 sql数据库工程师证书样本
0