一次oplog损坏事故
发表于:2025-11-09 作者:千家信息网编辑
千家信息网最后更新 2025年11月09日,问题1)程序报警Execution TimeoutCouldn't get a connection within the time limit2)mongod日志Jun 11 21:48:35 mo
千家信息网最后更新 2025年11月09日一次oplog损坏事故
问题
1)程序报警
Execution TimeoutCouldn't get a connection within the time limit
2)mongod日志
Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.122+0800 I NETWORK [initandlisten] connection accepted from 10.0.0.1:24321 #32 (32 connections now open)Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.136+0800 I ACCESS [conn32] Successfully authenticated as principal __system on localJun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.349+0800 I - [rsSync] Assertion: 10334:BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOOJun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.358+0800 I CONTROL [rsSync]Jun 11 21:48:35 mongod mongod: 0x132c032 0x12c9988 0x12b29f8 0x12b2aac 0x9da659 0xae692f 0x106e4dc 0x1066b1e 0x1066d69 0xfd64f5 0xaea0fe 0xaea621 0xebe304 0xf563ae 0xf57c78 0xf4d29b 0x1b5c330 0x7efd30830dc5 0x7efd3055f73dJun 11 21:48:35 mongod mongod: ----- BEGIN BACKTRACE -----Jun 11 21:48:35 mongod mongod: {"backtrace":[{"b":"400000","o":"F2C032","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EC9988","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"EB29F8","s":"_ZN5mongo11msgassertedEiPKc"},{"b":"400000","o":"EB2AAC"},{"b":"400000","o":"5DA659","s":"_ZNK5mongo7BSONObj14_assertInvalidEv"},{"b":"400000","o":"6E692F","s":"_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE"},{"b":"400000","o":"C6E4DC","s":"_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib"},{"b":"400000","o":"C66B1E","s":"_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib"},{"b":"400000","o":"C66D69","s":"_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib"},{"b":"400000","o":"BD64F5","s":"_ZN5mongo11RecordStore13insertRecordsEPNS_16OperationContextEPSt6vectorINS_6RecordESaIS4_EEb"},{"b":"400000","o":"6EA0FE","s":"_ZN5mongo10Collection16_insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_b"},{"b":"400000","o":"6EA621","s":"_ZN5mongo10Collection15insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_bb"},{"b":"400000","o":"ABE304","s":"_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt6vectorINS_7BSONObjESaIS4_EE"},{"b":"400000","o":"B563AE","s":"_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERKNS1_7OpQueueE"},{"b":"400000","o":"B57C78","s":"_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_16StorageInterfaceE"},{"b":"400000","o":"B4D29B","s":"_ZN5mongo4repl13runSyncThreadEv"},{"b":"400000","o":"175C330","s":"execute_native_thread_routine"},{"b":"7EFD30829000","o":"7DC5"},{"b":"7EFD30468000","o":"F773D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.12", "gitVersion" : "ef3e1bc78e997f0d9f22f45aeb1d8e3b6ac14a14", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-514.6.2.el7.x86_64", "version" : "#1 SMP Thu Feb 23 03:04:39 UTC 2017", "machine" : "x86_64" }, "somap" : [ {Jun 11 21:48:35 mongod mongod: "elfType" : 2, "b" : "400000", "buildId" : "BAAC1A970F6D0F06B88D0DE75BF06E4C260939EC" }, { "b" : "7FFE4D008000", "elfType" : 3, "buildId" : "7112DB1E073B211AB0CF8DE793F40F3FF996F5B4" }, { "b" : "7EFD31753000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "90EAF65D9B0EEEB1424241281F7F197451D4317D" }, { "b" : "7EFD31369000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7278C69EE161D98DDD0FA00F92B67AD78C7B7F40" }, { "b" : "7EFD31161000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "82E77ADE22BC9FFF8D3458BD37331E7EDF174C28" }, { "b" : "7EFD30F5D000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "C5F560504E1AF52E29679C3B52FF11121015D6BB" }, { "b" : "7EFD30C5B000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "721C7CC9488EFA25F83B48AF713AB27DBE48EF3E" }, { "b" : "7EFD30A45000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "7EFD30829000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C3DEB1FA27CD0C1C3CC575B944ABACBA0698B0F2" }, { "b" : "7EFD30468000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8B2C421716985B927AA0CAF2A05D0B1F452367F7" }, { "b" : "7EFD319C1000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8F3E366E2DB73C330A3791DEAE31AE9579099B44" }, { "b" : "7EFD3021A000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A2499C359AA179EE23324ED949C0E508E4434F10" }, { "b" : "7EFD2FF33000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "E09A34D9083DC6FEAF7018C09D55631DEEE2836D" }, { "b" : "7EFD2FD2F000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "BF54B7C8932E450769FBBB8B18864D1DD70BBC67" }, { "b" : "7EFD2FAFD000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BF8F00D7CB849ADB0B7A4703BC7B8D66AEE6A49C" }, { "b" : "7EFD2F8E7000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "7EFD2F6D8000", "path" : "/lib64Jun 11 21:48:35 mongod mongod: /libkrb5support.so.0", "elfType" : 3, "buildId" : "1E7A92FDD6FB3871DA97F4BCA2E147E72B6B6E1F" }, { "b" : "7EFD2F4D4000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7EFD2F2BA000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158" }, { "b" : "7EFD2F093000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "76687CA31A406854DF3BCF8D03055656F56E6892" }, { "b" : "7EFD2EE32000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "AE64AA461A26E01F60408013D361749D56DD0AE1" } ] }}Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x132c032]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10logContextEPKc+0x138) [0x12c9988]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo11msgassertedEiPKc+0x88) [0x12b29f8]Jun 11 21:48:35 mongod mongod: mongod(+0xEB2AAC) [0x12b2aac]Jun 11 21:48:35 mongod mongod: mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3B9) [0x9da659]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE+0xBF) [0xae692f]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib+0x46C) [0x106e4dc]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib+0x5E) [0x1066b1e]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib+0xA9) [0x1066d69]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo11RecordStore13insertRecordsEPNS_16OperationContextEPSt6vectorINS_6RecordESaIS4_EEb+0xB5) [0xfd64f5]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection16_insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_b+0x16E) [0xaea0fe]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection15insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_bb+0x1B1) [0xaea621]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt6vectorINS_7BSONObjESaIS4_EE+0x144) [0xebe304]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERKNS1_7OpQueueE+0x98E) [0xf563ae]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_16StorageInterfaceE+0xD08) [0xf57c78]Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl13runSyncThreadEv+0x2BB) [0xf4d29b]Jun 11 21:48:35 mongod mongod: mongod(execute_native_thread_routine+0x20) [0x1b5c330]Jun 11 21:48:35 mongod mongod: libpthread.so.0(+0x7DC5) [0x7efd30830dc5]Jun 11 21:48:35 mongod mongod: libc.so.6(clone+0x6D) [0x7efd3055f73d]Jun 11 21:48:35 mongod mongod: ----- END BACKTRACE -----Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.389+0800 F - [rsSync] terminate() called. An exception is active; attempting to gather more informationJun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.393+0800 F - [rsSync] DBException::toString(): 10334 BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOOJun 11 21:48:35 mongod mongod: Actual exception type: mongo::MsgAssertionExceptionJun 11 21:48:35 mongod mongod: 0x132c032 0x132bb82 0x1b14356 0x1b14383 0xf4d3e0 0x1b5c330 0x7efd30830dc5 0x7efd3055f73dJun 11 21:48:35 mongod mongod: ----- BEGIN BACKTRACE -----一、集群环境
3个mongos、3个mongo config、3个shard(每个分片各三台,一主两从)
二、事故描述
次数1:某次活动,由于并发太多,导致mongod连接数太多,内存消耗太多(每个连接上来会分配一定的内存空间),导致整体query慢,产生大量堆积(应用报超时等错误)。直接重启mongod,重启前未stepDown,其中一个从节点2发现错误。临时剔除节点,打算后续有时间解决(原因未知)
次数2:此时距离上次事故6天,节点2还未修复。同一个分片主节点1意外出现同样的错误(原因未知)此时只有一个主节点,熬夜紧急修复这两个节点
次数3:距离第二次事故两天,同一个分片主节点3又出现了同样的错误(原因未知)此时一阵冷汗,同一个分片轮番坏了一遍,辛亏解决的及时。
三、解决步骤
1、查看报错信息,感觉像是数据块损坏问题。并且此分片不能做更新操作,但是可以正常查询。
2、问题定位,oplog.rs损坏
> db> local db.oplog.rs.find() error: { "$err" : "BSONObj size: 1852142352 (0x1073656E) is invalid. Size must be between 0 and 16793600(16MB) First element: Status: ?type=100", "code" : 10334 }3、重建oplog
1)找到最近的一条oplog记录
> db.oplog.rs.find( { }, { ts: 1, h: 1 } ).sort( {$natural : -1} ).limit(1).next(){"ts" : Timestamp(1497172747, 46), "h" : NumberLong("4489544342319430008")}2)保存记录
> db.temp.save(db.oplog.rs.find( { }, { ts: 1, h: 1 } ).sort( {$natural : -1} ).limit(1).next() )//确认操作,很重要>db.temp.find()3)删除oplog.rs,物理文件不会删除
> db.oplog.rs.drop()
4)建立新的oplog
> db.runCommand( { create: "oplog.rs", capped: true, size: (50 * 1024 * 1024 * 1024) } ){ "ok" : 0, "errmsg" : "not authorized on local to execute command { create: \"oplog.rs\", capped: true, size: 53687091200.0 }", "code" : 13}//很悲催,重建capped类型的表,显示没权限,而oplog.rs必须要是capped类型,可是已经是root最大权限将shardsvr角色更改为configsvr角色,去掉keyfile认证(相当于去掉权限),重启服务,再次执行
configsvr> db.runCommand( { create: "oplog.rs", capped: true, size: (50 * 1024 * 1024 * 1024) } );{ "ok" : 1 }为什么要更改角色呢?
--因为shardsvr角色,去掉keyfile认证,服务启动不了。哭~~~
5)将上次操作的最近一条记录写到oplog中
> db.oplog.rs.save( db.temp.findOne() )
6)确认操作
> db.oplog.rs.find(){"ts" : Timestamp(1497172747, 46), "h" : NumberLong("4489544342319430008")}7)修改配置文件,更改会原来的配置,重启服务。重新加入到集群中。查看延迟状态
rs:SECONDARY> db.printSlaveReplicationInfo();source: mongod122:10000 syncedTo: Mon Jun 12 2017 03:11:41 GMT+0800 (CST) 0 secs (0 hrs) behind the primary source: mongod121:10000 syncedTo: Sun Jun 11 2017 23:52:12 GMT+0800 (CST) 11969 secs (3.32 hrs) behind the primary
待延时追上后,一切归于正常。
至今,还不知道是什么原因导致的该分片频繁发生此类错误,后续如有进展,再更新。
节点
错误
原因
角色
事故
权限
次数
问题
服务
内存
文件
类型
集群
更新
认证
配置
最大
重要
频繁
意外
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
数据库具有什么的基本特点
企业服务器管理助手
电脑服务器解说
甘肃玖贰柒互联网科技
网络安全三维动画
限制数据库条数据类型
数据库管理员最需要哪种能力
软件开发项目管理松弛时间
海康人脸识别机怎么添加到服务器
网络安全的英语小报
相比其他商用关系型数据库产品
游戏软件开发容易学吗
千万级数据库连表查询优化
数据库关系最基本的要求
曙光服务器开启ahci
网络安全提示用户
软件开发项目所属学科
影响服务器带宽
网络安全态势通报
上海归虹网络技术有限公司
新闻周刊网络安全评论
企业网络安全工程师简笔画
数据库原理求闭包
kafka自动删除数据库
浦东新区咨询软件开发活动方案
dell服务器售后哈尔滨
数据库设置环境变量
服务器维护好了不能开机
tplink改打印机服务器
发送和接收邮件的服务器叫什么