PostgreSQL 跟踪checkpointer出现死锁
发表于:2025-11-10 作者:千家信息网编辑
千家信息网最后更新 2025年11月10日,gdb跟踪checkpointer进程,出现死锁,Mark一下.跟踪checkpointer进程,查看共享内存中的信(heckpointerShmem->requests)(gdb) p Checkp
千家信息网最后更新 2025年11月10日PostgreSQL 跟踪checkpointer出现死锁
gdb跟踪checkpointer进程,出现死锁,Mark一下.
跟踪checkpointer进程,查看共享内存中的信(heckpointerShmem->requests)
(gdb) p CheckpointerShmem->requests[150] ... $16 = {rnode = {spcNode = 1663, dbNode = 16402, relNode = 26185}, forknum = MAIN_FORKNUM, segno = 0} (gdb) p CheckpointerShmem->requests[200] Cannot access memory at address 0xf9fb18 ... 然后,请求checkpoint的进程报错
testdb=# update t_wal_ckpt set c2 = 'C2#'||substr(c2,4,40); UPDATE 8192 testdb=# checkpoint; 2019-01-07 12:30:32.114 CST [1418] PANIC: stuck spinlock detected at RequestCheckpoint, checkpointer.c:1050 2019-01-07 12:30:32.114 CST [1418] STATEMENT: checkpoint; 2019-01-07 12:30:37.081 CST [1390] PANIC: stuck spinlock detected at FirstCallSinceLastCheckpoint, checkpointer.c:1376 2019-01-07 12:30:38.610 CST [1370] LOG: background writer process (PID 1390) was terminated by signal 6: Aborted 2019-01-07 12:30:38.610 CST [1370] LOG: terminating any other active server processes 2019-01-07 12:30:38.611 CST [1392] WARNING: terminating connection because of crash of another server process 2019-01-07 12:30:38.611 CST [1392] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2019-01-07 12:30:38.611 CST [1392] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2019-01-07 12:30:38.613 CST [1558] WARNING: terminating connection because of crash of another server process 2019-01-07 12:30:38.613 CST [1558] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2019-01-07 12:30:38.613 CST [1558] HINT: In a moment you should be able to reconnect to the database and repeat your command. PANIC: stuck spinlock detected at RequestCheckpoint, checkpointer.c:1050 server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: 2019-01-07 12:30:54.041 CST [1560] FATAL: the database system is in recovery mode Failed. !> 尝试重新连接,发现DB已coredump.
[xdb@localhost ~]$ [xdb@localhost ~]$ psql -d testdb 2019-01-07 14:10:16.114 CST [1629] FATAL: the database system is in recovery mode psql: FATAL: the database system is in recovery mode 执行恢复
[xdb@localhost ~]$ pg_ctl start pg_ctl: another server might be running; trying to start server anyway waiting for server to start....2019-01-07 14:11:50.821 CST [1632] FATAL: lock file "postmaster.pid" already exists 2019-01-07 14:11:50.821 CST [1632] HINT: Is another postmaster (PID 1370) running in data directory "/data/xdb/pg111db"? stopped waiting pg_ctl: could not start server Examine the log output. [xdb@localhost ~]$ find /data/xdb -name postmaster.pid /data/xdb/pg111db/postmaster.pid [xdb@localhost ~]$ rm -rf /data/xdb/pg111db/postmaster.pid [xdb@localhost ~]$ pg_ctl start waiting for server to start....2019-01-07 14:12:44.578 CST [1639] LOG: could not bind IPv6 address "::1": Address already in use [xdb@localhost ~]$ ps -ef|grep postgres xdb 1370 1 0 12:01 pts/0 00:00:02 /appdb/atlasdb/pg11.1/bin/postgres xdb 1389 1370 0 12:01 ? 00:00:00 [postgres] xdb 1641 1332 0 14:12 pts/0 00:00:00 grep --color=auto postgres [xdb@localhost ~]$ kill -9 1370 [xdb@localhost ~]$ pg_ctl start waiting for server to start....2019-01-07 14:13:33.125 CST [1648] LOG: listening on IPv6 address "::1", port 5432 2019-01-07 14:13:33.125 CST [1648] LOG: listening on IPv4 address "127.0.0.1", port 5432 2019-01-07 14:13:33.142 CST [1648] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" .2019-01-07 14:13:34.361 CST [1649] LOG: database system was interrupted; last known up at 2019-01-07 12:26:22 CST 2019-01-07 14:13:34.818 CST [1649] LOG: database system was not properly shut down; automatic recovery in progress 2019-01-07 14:13:34.863 CST [1649] LOG: redo starts at 1/48F9ED08 .2019-01-07 14:13:35.467 CST [1649] LOG: invalid record length at 1/4914FF58: wanted 24, got 0 2019-01-07 14:13:35.467 CST [1649] LOG: redo done at 1/4914FF30 2019-01-07 14:13:35.467 CST [1649] LOG: last completed transaction was at log time 2019-01-07 12:28:37.521542+08 2019-01-07 14:13:35.977 CST [1648] LOG: database system is ready to accept connections done server started 经分析,是因为共享内存结构中的CheckpointerShmem->ckpt_lck导致的.
在跟踪checkpointer进程时,执行
SpinLockRelease(&CheckpointerShmem->ckpt_lck); 释放lock后,不再出现上述问题.
进程
跟踪
内存
死锁
结构
问题
分析
尝试
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
网络安全自查情况项目
网络安全角色模型
网络安全公益广播获奖作品
aspen数据库怎么选择
广州新华互联网科技学校官网
税控盘提示访问数据库错
广东绿力网络技术有限公司地址
山东省瑞希互联网科技有限公司
商务网络安全项目
国际认证软件开发证书
国防网络安全ppt
浙江网络安全课程
服务器 数字进程管理
一万人在线直播服务器配置
想学手机软件开发零基础怎么学
数据库的脱机与联机的基本概念
java中导出表格数据库
无锡智能化软件开发框架
科美c1060服务器怎么进入
网络安全主题班会图片素材
软件开发做项目有多忙
税控盘提示访问数据库错
html5显示数据库数据
jsp数据库插入数据
红色警戒电脑版哪个服务器好
网络安全宣传理念
拼装服务器算原装正品吗
做软件开发月薪怎么样
中国电子信息网络安全产业基地
软件开发不适合