PostgreSQL checkpoint中用于刷一个脏page的函数是什么
发表于:2025-11-07 作者:千家信息网编辑
千家信息网最后更新 2025年11月07日,这篇文章主要讲解了"PostgreSQL checkpoint中用于刷一个脏page的函数是什么",文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习"Post
千家信息网最后更新 2025年11月07日PostgreSQL checkpoint中用于刷一个脏page的函数是什么
这篇文章主要讲解了"PostgreSQL checkpoint中用于刷一个脏page的函数是什么",文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习"PostgreSQL checkpoint中用于刷一个脏page的函数是什么"吧!
一、数据结构
宏定义
checkpoints request flag bits,检查点请求标记位定义.
/* * OR-able request flag bits for checkpoints. The "cause" bits are used only * for logging purposes. Note: the flags must be defined so that it's * sensible to OR together request flags arising from different requestors. *//* These directly affect the behavior of CreateCheckPoint and subsidiaries */#define CHECKPOINT_IS_SHUTDOWN 0x0001 /* Checkpoint is for shutdown */#define CHECKPOINT_END_OF_RECOVERY 0x0002 /* Like shutdown checkpoint, but * issued at end of WAL recovery */#define CHECKPOINT_IMMEDIATE 0x0004 /* Do it without delays */#define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */#define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those * belonging to unlogged tables *//* These are important to RequestCheckpoint */#define CHECKPOINT_WAIT 0x0020 /* Wait for completion */#define CHECKPOINT_REQUESTED 0x0040 /* Checkpoint request has been made *//* These indicate the cause of a checkpoint request */#define CHECKPOINT_CAUSE_XLOG 0x0080 /* XLOG consumption */#define CHECKPOINT_CAUSE_TIME 0x0100 /* Elapsed time */
二、源码解读
SyncOneBuffer,在syncing期间处理一个buffer,其主要处理逻辑如下:
1.获取buffer描述符
2.锁定buffer
3.根据buffer状态和输入参数执行相关判断/处理
4.钉住脏页,上共享锁,调用FlushBuffer刷盘
5.解锁/解钉和其他收尾工作
/* * SyncOneBuffer -- process a single buffer during syncing. * 在syncing期间处理一个buffer * * If skip_recently_used is true, we don't write currently-pinned buffers, nor * buffers marked recently used, as these are not replacement candidates. * 如skip_recently_used为T,既不写currently-pinned buffers, * 也不写标记为最近使用的buffers,因为这些缓冲区不是可替代的缓冲区. * * Returns a bitmask containing the following flag bits: * BUF_WRITTEN: we wrote the buffer. * BUF_REUSABLE: buffer is available for replacement, ie, it has * pin count 0 and usage count 0. * 返回位掩码: * BUF_WRITTEN: 已写入buffer * BUF_REUSABLE: buffer可用于替代(pin count和usage count均为0) * * (BUF_WRITTEN could be set in error if FlushBuffers finds the buffer clean * after locking it, but we don't care all that much.) * * Note: caller must have done ResourceOwnerEnlargeBuffers. */static intSyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context){ BufferDesc *bufHdr = GetBufferDescriptor(buf_id); int result = 0; uint32 buf_state; BufferTag tag; ReservePrivateRefCountEntry(); /* * Check whether buffer needs writing. * 检查buffer是否需要写入. * * We can make this check without taking the buffer content lock so long * as we mark pages dirty in access methods *before* logging changes with * XLogInsert(): if someone marks the buffer dirty just after our check we * don't worry because our checkpoint.redo points before log record for * upcoming changes and so we are not required to write such dirty buffer. * 在使用XLogInsert() logging变化前通过访问方法标记pages为脏时, * 不需要持有锁太长的时间来执行该检查: * 因为如果某个进程在检查后标记buffer为脏, * 在这种情况下checkpoint.redo指向了变化出现前的log位置,因此无需担心,而且不必写这样的脏块. */ buf_state = LockBufHdr(bufHdr); if (BUF_STATE_GET_REFCOUNT(buf_state) == 0 && BUF_STATE_GET_USAGECOUNT(buf_state) == 0) { result |= BUF_REUSABLE; } else if (skip_recently_used) { /* Caller told us not to write recently-used buffers */ //跳过最近使用的buffer UnlockBufHdr(bufHdr, buf_state); return result; } if (!(buf_state & BM_VALID) || !(buf_state & BM_DIRTY)) { /* It's clean, so nothing to do */ //buffer无效或者不是脏块 UnlockBufHdr(bufHdr, buf_state); return result; } /* * Pin it, share-lock it, write it. (FlushBuffer will do nothing if the * buffer is clean by the time we've locked it.) * 钉住它,上共享锁,并刷到盘上. */ PinBuffer_Locked(bufHdr); LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED); //调用FlushBuffer //If the caller has an smgr reference for the buffer's relation, pass it as the second parameter. //If not, pass NULL. FlushBuffer(bufHdr, NULL); LWLockRelease(BufferDescriptorGetContentLock(bufHdr)); tag = bufHdr->tag; UnpinBuffer(bufHdr, true); ScheduleBufferTagForWriteback(wb_context, &tag); return result | BUF_WRITTEN;}FlushBuffer
FlushBuffer函数物理上把共享缓存刷盘,主要实现函数还是smgrwrite(storage manager write).
/* * FlushBuffer * Physically write out a shared buffer. * 物理上把共享缓存刷盘. * * NOTE: this actually just passes the buffer contents to the kernel; the * real write to disk won't happen until the kernel feels like it. This * is okay from our point of view since we can redo the changes from WAL. * However, we will need to force the changes to disk via fsync before * we can checkpoint WAL. * 只是把buffer内容发给os内核,何时真正写盘由os来确定. * 在checkpoint WAL前需要通过fsync强制落盘. * * The caller must hold a pin on the buffer and have share-locked the * buffer contents. (Note: a share-lock does not prevent updates of * hint bits in the buffer, so the page could change while the write * is in progress, but we assume that that will not invalidate the data * written.) * 调用者必须钉住了缓存并且持有共享锁. * (注意:共享锁不会buffer中的hint bits的更新,因此在写入期间page可能会出现变化, * 但我假定那样不会让写入的数据无效) * * If the caller has an smgr reference for the buffer's relation, pass it * as the second parameter. If not, pass NULL. */static voidFlushBuffer(BufferDesc *buf, SMgrRelation reln){ XLogRecPtr recptr; ErrorContextCallback errcallback; instr_time io_start, io_time; Block bufBlock; char *bufToWrite; uint32 buf_state; /* * Acquire the buffer's io_in_progress lock. If StartBufferIO returns * false, then someone else flushed the buffer before we could, so we need * not do anything. */ if (!StartBufferIO(buf, false)) return; /* Setup error traceback support for ereport() */ errcallback.callback = shared_buffer_write_error_callback; errcallback.arg = (void *) buf; errcallback.previous = error_context_stack; error_context_stack = &errcallback; /* Find smgr relation for buffer */ if (reln == NULL) reln = smgropen(buf->tag.rnode, InvalidBackendId); TRACE_POSTGRESQL_BUFFER_FLUSH_START(buf->tag.forkNum, buf->tag.blockNum, reln->smgr_rnode.node.spcNode, reln->smgr_rnode.node.dbNode, reln->smgr_rnode.node.relNode); buf_state = LockBufHdr(buf); /* * Run PageGetLSN while holding header lock, since we don't have the * buffer locked exclusively in all cases. */ recptr = BufferGetLSN(buf); /* To check if block content changes while flushing. - vadim 01/17/97 */ buf_state &= ~BM_JUST_DIRTIED; UnlockBufHdr(buf, buf_state); /* * Force XLOG flush up to buffer's LSN. This implements the basic WAL * rule that log updates must hit disk before any of the data-file changes * they describe do. * * However, this rule does not apply to unlogged relations, which will be * lost after a crash anyway. Most unlogged relation pages do not bear * LSNs since we never emit WAL records for them, and therefore flushing * up through the buffer LSN would be useless, but harmless. However, * GiST indexes use LSNs internally to track page-splits, and therefore * unlogged GiST pages bear "fake" LSNs generated by * GetFakeLSNForUnloggedRel. It is unlikely but possible that the fake * LSN counter could advance past the WAL insertion point; and if it did * happen, attempting to flush WAL through that location would fail, with * disastrous system-wide consequences. To make sure that can't happen, * skip the flush if the buffer isn't permanent. */ if (buf_state & BM_PERMANENT) XLogFlush(recptr); /* * Now it's safe to write buffer to disk. Note that no one else should * have been able to write it while we were busy with log flushing because * we have the io_in_progress lock. */ bufBlock = BufHdrGetBlock(buf); /* * Update page checksum if desired. Since we have only shared lock on the * buffer, other processes might be updating hint bits in it, so we must * copy the page to private storage if we do checksumming. */ bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum); if (track_io_timing) INSTR_TIME_SET_CURRENT(io_start); /* * bufToWrite is either the shared buffer or a copy, as appropriate. */ smgrwrite(reln, buf->tag.forkNum, buf->tag.blockNum, bufToWrite, false); if (track_io_timing) { INSTR_TIME_SET_CURRENT(io_time); INSTR_TIME_SUBTRACT(io_time, io_start); pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time)); INSTR_TIME_ADD(pgBufferUsage.blk_write_time, io_time); } pgBufferUsage.shared_blks_written++; /* * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and * end the io_in_progress state. */ TerminateBufferIO(buf, true, 0); TRACE_POSTGRESQL_BUFFER_FLUSH_DONE(buf->tag.forkNum, buf->tag.blockNum, reln->smgr_rnode.node.spcNode, reln->smgr_rnode.node.dbNode, reln->smgr_rnode.node.relNode); /* Pop the error context stack */ error_context_stack = errcallback.previous;}三、跟踪分析
测试脚本
testdb=# update t_wal_ckpt set c2 = 'C4#'||substr(c2,4,40);UPDATE 1testdb=# checkpoint;
跟踪分析
(gdb) handle SIGINT print nostop passSIGINT is used by the debugger.Are you sure you want to change it? (y or n) ySignal Stop Print Pass to program DescriptionSIGINT No Yes Yes Interrupt(gdb) b SyncOneBufferBreakpoint 1 at 0x8a7167: file bufmgr.c, line 2357.(gdb) cContinuing.Program received signal SIGINT, Interrupt.Breakpoint 1, SyncOneBuffer (buf_id=0, skip_recently_used=false, wb_context=0x7fff27f5ae00) at bufmgr.c:23572357 BufferDesc *bufHdr = GetBufferDescriptor(buf_id);(gdb) n2358 int result = 0;(gdb) p *bufHdr$1 = {tag = {rnode = {spcNode = 1663, dbNode = 16384, relNode = 221290}, forkNum = MAIN_FORKNUM, blockNum = 0}, buf_id = 0, state = {value = 3548905472}, wait_backend_pid = 0, freeNext = -2, content_lock = {tranche = 53, state = { value = 536870912}, waiters = {head = 2147483647, tail = 2147483647}}}(gdb) n2362 ReservePrivateRefCountEntry();(gdb) 2373 buf_state = LockBufHdr(bufHdr);(gdb) 2375 if (BUF_STATE_GET_REFCOUNT(buf_state) == 0 &&(gdb) 2376 BUF_STATE_GET_USAGECOUNT(buf_state) == 0)(gdb) 2375 if (BUF_STATE_GET_REFCOUNT(buf_state) == 0 &&(gdb) 2380 else if (skip_recently_used)(gdb) 2387 if (!(buf_state & BM_VALID) || !(buf_state & BM_DIRTY))(gdb) 2398 PinBuffer_Locked(bufHdr);(gdb) p buf_state$2 = 3553099776(gdb) n2399 LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_SHARED);(gdb) 2401 FlushBuffer(bufHdr, NULL);(gdb) stepFlushBuffer (buf=0x7fedc4a68300, reln=0x0) at bufmgr.c:26872687 if (!StartBufferIO(buf, false))(gdb) n2691 errcallback.callback = shared_buffer_write_error_callback;(gdb) 2692 errcallback.arg = (void *) buf;(gdb) 2693 errcallback.previous = error_context_stack;(gdb) 2694 error_context_stack = &errcallback;(gdb) 2697 if (reln == NULL)(gdb) 2698 reln = smgropen(buf->tag.rnode, InvalidBackendId);(gdb) 2700 TRACE_POSTGRESQL_BUFFER_FLUSH_START(buf->tag.forkNum,(gdb) 2706 buf_state = LockBufHdr(buf);(gdb) 2712 recptr = BufferGetLSN(buf);(gdb) 2715 buf_state &= ~BM_JUST_DIRTIED;(gdb) p recptr$3 = 16953421760(gdb) n2716 UnlockBufHdr(buf, buf_state);(gdb) 2735 if (buf_state & BM_PERMANENT)(gdb) 2736 XLogFlush(recptr);(gdb) 2743 bufBlock = BufHdrGetBlock(buf);(gdb) 2750 bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);(gdb) p bufBlock$4 = (Block) 0x7fedc4e68300(gdb) n2752 if (track_io_timing)(gdb) 2758 smgrwrite(reln,(gdb) 2764 if (track_io_timing)(gdb) 2772 pgBufferUsage.shared_blks_written++;(gdb) 2778 TerminateBufferIO(buf, true, 0);(gdb) 2780 TRACE_POSTGRESQL_BUFFER_FLUSH_DONE(buf->tag.forkNum,(gdb) 2787 error_context_stack = errcallback.previous;(gdb) 2788 }(gdb) SyncOneBuffer (buf_id=0, skip_recently_used=false, wb_context=0x7fff27f5ae00) at bufmgr.c:24032403 LWLockRelease(BufferDescriptorGetContentLock(bufHdr));(gdb) 2405 tag = bufHdr->tag;(gdb) 2407 UnpinBuffer(bufHdr, true);(gdb) 2409 ScheduleBufferTagForWriteback(wb_context, &tag);(gdb) 2411 return result | BUF_WRITTEN;(gdb) 2412 }(gdb)感谢各位的阅读,以上就是"PostgreSQL checkpoint中用于刷一个脏page的函数是什么"的内容了,经过本文的学习后,相信大家对PostgreSQL checkpoint中用于刷一个脏page的函数是什么这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是,小编将为大家推送更多相关知识点的文章,欢迎关注!
函数
标记
处理
检查
钉住
内容
缓存
变化
学习
情况
数据
物理
缓冲区
分析
缓冲
跟踪
位置
内核
参数
只是
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
山东天马互联网科技有限公司
联想数码库管理服务器
web程序必须要服务器吗
数据库原理教改项目中期报告
vb6.0连数据库步骤
记忆互联网科技有限公司
北京商城app开发软件开发
数据库程序员面试笔试pdf
七大网络安全示范学校
天正 连接数据库失败
软件开发的方案书怎样写
CDN软件开发公司
北京龙腾讯网络技术有限公司
个人征信信息基础数据库管理办法
公民怎样为维护网络安全做贡献
互联网领先科技发展
2019网络安全就业前景
mdb数据库太大打开不了
数据库json类型增删改查
dbmis6数据库
热血江湖服务器排行
摩尔庄园服务器不互通
华为服务器定位
如何识别服务器在国外的网站
s71500网络安全 病毒
安卓系统软件开发服务平台
中宁县数据防泄密软件开发公司
需要服务器租用
网络安全基本的漏洞有什么
山东省联想服务器哪家服务好