PostgreSQL中mdread函数有什么作用
发表于:2025-11-10 作者:千家信息网编辑
千家信息网最后更新 2025年11月10日,本篇内容主要讲解"PostgreSQL中mdread函数有什么作用",感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习"PostgreSQL中mdread函数有什
千家信息网最后更新 2025年11月10日PostgreSQL中mdread函数有什么作用
本篇内容主要讲解"PostgreSQL中mdread函数有什么作用",感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习"PostgreSQL中mdread函数有什么作用"吧!
PostgreSQL存储管理的mdread函数是magnetic disk存储管理中负责读取的函数.
一、数据结构
smgrsw
f_smgr函数指针结构体定义了独立的存储管理模块和smgr.c之间的API函数.
md是magnetic disk的缩写.
除了md,先前PG还支持Sony WORM optical disk jukebox and persistent main memory这两种存储方式,
但在后面只剩下magnetic disk,其余的已被废弃不再支持.
"magnetic disk"本身的名称也存在误导,实际上md可以支持操作系统提供标准文件系统的任何类型的设备.
/* * This struct of function pointers defines the API between smgr.c and * any individual storage manager module. Note that smgr subfunctions are * generally expected to report problems via elog(ERROR). An exception is * that smgr_unlink should use elog(WARNING), rather than erroring out, * because we normally unlink relations during post-commit/abort cleanup, * and so it's too late to raise an error. Also, various conditions that * would normally be errors should be allowed during bootstrap and/or WAL * recovery --- see comments in md.c for details. * 函数指针结构体定义了独立的存储管理模块和smgr.c之间的API函数. * 注意smgr子函数通常会通过elog(ERROR)报告错误. * 其中一个例外是smgr_unlink应该使用elog(WARNING),而不是把错误抛出, * 因为通过来说在事务提交/回滚清理期间才会解链接(unlinke)关系, * 因此这时候抛出错误就显得太晚了. * 同时,在bootstrap和/或WAL恢复期间,各种可能会出现错误的情况也应被允许 --- 详细可查看md.c中的注释. */typedef struct f_smgr{ void (*smgr_init) (void); /* may be NULL */ void (*smgr_shutdown) (void); /* may be NULL */ void (*smgr_close) (SMgrRelation reln, ForkNumber forknum); void (*smgr_create) (SMgrRelation reln, ForkNumber forknum, bool isRedo); bool (*smgr_exists) (SMgrRelation reln, ForkNumber forknum); void (*smgr_unlink) (RelFileNodeBackend rnode, ForkNumber forknum, bool isRedo); void (*smgr_extend) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer, bool skipFsync); void (*smgr_prefetch) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum); void (*smgr_read) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer); void (*smgr_write) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer, bool skipFsync); void (*smgr_writeback) (SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, BlockNumber nblocks); BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum); void (*smgr_truncate) (SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks); void (*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum); void (*smgr_pre_ckpt) (void); /* may be NULL */ void (*smgr_sync) (void); /* may be NULL */ void (*smgr_post_ckpt) (void); /* may be NULL */} f_smgr;/*md是magnetic disk的缩写.除了md,先前PG还支持Sony WORM optical disk jukebox and persistent main memory这两种存储方式,但在后面只剩下magnetic disk,其余的已被废弃不再支持."magnetic disk"本身的名称也存在误导,实际上md可以支持操作系统提供标准文件系统的任何类型的设备.*/static const f_smgr smgrsw[] = { /* magnetic disk */ { .smgr_init = mdinit, .smgr_shutdown = NULL, .smgr_close = mdclose, .smgr_create = mdcreate, .smgr_exists = mdexists, .smgr_unlink = mdunlink, .smgr_extend = mdextend, .smgr_prefetch = mdprefetch, .smgr_read = mdread, .smgr_write = mdwrite, .smgr_writeback = mdwriteback, .smgr_nblocks = mdnblocks, .smgr_truncate = mdtruncate, .smgr_immedsync = mdimmedsync, .smgr_pre_ckpt = mdpreckpt, .smgr_sync = mdsync, .smgr_post_ckpt = mdpostckpt }};MdfdVec
magnetic disk存储管理在自己的描述符池中跟踪打开的文件描述符.
之所以这样做是因为便于支持超过os文件大小上限(通常是2GB)的关系.
为了达到这个目的,我们拆分关系为多个比OS文件大小上限要小的"segment"文件.
段大小通过pg_config.h中定义的RELSEG_SIZE配置参数设置.
/* * The magnetic disk storage manager keeps track of open file * descriptors in its own descriptor pool. This is done to make it * easier to support relations that are larger than the operating * system's file size limit (often 2GBytes). In order to do that, * we break relations up into "segment" files that are each shorter than * the OS file size limit. The segment size is set by the RELSEG_SIZE * configuration constant in pg_config.h. * magnetic disk存储管理在自己的描述符池中跟踪打开的文件描述符. * 之所以这样做是因为便于支持超过os文件大小上限(通常是2GB)的关系. * 为了达到这个目的,我们拆分关系为多个比OS文件大小上限要小的"segment"文件. * 段大小通过pg_config.h中定义的RELSEG_SIZE配置参数设置. * * On disk, a relation must consist of consecutively numbered segment * files in the pattern * -- Zero or more full segments of exactly RELSEG_SIZE blocks each * -- Exactly one partial segment of size 0 <= size < RELSEG_SIZE blocks * -- Optionally, any number of inactive segments of size 0 blocks. * The full and partial segments are collectively the "active" segments. * Inactive segments are those that once contained data but are currently * not needed because of an mdtruncate() operation. The reason for leaving * them present at size zero, rather than unlinking them, is that other * backends and/or the checkpointer might be holding open file references to * such segments. If the relation expands again after mdtruncate(), such * that a deactivated segment becomes active again, it is important that * such file references still be valid --- else data might get written * out to an unlinked old copy of a segment file that will eventually * disappear. * 在磁盘上,关系必须由按照某种模式连续编号的segment files组成. * -- 每个RELSEG_SIZE块的另段或多个完整段 * -- 大小满足0 <= size < RELSEG_SIZE blocks的一个部分段 * -- 可选的,大小为0 blocks的N个非活动段 * 完整和部分段统称为活动段.非活动段指的是哪些因为mdtruncate()操作而出现的包含数据但目前不需要的. * 保留这些大小为0的非活动段而不是unlinking的原因是其他进程和/或checkpointer进程可能 * 持有这些段的文件依赖. * 如果关系在mdtruncate()之后再次扩展了,这样一个无效的会重新变为活动段, * 因此文件依赖仍然保持有效是很重要的 * --- 否则数据可能写出到未经链接的旧segment file拷贝上,会时不时的出现数据丢失. * * File descriptors are stored in the per-fork md_seg_fds arrays inside * SMgrRelation. The length of these arrays is stored in md_num_open_segs. * Note that a fork's md_num_open_segs having a specific value does not * necessarily mean the relation doesn't have additional segments; we may * just not have opened the next segment yet. (We could not have "all * segments are in the array" as an invariant anyway, since another backend * could extend the relation while we aren't looking.) We do not have * entries for inactive segments, however; as soon as we find a partial * segment, we assume that any subsequent segments are inactive. * 文件描述符在SMgrRelation中的per-fork md_seg_fds数组存储. * 这些数组的长度存储在md_num_open_segs中. * 注意一个fork的md_num_open_segs有一个特定值并不必要意味着关系不能有额外的段, * 我们只是还没有打开下一个段而已. * (但不管怎样,我们不可能把"所有段都放在数组中"作为一个不变式看待, * 因为其他后台进程在尚未检索时已经扩展了关系) * 但是,我们不需要持有非活动段的条目,只要我们一旦发现部分段,那么就可以假定接下来的段是非活动的. * * The entire MdfdVec array is palloc'd in the MdCxt memory context. * 整个MdfdVec数组通过palloc在MdCxt内存上下文中分配. */typedef struct _MdfdVec{ //文件描述符池中该文件的编号 File mdfd_vfd; /* fd number in fd.c's pool */ //段号,从0起算 BlockNumber mdfd_segno; /* segment number, from 0 */} MdfdVec;二、源码解读
mdread() - 从relation中读取相应的block.
源码较为简单,主要是调用FileRead函数执行实际的读取操作.
/* * mdread() -- Read the specified block from a relation. * mdread() -- 从relation中读取相应的block */voidmdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer){ off_t seekpos;//seek的位置 int nbytes;//bytes MdfdVec *v;//md文件描述符向量数组 TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum, reln->smgr_rnode.node.spcNode, reln->smgr_rnode.node.dbNode, reln->smgr_rnode.node.relNode, reln->smgr_rnode.backend); //获取向量数组 v = _mdfd_getseg(reln, forknum, blocknum, false, EXTENSION_FAIL | EXTENSION_CREATE_RECOVERY); //获取block偏移 seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE)); //验证 Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE); //读取文件,读入buffer中,返回读取的字节数 nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ, seekpos, WAIT_EVENT_DATA_FILE_READ); //跟踪 TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum, reln->smgr_rnode.node.spcNode, reln->smgr_rnode.node.dbNode, reln->smgr_rnode.node.relNode, reln->smgr_rnode.backend, nbytes, BLCKSZ); if (nbytes != BLCKSZ) { //读取的字节数不等于块大小,报错 if (nbytes < 0) ereport(ERROR, (errcode_for_file_access(), errmsg("could not read block %u in file \"%s\": %m", blocknum, FilePathName(v->mdfd_vfd)))); /* * Short read: we are at or past EOF, or we read a partial block at * EOF. Normally this is an error; upper levels should never try to * read a nonexistent block. However, if zero_damaged_pages is ON or * we are InRecovery, we should instead return zeroes without * complaining. This allows, for example, the case of trying to * update a block that was later truncated away. * Short read:处于EOF或者在EOF之后,或者在EOF处读取了一个部分块. * 通常来说,这是一个错误,高层代码不应尝试读取一个不存在的block. * 但是,如果zero_damaged_pages参数设置为ON或者处于InRecovery状态,那么应该返回0而不报错. * 比如,这可以允许尝试更新一个块但随后就给截断的情况. */ if (zero_damaged_pages || InRecovery) MemSet(buffer, 0, BLCKSZ); else ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg("could not read block %u in file \"%s\": read only %d of %d bytes", blocknum, FilePathName(v->mdfd_vfd), nbytes, BLCKSZ))); }}三、跟踪分析
测试脚本
11:15:11 (xdb@[local]:5432)testdb=# insert into t1(id) select generate_series(100,500);
启动gdb,跟踪
查看调用栈
(gdb) b mdreadBreakpoint 3 at 0x8b669b: file md.c, line 738.(gdb) cContinuing.Breakpoint 3, mdread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "") at md.c:738738 TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum,(gdb) bt#0 mdread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "") at md.c:738#1 0x00000000008b92d5 in smgrread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "") at smgr.c:628#2 0x00000000008793f9 in ReadBuffer_common (smgr=0x2d09be0, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=50, mode=RBM_NORMAL, strategy=0x0, hit=0x7ffd5fb2948b) at bufmgr.c:890#3 0x0000000000878cd4 in ReadBufferExtended (reln=0x7f3836e1e788, forkNum=MAIN_FORKNUM, blockNum=50, mode=RBM_NORMAL, strategy=0x0) at bufmgr.c:664#4 0x0000000000878bb1 in ReadBuffer (reln=0x7f3836e1e788, blockNum=50) at bufmgr.c:596#5 0x00000000004eeb96 in ReadBufferBI (relation=0x7f3836e1e788, targetBlock=50, bistate=0x0) at hio.c:87#6 0x00000000004ef387 in RelationGetBufferForTuple (relation=0x7f3836e1e788, len=32, otherBuffer=0, options=0, bistate=0x0, vmbuffer=0x7ffd5fb295ec, vmbuffer_other=0x0) at hio.c:415#7 0x00000000004df1f8 in heap_insert (relation=0x7f3836e1e788, tup=0x2ca6770, cid=0, options=0, bistate=0x0) at heapam.c:2468#8 0x0000000000709dda in ExecInsert (mtstate=0x2ca4c40, slot=0x2ca3418, planSlot=0x2ca3418, estate=0x2ca48d8, canSetTag=true) at nodeModifyTable.c:529#9 0x000000000070c475 in ExecModifyTable (pstate=0x2ca4c40) at nodeModifyTable.c:2159#10 0x00000000006e05cb in ExecProcNodeFirst (node=0x2ca4c40) at execProcnode.c:445#11 0x00000000006d552e in ExecProcNode (node=0x2ca4c40) at ../../../src/include/executor/executor.h:247#12 0x00000000006d7d66 in ExecutePlan (estate=0x2ca48d8, planstate=0x2ca4c40, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x2d41a30, execute_once=true) at execMain.c:1723#13 0x00000000006d5af8 in standard_ExecutorRun (queryDesc=0x2ca24b8, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:364#14 0x00000000006d5920 in ExecutorRun (queryDesc=0x2ca24b8, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:307#15 0x00000000008c1092 in ProcessQuery (plan=0x2d418b8, sourceText=0x2c7eec8 "insert into t1(id) select generate_series(100,500);", params=0x0, queryEnv=0x0, dest=0x2d41a30, ---Typeto continue, or q to quit--- completionTag=0x7ffd5fb29b80 "") at pquery.c:161#16 0x00000000008c29a1 in PortalRunMulti (portal=0x2ce4488, isTopLevel=true, setHoldSnapshot=false, dest=0x2d41a30, altdest=0x2d41a30, completionTag=0x7ffd5fb29b80 "") at pquery.c:1286#17 0x00000000008c1f7a in PortalRun (portal=0x2ce4488, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2d41a30, altdest=0x2d41a30, completionTag=0x7ffd5fb29b80 "") at pquery.c:799#18 0x00000000008bbf16 in exec_simple_query (query_string=0x2c7eec8 "insert into t1(id) select generate_series(100,500);") at postgres.c:1145#19 0x00000000008c01a1 in PostgresMain (argc=1, argv=0x2ca8af8, dbname=0x2ca8960 "testdb", username=0x2c7bba8 "xdb") at postgres.c:4182#20 0x000000000081e07c in BackendRun (port=0x2ca0940) at postmaster.c:4361#21 0x000000000081d7ef in BackendStartup (port=0x2ca0940) at postmaster.c:4033#22 0x0000000000819be9 in ServerLoop () at postmaster.c:1706#23 0x000000000081949f in PostmasterMain (argc=1, argv=0x2c79b60) at postmaster.c:1379#24 0x0000000000742941 in main (argc=1, argv=0x2c79b60) at main.c:228(gdb)
获取读取的偏移
(gdb) n744 v = _mdfd_getseg(reln, forknum, blocknum, false,(gdb) 747 seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));(gdb) p *v$1 = {mdfd_vfd = 26, mdfd_segno = 0}(gdb) p BLCKSZ$2 = 8192(gdb) p blocknum$3 = 50(gdb) p RELSEG_SIZE$4 = 131072(gdb) n749 Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);(gdb) p seekpos$5 = 409600(gdb)执行读取操作
(gdb) n751 if (FileSeek(v->mdfd_vfd, seekpos, SEEK_SET) != seekpos)(gdb) 757 nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ, WAIT_EVENT_DATA_FILE_READ);(gdb) 759 TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum,(gdb) p nbytes$6 = 8192(gdb) p *buffer$7 = 1 '\001'(gdb) n767 if (nbytes != BLCKSZ)(gdb) 792 }(gdb) smgrread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "\001") at smgr.c:629629 }(gdb)
到此,相信大家对"PostgreSQL中mdread函数有什么作用"有了更深的了解,不妨来实际操作一番吧!这里是网站,更多相关内容可以进入相关频道进行查询,关注我们,继续学习!
文件
函数
大小
存储
支持
活动
数组
管理
错误
跟踪
上限
实际
数据
系统
作用
参数
多个
结构
进程
操作系统
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
KM数据库使用
安卓 软件开发 手机
ecshop数据库恢复
软件开发过程重要性及意义
服务器装机操作流程
服务器安全沙箱错误
体育运动软件开发
猎杀对决8tick服务器
软件开发项目成本核算案例
学业状态诊断数据库
软件开发工程师答辩点评
采访字幕软件开发
计算机网络技术中的数据
ios开发服务器
电力网络安全未遂事件
6g网络技术架构
如何使用网络安全知识
网络安全风险调研报告
重庆秀山蔬菜批发软件开发
必必普网络技术
安卓软件开发物联网
杭州智合云服务器
最早是什么数据库
宁波江北区软件开发怎么样
杨浦区个人数据库销售优点
苏宁软件开发待遇
反恐精英服务器老是断开
最强蜗牛 魔神数据库
磐石网络技术和有道是什么关系
软件开发测试男生生日礼物