暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 源码解读:通过 oid 获取Relation对象简析

原创 liaju 2020-11-12
3937

前言

此文基于postgresql源码(version:devel 12)简单了解其内部表格存储的组织方式,对于不相关的细节留作以后深入理解。

源码解析

入口 table.c table_open()

/* ----------------
 *		table_open - open a table relation by relation OID
 *
 *		This is essentially relation_open plus check that the relation
 *		is not an index nor a composite type.  (The caller should also
 *		check that it's not a view or foreign table before assuming it has
 *		storage.)
 * ----------------
 */
Relation
table_open(Oid relationId, LOCKMODE lockmode)
{
	Relation	r;
r = relation_open(relationId, lockmode);

if (r->rd_rel->relkind == RELKIND_INDEX ||
	r->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
	ereport(ERROR,
			(errcode(ERRCODE_WRONG_OBJECT_TYPE),
			 errmsg(""%s" is an index",
					RelationGetRelationName(r))));
else if (r->rd_rel->relkind == RELKIND_COMPOSITE_TYPE)
	ereport(ERROR,
			(errcode(ERRCODE_WRONG_OBJECT_TYPE),
			 errmsg(""%s" is a composite type",
					RelationGetRelationName(r))));

return r;

}

该函数主要有两个参数, 其中的lockmode指获取相应table所需要获取的锁,postgresql内置9种模式。 可以看到这里将主要工作交给了relation_open函数,table_open的主要作用为安全性检查,继续查看table_relation.

relation.c relation_open()

/* ----------------
 *		relation_open - open any relation by relation OID
 *
 *		If lockmode is not "NoLock", the specified kind of lock is
 *		obtained on the relation.  (Generally, NoLock should only be
 *		used if the caller knows it has some appropriate lock on the
 *		relation already.)
 *
 *		An error is raised if the relation does not exist.
 *
 *		NB: a "relation" is anything with a pg_class entry.  The caller is
 *		expected to check whether the relkind is something it can handle.
 * ----------------
 */
Relation
relation_open(Oid relationId, LOCKMODE lockmode)
{
	Relation	r;
Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES);

/* Get the lock before trying to open the relcache entry */
if (lockmode != NoLock)
	LockRelationOid(relationId, lockmode);

/* The relcache does all the real work... */
r = RelationIdGetRelation(relationId);

if (!RelationIsValid(r))
	elog(ERROR, "could not open relation with OID %u", relationId);

/*
 * If we didn't get the lock ourselves, assert that caller holds one,
 * except in bootstrap mode where no locks are used.
 */
Assert(lockmode != NoLock ||
	   IsBootstrapProcessingMode() ||
	   CheckRelationLockedByMe(r, AccessShareLock, true));

/* Make note that we've accessed a temporary relation */
if (RelationUsesLocalBuffers(r))
	MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE;

pgstat_initstats(r);

return r;

}

核心为调用RelationIdGetRelation()pgstat_inistats()

relcache.c RelationIdGetRelation()

/* ----------------------------------------------------------------
 *				 Relation Descriptor Lookup Interface
 * ----------------------------------------------------------------
 */

/*
 *		RelationIdGetRelation
 *
 *		Lookup a reldesc by OID; make one if not already in cache.
 *
 *		Returns NULL if no pg_class row could be found for the given relid
 *		(suggesting we are trying to access a just-deleted relation).
 *		Any other error is reported via elog.
 *
 *		NB: caller should already have at least AccessShareLock on the
 *		relation ID, else there are nasty race conditions.
 *
 *		NB: relation ref count is incremented, or set to 1 if new entry.
 *		Caller should eventually decrement count.  (Usually,
 *		that happens by calling RelationClose().)
 */
Relation
RelationIdGetRelation(Oid relationId)
{
	Relation	rd;

	/* Make sure we're in an xact, even if this ends up being a cache hit */
	Assert(IsTransactionState());

	/*
	 * first try to find reldesc in the cache
	 */
	RelationIdCacheLookup(relationId, rd);

	if (RelationIsValid(rd))
	{
		RelationIncrementReferenceCount(rd);
		/* revalidate cache entry if necessary */
		if (!rd->rd_isvalid)
		{
			/*
			 * Indexes only have a limited number of possible schema changes,
			 * and we don't want to use the full-blown procedure because it's
			 * a headache for indexes that reload itself depends on.
			 */
			if (rd->rd_rel->relkind == RELKIND_INDEX ||
				rd->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
				RelationReloadIndexInfo(rd);
			else
				RelationClearRelation(rd, true);

			/*
			 * Normally entries need to be valid here, but before the relcache
			 * has been initialized, not enough infrastructure exists to
			 * perform pg_class lookups. The structure of such entries doesn't
			 * change, but we still want to update the rd_rel entry. So
			 * rd_isvalid = false is left in place for a later lookup.
			 */
			Assert(rd->rd_isvalid ||
				   (rd->rd_isnailed && !criticalRelcachesBuilt));
		}
		return rd;
	}

	/*
	 * no reldesc in the cache, so have RelationBuildDesc() build one and add
	 * it.
	 */
	rd = RelationBuildDesc(relationId, true);
	if (RelationIsValid(rd))
		RelationIncrementReferenceCount(rd);
	return rd;
}

该函数涉及postgresql自己的缓存管理模块,由于我们需要了解table的存储组织结构,此处忽略缓存相关代码, 继续hackRelationBuildDesc()

relcache.c RelationBuildDesc()

/*
 *		RelationBuildDesc
 *
 *		Build a relation descriptor.  The caller must hold at least
 *		AccessShareLock on the target relid.
 *
 *		The new descriptor is inserted into the hash table if insertIt is true.
 *
 *		Returns NULL if no pg_class row could be found for the given relid
 *		(suggesting we are trying to access a just-deleted relation).
 *		Any other error is reported via elog.
 */
static Relation
RelationBuildDesc(Oid targetRelId, bool insertIt)
{
	Relation	relation;
	Oid			relid;
	HeapTuple	pg_class_tuple;
	Form_pg_class relp;
/*
 * This function and its subroutines can allocate a good deal of transient
 * data in CurrentMemoryContext.  Traditionally we've just leaked that
 * data, reasoning that the caller's context is at worst of transaction
 * scope, and relcache loads shouldn't happen so often that it's essential
 * to recover transient data before end of statement/transaction.  However
 * that's definitely not true in clobber-cache test builds, and perhaps
 * it's not true in other cases.  If RECOVER_RELATION_BUILD_MEMORY is not
 * zero, arrange to allocate the junk in a temporary context that we'll
 * free before returning.  Make it a child of caller's context so that it
 * will get cleaned up appropriately if we error out partway through.
 */

#if RECOVER_RELATION_BUILD_MEMORY
MemoryContext tmpcxt;
MemoryContext oldcxt;

tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
							   "RelationBuildDesc workspace",
							   ALLOCSET_DEFAULT_SIZES);
oldcxt = MemoryContextSwitchTo(tmpcxt);

#endif

/*
 * find the tuple in pg_class corresponding to the given relation id
 */
pg_class_tuple = ScanPgRelation(targetRelId, true, false);

/*
 * if no such tuple exists, return NULL
 */
if (!HeapTupleIsValid(pg_class_tuple))
{

#if RECOVER_RELATION_BUILD_MEMORY
/* Return to caller's context, and blow away the temporary context */
MemoryContextSwitchTo(oldcxt);
MemoryContextDelete(tmpcxt);
#endif
return NULL;
}

/*
 * get information from the pg_class_tuple
 */
relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
relid = relp->oid;
Assert(relid == targetRelId);

/*
 * allocate storage for the relation descriptor, and copy pg_class_tuple
 * to relation->rd_rel.
 */
relation = AllocateRelationDesc(relp);

/*
 * initialize the relation's relation id (relation->rd_id)
 */
RelationGetRelid(relation) = relid;

/*
 * normal relations are not nailed into the cache; nor can a pre-existing
 * relation be new.  It could be temp though.  (Actually, it could be new
 * too, but it's okay to forget that fact if forced to flush the entry.)
 */
relation->rd_refcnt = 0;
relation->rd_isnailed = false;
relation->rd_createSubid = InvalidSubTransactionId;
relation->rd_newRelfilenodeSubid = InvalidSubTransactionId;
switch (relation->rd_rel->relpersistence)
{
	case RELPERSISTENCE_UNLOGGED:
	case RELPERSISTENCE_PERMANENT:
		relation->rd_backend = InvalidBackendId;
		relation->rd_islocaltemp = false;
		break;
	case RELPERSISTENCE_TEMP:
		if (isTempOrTempToastNamespace(relation->rd_rel->relnamespace))
		{
			relation->rd_backend = BackendIdForTempRelations();
			relation->rd_islocaltemp = true;
		}
		else
		{
			/*
			 * If it's a temp table, but not one of ours, we have to use
			 * the slow, grotty method to figure out the owning backend.
			 *
			 * Note: it's possible that rd_backend gets set to MyBackendId
			 * here, in case we are looking at a pg_class entry left over
			 * from a crashed backend that coincidentally had the same
			 * BackendId we're using.  We should *not* consider such a
			 * table to be "ours"; this is why we need the separate
			 * rd_islocaltemp flag.  The pg_class entry will get flushed
			 * if/when we clean out the corresponding temp table namespace
			 * in preparation for using it.
			 */
			relation->rd_backend =
				GetTempNamespaceBackendId(relation->rd_rel->relnamespace);
			Assert(relation->rd_backend != InvalidBackendId);
			relation->rd_islocaltemp = false;
		}
		break;
	default:
		elog(ERROR, "invalid relpersistence: %c",
			 relation->rd_rel->relpersistence);
		break;
}

/*
 * initialize the tuple descriptor (relation->rd_att).
 */
RelationBuildTupleDesc(relation);

/*
 * Fetch rules and triggers that affect this relation
 */
if (relation->rd_rel->relhasrules)
	RelationBuildRuleLock(relation);
else
{
	relation->rd_rules = NULL;
	relation->rd_rulescxt = NULL;
}

if (relation->rd_rel->relhastriggers)
	RelationBuildTriggers(relation);
else
	relation->trigdesc = NULL;

if (relation->rd_rel->relrowsecurity)
	RelationBuildRowSecurity(relation);
else
	relation->rd_rsdesc = NULL;

/* foreign key data is not loaded till asked for */
relation->rd_fkeylist = NIL;
relation->rd_fkeyvalid = false;

/* if a partitioned table, initialize key and partition descriptor info */
if (relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
	RelationBuildPartitionKey(relation);
	RelationBuildPartitionDesc(relation);
}
else
{
	relation->rd_partkey = NULL;
	relation->rd_partkeycxt = NULL;
	relation->rd_partdesc = NULL;
	relation->rd_pdcxt = NULL;
}
/* ... but partcheck is not loaded till asked for */
relation->rd_partcheck = NIL;
relation->rd_partcheckvalid = false;
relation->rd_partcheckcxt = NULL;

/*
 * initialize access method information
 */
switch (relation->rd_rel->relkind)
{
	case RELKIND_INDEX:
	case RELKIND_PARTITIONED_INDEX:
		Assert(relation->rd_rel->relam != InvalidOid);
		RelationInitIndexAccessInfo(relation);
		break;
	case RELKIND_RELATION:
	case RELKIND_TOASTVALUE:
	case RELKIND_MATVIEW:
		Assert(relation->rd_rel->relam != InvalidOid);
		RelationInitTableAccessMethod(relation);
		break;
	case RELKIND_SEQUENCE:
		Assert(relation->rd_rel->relam == InvalidOid);
		RelationInitTableAccessMethod(relation);
		break;
	case RELKIND_VIEW:
	case RELKIND_COMPOSITE_TYPE:
	case RELKIND_FOREIGN_TABLE:
	case RELKIND_PARTITIONED_TABLE:
		Assert(relation->rd_rel->relam == InvalidOid);
		break;
}

/* extract reloptions if any */
RelationParseRelOptions(relation, pg_class_tuple);

/*
 * initialize the relation lock manager information
 */
RelationInitLockInfo(relation); /* see lmgr.c */

/*
 * initialize physical addressing information for the relation
 */
RelationInitPhysicalAddr(relation);

/* make sure relation is marked as having no open file yet */
relation->rd_smgr = NULL;

/*
 * now we can free the memory allocated for pg_class_tuple
 */
heap_freetuple(pg_class_tuple);

/*
 * Insert newly created relation into relcache hash table, if requested.
 *
 * There is one scenario in which we might find a hashtable entry already
 * present, even though our caller failed to find it: if the relation is a
 * system catalog or index that's used during relcache load, we might have
 * recursively created the same relcache entry during the preceding steps.
 * So allow RelationCacheInsert to delete any already-present relcache
 * entry for the same OID.  The already-present entry should have refcount
 * zero (else somebody forgot to close it); in the event that it doesn't,
 * we'll elog a WARNING and leak the already-present entry.
 */
if (insertIt)
	RelationCacheInsert(relation, true);

/* It's fully valid */
relation->rd_isvalid = true;

#if RECOVER_RELATION_BUILD_MEMORY
/* Return to caller's context, and blow away the temporary context */
MemoryContextSwitchTo(oldcxt);
MemoryContextDelete(tmpcxt);
#endif

return relation;

}

很关键的一个函数,用来构建关系的描述符。需要仔细了解下Relation, Oid, HeapTuple, Form_pg_class这几个结构体

Relation
typedef struct RelationData
{
	RelFileNode rd_node;		/* relation physical identifier */
	/* use "struct" here to avoid needing to include smgr.h: */
	struct SMgrRelationData *rd_smgr;	/* cached file handle, or NULL */
	int			rd_refcnt;		/* reference count */
	BackendId	rd_backend;		/* owning backend id, if temporary relation */
	bool		rd_islocaltemp; /* rel is a temp rel of this session */
	bool		rd_isnailed;	/* rel is nailed in cache */
	bool		rd_isvalid;		/* relcache entry is valid */
	bool		rd_indexvalid;	/* is rd_indexlist valid? (also rd_pkindex and
								 * rd_replidindex) */
	bool		rd_statvalid;	/* is rd_statlist valid? */
/*
 * rd_createSubid is the ID of the highest subtransaction the rel has
 * survived into; or zero if the rel was not created in the current top
 * transaction.  This can be now be relied on, whereas previously it could
 * be "forgotten" in earlier releases. Likewise, rd_newRelfilenodeSubid is
 * the ID of the highest subtransaction the relfilenode change has
 * survived into, or zero if not changed in the current transaction (or we
 * have forgotten changing it). rd_newRelfilenodeSubid can be forgotten
 * when a relation has multiple new relfilenodes within a single
 * transaction, with one of them occurring in a subsequently aborted
 * subtransaction, e.g. BEGIN; TRUNCATE t; SAVEPOINT save; TRUNCATE t;
 * ROLLBACK TO save; -- rd_newRelfilenode is now forgotten
 */
SubTransactionId rd_createSubid;	/* rel was created in current xact */
SubTransactionId rd_newRelfilenodeSubid;	/* new relfilenode assigned in
											 * current xact */

Form_pg_class rd_rel;		/* RELATION tuple */
TupleDesc	rd_att;			/* tuple descriptor */
Oid			rd_id;			/* relation's object id */
LockInfoData rd_lockInfo;	/* lock mgr's info for locking relation */
RuleLock   *rd_rules;		/* rewrite rules */
MemoryContext rd_rulescxt;	/* private memory cxt for rd_rules, if any */
TriggerDesc *trigdesc;		/* Trigger info, or NULL if rel has none */
/* use "struct" here to avoid needing to include rowsecurity.h: */
struct RowSecurityDesc *rd_rsdesc;	/* row security policies, or NULL */

/* data managed by RelationGetFKeyList: */
List	   *rd_fkeylist;	/* list of ForeignKeyCacheInfo (see below) */
bool		rd_fkeyvalid;	/* true if list has been computed */

struct PartitionKeyData *rd_partkey;	/* partition key, or NULL */
MemoryContext rd_partkeycxt;	/* private context for rd_partkey, if any */
struct PartitionDescData *rd_partdesc;	/* partitions, or NULL */
MemoryContext rd_pdcxt;		/* private context for rd_partdesc, if any */
List	   *rd_partcheck;	/* partition CHECK quals */
bool		rd_partcheckvalid;	/* true if list has been computed */
MemoryContext rd_partcheckcxt;	/* private cxt for rd_partcheck, if any */

/* data managed by RelationGetIndexList: */
List	   *rd_indexlist;	/* list of OIDs of indexes on relation */
Oid			rd_pkindex;		/* OID of primary key, if any */
Oid			rd_replidindex; /* OID of replica identity index, if any */

/* data managed by RelationGetStatExtList: */
List	   *rd_statlist;	/* list of OIDs of extended stats */

/* data managed by RelationGetIndexAttrBitmap: */
Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
Bitmapset  *rd_keyattr;		/* cols that can be ref'd by foreign keys */
Bitmapset  *rd_pkattr;		/* cols included in primary key */
Bitmapset  *rd_idattr;		/* included in replica identity index */

PublicationActions *rd_pubactions;	/* publication actions */

/*
 * rd_options is set whenever rd_rel is loaded into the relcache entry.
 * Note that you can NOT look into rd_rel for this data.  NULL means "use
 * defaults".
 */
bytea	   *rd_options;		/* parsed pg_class.reloptions */

/*
 * Oid of the handler for this relation. For an index this is a function
 * returning IndexAmRoutine, for table like relations a function returning
 * TableAmRoutine.  This is stored separately from rd_indam, rd_tableam as
 * its lookup requires syscache access, but during relcache bootstrap we
 * need to be able to initialize rd_tableam without syscache lookups.
 */
Oid			rd_amhandler;	/* OID of index AM's handler function */

/*
 * Table access method.
 */
const struct TableAmRoutine *rd_tableam;

/* These are non-NULL only for an index relation: */
Form_pg_index rd_index;		/* pg_index tuple describing this index */
/* use "struct" here to avoid needing to include htup.h: */
struct HeapTupleData *rd_indextuple;	/* all of pg_index tuple */

/*
 * index access support info (used only for an index relation)
 *
 * Note: only default support procs for each opclass are cached, namely
 * those with lefttype and righttype equal to the opclass's opcintype. The
 * arrays are indexed by support function number, which is a sufficient
 * identifier given that restriction.
 *
 * Note: rd_amcache is available for index AMs to cache private data about
 * an index.  This must be just a cache since it may get reset at any time
 * (in particular, it will get reset by a relcache inval message for the
 * index).  If used, it must point to a single memory chunk palloc'd in
 * rd_indexcxt.  A relcache reset will include freeing that chunk and
 * setting rd_amcache = NULL.
 */
MemoryContext rd_indexcxt;	/* private memory cxt for this stuff */
/* use "struct" here to avoid needing to include amapi.h: */
struct IndexAmRoutine *rd_indam;	/* index AM's API struct */
Oid		   *rd_opfamily;	/* OIDs of op families for each index col */
Oid		   *rd_opcintype;	/* OIDs of opclass declared input data types */
RegProcedure *rd_support;	/* OIDs of support procedures */
FmgrInfo   *rd_supportinfo; /* lookup info for support procedures */
int16	   *rd_indoption;	/* per-column AM-specific flags */
List	   *rd_indexprs;	/* index expression trees, if any */
List	   *rd_indpred;		/* index predicate tree, if any */
Oid		   *rd_exclops;		/* OIDs of exclusion operators, if any */
Oid		   *rd_exclprocs;	/* OIDs of exclusion ops' procs, if any */
uint16	   *rd_exclstrats;	/* exclusion ops' strategy numbers, if any */
void	   *rd_amcache;		/* available for use by index AM */
Oid		   *rd_indcollation;	/* OIDs of index collations */

/*
 * foreign-table support
 *
 * rd_fdwroutine must point to a single memory chunk palloc'd in
 * CacheMemoryContext.  It will be freed and reset to NULL on a relcache
 * reset.
 */

/* use "struct" here to avoid needing to include fdwapi.h: */
struct FdwRoutine *rd_fdwroutine;	/* cached function pointers, or NULL */

/*
 * Hack for CLUSTER, rewriting ALTER TABLE, etc: when writing a new
 * version of a table, we need to make any toast pointers inserted into it
 * have the existing toast table's OID, not the OID of the transient toast
 * table.  If rd_toastoid isn't InvalidOid, it is the OID to place in
 * toast pointers inserted into this rel.  (Note it's set on the new
 * version of the main heap, not the toast table itself.)  This also
 * causes toast_save_datum() to try to preserve toast value OIDs.
 */
Oid			rd_toastoid;	/* Real TOAST table's OID, or InvalidOid */

/* use "struct" here to avoid needing to include pgstat.h: */
struct PgStat_TableStatus *pgstat_info; /* statistics collection area */

} RelationData;

Relation为RelationData结构的指针

postgres_ext.h Oid

/*
 * Object ID is a fundamental type in Postgres.
 */
typedef unsigned int Oid;

htup.h HeapTuple

/*
 * HeapTupleData is an in-memory data structure that points to a tuple.
 *
 * There are several ways in which this data structure is used:
 *
 * * Pointer to a tuple in a disk buffer: t_data points directly into the
 *	 buffer (which the code had better be holding a pin on, but this is not
 *	 reflected in HeapTupleData itself).
 *
 * * Pointer to nothing: t_data is NULL.  This is used as a failure indication
 *	 in some functions.
 *
 * * Part of a palloc'd tuple: the HeapTupleData itself and the tuple
 *	 form a single palloc'd chunk.  t_data points to the memory location
 *	 immediately following the HeapTupleData struct (at offset HEAPTUPLESIZE).
 *	 This is the output format of heap_form_tuple and related routines.
 *
 * * Separately allocated tuple: t_data points to a palloc'd chunk that
 *	 is not adjacent to the HeapTupleData.  (This case is deprecated since
 *	 it's difficult to tell apart from case #1.  It should be used only in
 *	 limited contexts where the code knows that case #1 will never apply.)
 *
 * * Separately allocated minimal tuple: t_data points MINIMAL_TUPLE_OFFSET
 *	 bytes before the start of a MinimalTuple.  As with the previous case,
 *	 this can't be told apart from case #1 by inspection; code setting up
 *	 or destroying this representation has to know what it's doing.
 *
 * t_len should always be valid, except in the pointer-to-nothing case.
 * t_self and t_tableOid should be valid if the HeapTupleData points to
 * a disk buffer, or if it represents a copy of a tuple on disk.  They
 * should be explicitly set invalid in manufactured tuples.
 */
typedef struct HeapTupleData
{
	uint32		t_len;			/* length of *t_data */
	ItemPointerData t_self;		/* SelfItemPointer */
	Oid			t_tableOid;		/* table the tuple came from */
#define FIELDNO_HEAPTUPLEDATA_DATA 3
	HeapTupleHeader t_data;		/* -> tuple header and data */
} HeapTupleData;

可以看出该结构主要用来表示一个内存中的元组,ItemPointerData表示该元祖在硬盘上的位置,由blockID和在block上的偏移量确定. t_data一般情况下存储着元组头,紧跟着元祖头的内存空间存着元祖的数据。

itemptr.h
/*
 * ItemPointer:
 *
 * This is a pointer to an item within a disk page of a known file
 * (for example, a cross-link from an index to its parent table).
 * blkid tells us which block, posid tells us which entry in the linp
 * (ItemIdData) array we want.
 *
 * Note: because there is an item pointer in each tuple header and index
 * tuple header on disk, it's very important not to waste space with
 * structure padding bytes.  The struct is designed to be six bytes long
 * (it contains three int16 fields) but a few compilers will pad it to
 * eight bytes unless coerced.  We apply appropriate persuasion where
 * possible.  If your compiler can't be made to play along, you'll waste
 * lots of space.
 */
typedef struct ItemPointerData
{
	BlockIdData ip_blkid;
	OffsetNumber ip_posid;
}

Form_pg_class


/* ----------------
 *		pg_class definition.  cpp turns this into
 *		typedef struct FormData_pg_class
 * ----------------
 */
CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,RelationRelation_Rowtype_Id) BKI_SCHEMA_MACRO
{
	/* oid */
	Oid			oid;
/* class name */
NameData	relname;

/* OID of namespace containing this class */
Oid			relnamespace BKI_DEFAULT(PGNSP);

/* OID of entry in pg_type for table's implicit row type */
Oid			reltype BKI_LOOKUP(pg_type);

/* OID of entry in pg_type for underlying composite type */
Oid			reloftype BKI_DEFAULT(0) BKI_LOOKUP(pg_type);

/* class owner */
Oid			relowner BKI_DEFAULT(PGUID);

/* access method; 0 if not a table / index */
Oid			relam BKI_LOOKUP(pg_am);

/* identifier of physical storage file */
/* relfilenode == 0 means it is a "mapped" relation, see relmapper.c */
Oid			relfilenode;

/* identifier of table space for relation (0 means default for database) */
Oid			reltablespace BKI_DEFAULT(0) BKI_LOOKUP(pg_tablespace);

/* # of blocks (not always up-to-date) */
int32		relpages;

/* # of tuples (not always up-to-date) */
float4		reltuples;

/* # of all-visible blocks (not always up-to-date) */
int32		relallvisible;

/* OID of toast table; 0 if none */
Oid			reltoastrelid;

/* T if has (or has had) any indexes */
bool		relhasindex;

/* T if shared across databases */
bool		relisshared;

/* see RELPERSISTENCE_xxx constants below */
char		relpersistence;

/* see RELKIND_xxx constants below */
char		relkind;

/* number of user attributes */
int16		relnatts;

/*
 * Class pg_attribute must contain exactly "relnatts" user attributes
 * (with attnums ranging from 1 to relnatts) for this class.  It may also
 * contain entries with negative attnums for system attributes.
 */

/* # of CHECK constraints for class */
int16		relchecks;

/* has (or has had) any rules */
bool		relhasrules;

/* has (or has had) any TRIGGERs */
bool		relhastriggers;

/* has (or has had) child tables or indexes */
bool		relhassubclass;

/* row security is enabled or not */
bool		relrowsecurity;

/* row security forced for owners or not */
bool		relforcerowsecurity;

/* matview currently holds query results */
bool		relispopulated;

/* see REPLICA_IDENTITY_xxx constants */
char		relreplident;

/* is relation a partition? */
bool		relispartition;

/* heap for rewrite during DDL, link to original rel */
Oid			relrewrite BKI_DEFAULT(0);

/* all Xids < this are frozen in this rel */
TransactionId relfrozenxid;

/* all multixacts in this rel are >= this; it is really a MultiXactId */
TransactionId relminmxid;

#ifdef CATALOG_VARLEN /* variable-length fields start here /
/
NOTE: These fields are not present in a relcache entry's rd_rel field. /
/
access permissions */
aclitem relacl[1];

/* access-method-specific options */
text		reloptions[1];

/* partition bound node tree */
pg_node_tree relpartbound;

#endif
} FormData_pg_class;

可以把FormData_pg_class理解为一个表的概括。

RelationBuildDesc

有了上面的基础认知,继续阅读RelationBuildDesc的逻辑。

	/*
	 * find the tuple in pg_class corresponding to the given relation id
	 */
	pg_class_tuple = ScanPgRelation(targetRelId, true, false);

可以看出pg_class也是存储在一个关系中,根据oid可以找到对应关系的pg_class元组

	/*
	 * get information from the pg_class_tuple
	 */
	relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);

GETSTRUCT的作用主要是获取元组的data部分(不含头~),然后将这部分数据标示为Form_pg_class. 忽略cache相关代码,观察如何获取关系的属性字典

	/*
	 * initialize the tuple descriptor (relation->rd_att).
	 */
	RelationBuildTupleDesc(relation);
relcache.c RelationBuildTupleDesc()
/*
 *		RelationBuildTupleDesc
 *
 *		Form the relation's tuple descriptor from information in
 *		the pg_attribute, pg_attrdef & pg_constraint system catalogs.
 */
static void
RelationBuildTupleDesc(Relation relation)
{
	HeapTuple	pg_attribute_tuple;
	Relation	pg_attribute_desc;
	SysScanDesc pg_attribute_scan;
	ScanKeyData skey[2];
	int			need;
	TupleConstr *constr;
	AttrDefault *attrdef = NULL;
	AttrMissing *attrmiss = NULL;
	int			ndef = 0;
/* copy some fields from pg_class row to rd_att */
relation->rd_att->tdtypeid = relation->rd_rel->reltype;
relation->rd_att->tdtypmod = -1;	/* unnecessary, but... */

constr = (TupleConstr *) MemoryContextAlloc(CacheMemoryContext,
											sizeof(TupleConstr));
constr->has_not_null = false;
constr->has_generated_stored = false;

/*
 * Form a scan key that selects only user attributes (attnum > 0).
 * (Eliminating system attribute rows at the index level is lots faster
 * than fetching them.)
 */
ScanKeyInit(&skey[0],
			Anum_pg_attribute_attrelid,
			BTEqualStrategyNumber, F_OIDEQ,
			ObjectIdGetDatum(RelationGetRelid(relation)));
ScanKeyInit(&skey[1],
			Anum_pg_attribute_attnum,
			BTGreaterStrategyNumber, F_INT2GT,
			Int16GetDatum(0));

/*
 * Open pg_attribute and begin a scan.  Force heap scan if we haven't yet
 * built the critical relcache entries (this includes initdb and startup
 * without a pg_internal.init file).
 */
pg_attribute_desc = table_open(AttributeRelationId, AccessShareLock);
pg_attribute_scan = systable_beginscan(pg_attribute_desc,
									   AttributeRelidNumIndexId,
									   criticalRelcachesBuilt,
									   NULL,
									   2, skey);

/*
 * add attribute data to relation->rd_att
 */
need = RelationGetNumberOfAttributes(relation);

while (HeapTupleIsValid(pg_attribute_tuple = systable_getnext(pg_attribute_scan)))
{
	Form_pg_attribute attp;
	int			attnum;

	attp = (Form_pg_attribute) GETSTRUCT(pg_attribute_tuple);

	attnum = attp->attnum;
	if (attnum <= 0 || attnum > RelationGetNumberOfAttributes(relation))
		elog(ERROR, "invalid attribute number %d for %s",
			 attp->attnum, RelationGetRelationName(relation));


	memcpy(TupleDescAttr(relation->rd_att, attnum - 1),
		   attp,
		   ATTRIBUTE_FIXED_PART_SIZE);

	/* Update constraint/default info */
	if (attp->attnotnull)
		constr->has_not_null = true;
	if (attp->attgenerated == ATTRIBUTE_GENERATED_STORED)
		constr->has_generated_stored = true;

	/* If the column has a default, fill it into the attrdef array */
	if (attp->atthasdef)
	{
		if (attrdef == NULL)
			attrdef = (AttrDefault *)
				MemoryContextAllocZero(CacheMemoryContext,
									   RelationGetNumberOfAttributes(relation) *
									   sizeof(AttrDefault));
		attrdef[ndef].adnum = attnum;
		attrdef[ndef].adbin = NULL;

		ndef++;
	}

	/* Likewise for a missing value */
	if (attp->atthasmissing)
	{
		Datum		missingval;
		bool		missingNull;

		/* Do we have a missing value? */
		missingval = heap_getattr(pg_attribute_tuple,
								  Anum_pg_attribute_attmissingval,
								  pg_attribute_desc->rd_att,
								  &missingNull);
		if (!missingNull)
		{
			/* Yes, fetch from the array */
			MemoryContext oldcxt;
			bool		is_null;
			int			one = 1;
			Datum		missval;

			if (attrmiss == NULL)
				attrmiss = (AttrMissing *)
					MemoryContextAllocZero(CacheMemoryContext,
										   relation->rd_rel->relnatts *
										   sizeof(AttrMissing));

			missval = array_get_element(missingval,
										1,
										&one,
										-1,
										attp->attlen,
										attp->attbyval,
										attp->attalign,
										&is_null);
			Assert(!is_null);
			if (attp->attbyval)
			{
				/* for copy by val just copy the datum direct */
				attrmiss[attnum - 1].am_value = missval;
			}
			else
			{
				/* otherwise copy in the correct context */
				oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
				attrmiss[attnum - 1].am_value = datumCopy(missval,
														  attp->attbyval,
														  attp->attlen);
				MemoryContextSwitchTo(oldcxt);
			}
			attrmiss[attnum - 1].am_present = true;
		}
	}
	need--;
	if (need == 0)
		break;
}

/*
 * end the scan and close the attribute relation
 */
systable_endscan(pg_attribute_scan);
table_close(pg_attribute_desc, AccessShareLock);

if (need != 0)
	elog(ERROR, "catalog is missing %d attribute(s) for relid %u",
		 need, RelationGetRelid(relation));

/*
 * The attcacheoff values we read from pg_attribute should all be -1
 * ("unknown").  Verify this if assert checking is on.  They will be
 * computed when and if needed during tuple access.
 */

#ifdef USE_ASSERT_CHECKING
{
int i;

	for (i = 0; i < RelationGetNumberOfAttributes(relation); i++)
		Assert(TupleDescAttr(relation->rd_att, i)->attcacheoff == -1);
}

#endif

/*
 * However, we can easily set the attcacheoff value for the first
 * attribute: it must be zero.  This eliminates the need for special cases
 * for attnum=1 that used to exist in fastgetattr() and index_getattr().
 */
if (RelationGetNumberOfAttributes(relation) > 0)
	TupleDescAttr(relation->rd_att, 0)->attcacheoff = 0;

/*
 * Set up constraint/default info
 */
if (constr->has_not_null || ndef > 0 ||
	attrmiss || relation->rd_rel->relchecks)
{
	relation->rd_att->constr = constr;

	if (ndef > 0)			/* DEFAULTs */
	{
		if (ndef < RelationGetNumberOfAttributes(relation))
			constr->defval = (AttrDefault *)
				repalloc(attrdef, ndef * sizeof(AttrDefault));
		else
			constr->defval = attrdef;
		constr->num_defval = ndef;
		AttrDefaultFetch(relation);
	}
	else
		constr->num_defval = 0;

	constr->missing = attrmiss;

	if (relation->rd_rel->relchecks > 0)	/* CHECKs */
	{
		constr->num_check = relation->rd_rel->relchecks;
		constr->check = (ConstrCheck *)
			MemoryContextAllocZero(CacheMemoryContext,
								   constr->num_check * sizeof(ConstrCheck));
		CheckConstraintFetch(relation);
	}
	else
		constr->num_check = 0;
}
else
{
	pfree(constr);
	relation->rd_att->constr = NULL;
}

}

该函数主要用来从pg_attribute表中读出与查询的relation相关的属性结构描述元组,即:

/*
 * This struct is passed around within the backend to describe the structure
 * of tuples.  For tuples coming from on-disk relations, the information is
 * collected from the pg_attribute, pg_attrdef, and pg_constraint catalogs.
 * Transient row types (such as the result of a join query) have anonymous
 * TupleDesc structs that generally omit any constraint info; therefore the
 * structure is designed to let the constraints be omitted efficiently.
 *
 * Note that only user attributes, not system attributes, are mentioned in
 * TupleDesc.
 *
 * If the tupdesc is known to correspond to a named rowtype (such as a table's
 * rowtype) then tdtypeid identifies that type and tdtypmod is -1.  Otherwise
 * tdtypeid is RECORDOID, and tdtypmod can be either -1 for a fully anonymous
 * row type, or a value >= 0 to allow the rowtype to be looked up in the
 * typcache.c type cache.
 *
 * Note that tdtypeid is never the OID of a domain over composite, even if
 * we are dealing with values that are known (at some higher level) to be of
 * a domain-over-composite type.  This is because tdtypeid/tdtypmod need to
 * match up with the type labeling of composite Datums, and those are never
 * explicitly marked as being of a domain type, either.
 *
 * Tuple descriptors that live in caches (relcache or typcache, at present)
 * are reference-counted: they can be deleted when their reference count goes
 * to zero.  Tuple descriptors created by the executor need no reference
 * counting, however: they are simply created in the appropriate memory
 * context and go away when the context is freed.  We set the tdrefcount
 * field of such a descriptor to -1, while reference-counted descriptors
 * always have tdrefcount >= 0.
 */
typedef struct TupleDescData
{
	int			natts;			/* number of attributes in the tuple */
	Oid			tdtypeid;		/* composite type ID for tuple type */
	int32		tdtypmod;		/* typmod for tuple type */
	int			tdrefcount;		/* reference count, or -1 if not counting */
	TupleConstr *constr;		/* constraints, or NULL if none */
	/* attrs[N] is the description of Attribute Number N+1 */
	FormData_pg_attribute attrs[FLEXIBLE_ARRAY_MEMBER];
} TupleDescData;
typedef struct TupleDescData *TupleDesc;

至此,关于relation对象的主要构造功能已完成(忽略缺失值处理,默认值等等细节)

总结

postgresql的许多结构描述性信息均以元组的形式存储在表格中,关系字典存储在pg_class表格中,属性字典存储在pg_attribute表中。 构造Relation对象的过程为,根据oid从pg_class表中获取与该关系相关的关系数据字典,而后根据relation id获取属性字典。 后续获取Relation的各个元组后,需要根据属性字典对它们进行解析。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论