初步认识pg_control文件之一

这个据说是PostgreSQL的control file。

到底如何呢,先看看改名后如何,把pg_control文件改名,然后启动 Postgres,运行时得到信息:

[postgres@pg101 bin]$ postgres: could not find the database system
Expected to find it in the directory "/usr/local/pgsql/bin/../data",
but could not open file "/usr/local/pgsql/bin/../data/global/pg_control": No Such file or Directory

对应的源代码,在postmater.c的 checkDataDir方法中:

        snprintf(path, sizeof(path), "%s/global/pg_control", DataDir);

        fp = AllocateFile(path, PG_BINARY_R);
        if (fp == NULL)
        {
                write_stderr("%s: could not find the database system
"
                                         "Expected to find it in the directory "%s",
"
                                         "but could not open file "%s": %s
",
                                         progname, DataDir, path, strerror(errno));
                ExitPostmaster(2);
        }
        FreeFile(fp);

将 pg_control文件改回原来的名字后,重新启动PostgreSQL数据库,没有问题。

而在main.c中,有如下代码:

从注释中可以看到,数据库中初始化后,会有LC_CTYPE/LC_COLLATE等信息已经写入到pg_control文件中。

        /*
         * Set up locale information from environment.  Note that LC_CTYPE and
         * LC_COLLATE will be overridden later from pg_control if we are in an
         * already-initialized database.  We set them here so that they will be
         * available to fill pg_control during initdb.  LC_MESSAGES will get set
         * later during GUC option processing, but we set it here to allow startup
         * error messages to be localized.
         */

        set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("postgres"));

在 src/backend/access/transam/xlog.c 中,有如下代码:

/*
 * We maintain an image of pg_control in shared memory.
 */
static ControlFileData *ControlFile = NULL;

可见,与pg_control文件相对应,在内存中保留着一个内存结构。

它长得是这个样子:

/*
 * Contents of pg_control.
 *
 * NOTE: try to keep this under 512 bytes so that it will fit on one physical
 * sector of typical disk drives.  This reduces the odds of corruption due to
 * power failure midway through a write.
 */

typedef struct ControlFileData
{
    /*
     * Unique system identifier --- to ensure we match up xlog files with the
     * installation that produced them.
     */
    uint64        system_identifier;

    /*
     * Version identifier information.    Keep these fields at the same offset,
     * especially pg_control_version; they won't be real useful if they move
     * around.    (For historical reasons they must be 8 bytes into the file
     * rather than immediately at the front.)
     *
     * pg_control_version identifies the format of pg_control itself.
     * catalog_version_no identifies the format of the system catalogs.
     *
     * There are additional version identifiers in individual files; for
     * example, WAL logs contain per-page magic numbers that can serve as
     * version cues for the WAL log.
     */
    uint32        pg_control_version;        /* PG_CONTROL_VERSION */
    uint32        catalog_version_no;        /* see catversion.h */

    /*
     * System status data
     */
    DBState        state;            /* see enum above */
    pg_time_t    time;            /* time stamp of last pg_control update */
    XLogRecPtr    checkPoint;        /* last check point record ptr */
    XLogRecPtr    prevCheckPoint; /* previous check point record ptr */

    CheckPoint    checkPointCopy; /* copy of last check point record */

    /*
     * These two values determine the minimum point we must recover up to
     * before starting up:
     *
     * minRecoveryPoint is updated to the latest replayed LSN whenever we
     * flush a data change during archive recovery. That guards against
     * starting archive recovery, aborting it, and restarting with an earlier
     * stop location. If we've already flushed data changes from WAL record X
     * to disk, we mustn't start up until we reach X again. Zero when not
     * doing archive recovery.
     *
     * backupStartPoint is the redo pointer of the backup start checkpoint, if
     * we are recovering from an online backup and haven't reached the end of
     * backup yet. It is reset to zero when the end of backup is reached, and
     * we mustn't start up before that. A boolean would suffice otherwise, but
     * we use the redo pointer as a cross-check when we see an end-of-backup
     * record, to make sure the end-of-backup record corresponds the base
     * backup we're recovering from.
     */
    XLogRecPtr    minRecoveryPoint;
    XLogRecPtr    backupStartPoint;

    /*
     * Parameter settings that determine if the WAL can be used for archival
     * or hot standby.
     */
    int            wal_level;
    int            MaxConnections;
    int            max_prepared_xacts;
    int            max_locks_per_xact;

    /*
     * This data is used to check for hardware-architecture compatibility of
     * the database and the backend executable.  We need not check endianness
     * explicitly, since the pg_control version will surely look wrong to a
     * machine of different endianness, but we do need to worry about MAXALIGN
     * and floating-point format.  (Note: storage layout nominally also
     * depends on SHORTALIGN and INTALIGN, but in practice these are the same
     * on all architectures of interest.)
     *
     * Testing just one double value is not a very bulletproof test for
     * floating-point compatibility, but it will catch most cases.
     */
    uint32        maxAlign;        /* alignment requirement for tuples */
    double        floatFormat;    /* constant 1234567.0 */
    #define FLOATFORMAT_VALUE    1234567.0

    /*
     * This data is used to make sure that configuration of this database is
     * compatible with the backend executable.
     */
    uint32        blcksz;            /* data block size for this DB */
    uint32        relseg_size;    /* blocks per segment of large relation */

    uint32        xlog_blcksz;    /* block size within WAL files */
    uint32        xlog_seg_size;    /* size of each WAL segment */

    uint32        nameDataLen;    /* catalog name field width */
    uint32        indexMaxKeys;    /* max number of columns in an index */

    uint32        toast_max_chunk_size;    /* chunk size in TOAST tables */

    /* flag indicating internal format of timestamp, interval, time */
    bool        enableIntTimes; /* int64 storage enabled? */

    /* flags indicating pass-by-value status of various types */
    bool        float4ByVal;    /* float4 pass-by-value? */
    bool        float8ByVal;    /* float8, int8, etc pass-by-value? */

    /* CRC of all above ... MUST BE LAST! */
    pg_crc32    crc;
} ControlFileData;

然后,一个一个地看吧。

原文地址:https://www.cnblogs.com/gaojian/p/3227525.html