对PostgreSQL中bgwriter的 MyProc 的理解

[作者：技术者高健@博客园 mail: luckyjackgao@gmail.com ]

在bgwriter的代码中，有这样一段，其中的MyProc显得很突兀：

/*                        
 * Loop forever                        
 */                        
for (;;)                        
{                        
    bool        can_hibernate;            
    int            rc;        
                        
    /* Clear any already-pending wakeups */                    
    ResetLatch(&MyProc->procLatch);                    
                        
    if (got_SIGHUP)                    
    {                    
        got_SIGHUP = false;                
        ProcessConfigFile(PGC_SIGHUP);                
    }                    
    if (shutdown_requested)                    
    {                    
        /*                
         * From here on, elog(ERROR) should end with exit(1), not send                
         * control back to the sigsetjmp block above                
         */                
        ExitOnAnyError = true;                
        /* Normal exit from the bgwriter is here */                
        proc_exit(0);        /* done */        
    }                    
                        
    /*                    
     * Do one cycle of dirty-buffer writing.                    
     */                    
    can_hibernate = BgBufferSync();                    
                        
    /*                    
     * Send off activity statistics to the stats collector                    
     */                    
    pgstat_send_bgwriter();                    
                        
    if (FirstCallSinceLastCheckpoint())                    
    {                    
        /*                
         * After any checkpoint, close all smgr files.    This is so we            
         * won't hang onto smgr references to deleted files indefinitely.                
         */                
        smgrcloseall();                
    }                    
                        
    /*                    
     * Sleep until we are signaled or BgWriterDelay has elapsed.                    
     *                    
     * Note: the feedback control loop in BgBufferSync() expects that we                    
     * will call it every BgWriterDelay msec.  While it's not critical for                    
     * correctness that that be exact, the feedback loop might misbehave                    
     * if we stray too far from that.  Hence, avoid loading this process                    
     * down with latch events that are likely to happen frequently during                    
     * normal operation.                    
     */                    
    rc = WaitLatch(&MyProc->procLatch,                    
                   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,        
                   BgWriterDelay /* ms */ );        
                        
    /*                    
     * If no latch event and BgBufferSync says nothing's happening, extend                    
     * the sleep in "hibernation" mode, where we sleep for much longer                    
     * than bgwriter_delay says.  Fewer wakeups save electricity.  When a                    
     * backend starts using buffers again, it will wake us up by setting                    
     * our latch.  Because the extra sleep will persist only as long as no                    
     * buffer allocations happen, this should not distort the behavior of                    
     * BgBufferSync's control loop too badly; essentially, it will think                    
     * that the system-wide idle interval didn't exist.                    
     *                    
     * There is a race condition here, in that a backend might allocate a                    
     * buffer between the time BgBufferSync saw the alloc count as zero                    
     * and the time we call StrategyNotifyBgWriter.  While it's not                    
     * critical that we not hibernate anyway, we try to reduce the odds of                    
     * that by only hibernating when BgBufferSync says nothing's happening                    
     * for two consecutive cycles.    Also, we mitigate any possible                
     * consequences of a missed wakeup by not hibernating forever.                    
     */                    
    if (rc == WL_TIMEOUT && can_hibernate && prev_hibernate)                    
    {                    
        /* Ask for notification at next buffer allocation */                
        StrategyNotifyBgWriter(&MyProc->procLatch);                
        /* Sleep ... */                
        rc = WaitLatch(&MyProc->procLatch,                
                       WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,    
                       BgWriterDelay * HIBERNATE_FACTOR);    
        /* Reset the notification request in case we timed out */                
        StrategyNotifyBgWriter(NULL);                
    }                    
                        
    /*                    
     * Emergency bailout if postmaster has died.  This is to avoid the                    
     * necessity for manual cleanup of all postmaster children.                    
     */                    
    if (rc & WL_POSTMASTER_DEATH)                    
        exit(1);                
                        
    prev_hibernate = can_hibernate;                    
}

MyProc究竟从哪里来呢？在proc.h中有说明：

/*
 * Each backend has a PGPROC struct in shared memory.  There is also a list of
 * currently-unused PGPROC structs that will be reallocated to new backends.
 *
 * links: list link for any list the PGPROC is in.    When waiting for a lock,
 * the PGPROC is linked into that lock's waitProcs queue.  A recycled PGPROC
 * is linked into ProcGlobal's freeProcs list.
 *
 * Note: twophase.c also sets up a dummy PGPROC struct for each currently
 * prepared transaction.  These PGPROCs appear in the ProcArray data structure
 * so that the prepared transactions appear to be still running and are
 * correctly shown as holding locks.  A prepared transaction PGPROC can be
 * distinguished from a real one at need by the fact that it has pid == 0.
 * The semaphore and lock-activity fields in a prepared-xact PGPROC are unused,
 * but its myProcLocks[] lists are valid.
 */
struct PGPROC
{
    /* proc->links MUST BE FIRST IN STRUCT (see ProcSleep,ProcWakeup,etc) */
    SHM_QUEUE    links;            /* list link if process is in a list */

    PGSemaphoreData sem;        /* ONE semaphore to sleep on */
    int            waitStatus;        /* STATUS_WAITING, STATUS_OK or STATUS_ERROR */

    Latch        procLatch;        /* generic latch for process */

    LocalTransactionId lxid;    /* local id of top-level transaction currently
                                 * being executed by this proc, if running;
                                 * else InvalidLocalTransactionId */
    int            pid;            /* Backend's process ID; 0 if prepared xact */
    int            pgprocno;

    /* These fields are zero while a backend is still starting up: */
    BackendId    backendId;        /* This backend's backend ID (if assigned) */
    Oid            databaseId;        /* OID of database this backend is using */
    Oid            roleId;            /* OID of role using this backend */

    /*
     * While in hot standby mode, shows that a conflict signal has been sent
     * for the current transaction. Set/cleared while holding ProcArrayLock,
     * though not required. Accessed without lock, if needed.
     */
    bool        recoveryConflictPending;

    /* Info about LWLock the process is currently waiting for, if any. */
    bool        lwWaiting;        /* true if waiting for an LW lock */
    uint8        lwWaitMode;        /* lwlock mode being waited for */
    struct PGPROC *lwWaitLink;    /* next waiter for same LW lock */

    /* Info about lock the process is currently waiting for, if any. */
    /* waitLock and waitProcLock are NULL if not currently waiting. */
    LOCK       *waitLock;        /* Lock object we're sleeping on ... */
    PROCLOCK   *waitProcLock;    /* Per-holder info for awaited lock */
    LOCKMODE    waitLockMode;    /* type of lock we're waiting for */
    LOCKMASK    heldLocks;        /* bitmask for lock types already held on this
                                 * lock object by this backend */

    /*
     * Info to allow us to wait for synchronous replication, if needed.
     * waitLSN is InvalidXLogRecPtr if not waiting; set only by user backend.
     * syncRepState must not be touched except by owning process or WALSender.
     * syncRepLinks used only while holding SyncRepLock.
     */
    XLogRecPtr    waitLSN;        /* waiting for this LSN or higher */
    int            syncRepState;    /* wait state for sync rep */
    SHM_QUEUE    syncRepLinks;    /* list link if process is in syncrep queue */

    /*
     * All PROCLOCK objects for locks held or awaited by this backend are
     * linked into one of these lists, according to the partition number of
     * their lock.
     */
    SHM_QUEUE    myProcLocks[NUM_LOCK_PARTITIONS];

    struct XidCache subxids;    /* cache for subtransaction XIDs */

    /* Per-backend LWLock.    Protects fields below. */
    LWLockId    backendLock;    /* protects the fields below */

    /* Lock manager data, recording fast-path locks taken by this backend. */
    uint64        fpLockBits;        /* lock modes held for each fast-path slot */
    Oid            fpRelId[FP_LOCK_SLOTS_PER_BACKEND];        /* slots for rel oids */
    bool        fpVXIDLock;        /* are we holding a fast-path VXID lock? */
    LocalTransactionId fpLocalTransactionId;    /* lxid for fast-path VXID
                                                 * lock */
};

/* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */


extern PGDLLIMPORT PGPROC *MyProc;
extern PGDLLIMPORT struct PGXACT *MyPgXact;

其中有一段说的：

/*
* Each backend has a PGPROC struct in shared memory. There is also a list of
* currently-unused PGPROC structs that will be reallocated to new backends.
*

也就是说，bgwriter 也好，wal writer 也罢，都有一个类似的结构在内存中，而各个后台进程，就是靠着这个内存结构进行着一些通信。

再看这个

rc = WaitLatch(&MyProc->procLatch,                    
                   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,        
                   BgWriterDelay /* ms */ );  
看unix 平台上的 unix_latch.c 代码：

/*
 * Wait for a given latch to be set, or for postmaster death, or until timeout
 * is exceeded. 'wakeEvents' is a bitmask that specifies which of those events
 * to wait for. If the latch is already set (and WL_LATCH_SET is given), the
 * function returns immediately.
 *
 * The 'timeout' is given in milliseconds. It must be >= 0 if WL_TIMEOUT flag
 * is given.  On some platforms, signals do not interrupt the wait, or even
 * cause the timeout to be restarted, so beware that the function can sleep
 * for several times longer than the requested timeout.  However, this
 * difficulty is not so great as it seems, because the signal handlers for any
 * signals that the caller should respond to ought to be programmed to end the
 * wait by calling SetLatch.  Ideally, the timeout parameter is vestigial.
 *
 * The latch must be owned by the current process, ie. it must be a
 * backend-local latch initialized with InitLatch, or a shared latch
 * associated with the current process by calling OwnLatch.
 *
 * Returns bit mask indicating which condition(s) caused the wake-up. Note
 * that if multiple wake-up conditions are true, there is no guarantee that
 * we return all of them in one call, but we will return at least one.
 */
int
WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
{
    return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
}

第一个参数就是 latch了，这个是必须的。第二个就是事件，第三个是超时参数。

[作者：技术者高健@博客园 mail: luckyjackgao@gmail.com ]

结束