剖析虚幻渲染体系(06)- UE5特辑Part 2(Lumen和其它)

6.5 Lumen

6.5.1 Lumen技术特性

6.2.2.2 Lumen全局动态光照小节已经简介过Lumen的特性,包含间接光照明、天空光、自发光照明、软硬阴影、反射等,本节将更加详细地介绍其技术特性。

首先需要阐明的是,Lumen是综合使用了多种技术的结合体,而非单一技术的运用。比如,Lumen默认使用有符号距离场(SDF)的软光追,但是当硬件光线追踪被启用时,可以在支持的显卡上实现更高的质量。

下面将Lumen涉及的主要技术点罗列出来。

6.5.1.1 表面缓存(Surface Cache)

Lumen会为场景表面的附近生成自动化参数,被称为表面缓存(Surface Cache),表面缓存用于快速查询场景中射线命中点的光照。Lumen会为每个网格从多角度捕捉材质属性,这些捕捉位置被称为Cards,是逐网格被离线生成的。通过控制台参数r.Lumen.Visualize.CardPlacement 1可以查看Lumen Cards的可视化效果:

上:正常渲染画面;下:Lumen Card可视化。

Nanite加速了网格捕捉,用于保持Surface Cache与三角形场景同步。特别是高面数的网格,需要使用Nanite来获得高效捕捉。

当Surface Cache被材质属性填充后,Lumen计算这些表面位置的直接和间接照明。这些更新在多个帧上摊销,为许多动态灯光和多反弹的全局照明提供有效的支持。

只有内部简单的网格可以被支持,如墙壁、地板和天花板,它们应该各自使用单独的网格,而不应该合成一个大网格。

6.5.1.2 屏幕追踪(Screen Tracing)

Lumen的特点是先对屏幕进行追踪(称为屏幕追踪或屏幕空间追踪),如果没有击中,或者光线经过表面后,就使用更可靠的方法。

使用屏幕追踪的缺点是,它极大地限制了艺术家的控制,导致只适用于间接照明,如Indirect lighting Scale、Emissive Boost等光照属性。

件光线追踪首先使用屏幕追踪,然后再使用其它开销更大的追踪选项。如果屏幕追踪被禁用于GI和反射,将会看见只有Lumen场景。屏幕跟踪支持任何几何类型,并有助于掩盖Lumen场景和三角形场景之间的不匹配现象。

使用r.Lumen.ScreenProbeGather.ScreenTraces 0|1开启或关闭屏幕追踪,以查看场景的对比效果:

上:开启了Lumen屏幕追踪的效果;下:关闭Lumen屏幕追踪的效果。可知在反射上差别最明显,其次是部分间接光。

6.5.1.3 Lumen光线追踪

Lumen支持两种光线追踪模式:

1、软件光线追踪。可以在最广泛的硬件和平台上运行。

2、硬件光线追踪。需要显卡和操作系统支持。

  • 软件光线追踪

Lumen默认使用依赖有向距离场的软件光线追踪,这意味着可以运行于支持SM5的硬件上。

需要在工程设置中开启生成网格距离场(Generate Mesh Distance Fields),UE5默认已开启。

渲染器会合并网格的距离场到一个全局距离场(Global Distance Field)以加速追踪。默认情况下,Lumen追踪每一个网格距离场的前两米的准确性,其它距离的射线则使用合并的全局距离场。如果项目需要精确控制Lumen软光追,则可以在项目设置中使用的软件光线追踪模式的方法:

细节追踪(Detail Tracing)是默认的追踪方法,可以利用单独的网格距离场来达到高质量的GI(前两米才使用,其它距离用全局距离场)。全局追踪(Global Tracing)利用全局距离场来快速追踪,但会损失一定的画质效果。

网格距离场会根据摄像机在世界的移动而动态流式加载或卸载。它们会被打包成一个图集(Atlas),可以通过控制台命令r.DistanceFields.LogAtlasStats 1输出信息:

由于Lumen的软光追的质量非常依赖网格距离场,所以关注网格距离场的质量可以提升Lumen的GI效果。下图是现实网格距离场和全局距离场的菜单:

下面两图分别是网格距离场和全局距离场可视化:

但是,软件光线追踪存在着诸多限制,主要有:

  • 几何物体限制:

    • Lumen场景只支持静态网格、实例化静态网格、层级实例化静态网格(Hierarchical Instanced Static Meshe)。
    • 不支持地貌几何体,因此它们没有间接反射光。未来将会支持。
  • 材质限制:

    • 不支持世界位置偏移(WPO)。
    • 不支持透明物体,视Masked物体为不透明物体。
    • 距离场数据的构建基于静态网格资产的材质属性,而不是覆盖的组件(override component)。意味着运行时改变材质不会影响到Lumen的GI。
  • 工作流限制:

    • 软件光线追踪要求层级是由模块组成。墙壁、地板和天花板应该是独立的网格。较大的网格(如山)将有不良的表现,并可能导致自遮挡伪阴影。
    • 墙壁应大于10厘米,以避免漏光。
    • 距离场的分辨率依赖静态网格导入时的设置,如果压缩率过高,将得不到高质量的距离场数据。
    • 距离场无法表达很薄的物体。

上面已经阐述完Lumen的软件光追,下面继续介绍其硬件光追。

  • 硬件光线追踪

硬件光线追踪比软件光线追踪支持更大范围的几何物体类型,特别是它支持追踪蒙皮网格。硬件光线追踪也能更好地获得更高的画面质量:它与实际的三角形相交,并有选择地来评估光线击中点的照明,而不是较低质量的Surface Cache。

然而,硬件光线追踪的场景设置成本很高,目前还无法扩展到实例数超过10万的场景。动态变形网格(如蒙皮网格)也会导致更新每一帧的光线追踪加速结构的巨大成本,该成本与蒙皮三角形的数量成正比。

对于使用Nanite的静态网格,硬件光线追踪为了渲染效率,只能在静态网格编辑器设置中Nanite的Proxy Triangle Percent生成的代理网格(Proxy Mesh)上操作。这些Proxy Mesh可以通过控制台命令r.Nanite 0|1来开关可视化:

上:全精度细节的三角形网格;下:对应的Nanite代理网格。

屏幕追踪用于掩盖Nanite渲染的全精度三角形网格和Lumen射线追踪的代理网格之间的不匹配。然而,在某些情况下,不匹配太大而无法掩盖。上面两图就是因为Proxy Triangle Percent数值太小,导致了自阴影的瑕疵。

Lumen只有在满足以下条件时才启用硬件光线追踪:

  • 工程设置里开启了Use Hardware Ray Tracing when availableSupport Hardware Ray Tracing
  • 工程运行于支持的操作系统、RHI和显卡。目前仅以下平台支持硬件光追:
    • 带DirectX 12的Windows10。
    • PlayStation 5。
    • Xbox系列S / X。
    • 显卡必须NVIDIA RTX-2000系列及以上,或者AMD RX 6000系列及以上。

6.5.1.4 Lumen其它说明

Lumen场景运行于摄像机附近的世界,而不是整个世界,实现了大世界和流数据。Lumen依赖于Nanite的LOD和多视图光栅化来快速捕捉场景,以维护Surface Cache,并控制所有操作以防止出现错误。Lumen不需要Nanite来操作,但是在没有启用Nanite的场景中,Lumen的场景捕捉会变得非常慢。如果资产没有良好的LOD设置,这种情况尤其严重。

Lumen的Surface Cache覆盖了距离摄像头200米的位置。在此之后的范围,只有屏幕追踪对于全局照明是开启的。

此外,Lumen还存在其它限制:

  • Lumen全局光照不能和光照图(Lightmap)一起使用。未来,Lumen的反射应该被扩展到和Lightmap中使用全局照明,这将进一步提升渲染质量。
  • 植物还不能被很好地支持,因为严重依赖于下采样渲染和时间滤波器。
  • Lumen的最后收集(Final Gather)会在移动物体周围添加显著的噪点,目前仍在积极开发中。
  • 透明材质还不支持Lumen反射。
  • 透明材质没有高质量的动态GI。

以下是Lumen相关的调试或可视化信息:

上:正常画面;中:Lumen Scene可视化;下:Lumen GI可视化。

当然,除了以上出现的几个可视化选项,实际上Lumen还有很多其它可视化控制命令:

r.Lumen.RadianceCache.Visualize    
r.Lumen.RadianceCache.VisualizeClipmapIndex
r.Lumen.RadianceCache.VisualizeProbeRadius
r.Lumen.RadianceCache.VisualizeRadiusScale

r.Lumen.ScreenProbeGather.VisualizeTraces    
r.Lumen.ScreenProbeGather.VisualizeTracesFreeze

r.Lumen.Visualize.CardInterpolateInfluenceRadius    
r.Lumen.Visualize.CardPlacement
r.Lumen.Visualize.CardPlacementDistance    
r.Lumen.Visualize.CardPlacementIndex    
r.Lumen.Visualize.CardPlacementOrientation    
r.Lumen.Visualize.ClipmapIndex    
r.Lumen.Visualize.ConeAngle    
r.Lumen.Visualize.ConeStepFactor
r.Lumen.Visualize.GridPixelSize    
r.Lumen.Visualize.HardwareRayTracing
r.Lumen.Visualize.HardwareRayTracing.DeferredMaterial    
r.Lumen.Visualize.HardwareRayTracing.DeferredMaterial.TileDimension
r.Lumen.Visualize.HardwareRayTracing.LightingMode
r.Lumen.Visualize.HardwareRayTracing.MaxTranslucentSkipCount
r.Lumen.Visualize.MaxMeshSDFTraceDistance
r.Lumen.Visualize.MaxTraceDistance    
r.Lumen.Visualize.MinTraceDistance    
r.Lumen.Visualize.Stats    
r.Lumen.Visualize.TraceMeshSDFs    
r.Lumen.Visualize.TraceRadianceCache
r.Lumen.Visualize.VoxelFaceIndex
r.Lumen.Visualize.Voxels
r.Lumen.Visualize.VoxelStepFactor    

ShowFlag.LumenGlobalIllumination
ShowFlag.LumenReflections
ShowFlag.VisualizeLumenIndirectDiffuse
ShowFlag.VisualizeLumenScene

此外,还有很多控制命令,以下显示部分命令:

r.Lumen.DiffuseIndirect.Allow
r.Lumen.DiffuseIndirect.CardInterpolateInfluenceRadius
r.Lumen.DiffuseIndirect.CardTraceEndDistanceFromCamera    

r.Lumen.DirectLighting    
r.Lumen.DirectLighting.BatchSize    
r.Lumen.DirectLighting.CardUpdateFrequencyScale    

r.Lumen.HardwareRayTracing
r.Lumen.HardwareRayTracing.PullbackBias
r.Lumen.IrradianceFieldGather
r.Lumen.IrradianceFieldGather.ClipmapDistributionBase
r.Lumen.IrradianceFieldGather.ClipmapWorldExtent

r.Lumen.MaxConeSteps
r.Lumen.MaxTraceDistance
r.Lumen.ProbeHierarchy
r.Lumen.ProbeHierarchy.AdditionalSpecularRayThreshold
r.Lumen.ProbeHierarchy.AntiTileAliasing

r.Lumen.RadianceCache.DownsampleDistanceFromCamera
r.Lumen.RadianceCache.ForceFullUpdate    
r.Lumen.RadianceCache.NumFramesToKeepCachedProbes    

r.Lumen.Radiosity    
r.Lumen.Radiosity.CardUpdateFrequencyScale    
r.Lumen.Radiosity.ComputeScatter    
r.Lumen.Radiosity.ConeAngleScale

r.Lumen.Reflections.Allow
r.Lumen.Reflections.DownsampleFactor    
r.Lumen.Reflections.GGXSamplingBias    
r.Lumen.Reflections.HardwareRayTracing
r.Lumen.Reflections.HardwareRayTracing.DeferredMaterial

r.Lumen.Reflections.HierarchicalScreenTraces.UncertainTraceRelativeDepthThreshold
r.Lumen.Reflections.MaxRayIntensity
r.Lumen.Reflections.MaxRoughnessToTrace    
r.Lumen.Reflections.RoughnessFadeLength    
r.Lumen.Reflections.ScreenSpaceReconstruction

r.Lumen.Reflections.ScreenTraces
r.Lumen.Reflections.Temporal
r.Lumen.Reflections.Temporal.DistanceThreshold
r.Lumen.Reflections.Temporal.HistoryWeight
r.Lumen.Reflections.TraceMeshSDFs

r.Lumen.ScreenProbeGather
r.Lumen.ScreenProbeGather.AdaptiveProbeAllocationFraction
r.Lumen.ScreenProbeGather.AdaptiveProbeMinDownsampleFactor
r.Lumen.ScreenProbeGather.DiffuseIntegralMethod
r.Lumen.ScreenProbeGather.DownsampleFactor
r.Lumen.ScreenProbeGather.FixedJitterIndex
r.Lumen.ScreenProbeGather.FullResolutionJitterWidth
r.Lumen.ScreenProbeGather.GatherNumMips
r.Lumen.ScreenProbeGather.GatherOctahedronResolutionScale
r.Lumen.ScreenProbeGather.HardwareRayTracing

r.Lumen.ScreenProbeGather.ImportanceSample.ProbeRadianceHistory
r.Lumen.ScreenProbeGather.MaxRayIntensity
r.Lumen.ScreenProbeGather.OctahedralSolidAngleTextureSize
r.Lumen.ScreenProbeGather.RadianceCache
r.Lumen.ScreenProbeGather.RadianceCache.ClipmapDistributionBase

r.Lumen.ScreenProbeGather.ReferenceMode
r.Lumen.ScreenProbeGather.ScreenSpaceBentNormal
r.Lumen.ScreenProbeGather.ScreenTraces
r.Lumen.ScreenProbeGather.ScreenTraces.HZBTraversal

r.Lumen.ScreenProbeGather.SpatialFilterHalfKernelSize    Experimental
r.Lumen.ScreenProbeGather.SpatialFilterMaxRadianceHitAngle

r.Lumen.ScreenProbeGather.Temporal    
r.Lumen.ScreenProbeGather.Temporal.ClearHistoryEveryFrame    

r.Lumen.ScreenProbeGather.TraceMeshSDFs    
r.Lumen.ScreenProbeGather.TracingOctahedronResolution
r.Lumen.TraceMeshSDFs
r.Lumen.TraceMeshSDFs.Allow    
r.Lumen.TranslucencyVolume.ConeAngleScale    
r.Lumen.TranslucencyVolume.Enable    
r.Lumen.TranslucencyVolume.EndDistanceFromCamera    

r.LumenParallelBeginUpdate
r.LumenScene.CardAtlasAllocatorBinSize    
r.LumenScene.CardAtlasSize    
r.LumenScene.CardCameraDistanceTexelDensityScale
r.LumenScene.CardCaptureMargin

r.LumenScene.ClipmapResolution    
r.LumenScene.ClipmapWorldExtent    
r.LumenScene.ClipmapZResolutionDivisor    
r.LumenScene.DiffuseReflectivityOverride    
r.LumenScene.DistantScene
r.LumenScene.DistantScene.CardResolution    

r.LumenScene.FastCameraMode
r.LumenScene.GlobalDFClipmapExtent    
r.LumenScene.GlobalDFResolution    
r.LumenScene.HeightfieldSlopeThreshold    
r.LumenScene.MaxInstanceAddsPerFrame
r.LumenScene.MeshCardsCullFaces    
r.LumenScene.MeshCardsMaxLOD

r.LumenScene.NaniteMultiViewCapture    
r.LumenScene.NumClipmapLevels    
r.LumenScene.PrimitivesPerPacket
r.LumenScene.RecaptureEveryFrame    
r.LumenScene.Reset
r.LumenScene.UploadCardBufferEveryFrame    
r.LumenScene.VoxelLightingAverageObjectsPerVisBufferTile

r.SSGI.AllowStandaloneLumenProbeHierarchy
r.Water.SingleLayer.LumenReflections

Lumen相关的控制台指令达到上百个,由此可知Lumen渲染的复杂度有多高!!

6.5.2 Lumen渲染基础

本节将阐述Lumen相关的基础概念和类型。

6.5.2.1 FLumenCard

FLumenCard就是上一小节提及的Card,是FLumenMeshCards的基本组成元素。

// EngineSourceRuntimeRendererPrivateLumenLumenSceneData.h

// Lumen卡片类型。
class FLumenCard
{
public:
    FLumenCard();
    ~FLumenCard();

    // 世界空间的包围盒.
    FBox WorldBounds;
    // 旋转信息.
    FVector LocalToWorldRotationX;
    FVector LocalToWorldRotationY;
    FVector LocalToWorldRotationZ;
    // 位置.
    FVector Origin;
    // 局部空间的包围盒.
    FVector LocalExtent;
    
    // 是否可见.
    bool bVisible = false;
    // 是否处于远景.
    bool bDistantScene = false;

    // 所在图集的信息.
    bool bAllocated = false;
    FIntPoint DesiredResolution;
    FIntRect AtlasAllocation;

    // 朝向
    int32 Orientation = -1;
    // 在可见列表的索引.
    int32 IndexInVisibleCardIndexBuffer = -1;
    // 所在的FLumenMeshCards的Card列表的索引.
    int32 IndexInMeshCards = -1;
    // 所在的FLumenMeshCards的索引.
    int32 MeshCardsIndex = -1;
    // 分辨率缩放.
    float ResolutionScale = 1.0f;

    // 初始化
    void Initialize(float InResolutionScale, const FMatrix& LocalToWorld, const FLumenCardBuildData& CardBuildData, int32 InIndexInMeshCards, int32 InMeshCardsIndex);

    // 设置变换数据
    void SetTransform(const FMatrix& LocalToWorld, FVector CardLocalCenter, FVector CardLocalExtent, int32 InOrientation);
    void SetTransform(const FMatrix& LocalToWorld, const FVector& LocalOrigin, const FVector& CardToLocalRotationX, const FVector& CardToLocalRotationY, const FVector& CardToLocalRotationZ, const FVector& InLocalExtent);

    // 从图集(场景)中删除.
    void RemoveFromAtlas(FLumenSceneData& LumenSceneData);

    int32 GetNumTexels() const
    {
        return AtlasAllocation.Area();
    }

    inline FVector TransformWorldPositionToCardLocal(FVector WorldPosition) const
    {
        FVector Offset = WorldPosition - Origin;
        return FVector(Offset | LocalToWorldRotationX, Offset | LocalToWorldRotationY, Offset | LocalToWorldRotationZ);
    }

    inline FVector TransformCardLocalPositionToWorld(FVector CardPosition) const
    {
        return Origin + CardPosition.X * LocalToWorldRotationX + CardPosition.Y * LocalToWorldRotationY + CardPosition.Z * LocalToWorldRotationZ;
    }
};

6.5.2.2 FLumenMeshCards

FLumenMeshCards是计算Surface Cache的基本元素,也是构成Lumen Scene的基本单元。它最多可存储6个面(朝向)的FLumenCard信息,每个朝向可存储0~N个FLumenCard信息(由NumCardsPerOrientation指定)。

// EngineSourceRuntimeRendererPrivateLumenLumenMeshCards.h

class FLumenMeshCards
{
public:
    // 初始化.
    void Initialize(
        const FMatrix& InLocalToWorld, 
        const FBox& InBounds,
        uint32 InFirstCardIndex,
        uint32 InNumCards,
        uint32 InNumCardsPerOrientation[6],
        uint32 InCardOffsetPerOrientation[6])
    {
        Bounds = InBounds;
        SetTransform(InLocalToWorld);
        FirstCardIndex = InFirstCardIndex;
        NumCards = InNumCards;

        for (uint32 OrientationIndex = 0; OrientationIndex < 6; ++OrientationIndex)
        {
            NumCardsPerOrientation[OrientationIndex] = InNumCardsPerOrientation[OrientationIndex];
            CardOffsetPerOrientation[OrientationIndex] = InCardOffsetPerOrientation[OrientationIndex];
        }
    }

    // 设置变换矩阵.
    void SetTransform(const FMatrix& InLocalToWorld)
    {
        LocalToWorld = InLocalToWorld;
    }

    // 局部到世界的矩阵.
    FMatrix LocalToWorld;
    // 局部包围盒.
    FBox Bounds;

    // 第一个FLumenCard索引.
    uint32 FirstCardIndex = 0;
    // FLumenCard数量.
    uint32 NumCards = 0;
    // 6个朝向的FLumenCard数量.
    uint32 NumCardsPerOrientation[6];
    // 6个朝向的FLumenCard偏移.
    uint32 CardOffsetPerOrientation[6];
};

6.5.2.3 FLumenSceneData

FLumenSceneData就是Lumen实现全局光照的场景代表,它使用的不是Nanite的高精度网格,而是基于FLumenCard和FLumenMeshCards为基本元素的粗糙的场景。其定义及相关类型如下:

// EngineSourceRuntimeRendererPrivateLumenLumenSceneData.h

// Lumen图元实例
class FLumenPrimitiveInstance
{
public:
    FBox WorldSpaceBoundingBox;
    // FLumenMeshCards索引.
    int32 MeshCardsIndex;
    bool bValidMeshCards;
};

// Lumen图元
class FLumenPrimitive
{
public:
    // 世界空间包围盒.
    FBox WorldSpaceBoundingBox;
    // 属于此图元的FLumenMeshCards的最大包围盒, 用于早期剔除.
    float MaxCardExtent;

    // 图元实例列表.
    TArray<FLumenPrimitiveInstance, TInlineAllocator<1>> Instances;

    // 对应的真实场景的图元信息.
    FPrimitiveSceneInfo* Primitive = nullptr;

    // 是否合并的实例.
    bool bMergedInstances = false;
    // 卡片分辨率缩放.
    float CardResolutionScale = 1.0f;
    // FLumenMeshCards的数量.
    int32 NumMeshCards = 0;

    // 映射到LumenDFInstanceToDFObjectIndex.
    uint32 LumenDFInstanceOffset = UINT32_MAX;
    int32 LumenNumDFInstances = 0;

    // 获取FLumenMeshCards索引.
    int32 GetMeshCardsIndex(int32 InstanceIndex) const
    {
        if (bMergedInstances)
        {
            return Instances[0].MeshCardsIndex;
        }

        if (InstanceIndex < Instances.Num())
        {
            return Instances[InstanceIndex].MeshCardsIndex;
        }

        return -1;
    }
};

// Lumen场景数据.
class FLumenSceneData
{
public:
    int32 Generation;

    // 上传GPU的缓冲.
    FScatterUploadBuffer CardUploadBuffer;
    FScatterUploadBuffer UploadMeshCardsBuffer;
    FScatterUploadBuffer ByteBufferUploadBuffer;
    FScatterUploadBuffer UploadPrimitiveBuffer;

    FUniqueIndexList CardIndicesToUpdateInBuffer;
    FRWBufferStructured CardBuffer;

    TArray<FBox> PrimitiveModifiedBounds;

    // Lumen场景的所有Lumen图元.
    TArray<FLumenPrimitive> LumenPrimitives;

    // FLumenMeshCards数据.
    FUniqueIndexList MeshCardsIndicesToUpdateInBuffer;
    TSparseSpanArray<FLumenMeshCards> MeshCards;
    TSparseSpanArray<FLumenCard> Cards;
    TArray<int32, TInlineAllocator<8>> DistantCardIndices;
    FRWBufferStructured MeshCardsBuffer;
    FRWByteAddressBuffer DFObjectToMeshCardsIndexBuffer;

    // 从图元映射到LumenDFInstance.
    FUniqueIndexList PrimitivesToUpdate;
    FRWByteAddressBuffer PrimitiveToDFLumenInstanceOffsetBuffer;
    uint32 PrimitiveToLumenDFInstanceOffsetBufferSize = 0;

    // 从LumenDFInstance映射到DFObjectIndex
    FUniqueIndexList DFObjectIndicesToUpdateInBuffer;
    FUniqueIndexList LumenDFInstancesToUpdate;
    TSparseSpanArray<int32> LumenDFInstanceToDFObjectIndex;
    FRWByteAddressBuffer LumenDFInstanceToDFObjectIndexBuffer;
    uint32 LumenDFInstanceToDFObjectIndexBufferSize = 0;

    // 可见的FLumenMeshCards列表.
    TArray<int32> VisibleCardsIndices;
    TRefCountPtr<FRDGPooledBuffer> VisibleCardsIndexBuffer;

    // --- 从三角形场景中捕获的数据 ---
    TRefCountPtr<IPooledRenderTarget> AlbedoAtlas;
    TRefCountPtr<IPooledRenderTarget> NormalAtlas;
    TRefCountPtr<IPooledRenderTarget> EmissiveAtlas;

    // --- 生成的数据 ---
    TRefCountPtr<IPooledRenderTarget> DepthAtlas;
    TRefCountPtr<IPooledRenderTarget> FinalLightingAtlas;
    TRefCountPtr<IPooledRenderTarget> IrradianceAtlas;
    TRefCountPtr<IPooledRenderTarget> IndirectIrradianceAtlas;
    TRefCountPtr<IPooledRenderTarget> RadiosityAtlas;
    TRefCountPtr<IPooledRenderTarget> OpacityAtlas;

    // 其它数据.
    bool bFinalLightingAtlasContentsValid;
    FIntPoint MaxAtlasSize;
    FBinnedTextureLayout AtlasAllocator;
    int32 NumCardTexels = 0;
    int32 NumMeshCardsToAddToSurfaceCache = 0;

    // 增删图元数据.
    bool bTrackAllPrimitives;
    TSet<FPrimitiveSceneInfo*> PendingAddOperations;
    TSet<FPrimitiveSceneInfo*> PendingUpdateOperations;
    TArray<FLumenPrimitiveRemoveInfo> PendingRemoveOperations;

    FLumenSceneData(EShaderPlatform ShaderPlatform, EWorldType::Type WorldType);
    ~FLumenSceneData();

    // 增删图元操作.
    void AddPrimitiveToUpdate(int32 PrimitiveIndex);
    void AddPrimitive(FPrimitiveSceneInfo* InPrimitive);
    void UpdatePrimitive(FPrimitiveSceneInfo* InPrimitive);
    void RemovePrimitive(FPrimitiveSceneInfo* InPrimitive, int32 PrimitiveIndex);

    // 增删FLumenMeshCards.
    void AddCardToVisibleCardList(int32 CardIndex);
    void RemoveCardFromVisibleCardList(int32 CardIndex);
    void AddMeshCards(int32 LumenPrimitiveIndex, int32 LumenInstanceIndex);
    void UpdateMeshCards(const FMatrix& LocalToWorld, int32 MeshCardsIndex, const FMeshCardsBuildData& MeshCardsBuildData);
    void RemoveMeshCards(FLumenPrimitive& LumenPrimitive, FLumenPrimitiveInstance& LumenPrimitiveInstance);

    bool HasPendingOperations() const
    {
        return PendingAddOperations.Num() > 0 || PendingUpdateOperations.Num() > 0 || PendingRemoveOperations.Num() > 0;
    }

    void UpdatePrimitiveToDistanceFieldInstanceMapping(FScene& Scene, FRHICommandListImmediate& RHICmdList);

private:
    // 从构建数据增加FLumenMeshCards.
    int32 AddMeshCardsFromBuildData(const FMatrix& LocalToWorld, const FMeshCardsBuildData& MeshCardsBuildData, float ResolutionScale);
};

由此可知,FLumenSceneData存储着FLumenMeshCards以及以FLumenMeshCards为基础的图元FLumenPrimitive和图元实例FLumenPrimitiveInstance。每个FLumenPrimitive又存储着若干个FLumenMeshCards,同时存储了一个FPrimitiveSceneInfo指针,标明它是真实世界哪个FPrimitiveSceneInfo的粗糙代表。

6.5.3 Lumen数据构建

Lumen在正在渲染之前,会执行很多数据构建,包含生成Mesh Distance Field、Global Distance Field以及MeshCard。

首次启动Lumen工程时,会构建很多数据,包含网格距离场等。

6.5.3.1 CardRepresentation

为了构建网格卡片代表,UE5独立出了MeshCardRepresentation模块,其核心概念和类型如下:

// EngineSourceRuntimeEnginePublicMeshCardRepresentation.h

// FLumenCard构建数据
class FLumenCardBuildData
{
public:
    // 中心和包围盒.
    FVector Center;
    FVector Extent;

    // 朝向顺序: -X, +X, -Y, +Y, -Z, +Z
    int32 Orientation;
    int32 LODLevel;

    // 根据朝向旋转Extent.
    static FVector TransformFaceExtent(FVector Extent, int32 Orientation)
    {
        if (Orientation / 2 == 2) // 朝向: -Z, +Z
        {
            return FVector(Extent.Y, Extent.X, Extent.Z);
        }
        else if (Orientation / 2 == 1) // 朝向: -Y, +Y
        {
            return FVector(Extent.Z, Extent.X, Extent.Y);
        }
        else // (Orientation / 2 == 0), 朝向: -X, +X
        {
            return FVector(Extent.Y, Extent.Z, Extent.X);
        }
    }
};

// FLumenMeshCards构建数据.
class FMeshCardsBuildData
{
public:
    FBox Bounds;
    int32 MaxLODLevel;
    // FLumenCard构建数据列表.
    TArray<FLumenCardBuildData> CardBuildData;

    (......)
};

// 每个卡片表示数据实例的唯一id。
class FCardRepresentationDataId
{
public:
    uint32 Value = 0;

    bool IsValid() const
    {
        return Value != 0;
    }

    bool operator==(FCardRepresentationDataId B) const
    {
        return Value == B.Value;
    }

    friend uint32 GetTypeHash(FCardRepresentationDataId DataId)
    {
        return GetTypeHash(DataId.Value);
    }
};

// 卡片代表网格构建过程的有效负载和输出数据.
class FCardRepresentationData : public FDeferredCleanupInterface
{
public:
    // 网格卡片构建数据和ID.
    FMeshCardsBuildData MeshCardsBuildData;
    FCardRepresentationDataId CardRepresentationDataId;

    (......)

#if WITH_EDITORONLY_DATA
    // 缓存卡片代表的数据.
    void CacheDerivedData(const FString& InDDCKey, const ITargetPlatform* TargetPlatform, UStaticMesh* Mesh, UStaticMesh* GenerateSource, bool bGenerateDistanceFieldAsIfTwoSided, FSourceMeshDataForDerivedDataTask* OptionalSourceMeshData);
#endif
};

// 构建任务
class FAsyncCardRepresentationTaskWorker : public FNonAbandonableTask
{
public:
    (.....)
    
    void DoWork();

private:
    FAsyncCardRepresentationTask& Task;
};

// 构建任务数据载体.
class FAsyncCardRepresentationTask
{
public:
    bool bSuccess = false;

#if WITH_EDITOR
    TArray<FSignedDistanceFieldBuildMaterialData> MaterialBlendModes;
#endif

    FSourceMeshDataForDerivedDataTask SourceMeshData;
    bool bGenerateDistanceFieldAsIfTwoSided = false;
    UStaticMesh* StaticMesh = nullptr;
    UStaticMesh* GenerateSource = nullptr;
    FString DDCKey;
    FCardRepresentationData* GeneratedCardRepresentation;
    TUniquePtr<FAsyncTask<FAsyncCardRepresentationTaskWorker>> AsyncTask = nullptr;
};

// 管理网格距离场的异步构建的类型.
class FCardRepresentationAsyncQueue : public FGCObject
{
public:
    // 增加新的构建任务.
    ENGINE_API void AddTask(FAsyncCardRepresentationTask* Task);
    
    // 处理异步任务.
    ENGINE_API void ProcessAsyncTasks(bool bLimitExecutionTime = false);
    
    // 取消构建.
    ENGINE_API void CancelBuild(UStaticMesh* StaticMesh);
    ENGINE_API void CancelAllOutstandingBuilds();

    // 阻塞构建任务.
    ENGINE_API void BlockUntilBuildComplete(UStaticMesh* StaticMesh, bool bWarnIfBlocked);
    ENGINE_API void BlockUntilAllBuildsComplete();

    (......)
};

// 全局构建队列.
extern ENGINE_API FCardRepresentationAsyncQueue* GCardRepresentationAsyncQueue;

extern ENGINE_API FString BuildCardRepresentationDerivedDataKey(const FString& InMeshKey);

extern ENGINE_API void BeginCacheMeshCardRepresentation(const ITargetPlatform* TargetPlatform, UStaticMesh* StaticMeshAsset, class FStaticMeshRenderData& RenderData, const FString& DistanceFieldKey, FSourceMeshDataForDerivedDataTask* OptionalSourceMeshData);

6.5.3.2 GCardRepresentationAsyncQueue

为了构建Lumen需要的数据,UE5声明了两个全局队列变量:GCardRepresentationAsyncQueueGDistanceFieldAsyncQueue,前者用于Lumen Card的数据构建,后者用于距离场的数据构建。它们的创建和更新逻辑如下:

// EngineSourceRuntimeLaunchPrivateLaunchEngineLoop.cpp

int32 FEngineLoop::PreInitPreStartupScreen(const TCHAR* CmdLine)
{
    (......)
    
    if (!FPlatformProperties::RequiresCookedData())
    {
        (......)
        
        // 创建全局异步队列.
        GDistanceFieldAsyncQueue = new FDistanceFieldAsyncQueue();
        GCardRepresentationAsyncQueue = new FCardRepresentationAsyncQueue();

        (......)
    }
    
    (......)
}

void FEngineLoop::Tick()
{
    (......)
    
    // 每帧更新全局异步队列.
    if (GDistanceFieldAsyncQueue)
    {
        QUICK_SCOPE_CYCLE_COUNTER(STAT_FEngineLoop_Tick_GDistanceFieldAsyncQueue);
        GDistanceFieldAsyncQueue->ProcessAsyncTasks();
    }
    if (GCardRepresentationAsyncQueue)
    {
        QUICK_SCOPE_CYCLE_COUNTER(STAT_FEngineLoop_Tick_GCardRepresentationAsyncQueue);
        GCardRepresentationAsyncQueue->ProcessAsyncTasks();
    }
    
    (......)
}

由于GDistanceFieldAsyncQueue是UE4就存在的类型,本节将忽略之,将精力放在GCardRepresentationAsyncQueue上。

对于CardRepresentation加入到全局构建队列GCardRepresentationAsyncQueue的时机,可在MeshCardRepresentation.cpp找到答案:

FCardRepresentationAsyncQueue* GCardRepresentationAsyncQueue = NULL;

// 开始缓存网格卡片代表.
void BeginCacheMeshCardRepresentation(const ITargetPlatform* TargetPlatform, UStaticMesh* StaticMeshAsset, FStaticMeshRenderData& RenderData, const FString& DistanceFieldKey, FSourceMeshDataForDerivedDataTask* OptionalSourceMeshData)
{
    static const auto CVarCards = IConsoleManager::Get().FindTConsoleVariableDataInt(TEXT("r.MeshCardRepresentation"));

    if (CVarCards->GetValueOnAnyThread() != 0)
    {
        FString Key = BuildCardRepresentationDerivedDataKey(DistanceFieldKey);
        if (RenderData.LODResources.IsValidIndex(0))
        {
            // 构建FCardRepresentationData实例.
            if (!RenderData.LODResources[0].CardRepresentationData)
            {
                RenderData.LODResources[0].CardRepresentationData = new FCardRepresentationData();
            }

            const FMeshBuildSettings& BuildSettings = StaticMeshAsset->GetSourceModel(0).BuildSettings;
            UStaticMesh* MeshToGenerateFrom = StaticMeshAsset;

            // 缓存FCardRepresentationData.
            RenderData.LODResources[0].CardRepresentationData->CacheDerivedData(Key, TargetPlatform, StaticMeshAsset, MeshToGenerateFrom, BuildSettings.bGenerateDistanceFieldAsIfTwoSided, OptionalSourceMeshData);
        }
    }
}

// 缓存FCardRepresentationData.
void FCardRepresentationData::CacheDerivedData(const FString& InDDCKey, const ITargetPlatform* TargetPlatform, UStaticMesh* Mesh, UStaticMesh* GenerateSource, bool bGenerateDistanceFieldAsIfTwoSided, FSourceMeshDataForDerivedDataTask* OptionalSourceMeshData)
{
    TArray<uint8> DerivedData;

    (......)
    {
        COOK_STAT(Timer.TrackCyclesOnly());
        
        // 创建新的构建任务FAsyncCardRepresentationTask.
        FAsyncCardRepresentationTask* NewTask = new FAsyncCardRepresentationTask;
        NewTask->DDCKey = InDDCKey;
        check(Mesh && GenerateSource);
        NewTask->StaticMesh = Mesh;
        NewTask->GenerateSource = GenerateSource;
        NewTask->GeneratedCardRepresentation = new FCardRepresentationData();
        NewTask->bGenerateDistanceFieldAsIfTwoSided = bGenerateDistanceFieldAsIfTwoSided;

        // 处理材质混合模式.
        for (int32 MaterialIndex = 0; MaterialIndex < Mesh->GetStaticMaterials().Num(); MaterialIndex++)
        {
            FSignedDistanceFieldBuildMaterialData MaterialData;
            // Default material blend mode
            MaterialData.BlendMode = BLEND_Opaque;
            MaterialData.bTwoSided = false;

            if (Mesh->GetStaticMaterials()[MaterialIndex].MaterialInterface)
            {
                MaterialData.BlendMode = Mesh->GetStaticMaterials()[MaterialIndex].MaterialInterface->GetBlendMode();
                MaterialData.bTwoSided = Mesh->GetStaticMaterials()[MaterialIndex].MaterialInterface->IsTwoSided();
            }

            NewTask->MaterialBlendModes.Add(MaterialData);
        }

        // Nanite材质用一个粗糙表示覆盖源静态网格。在构建网格SDF之前,需要加载原始数据。
        if (OptionalSourceMeshData)
        {
            NewTask->SourceMeshData = *OptionalSourceMeshData;
        }
        // 创建Nanite的粗糙代表.
        else if (Mesh->NaniteSettings.bEnabled)
        {
            IMeshBuilderModule& MeshBuilderModule = IMeshBuilderModule::GetForPlatform(TargetPlatform);
            if (!MeshBuilderModule.BuildMeshVertexPositions(Mesh, NewTask->SourceMeshData.TriangleIndices, NewTask->SourceMeshData.VertexPositions))
            {
                UE_LOG(LogStaticMesh, Error, TEXT("Failed to build static mesh. See previous line(s) for details."));
            }
        }

        // 加入全局队列GCardRepresentationAsyncQueue.
        GCardRepresentationAsyncQueue->AddTask(NewTask);
    }
}

6.5.3.3 GenerateCardRepresentationData

跟踪FCardRepresentationAsyncQueue的调用堆栈,不难查到其最终会进入FMeshUtilities::GenerateCardRepresentationData接口,此接口会执行具体的网格卡片构建逻辑:

// EngineSourceDeveloperMeshUtilitiesPrivateMeshCardRepresentationUtilities.cpp

bool FMeshUtilities::GenerateCardRepresentationData(
    FString MeshName,
    const FSourceMeshDataForDerivedDataTask& SourceMeshData,
    const FStaticMeshLODResources& LODModel,
    class FQueuedThreadPool& ThreadPool,
    const TArray<FSignedDistanceFieldBuildMaterialData>& MaterialBlendModes,
    const FBoxSphereBounds& Bounds,
    const FDistanceFieldVolumeData* DistanceFieldVolumeData,
    bool bGenerateAsIfTwoSided,
    FCardRepresentationData& OutData)
{
    // 构建Embree场景.
    FEmbreeScene EmbreeScene;
    MeshRepresentation::SetupEmbreeScene(MeshName,
        SourceMeshData,
        LODModel,
        MaterialBlendModes,
        bGenerateAsIfTwoSided,
        EmbreeScene);

    if (!EmbreeScene.EmbreeScene)
    {
        return false;
    }

    // 处理上下文.
    FGenerateCardMeshContext Context(MeshName, EmbreeScene.EmbreeScene, EmbreeScene.EmbreeDevice, OutData);
    // 构建网格卡片.
    BuildMeshCards(DistanceFieldVolumeData ? DistanceFieldVolumeData->LocalSpaceMeshBounds : Bounds.GetBox(), Context, OutData);

    MeshRepresentation::DeleteEmbreeScene(EmbreeScene);
    
    (......)

    return true;
}

由此可知,构建网格卡片过程使用了Embree第三方库。

关于Embree

Embree是由Intel开发维护的开源库,是一个高性能光线追踪内核的集合,帮助开发者提高逼真渲染的应用程序的性能。它的特性有高级头发几何体、运动模糊、动态场景、多关卡实例:

Embree的实现和技术有以下特点:

  • 内核为支持SSE、AVX、AVX2和AVX-512指令的最新Intel处理器进行了优化。
  • 支持运行时代码选择,以选择遍历和构建算法,以最佳匹配的CPU指令集。
  • 支持使用Intel SPMD程序编译器(ISPC)编写的应用程序,还提供了核心射线追踪算法的ISPC接口。
  • 包含针对非缓存一致的工作负载(如蒙特卡罗光线追踪算法)和缓存一致的工作负载(如主要可见性和硬阴影射线)优化的算法。

简而言之,Embree是基于CPU的高度优化的光线追踪渲染加速器,但不支持GPU的硬件加速。正是这个特点,Lumen的网格卡片构建时间主要取决于CPU的性能。

构建的核心逻辑位于BuildMeshCards

void BuildMeshCards(const FBox& MeshBounds, const FGenerateCardMeshContext& Context, FCardRepresentationData& OutData)
{
    static const auto CVarMeshCardRepresentationMinSurface = IConsoleManager::Get().FindTConsoleVariableDataFloat(TEXT("r.MeshCardRepresentation.MinSurface"));
    const float MinSurfaceThreshold = CVarMeshCardRepresentationMinSurface->GetValueOnAnyThread();

    // 确保生成的卡片包围盒不为空.
    const FVector MeshCardsBoundsCenter = MeshBounds.GetCenter();
    const FVector MeshCardsBoundsExtent = FVector::Max(MeshBounds.GetExtent() + 1.0f, FVector(5.0f));
    const FBox MeshCardsBounds(MeshCardsBoundsCenter - MeshCardsBoundsExtent, MeshCardsBoundsCenter + MeshCardsBoundsExtent);

    // 初始化部分输出数据.
    OutData.MeshCardsBuildData.Bounds = MeshCardsBounds;
    OutData.MeshCardsBuildData.MaxLODLevel = 1;
    OutData.MeshCardsBuildData.CardBuildData.Reset();

    // 处理采样和体素数据.
    const float SamplesPerWorldUnit = 1.0f / 10.0f;
    const int32 MinSamplesPerAxis = 4;
    const int32 MaxSamplesPerAxis = 64;
    FIntVector VolumeSizeInVoxels;
    VolumeSizeInVoxels.X = FMath::Clamp<int32>(MeshCardsBounds.GetSize().X * SamplesPerWorldUnit, MinSamplesPerAxis, MaxSamplesPerAxis);
    VolumeSizeInVoxels.Y = FMath::Clamp<int32>(MeshCardsBounds.GetSize().Y * SamplesPerWorldUnit, MinSamplesPerAxis, MaxSamplesPerAxis);
    VolumeSizeInVoxels.Z = FMath::Clamp<int32>(MeshCardsBounds.GetSize().Z * SamplesPerWorldUnit, MinSamplesPerAxis, MaxSamplesPerAxis);

    // 单个体素的大小.
    const FVector VoxelExtent = MeshCardsBounds.GetSize() / FVector(VolumeSizeInVoxels);

    // 随机在半球上生成射线方向.
    TArray<FVector4> RayDirectionsOverHemisphere;
    {
        FRandomStream RandomStream(0);
        MeshUtilities::GenerateStratifiedUniformHemisphereSamples(64, RandomStream, RayDirectionsOverHemisphere);
    }
    
    // 遍历6个朝向, 给每个朝向生成卡片数据.
    for (int32 Orientation = 0; Orientation < 6; ++Orientation)
    {
        // 初始化高度场和射线等数据.
        FIntPoint HeighfieldSize(0, 0);
        FVector RayDirection(0.0f, 0.0f, 0.0f);
        FVector RayOriginFrame = MeshCardsBounds.Min;
        FVector HeighfieldStepX(0.0f, 0.0f, 0.0f);
        FVector HeighfieldStepY(0.0f, 0.0f, 0.0f);
        float MaxRayT = 0.0f;
        int32 MeshSliceNum = 0;

        // 根据朝向调整高度场和射线数据.
        switch (Orientation / 2)
        {
            case 0: // 朝向: -X, +X
                MaxRayT = MeshCardsBounds.GetSize().X + 0.1f;
                MeshSliceNum = VolumeSizeInVoxels.X;
                HeighfieldSize.X = VolumeSizeInVoxels.Y;
                HeighfieldSize.Y = VolumeSizeInVoxels.Z;
                HeighfieldStepX = FVector(0.0f, MeshCardsBounds.GetSize().Y / HeighfieldSize.X, 0.0f);
                HeighfieldStepY = FVector(0.0f, 0.0f, MeshCardsBounds.GetSize().Z / HeighfieldSize.Y);
                break;

            case 1: // 朝向: -Y, +Y
                MaxRayT = MeshCardsBounds.GetSize().Y + 0.1f;
                MeshSliceNum = VolumeSizeInVoxels.Y;
                HeighfieldSize.X = VolumeSizeInVoxels.X;
                HeighfieldSize.Y = VolumeSizeInVoxels.Z;
                HeighfieldStepX = FVector(MeshCardsBounds.GetSize().X / HeighfieldSize.X, 0.0f, 0.0f);
                HeighfieldStepY = FVector(0.0f, 0.0f, MeshCardsBounds.GetSize().Z / HeighfieldSize.Y);
                break;

            case 2: // 朝向: -Z, +Z
                MaxRayT = MeshCardsBounds.GetSize().Z + 0.1f;
                MeshSliceNum = VolumeSizeInVoxels.Z;
                HeighfieldSize.X = VolumeSizeInVoxels.X;
                HeighfieldSize.Y = VolumeSizeInVoxels.Y;
                HeighfieldStepX = FVector(MeshCardsBounds.GetSize().X / HeighfieldSize.X, 0.0f, 0.0f);
                HeighfieldStepY = FVector(0.0f, MeshCardsBounds.GetSize().Y / HeighfieldSize.Y, 0.0f);
                break;
        }

        // 根据朝向调整射线方向.
        switch (Orientation)
        {
            case 0: 
                RayDirection.X = +1.0f; 
                break;

            case 1: 
                RayDirection.X = -1.0f; 
                RayOriginFrame.X = MeshCardsBounds.Max.X;
                break;

            case 2: 
                RayDirection.Y = +1.0f; 
                break;

            case 3: 
                RayDirection.Y = -1.0f; 
                RayOriginFrame.Y = MeshCardsBounds.Max.Y;
                break;

            case 4: 
                RayDirection.Z = +1.0f; 
                break;

            case 5: 
                RayDirection.Z = -1.0f; 
                RayOriginFrame.Z = MeshCardsBounds.Max.Z;
                break;

            default: 
                check(false);
        };

        TArray<TArray<FSurfacePoint, TInlineAllocator<16>>> HeightfieldLayers;
        HeightfieldLayers.SetNum(HeighfieldSize.X * HeighfieldSize.Y);

        // 填充表面点的数据.
        {
            TRACE_CPUPROFILER_EVENT_SCOPE(FillSurfacePoints);

            TArray<float> Heightfield;
            Heightfield.SetNum(HeighfieldSize.X * HeighfieldSize.Y);
            for (int32 HeighfieldY = 0; HeighfieldY < HeighfieldSize.Y; ++HeighfieldY)
            {
                for (int32 HeighfieldX = 0; HeighfieldX < HeighfieldSize.X; ++HeighfieldX)
                {
                    Heightfield[HeighfieldX + HeighfieldY * HeighfieldSize.X] = -1.0f;
                }
            }

            for (int32 HeighfieldY = 0; HeighfieldY < HeighfieldSize.Y; ++HeighfieldY)
            {
                for (int32 HeighfieldX = 0; HeighfieldX < HeighfieldSize.X; ++HeighfieldX)
                {
                    FVector RayOrigin = RayOriginFrame;
                    RayOrigin += (HeighfieldX + 0.5f) * HeighfieldStepX;
                    RayOrigin += (HeighfieldY + 0.5f) * HeighfieldStepY;

                    float StepTMin = 0.0f;

                    for (int32 StepIndex = 0; StepIndex < 64; ++StepIndex)
                    {
                        FEmbreeRay EmbreeRay;
                        EmbreeRay.ray.org_x = RayOrigin.X;
                        EmbreeRay.ray.org_y = RayOrigin.Y;
                        EmbreeRay.ray.org_z = RayOrigin.Z;
                        EmbreeRay.ray.dir_x = RayDirection.X;
                        EmbreeRay.ray.dir_y = RayDirection.Y;
                        EmbreeRay.ray.dir_z = RayDirection.Z;
                        EmbreeRay.ray.tnear = StepTMin;
                        EmbreeRay.ray.tfar = FLT_MAX;

                        FEmbreeIntersectionContext EmbreeContext;
                        rtcInitIntersectContext(&EmbreeContext);
                        rtcIntersect1(Context.FullMeshEmbreeScene, &EmbreeContext, &EmbreeRay);

                        if (EmbreeRay.hit.geomID != RTC_INVALID_GEOMETRY_ID && EmbreeRay.hit.primID != RTC_INVALID_GEOMETRY_ID)
                        {
                            const FVector SurfacePoint = RayOrigin + RayDirection * EmbreeRay.ray.tfar;
                            const FVector SurfaceNormal = EmbreeRay.GetHitNormal();

                            const float NdotD = FVector::DotProduct(RayDirection, SurfaceNormal);
                            const bool bPassCullTest = EmbreeContext.IsHitTwoSided() || NdotD <= 0.0f;
                            const bool bPassProjectionAngleTest = FMath::Abs(NdotD) >= FMath::Cos(75.0f * (PI / 180.0f));

                            const float MinDistanceBetweenPoints = (MaxRayT / MeshSliceNum);
                            const bool bPassDistanceToAnotherSurfaceTest = EmbreeRay.ray.tnear <= 0.0f || (EmbreeRay.ray.tfar - EmbreeRay.ray.tnear > MinDistanceBetweenPoints);

                            if (bPassCullTest && bPassProjectionAngleTest && bPassDistanceToAnotherSurfaceTest)
                            {
                                const bool bIsInsideMesh = IsSurfacePointInsideMesh(Context.FullMeshEmbreeScene, SurfacePoint, SurfaceNormal, RayDirectionsOverHemisphere);
                                if (!bIsInsideMesh)
                                {
                                    HeightfieldLayers[HeighfieldX + HeighfieldY * HeighfieldSize.X].Add(
                                        { EmbreeRay.ray.tnear, EmbreeRay.ray.tfar }
                                    );
                                }
                            }

                            StepTMin = EmbreeRay.ray.tfar + 0.01f;
                        }
                        else
                        {
                            break;
                        }
                    }
                }
            }
        }

        const int32 MinCardHits = FMath::Floor(HeighfieldSize.X * HeighfieldSize.Y * MinSurfaceThreshold);

        TArray<FPlacedCard, TInlineAllocator<16>> PlacedCards;
        int32 PlacedCardsHits = 0;

        // 放置一个默认卡片.
        {
            FPlacedCard PlacedCard;
            PlacedCard.SliceMin = 0;
            PlacedCard.SliceMax = MeshSliceNum;
            PlacedCards.Add(PlacedCard);

            PlacedCardsHits = UpdatePlacedCards(PlacedCards, RayOriginFrame, RayDirection, HeighfieldStepX, HeighfieldStepY, HeighfieldSize, MeshSliceNum, MaxRayT, MinCardHits, VoxelExtent, HeightfieldLayers);

            if (PlacedCardsHits < MinCardHits)
            {
                PlacedCards.Reset();
            }
        }

        SerializePlacedCards(PlacedCards, /*LOD level*/ 0, Orientation, MinCardHits, MeshCardsBounds, OutData);

        // 尝试通过拆分现有的卡片去放置更多的卡片.
        for (uint32 CardPlacementIteration = 0; CardPlacementIteration < 4; ++CardPlacementIteration)
        {
            TArray<FPlacedCard, TInlineAllocator<16>> BestPlacedCards;
            int32 BestPlacedCardHits = PlacedCardsHits;

            for (int32 PlacedCardIndex = 0; PlacedCardIndex < PlacedCards.Num(); ++PlacedCardIndex)
            {
                const FPlacedCard& PlacedCard = PlacedCards[PlacedCardIndex];
                for (int32 SliceIndex = PlacedCard.SliceMin + 2; SliceIndex < PlacedCard.SliceMax; ++SliceIndex)
                {
                    TArray<FPlacedCard, TInlineAllocator<16>> TempPlacedCards(PlacedCards);

                    FPlacedCard NewPlacedCard;
                    NewPlacedCard.SliceMin = SliceIndex;
                    NewPlacedCard.SliceMax = PlacedCard.SliceMax;

                    TempPlacedCards[PlacedCardIndex].SliceMax = SliceIndex - 1;
                    TempPlacedCards.Insert(NewPlacedCard, PlacedCardIndex + 1);

                    const int32 NumHits = UpdatePlacedCards(TempPlacedCards, RayOriginFrame, RayDirection, HeighfieldStepX, HeighfieldStepY, HeighfieldSize, MeshSliceNum, MaxRayT, MinCardHits, VoxelExtent, HeightfieldLayers);

                    if (NumHits > BestPlacedCardHits)
                    {
                        BestPlacedCards = TempPlacedCards;
                        BestPlacedCardHits = NumHits;
                    }
                }
            }

            if (BestPlacedCardHits >= PlacedCardsHits + MinCardHits)
            {
                PlacedCards = BestPlacedCards;
                PlacedCardsHits = BestPlacedCardHits;
            }
        }

        SerializePlacedCards(PlacedCards, /*LOD level*/ 1, Orientation, MinCardHits, MeshCardsBounds, OutData);
    } // for (int32 Orientation = 0; Orientation < 6; ++Orientation)
}

以上代码显示构建卡牌数据时使用了高度场光线追踪(Height Field Ray Tracing)来加速,而光线追踪多年前就存在的技术。它的核心思想和步骤在于将网格离散化成大小相等的3D体素,然后根据分辨率大小从摄像机位置向每个像素位置发射一条光线和3D体素相交测试,从而渲染出高度场的轮廓。而高度场的轮廓将屏幕划分为高度场覆盖区域和高度场以上区域的分界线:

这样获得的轮廓存在明显的锯齿,论文Ray Tracing Height Fields提供了高度场平面、线性近似平面、三角面、双线性表面等方法来重建表面数据以缓解锯齿。

经过以上构建之后,可以出现如下所示的网格卡片数据:

上:网格正常数据;下:网格卡片数据可视化。

网格卡片数据存在LOD,会根据镜头远近选择对应等级的LOD(点击看视频)。

此外,UE5构建出来的网格距离场数据做了改进,利用稀疏存储提升了精度(下图左),明显要好于UE4(下图右)。

6.5.4 Lumen渲染流程

Lumen的主要渲染流程依然在FDeferredShadingSceneRenderer::Render中:

void FDeferredShadingSceneRenderer::Render(FRDGBuilder& GraphBuilder)
{
    (......)
    
    bool bAnyLumenEnabled = false;
    if (!IsSimpleForwardShadingEnabled(ShaderPlatform))
    {
        (......)

        // 检测是否有视图启用了Lumen.
        for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
        {
            FViewInfo& View = Views[ViewIndex];
            bAnyLumenEnabled = bAnyLumenEnabled 
                || GetViewPipelineState(View).DiffuseIndirectMethod == EDiffuseIndirectMethod::Lumen
                || GetViewPipelineState(View).ReflectionsMethod == EReflectionsMethod::Lumen;
        }

        (......)
    }
    
    (......)
    
    // PrePass.
    RenderPrePass(...);
    
    (......)
    
    // 更新Lumen场景.
    UpdateLumenScene(GraphBuilder);

    // 如果在BasePass之前执行遮挡剔除, 则在RenderBasePass之前渲染Lumen场景光照.
    // bOcclusionBeforeBasePass默认为false.
    if (bOcclusionBeforeBasePass)
    {
        {
            LLM_SCOPE_BYTAG(Lumen);
            RenderLumenSceneLighting(GraphBuilder, Views[0]);
        }

        ComputeVolumetricFog(GraphBuilder);
    }
    
    (......)
    
    // BasePass.
    RenderBasePass(...);
    
    (......)
    
    // BasePass之后的Lumen光照.
    if (!bOcclusionBeforeBasePass)
    {
        const bool bAfterBasePass = true;
        // 渲染阴影.
        AllocateVirtualShadowMaps(bAfterBasePass);
        RenderShadowDepthMaps(GraphBuilder, InstanceCullingManager);
        
        {
            LLM_SCOPE_BYTAG(Lumen);
            // 渲染Lumen场景光照.
            RenderLumenSceneLighting(GraphBuilder, Views[0]);
        }

        AddServiceLocalQueuePass(GraphBuilder);
    }
    
    (......)
    
    // 渲染Lumen可视化.
    RenderLumenSceneVisualization(GraphBuilder, SceneTextures);
    // 渲染非直接漫反射和AO.
    RenderDiffuseIndirectAndAmbientOcclusion(GraphBuilder, SceneTextures, LightingChannelsTexture, true);
    
    (......)
}

下面的红框是RenderDoc截帧中Lumen的执行步骤:

Lumen的光照主要有更新场景UpdateLumenScene和计算场景光照RenderLumenSceneLighting两个阶段。

6.5.5 Lumen场景更新

6.5.5.1 UpdateLumenScene

Lumen场景更新主要由UpdateLumenScene承担:

// EngineSourceRuntimeRendererPrivateLumenLumenSceneRendering.cpp

void FDeferredShadingSceneRenderer::UpdateLumenScene(FRDGBuilder& GraphBuilder)
{
    LLM_SCOPE_BYTAG(Lumen);

    FViewInfo& View = Views[0];
    const FPerViewPipelineState& ViewPipelineState = GetViewPipelineState(View);
    const bool bAnyLumenActive = ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::Lumen || ViewPipelineState.ReflectionsMethod == EReflectionsMethod::Lumen;

    if (bAnyLumenActive
        // 非主要视图更新场景
        && !View.bIsPlanarReflection 
        && !View.bIsSceneCapture
        && !View.bIsReflectionCapture
        && View.ViewState)
    {
        const double StartTime = FPlatformTime::Seconds();

        // 获取Lumen场景和卡片数据.
        FLumenSceneData& LumenSceneData = *Scene->LumenSceneData;
        TArray<FCardRenderData, SceneRenderingAllocator>& CardsToRender = LumenCardRenderer.CardsToRender;

        RDG_EVENT_SCOPE(GraphBuilder, "UpdateLumenScene: %u card captures %.3fM texels", CardsToRender.Num(), LumenCardRenderer.NumCardTexelsToCapture / 1e6f);

        // 更新卡片场景缓冲.
        UpdateCardSceneBuffer(GraphBuilder.RHICmdList, ViewFamily, Scene);

        // 因为更新了Lumen的图元映射缓冲, 所以需要重新创建视图统一缓冲区.
        Lumen::SetupViewUniformBufferParameters(Scene, *View.CachedViewUniformShaderParameters);
        View.ViewUniformBuffer = TUniformBufferRef<FViewUniformShaderParameters>::CreateUniformBufferImmediate(*View.CachedViewUniformShaderParameters, UniformBuffer_SingleFrame);
        
        LumenCardRenderer.CardIdsToRender.Empty(CardsToRender.Num());

        // 捕捉卡片的临时深度缓冲区.
        const FRDGTextureDesc DepthStencilAtlasDesc = FRDGTextureDesc::Create2D(LumenSceneData.MaxAtlasSize, PF_DepthStencil, FClearValueBinding::DepthZero, TexCreate_ShaderResource | TexCreate_DepthStencilTargetable | TexCreate_NoFastClear);
        FRDGTextureRef DepthStencilAtlasTexture = GraphBuilder.CreateTexture(DepthStencilAtlasDesc, TEXT("Lumen.DepthStencilAtlas"));

        if (CardsToRender.Num() > 0)
        {
            FRHIBuffer* PrimitiveIdVertexBuffer = nullptr;
            FInstanceCullingResult InstanceCullingResult;
            // 裁剪卡片, 支持GPU和非GPU裁剪.
#if GPUCULL_TODO
            if (Scene->GPUScene.IsEnabled())
            {
                int32 MaxInstances = 0;
                int32 VisibleMeshDrawCommandsNum = 0;
                int32 NewPassVisibleMeshDrawCommandsNum = 0;

                FInstanceCullingContext InstanceCullingContext(nullptr, TArrayView<const int32>(&View.GPUSceneViewId, 1));

                SetupGPUInstancedDraws(InstanceCullingContext, LumenCardRenderer.MeshDrawCommands, false, MaxInstances, VisibleMeshDrawCommandsNum, NewPassVisibleMeshDrawCommandsNum);
                // Not supposed to do any compaction here.
                ensure(VisibleMeshDrawCommandsNum == LumenCardRenderer.MeshDrawCommands.Num());

                InstanceCullingContext.BuildRenderingCommands(GraphBuilder, Scene->GPUScene, View.DynamicPrimitiveCollector.GetPrimitiveIdRange(), InstanceCullingResult);
            }
            else
#endif // GPUCULL_TODO
            {
                // Prepare primitive Id VB for rendering mesh draw commands.
                if (LumenCardRenderer.MeshDrawPrimitiveIds.Num() > 0)
                {
                    const uint32 PrimitiveIdBufferDataSize = LumenCardRenderer.MeshDrawPrimitiveIds.Num() * sizeof(int32);

                    FPrimitiveIdVertexBufferPoolEntry Entry = GPrimitiveIdVertexBufferPool.Allocate(PrimitiveIdBufferDataSize);
                    PrimitiveIdVertexBuffer = Entry.BufferRHI;

                    void* RESTRICT Data = RHILockBuffer(PrimitiveIdVertexBuffer, 0, PrimitiveIdBufferDataSize, RLM_WriteOnly);
                    FMemory::Memcpy(Data, LumenCardRenderer.MeshDrawPrimitiveIds.GetData(), PrimitiveIdBufferDataSize);
                    RHIUnlockBuffer(PrimitiveIdVertexBuffer);

                    GPrimitiveIdVertexBufferPool.ReturnToFreeList(Entry);
                }
        }
            FRDGTextureRef AlbedoAtlasTexture = GraphBuilder.RegisterExternalTexture(LumenSceneData.AlbedoAtlas);
            FRDGTextureRef NormalAtlasTexture = GraphBuilder.RegisterExternalTexture(LumenSceneData.NormalAtlas);
            FRDGTextureRef EmissiveAtlasTexture = GraphBuilder.RegisterExternalTexture(LumenSceneData.EmissiveAtlas);

            uint32 NumRects = 0;
            FRDGBufferRef RectMinMaxBuffer = nullptr;
            {
                // 上传卡片id,用于在待渲染卡片上操作的批量绘制。
                TArray<FUintVector4, SceneRenderingAllocator> RectMinMaxToRender;
                RectMinMaxToRender.Reserve(CardsToRender.Num());
                for (const FCardRenderData& CardRenderData : CardsToRender)
                {
                    FIntRect AtlasRect = CardRenderData.AtlasAllocation;

                    FUintVector4 Rect;
                    Rect.X = FMath::Max(AtlasRect.Min.X, 0);
                    Rect.Y = FMath::Max(AtlasRect.Min.Y, 0);
                    Rect.Z = FMath::Max(AtlasRect.Max.X, 0);
                    Rect.W = FMath::Max(AtlasRect.Max.Y, 0);
                    RectMinMaxToRender.Add(Rect);
                }

                NumRects = CardsToRender.Num();
                RectMinMaxBuffer = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateUploadDesc(sizeof(FUintVector4), FMath::RoundUpToPowerOfTwo(NumRects)), TEXT("Lumen.RectMinMaxBuffer"));

                FPixelShaderUtils::UploadRectMinMaxBuffer(GraphBuilder, RectMinMaxToRender, RectMinMaxBuffer);

                FRDGBufferSRVRef RectMinMaxBufferSRV = GraphBuilder.CreateSRV(FRDGBufferSRVDesc(RectMinMaxBuffer, PF_R32G32B32A32_UINT));
                ClearLumenCards(GraphBuilder, View, AlbedoAtlasTexture, NormalAtlasTexture, EmissiveAtlasTexture, DepthStencilAtlasTexture, LumenSceneData.MaxAtlasSize, RectMinMaxBufferSRV, NumRects);
            }

            // 缓存视图信息.
            FViewInfo* SharedView = View.CreateSnapshot();
            {
                SharedView->DynamicPrimitiveCollector = FGPUScenePrimitiveCollector(&GetGPUSceneDynamicContext());
                SharedView->StereoPass = eSSP_FULL;
                SharedView->DrawDynamicFlags = EDrawDynamicFlags::ForceLowestLOD;

                // Don't do material texture mip biasing in proxy card rendering
                SharedView->MaterialTextureMipBias = 0;

                TRefCountPtr<IPooledRenderTarget> NullRef;
                FPlatformMemory::Memcpy(&SharedView->PrevViewInfo.HZB, &NullRef, sizeof(SharedView->PrevViewInfo.HZB));

                SharedView->CachedViewUniformShaderParameters = MakeUnique<FViewUniformShaderParameters>();
                SharedView->CachedViewUniformShaderParameters->PrimitiveSceneData = Scene->GPUScene.PrimitiveBuffer.SRV;
                SharedView->CachedViewUniformShaderParameters->InstanceSceneData = Scene->GPUScene.InstanceDataBuffer.SRV;
                SharedView->CachedViewUniformShaderParameters->LightmapSceneData = Scene->GPUScene.LightmapDataBuffer.SRV;
                SharedView->ViewUniformBuffer = TUniformBufferRef<FViewUniformShaderParameters>::CreateUniformBufferImmediate(*SharedView->CachedViewUniformShaderParameters, UniformBuffer_SingleFrame);
            }

            // 设置场景的纹理缓存.
            FLumenCardPassUniformParameters* PassUniformParameters = GraphBuilder.AllocParameters<FLumenCardPassUniformParameters>();
            SetupSceneTextureUniformParameters(GraphBuilder, Scene->GetFeatureLevel(), /*SceneTextureSetupMode*/ ESceneTextureSetupMode::None, PassUniformParameters->SceneTextures);

            // 捕获网格卡片.
            {
                FLumenCardPassParameters* PassParameters = GraphBuilder.AllocParameters<FLumenCardPassParameters>();
                PassParameters->View = Scene->UniformBuffers.LumenCardCaptureViewUniformBuffer;
                PassParameters->CardPass = GraphBuilder.CreateUniformBuffer(PassUniformParameters);
                PassParameters->RenderTargets[0] = FRenderTargetBinding(AlbedoAtlasTexture, ERenderTargetLoadAction::ELoad);
                PassParameters->RenderTargets[1] = FRenderTargetBinding(NormalAtlasTexture, ERenderTargetLoadAction::ELoad);
                PassParameters->RenderTargets[2] = FRenderTargetBinding(EmissiveAtlasTexture, ERenderTargetLoadAction::ELoad);
                PassParameters->RenderTargets.DepthStencil = FDepthStencilBinding(DepthStencilAtlasTexture, ERenderTargetLoadAction::ELoad, FExclusiveDepthStencil::DepthWrite_StencilNop);

                InstanceCullingResult.GetDrawParameters(PassParameters->InstanceCullingDrawParams);

                // 捕获网格卡片Pass.
                GraphBuilder.AddPass(
                    RDG_EVENT_NAME("MeshCardCapture"),
                    PassParameters,
                    ERDGPassFlags::Raster,
                    [this, Scene = Scene, PrimitiveIdVertexBuffer, SharedView, &CardsToRender, PassParameters](FRHICommandList& RHICmdList)
                    {
                        QUICK_SCOPE_CYCLE_COUNTER(MeshPass);

                        // 将所有待渲染的卡片准备数据并提交绘制指令.
                        for (FCardRenderData& CardRenderData : CardsToRender)
                        {
                            if (CardRenderData.NumMeshDrawCommands > 0)
                            {
                                FIntRect AtlasRect = CardRenderData.AtlasAllocation;
                                RHICmdList.SetViewport(AtlasRect.Min.X, AtlasRect.Min.Y, 0.0f, AtlasRect.Max.X, AtlasRect.Max.Y, 1.0f);

                                CardRenderData.PatchView(RHICmdList, Scene, SharedView);
                                Scene->UniformBuffers.LumenCardCaptureViewUniformBuffer.UpdateUniformBufferImmediate(*SharedView->CachedViewUniformShaderParameters);

                                FGraphicsMinimalPipelineStateSet GraphicsMinimalPipelineStateSet;
#if GPUCULL_TODO
                                if (Scene->GPUScene.IsEnabled())
                                {
                                    FRHIBuffer* DrawIndirectArgsBuffer = nullptr;
                                    FRHIBuffer* InstanceIdOffsetBuffer = nullptr;
                                    FInstanceCullingDrawParams& InstanceCullingDrawParams = PassParameters->InstanceCullingDrawParams;
                                    if (InstanceCullingDrawParams.DrawIndirectArgsBuffer != nullptr && InstanceCullingDrawParams.InstanceIdOffsetBuffer != nullptr)
                                    {
                                        DrawIndirectArgsBuffer = InstanceCullingDrawParams.DrawIndirectArgsBuffer->GetRHI();
                                        InstanceIdOffsetBuffer = InstanceCullingDrawParams.InstanceIdOffsetBuffer->GetRHI();
                                    }

                                    // GPU裁剪调用GPUInstanced接口.
                                    SubmitGPUInstancedMeshDrawCommandsRange(
                                        LumenCardRenderer.MeshDrawCommands,
                                        GraphicsMinimalPipelineStateSet,
                                        CardRenderData.StartMeshDrawCommandIndex,
                                        CardRenderData.NumMeshDrawCommands,
                                        1,
                                        InstanceIdOffsetBuffer,
                                        DrawIndirectArgsBuffer,
                                        RHICmdList);
                                }
                                else
#endif // GPUCULL_TODO
                                {
                                    // 非GPU裁剪调用普通绘制接口.
                                    SubmitMeshDrawCommandsRange(
                                        LumenCardRenderer.MeshDrawCommands,
                                        GraphicsMinimalPipelineStateSet,
                                        PrimitiveIdVertexBuffer,
                                        0,
                                        false,
                                        CardRenderData.StartMeshDrawCommandIndex,
                                        CardRenderData.NumMeshDrawCommands,
                                        1,
                                        RHICmdList);
                                }
                            }
                        }
                    }
                );
            }

            // 记录待渲染卡片的id和检测是否存在需要渲染Nanite网格的标记.
            bool bAnyNaniteMeshes = false;
            for (FCardRenderData& CardRenderData : CardsToRender)
            {
                bAnyNaniteMeshes = bAnyNaniteMeshes || CardRenderData.NaniteInstanceIds.Num() > 0 || CardRenderData.bDistantScene;
                LumenCardRenderer.CardIdsToRender.Add(CardRenderData.CardIndex);
            }

            // 渲染Lumen场景的Nanite网格.
            if (UseNanite(ShaderPlatform) && ViewFamily.EngineShowFlags.NaniteMeshes && bAnyNaniteMeshes)
            {
                TRACE_CPUPROFILER_EVENT_SCOPE(NaniteMeshPass);
                QUICK_SCOPE_CYCLE_COUNTER(NaniteMeshPass);

                const FIntPoint DepthStencilAtlasSize = DepthStencilAtlasDesc.Extent;
                const FIntRect DepthAtlasRect = FIntRect(0, 0, DepthStencilAtlasSize.X, DepthStencilAtlasSize.Y);
                FRDGBufferSRVRef RectMinMaxBufferSRV = GraphBuilder.CreateSRV(FRDGBufferSRVDesc(RectMinMaxBuffer, PF_R32G32B32A32_UINT));

                // 光栅化上下文.
                Nanite::FRasterContext RasterContext = Nanite::InitRasterContext(
                    GraphBuilder,
                    FeatureLevel,
                    DepthStencilAtlasSize,
                    Nanite::EOutputBufferMode::VisBuffer,
                    true,
                    RectMinMaxBufferSRV,
                    NumRects);

                const bool bUpdateStreaming = false;
                const bool bSupportsMultiplePasses = true;
                const bool bForceHWRaster = RasterContext.RasterScheduling == Nanite::ERasterScheduling::HardwareOnly;
                // 非主要上下文(和Nanite的主要Pass区别开来)
                const bool bPrimaryContext = false;

                // 裁剪上下文
                Nanite::FCullingContext CullingContext = Nanite::InitCullingContext(
                    GraphBuilder,
                    *Scene,
                    nullptr,
                    FIntRect(),
                    false,
                    bUpdateStreaming,
                    bSupportsMultiplePasses,
                    bForceHWRaster,
                    bPrimaryContext);

                // 多视图渲染.
                if (GLumenSceneNaniteMultiViewCapture)
                {
                    const uint32 NumCardsToRender = CardsToRender.Num();

                    // 第一层while循环是为了拆分卡片数量, 防止同一个批次的卡片超过MAX_VIEWS_PER_CULL_RASTERIZE_PASS.
                    uint32 NextCardIndex = 0;
                    while(NextCardIndex < NumCardsToRender)
                    {
                        TArray<Nanite::FPackedView, SceneRenderingAllocator> NaniteViews;
                        TArray<Nanite::FInstanceDraw, SceneRenderingAllocator> NaniteInstanceDraws;

                        // 给每个待渲染卡片生成一个FPackedViewParams实例, 添加到NaniteViews, 直到NaniteViews达到了最大视图数量.
                        while(NextCardIndex < NumCardsToRender && NaniteViews.Num() < MAX_VIEWS_PER_CULL_RASTERIZE_PASS)
                        {
                            const FCardRenderData& CardRenderData = CardsToRender[NextCardIndex];

                            if(CardRenderData.NaniteInstanceIds.Num() > 0)
                            {
                                for(uint32 InstanceID : CardRenderData.NaniteInstanceIds)
                                {
                                    NaniteInstanceDraws.Add(Nanite::FInstanceDraw { InstanceID, (uint32)NaniteViews.Num() });
                                }

                                Nanite::FPackedViewParams Params;
                                Params.ViewMatrices = CardRenderData.ViewMatrices;
                                Params.PrevViewMatrices = CardRenderData.ViewMatrices;
                                Params.ViewRect = CardRenderData.AtlasAllocation;
                                Params.RasterContextSize = DepthStencilAtlasSize;
                                Params.LODScaleFactor = CardRenderData.NaniteLODScaleFactor;
                                NaniteViews.Add(Nanite::CreatePackedView(Params));
                            }

                            NextCardIndex++;
                        }

                        // 光栅化卡片.
                        if (NaniteInstanceDraws.Num() > 0)
                        {
                            RDG_EVENT_SCOPE(GraphBuilder, "Nanite::RasterizeLumenCards");

                            Nanite::FRasterState RasterState;
                            Nanite::CullRasterize(
                                GraphBuilder,
                                *Scene,
                                NaniteViews,
                                CullingContext,
                                RasterContext,
                                RasterState,
                                &NaniteInstanceDraws
                            );
                        }
                    }
                }
                else // 单视图渲染
                {
                    RDG_EVENT_SCOPE(GraphBuilder, "RenderLumenCardsWithNanite");

                    // 单视图渲染比较暴力, 线性遍历所有待渲染卡片, 每个卡片构建一个view并调用一次绘制.
                    for(FCardRenderData& CardRenderData : CardsToRender)
                    {
                        if(CardRenderData.NaniteInstanceIds.Num() > 0)
                        {                        
                            TArray<Nanite::FInstanceDraw, SceneRenderingAllocator> NaniteInstanceDraws;
                            for( uint32 InstanceID : CardRenderData.NaniteInstanceIds )
                            {
                                NaniteInstanceDraws.Add( Nanite::FInstanceDraw { InstanceID, 0u } );
                            }
                        
                            CardRenderData.PatchView(GraphBuilder.RHICmdList, Scene, SharedView);
                            Nanite::FPackedView PackedView = Nanite::CreatePackedViewFromViewInfo(*SharedView, DepthStencilAtlasSize, 0);

                            Nanite::CullRasterize(
                                GraphBuilder,
                                *Scene,
                                { PackedView },
                                CullingContext,
                                RasterContext,
                                Nanite::FRasterState(),
                                &NaniteInstanceDraws
                            );
                        }
                    }
                }

                extern float GLumenDistantSceneMinInstanceBoundsRadius;

                // 为远处的卡片渲染整个场景.
                for (FCardRenderData& CardRenderData : CardsToRender)
                {
                    // bDistantScene标记了是否远处的卡片.
                    if (CardRenderData.bDistantScene)
                    {
                        Nanite::FRasterState RasterState;
                        RasterState.bNearClip = false;

                        CardRenderData.PatchView(GraphBuilder.RHICmdList, Scene, SharedView);
                        Nanite::FPackedView PackedView = Nanite::CreatePackedViewFromViewInfo(
                            *SharedView,
                            DepthStencilAtlasSize,
                            /*Flags*/ 0,
                            /*StreamingPriorityCategory*/ 0,
                            GLumenDistantSceneMinInstanceBoundsRadius,
                            Lumen::GetDistanceSceneNaniteLODScaleFactor());

                        Nanite::CullRasterize(
                            GraphBuilder,
                            *Scene,
                            { PackedView },
                            CullingContext,
                            RasterContext,
                            RasterState);
                    }
                }

                // Lumen网格捕获Pass.
                Nanite::DrawLumenMeshCapturePass(
                    GraphBuilder,
                    *Scene,
                    SharedView,
                    CardsToRender,
                    CullingContext,
                    RasterContext,
                    PassUniformParameters,
                    RectMinMaxBufferSRV,
                    NumRects,
                    LumenSceneData.MaxAtlasSize,
                    AlbedoAtlasTexture,
                    NormalAtlasTexture,
                    EmissiveAtlasTexture,
                    DepthStencilAtlasTexture
                );
            }

            ConvertToExternalTexture(GraphBuilder, AlbedoAtlasTexture, LumenSceneData.AlbedoAtlas);
            ConvertToExternalTexture(GraphBuilder, NormalAtlasTexture, LumenSceneData.NormalAtlas);
            ConvertToExternalTexture(GraphBuilder, EmissiveAtlasTexture, LumenSceneData.EmissiveAtlas);
        }

        // 上传卡片数据.
        {
            QUICK_SCOPE_CYCLE_COUNTER(UploadCardIndexBuffers);

            // 上传索引缓冲.
            {
                FRDGBufferRef CardIndexBuffer = GraphBuilder.CreateBuffer(
                    FRDGBufferDesc::CreateUploadDesc(sizeof(uint32), FMath::Max(LumenCardRenderer.CardIdsToRender.Num(), 1)),
                    TEXT("Lumen.CardsToRenderIndexBuffer"));

                FLumenCardIdUpload* PassParameters = GraphBuilder.AllocParameters<FLumenCardIdUpload>();
                PassParameters->CardIds = CardIndexBuffer;

                const uint32 CardIdBytes = LumenCardRenderer.CardIdsToRender.GetTypeSize() * LumenCardRenderer.CardIdsToRender.Num();
                const void* CardIdPtr = LumenCardRenderer.CardIdsToRender.GetData();

                GraphBuilder.AddPass(
                    RDG_EVENT_NAME("Upload CardsToRenderIndexBuffer NumIndices=%d", LumenCardRenderer.CardIdsToRender.Num()),
                    PassParameters,
                    ERDGPassFlags::Copy,
                    [PassParameters, CardIdBytes, CardIdPtr](FRHICommandListImmediate& RHICmdList)
                    {
                        if (CardIdBytes > 0)
                        {
                            void* DestCardIdPtr = RHILockBuffer(PassParameters->CardIds->GetRHI(), 0, CardIdBytes, RLM_WriteOnly);
                            FPlatformMemory::Memcpy(DestCardIdPtr, CardIdPtr, CardIdBytes);
                            RHIUnlockBuffer(PassParameters->CardIds->GetRHI());
                        }
                    });

                ConvertToExternalBuffer(GraphBuilder, CardIndexBuffer, LumenCardRenderer.CardsToRenderIndexBuffer);
            }

            // 上传哈希映射表缓冲.
            {
                const uint32 NumHashMapUInt32 = FLumenCardRenderer::NumCardsToRenderHashMapBucketUInt32;
                const uint32 NumHashMapBytes = 4 * NumHashMapUInt32;
                const uint32 NumHashMapBuckets = 32 * NumHashMapUInt32;

                FRDGBufferRef CardHashMapBuffer = GraphBuilder.CreateBuffer(
                    FRDGBufferDesc::CreateUploadDesc(sizeof(uint32), NumHashMapUInt32),
                    TEXT("Lumen.CardsToRenderHashMapBuffer"));

                LumenCardRenderer.CardsToRenderHashMap.Init(0, NumHashMapBuckets);

                for (int32 CardIndex : LumenCardRenderer.CardIdsToRender)
                {
                    LumenCardRenderer.CardsToRenderHashMap[CardIndex % NumHashMapBuckets] = 1;
                }

                FLumenCardIdUpload* PassParameters = GraphBuilder.AllocParameters<FLumenCardIdUpload>();
                PassParameters->CardIds = CardHashMapBuffer;

                const void* HashMapDataPtr = LumenCardRenderer.CardsToRenderHashMap.GetData();

                GraphBuilder.AddPass(
                    RDG_EVENT_NAME("Upload CardsToRenderHashMapBuffer NumUInt32=%d", NumHashMapUInt32),
                    PassParameters,
                    ERDGPassFlags::Copy,
                    [PassParameters, NumHashMapBytes, HashMapDataPtr](FRHICommandListImmediate& RHICmdList)
                    {
                        if (NumHashMapBytes > 0)
                        {
                            void* DestCardIdPtr = RHILockBuffer(PassParameters->CardIds->GetRHI(), 0, NumHashMapBytes, RLM_WriteOnly);
                            FPlatformMemory::Memcpy(DestCardIdPtr, HashMapDataPtr, NumHashMapBytes);
                            RHIUnlockBuffer(PassParameters->CardIds->GetRHI());
                        }
                    });

                ConvertToExternalBuffer(GraphBuilder, CardHashMapBuffer, LumenCardRenderer.CardsToRenderHashMapBuffer);
            }

            // 上传可见卡片索引缓冲.
            {
                FRDGBufferRef VisibleCardsIndexBuffer = GraphBuilder.CreateBuffer(
                    FRDGBufferDesc::CreateUploadDesc(sizeof(uint32), FMath::Max(LumenSceneData.VisibleCardsIndices.Num(), 1)),
                    TEXT("Lumen.VisibleCardsIndexBuffer"));

                FLumenCardIdUpload* PassParameters = GraphBuilder.AllocParameters<FLumenCardIdUpload>();
                PassParameters->CardIds = VisibleCardsIndexBuffer;

                const uint32 CardIdBytes = sizeof(uint32) * LumenSceneData.VisibleCardsIndices.Num();
                const void* CardIdPtr = LumenSceneData.VisibleCardsIndices.GetData();

                GraphBuilder.AddPass(
                    RDG_EVENT_NAME("Upload VisibleCardIndices NumIndices=%d", LumenSceneData.VisibleCardsIndices.Num()),
                    PassParameters,
                    ERDGPassFlags::Copy,
                    [PassParameters, CardIdBytes, CardIdPtr](FRHICommandListImmediate& RHICmdList)
                    {
                        if (CardIdBytes > 0)
                        {
                            void* DestCardIdPtr = RHILockBuffer(PassParameters->CardIds->GetRHI(), 0, CardIdBytes, RLM_WriteOnly);
                            FPlatformMemory::Memcpy(DestCardIdPtr, CardIdPtr, CardIdBytes);
                            RHIUnlockBuffer(PassParameters->CardIds->GetRHI());
                        }
                    });

                ConvertToExternalBuffer(GraphBuilder, VisibleCardsIndexBuffer, LumenSceneData.VisibleCardsIndexBuffer);
            }
        }

        // 预过滤Lumen场景深度.
        if (LumenCardRenderer.CardIdsToRender.Num() > 0)
        {
            TRDGUniformBufferRef<FLumenCardScene> LumenCardSceneUniformBuffer;
            {
                FLumenCardScene* LumenCardSceneParameters = GraphBuilder.AllocParameters<FLumenCardScene>();
                SetupLumenCardSceneParameters(GraphBuilder, Scene, *LumenCardSceneParameters);
                LumenCardSceneUniformBuffer = GraphBuilder.CreateUniformBuffer(LumenCardSceneParameters);
            }

            PrefilterLumenSceneDepth(GraphBuilder, LumenCardSceneUniformBuffer, DepthStencilAtlasTexture, LumenCardRenderer.CardIdsToRender, View);
        }
    }

    FLumenSceneData& LumenSceneData = *Scene->LumenSceneData;
    LumenSceneData.CardIndicesToUpdateInBuffer.Reset();
    LumenSceneData.MeshCardsIndicesToUpdateInBuffer.Reset();
    LumenSceneData.DFObjectIndicesToUpdateInBuffer.Reset();
}

更新Lumen场景的过程主要有裁剪卡片、上传卡片ID、缓存视图和场景纹理、捕获网格卡片、将卡片当做视图光栅化Lumen场景、渲染远处卡片、绘制网格捕获、上传卡片数据及可见数据等步骤。

由于以上过程比较多,无法将所有过程都详细阐述,本节将重点阐述捕获网格卡片和光栅化网格卡片涉及的阶段。

6.5.5.2 CardsToRender

为了阐述捕获网格卡片和光栅化网格卡片的阶段,需要弄清楚LumenCardRenderer.CardsToRender的添加过程。下面捋清Lumen场景上有哪些卡片需要捕获和渲染,它的处理者是InitView阶段的BeginUpdateLumenSceneTasks

// EngineSourceRuntimeRendererPrivateLumenLumenSceneRendering.cpp

void FDeferredShadingSceneRenderer::BeginUpdateLumenSceneTasks(FRDGBuilder& GraphBuilder)
{
    LLM_SCOPE_BYTAG(Lumen);

    const FViewInfo& MainView = Views[0];
    const bool bAnyLumenActive = ShouldRenderLumenDiffuseGI(Scene, MainView, true)
        || ShouldRenderLumenReflections(MainView, true);

    if (bAnyLumenActive
        && !ViewFamily.EngineShowFlags.HitProxies)
    {
        SCOPED_NAMED_EVENT(FDeferredShadingSceneRenderer_BeginUpdateLumenSceneTasks, FColor::Emerald);
        QUICK_SCOPE_CYCLE_COUNTER(BeginUpdateLumenSceneTasks);
        const double StartTime = FPlatformTime::Seconds();

        FLumenSceneData& LumenSceneData = *Scene->LumenSceneData;
        // 获取待渲染卡片列表并重置.
        TArray<FCardRenderData, SceneRenderingAllocator>& CardsToRender = LumenCardRenderer.CardsToRender;
        LumenCardRenderer.Reset();

        const int32 LocalLumenSceneGeneration = GLumenSceneGeneration;
        const bool bRecaptureLumenSceneOnce = LumenSceneData.Generation != LocalLumenSceneGeneration;
        LumenSceneData.Generation = LocalLumenSceneGeneration;
        const bool bReallocateAtlas = LumenSceneData.MaxAtlasSize != GetDesiredAtlasSize() 
            || (LumenSceneData.RadiosityAtlas && LumenSceneData.RadiosityAtlas->GetDesc().Extent != GetRadiosityAtlasSize(LumenSceneData.MaxAtlasSize))
            || GLumenSceneReset;

        if (GLumenSceneReset != 2)
        {
            GLumenSceneReset = 0;
        }

        LumenSceneData.NumMeshCardsToAddToSurfaceCache = 0;

        // 更新脏卡片.
        UpdateDirtyCards(Scene, bReallocateAtlas, bRecaptureLumenSceneOnce);
        // 更新Lumen场景的图元信息.
        UpdateLumenScenePrimitives(Scene);
        // 更新远处场景.
        UpdateDistantScene(Scene, Views[0]);

        const FVector LumenSceneCameraOrigin = GetLumenSceneViewOrigin(MainView, GetNumLumenVoxelClipmaps() - 1);
        const float MaxCardUpdateDistanceFromCamera = ComputeMaxCardUpdateDistanceFromCamera();

        // 重新分配卡片Atlas.
        if (bReallocateAtlas)
        {
            LumenSceneData.MaxAtlasSize = GetDesiredAtlasSize();
            // 在重新创建Atlas之前,应该释放所有内容
            ensure(LumenSceneData.NumCardTexels == 0);

            LumenSceneData.AtlasAllocator = FBinnedTextureLayout(LumenSceneData.MaxAtlasSize, GLumenSceneCardAtlasAllocatorBinSize);
        }

        // 每帧捕获和更新卡片纹素以及它们的数量, 是否更新由GLumenSceneRecaptureLumenSceneEveryFrame(控制台命令r.LumenScene.RecaptureEveryFrame)决定.
        const int32 CardCapturesPerFrame = GLumenSceneRecaptureLumenSceneEveryFrame != 0 ? INT_MAX : GetMaxLumenSceneCardCapturesPerFrame();
        const int32 CardTexelsToCapturePerFrame = GLumenSceneRecaptureLumenSceneEveryFrame != 0 ? INT_MAX : GetLumenSceneCardResToCapturePerFrame() * GetLumenSceneCardResToCapturePerFrame();

        if (CardCapturesPerFrame > 0 && CardTexelsToCapturePerFrame > 0)
        {
            QUICK_SCOPE_CYCLE_COUNTER(FillCardsToRender);

            TArray<FLumenSurfaceCacheUpdatePacket, SceneRenderingAllocator> Packets;
            TArray<FMeshCardsAdd, SceneRenderingAllocator> MeshCardsAddsSortedByPriority;

            // 准备表面缓存更新.
            {
                TRACE_CPUPROFILER_EVENT_SCOPE(PrepareSurfaceCacheUpdate);

                const int32 NumPrimitivesPerPacket = FMath::Max(GLumenScenePrimitivesPerPacket, 1);
                const int32 NumPackets = FMath::DivideAndRoundUp(LumenSceneData.LumenPrimitives.Num(), NumPrimitivesPerPacket);

                CardsToRender.Reset(GetMaxLumenSceneCardCapturesPerFrame());
                Packets.Reserve(NumPackets);

                for (int32 PacketIndex = 0; PacketIndex < NumPackets; ++PacketIndex)
                {
                    Packets.Emplace(
                        LumenSceneData.LumenPrimitives,
                        LumenSceneData.MeshCards,
                        LumenSceneData.Cards,
                        LumenSceneCameraOrigin,
                        MaxCardUpdateDistanceFromCamera,
                        PacketIndex * NumPrimitivesPerPacket,
                        NumPrimitivesPerPacket);
                }
            }

            // 执行准备缓存更新任务.
            {
                TRACE_CPUPROFILER_EVENT_SCOPE(RunPrepareSurfaceCacheUpdate);
                const bool bExecuteInParallel = FApp::ShouldUseThreadingForPerformance();

                ParallelFor(Packets.Num(),
                    [&Packets](int32 Index)
                    {
                        Packets[Index].AnyThreadTask();
                    },
                    !bExecuteInParallel
                );
            }

            // 打包上述任务的结果.
            {
                TRACE_CPUPROFILER_EVENT_SCOPE(PacketResults);

                const float CARD_DISTANCE_BUCKET_SIZE = 100.0f;
                uint32 NumMeshCardsAddsPerBucket[MAX_ADD_PRIMITIVE_PRIORITY + 1];

                for (int32 BucketIndex = 0; BucketIndex < UE_ARRAY_COUNT(NumMeshCardsAddsPerBucket); ++BucketIndex)
                {
                    NumMeshCardsAddsPerBucket[BucketIndex] = 0;
                }

                // Count how many cards fall into each bucket
                for (int32 PacketIndex = 0; PacketIndex < Packets.Num(); ++PacketIndex)
                {
                    const FLumenSurfaceCacheUpdatePacket& Packet = Packets[PacketIndex];
                    LumenSceneData.NumMeshCardsToAddToSurfaceCache += Packet.MeshCardsAdds.Num();

                    for (int32 CardIndex = 0; CardIndex < Packet.MeshCardsAdds.Num(); ++CardIndex)
                    {
                        const FMeshCardsAdd& MeshCardsAdd = Packet.MeshCardsAdds[CardIndex];
                        ++NumMeshCardsAddsPerBucket[MeshCardsAdd.Priority];
                    }
                }

                int32 NumMeshCardsInBucketsUpToMaxBucket = 0;
                int32 MaxBucketIndexToAdd = 0;

                // 选择前N个桶进行分配
                for (int32 BucketIndex = 0; BucketIndex < UE_ARRAY_COUNT(NumMeshCardsAddsPerBucket); ++BucketIndex)
                {
                    NumMeshCardsInBucketsUpToMaxBucket += NumMeshCardsAddsPerBucket[BucketIndex];
                    MaxBucketIndexToAdd = BucketIndex;

                    if (NumMeshCardsInBucketsUpToMaxBucket > CardCapturesPerFrame)
                    {
                        break;
                    }
                }

                MeshCardsAddsSortedByPriority.Reserve(GetMaxLumenSceneCardCapturesPerFrame());

                // 拷贝前N个桶到CardsToAllocateSortedByDistance
                for (int32 PacketIndex = 0; PacketIndex < Packets.Num(); ++PacketIndex)
                {
                    const FLumenSurfaceCacheUpdatePacket& Packet = Packets[PacketIndex];

                    for (int32 CardIndex = 0; CardIndex < Packet.MeshCardsAdds.Num() && MeshCardsAddsSortedByPriority.Num() < CardCapturesPerFrame; ++CardIndex)
                    {
                        const FMeshCardsAdd& MeshCardsAdd = Packet.MeshCardsAdds[CardIndex];

                        if (MeshCardsAdd.Priority <= MaxBucketIndexToAdd)
                        {
                            MeshCardsAddsSortedByPriority.Add(MeshCardsAdd);
                        }
                    }
                }

                // 移除所有不可见的网格卡片.
                for (int32 PacketIndex = 0; PacketIndex < Packets.Num(); ++PacketIndex)
                {
                    const FLumenSurfaceCacheUpdatePacket& Packet = Packets[PacketIndex];

                    for (int32 MeshCardsToRemoveIndex = 0; MeshCardsToRemoveIndex < Packet.MeshCardsRemoves.Num(); ++MeshCardsToRemoveIndex)
                    {
                        const FMeshCardsRemove& MeshCardsRemove = Packet.MeshCardsRemoves[MeshCardsToRemoveIndex];
                        FLumenPrimitive& LumenPrimitive = LumenSceneData.LumenPrimitives[MeshCardsRemove.LumenPrimitiveIndex];
                        FLumenPrimitiveInstance& LumenPrimitiveInstance = LumenPrimitive.Instances[MeshCardsRemove.LumenInstanceIndex];

                        LumenSceneData.RemoveMeshCards(LumenPrimitive, LumenPrimitiveInstance);
                    }
                }
            }

            // 分配远处场景.
            extern int32 GLumenUpdateDistantSceneCaptures;
            if (GLumenUpdateDistantSceneCaptures)
            {
                for (int32 DistantCardIndex : LumenSceneData.DistantCardIndices)
                {
                    FLumenCard& DistantCard = LumenSceneData.Cards[DistantCardIndex];

                    extern int32 GLumenDistantSceneCardResolution;
                    DistantCard.DesiredResolution = FIntPoint(GLumenDistantSceneCardResolution, GLumenDistantSceneCardResolution);

                    if (!DistantCard.bVisible)
                    {
                        LumenSceneData.AddCardToVisibleCardList(DistantCardIndex);
                        DistantCard.bVisible = true;
                    }

                    DistantCard.RemoveFromAtlas(LumenSceneData);
                    LumenSceneData.CardIndicesToUpdateInBuffer.Add(DistantCardIndex);

                    // 加入到CardsToRender列表.
                    CardsToRender.Add(FCardRenderData(
                        DistantCard,
                        nullptr,
                        -1,
                        FeatureLevel,
                        DistantCardIndex));
                }
            }

            // 分配新的卡片.
            for (int32 SortedCardIndex = 0; SortedCardIndex < MeshCardsAddsSortedByPriority.Num(); ++SortedCardIndex)
            {
                const FMeshCardsAdd& MeshCardsAdd = MeshCardsAddsSortedByPriority[SortedCardIndex];
                FLumenPrimitive& LumenPrimitive = LumenSceneData.LumenPrimitives[MeshCardsAdd.LumenPrimitiveIndex];
                FLumenPrimitiveInstance& LumenPrimitiveInstance = LumenPrimitive.Instances[MeshCardsAdd.LumenInstanceIndex];

                LumenSceneData.AddMeshCards(MeshCardsAdd.LumenPrimitiveIndex, MeshCardsAdd.LumenInstanceIndex);

                if (LumenPrimitiveInstance.MeshCardsIndex >= 0)
                {
                    // 获取图元实例的网格卡片.
                    const FLumenMeshCards& MeshCards = LumenSceneData.MeshCards[LumenPrimitiveInstance.MeshCardsIndex];

                    // 遍历网格卡片的所有卡片, 添加有效的卡片到CardsToRender列表.
                    for (uint32 CardIndex = MeshCards.FirstCardIndex; CardIndex < MeshCards.FirstCardIndex + MeshCards.NumCards; ++CardIndex)
                    {
                        FLumenCard& LumenCard = LumenSceneData.Cards[CardIndex];

                        // 分配卡片.
                        FCardAllocationOutput CardAllocation;
                        ComputeCardAllocation(LumenCard, LumenSceneCameraOrigin, MaxCardUpdateDistanceFromCamera, CardAllocation);

                        LumenCard.DesiredResolution = CardAllocation.TextureAllocationSize;

                        if (LumenCard.bVisible != CardAllocation.bVisible)
                        {
                            LumenCard.bVisible = CardAllocation.bVisible;
                            if (LumenCard.bVisible)
                            {
                                LumenSceneData.AddCardToVisibleCardList(CardIndex);
                            }
                            else
                            {
                                LumenCard.RemoveFromAtlas(LumenSceneData);
                                LumenSceneData.RemoveCardFromVisibleCardList(CardIndex);
                            }
                            LumenSceneData.CardIndicesToUpdateInBuffer.Add(CardIndex);
                        }

                        // 如果卡片可见且分辨率和预期不一样, 才添加到CardsToRender.
                        if (LumenCard.bVisible && LumenCard.AtlasAllocation.Width() != LumenCard.DesiredResolution.X && LumenCard.AtlasAllocation.Height() != LumenCard.DesiredResolution.Y)
                        {
                            LumenCard.RemoveFromAtlas(LumenSceneData);
                            LumenSceneData.CardIndicesToUpdateInBuffer.Add(CardIndex);

                            // 加入到CardsToRender列表.
                            CardsToRender.Add(FCardRenderData(
                                LumenCard,
                                LumenPrimitive.Primitive,
                                LumenPrimitive.bMergedInstances ? -1 : MeshCardsAdd.LumenInstanceIndex,
                                FeatureLevel,
                                CardIndex));

                            LumenCardRenderer.NumCardTexelsToCapture += LumenCard.AtlasAllocation.Area();
                        }
                    } // for

                    // 如果卡片或卡片纹素超限, 终止循环.
                    if (CardsToRender.Num() >= CardCapturesPerFrame
                        || LumenCardRenderer.NumCardTexelsToCapture >= CardTexelsToCapturePerFrame)
                    {
                        break;
                    }
                }
            }
        }

        // 分配和更新卡片Atlas.
        AllocateOptionalCardAtlases(GraphBuilder, LumenSceneData, MainView, bReallocateAtlas);
        UpdateLumenCardAtlasAllocation(GraphBuilder, MainView, bReallocateAtlas, bRecaptureLumenSceneOnce);

         // 处理待渲染卡片.
        if (CardsToRender.Num() > 0)
        {
            // 设置网格通道.
            {
                QUICK_SCOPE_CYCLE_COUNTER(MeshPassSetup);

                // 在渲染之前,确保所有的网格渲染数据都已准备好.
                {
                    QUICK_SCOPE_CYCLE_COUNTER(PrepareStaticMeshData);

                    // Set of unique primitives requiring static mesh update
                    TSet<FPrimitiveSceneInfo*> PrimitivesToUpdateStaticMeshes;

                    for (FCardRenderData& CardRenderData : CardsToRender)
                    {
                        FPrimitiveSceneInfo* PrimitiveSceneInfo = CardRenderData.PrimitiveSceneInfo;

                        if (PrimitiveSceneInfo && PrimitiveSceneInfo->Proxy->AffectsDynamicIndirectLighting())
                        {
                            if (PrimitiveSceneInfo->NeedsUniformBufferUpdate())
                            {
                                PrimitiveSceneInfo->UpdateUniformBuffer(GraphBuilder.RHICmdList);
                            }

                            if (PrimitiveSceneInfo->NeedsUpdateStaticMeshes())
                            {
                                PrimitivesToUpdateStaticMeshes.Add(PrimitiveSceneInfo);
                            }
                        }
                    }

                    if (PrimitivesToUpdateStaticMeshes.Num() > 0)
                    {
                        TArray<FPrimitiveSceneInfo*> UpdatedSceneInfos;
                        UpdatedSceneInfos.Reserve(PrimitivesToUpdateStaticMeshes.Num());
                        for (FPrimitiveSceneInfo* PrimitiveSceneInfo : PrimitivesToUpdateStaticMeshes)
                        {
                            UpdatedSceneInfos.Add(PrimitiveSceneInfo);
                        }

                        FPrimitiveSceneInfo::UpdateStaticMeshes(GraphBuilder.RHICmdList, Scene, UpdatedSceneInfos, true);
                    }
                }

                // 增加卡片捕获绘制.
                for (FCardRenderData& CardRenderData : CardsToRender)
                {
                    CardRenderData.StartMeshDrawCommandIndex = LumenCardRenderer.MeshDrawCommands.Num();
                    CardRenderData.NumMeshDrawCommands = 0;
                    int32 NumNanitePrimitives = 0;

                    const FLumenCard& Card = LumenSceneData.Cards[CardRenderData.CardIndex];
                    checkSlow(Card.bVisible && Card.bAllocated);

                    // 创建或处理卡片对应的FVisibleMeshDrawCommand.
                    AddCardCaptureDraws(Scene, 
                        GraphBuilder.RHICmdList, 
                        CardRenderData, 
                        LumenCardRenderer.MeshDrawCommands, 
                        LumenCardRenderer.MeshDrawPrimitiveIds);

                    CardRenderData.NumMeshDrawCommands = LumenCardRenderer.MeshDrawCommands.Num() - CardRenderData.StartMeshDrawCommandIndex;
                }
            }

            (.....)
        }
    }
}

以上可知,网格卡片并不是每帧更新,在GLumenSceneRecaptureLumenSceneEveryFrame(控制台命令r.LumenScene.RecaptureEveryFrame)开启的情况下,网格卡片的分辨率发生改变且可见的情况下,才会加入到待渲染列表,并且每帧都有上限,防止一帧需要更新和绘制的卡片过多导致性能瓶颈。

6.5.5.3 MeshCardCapture

分析完如何将网格卡片加入到待渲染列表,便可以继续分析捕获卡片的具体过程了:

// 捕获网格卡片.
{
    FLumenCardPassParameters* PassParameters = GraphBuilder.AllocParameters<FLumenCardPassParameters>();
    // 卡片视图信息.
    PassParameters->View = Scene->UniformBuffers.LumenCardCaptureViewUniformBuffer;
    PassParameters->CardPass = GraphBuilder.CreateUniformBuffer(PassUniformParameters);
    // Atlas渲染目标有3个: 基础色, 法线, 自发光.
    PassParameters->RenderTargets[0] = FRenderTargetBinding(AlbedoAtlasTexture, ERenderTargetLoadAction::ELoad);
    PassParameters->RenderTargets[1] = FRenderTargetBinding(NormalAtlasTexture, ERenderTargetLoadAction::ELoad);
    PassParameters->RenderTargets[2] = FRenderTargetBinding(EmissiveAtlasTexture, ERenderTargetLoadAction::ELoad);
    // 深度目标缓冲.
    PassParameters->RenderTargets.DepthStencil = FDepthStencilBinding(DepthStencilAtlasTexture, ERenderTargetLoadAction::ELoad, FExclusiveDepthStencil::DepthWrite_StencilNop);

    InstanceCullingResult.GetDrawParameters(PassParameters->InstanceCullingDrawParams);

    // 捕获网格卡片Pass.
    GraphBuilder.AddPass(
        RDG_EVENT_NAME("MeshCardCapture"),
        PassParameters,
        ERDGPassFlags::Raster,
        [this, Scene = Scene, PrimitiveIdVertexBuffer, SharedView, &CardsToRender, PassParameters](FRHICommandList& RHICmdList)
        {
            QUICK_SCOPE_CYCLE_COUNTER(MeshPass);

            // 将所有待渲染的卡片准备数据并提交绘制指令.
            for (FCardRenderData& CardRenderData : CardsToRender)
            {
                if (CardRenderData.NumMeshDrawCommands > 0)
                {
                    FIntRect AtlasRect = CardRenderData.AtlasAllocation;
                    // 设置视口.
                    RHICmdList.SetViewport(AtlasRect.Min.X, AtlasRect.Min.Y, 0.0f, AtlasRect.Max.X, AtlasRect.Max.Y, 1.0f);

                    // 处理视图数据.
                    CardRenderData.PatchView(RHICmdList, Scene, SharedView);
                    Scene->UniformBuffers.LumenCardCaptureViewUniformBuffer.UpdateUniformBufferImmediate(*SharedView->CachedViewUniformShaderParameters);

                    FGraphicsMinimalPipelineStateSet GraphicsMinimalPipelineStateSet;
                #if GPUCULL_TODO
                    if (Scene->GPUScene.IsEnabled())
                    {
                        FRHIBuffer* DrawIndirectArgsBuffer = nullptr;
                        FRHIBuffer* InstanceIdOffsetBuffer = nullptr;
                        FInstanceCullingDrawParams& InstanceCullingDrawParams = PassParameters->InstanceCullingDrawParams;
                        if (InstanceCullingDrawParams.DrawIndirectArgsBuffer != nullptr && InstanceCullingDrawParams.InstanceIdOffsetBuffer != nullptr)
                        {
                            DrawIndirectArgsBuffer = InstanceCullingDrawParams.DrawIndirectArgsBuffer->GetRHI();
                            InstanceIdOffsetBuffer = InstanceCullingDrawParams.InstanceIdOffsetBuffer->GetRHI();
                        }

                        // GPU裁剪调用GPUInstanced接口.
                        SubmitGPUInstancedMeshDrawCommandsRange(
                            LumenCardRenderer.MeshDrawCommands,
                            GraphicsMinimalPipelineStateSet,
                            CardRenderData.StartMeshDrawCommandIndex,
                            CardRenderData.NumMeshDrawCommands,
                            1,
                            InstanceIdOffsetBuffer,
                            DrawIndirectArgsBuffer,
                            RHICmdList);
                    }
                #endif // GPUCULL_TODO
                    (......)
                }
            }
        }
    );
}

绘制卡片阶段,渲染网格卡片时为每个网格卡片以低分辨率从不同的方向获取网格表面属性的投影,这些投影后的网格属性被储存在纹理atlas中,但不同于传统的渲染管线,此处只光栅化卡片视图范围内的Nanite网格的三种属性:基础色、法线、自发光。(下图)

卡片捕捉阶段投影在网格卡片的网格属性图集。上:基础色图集,下:法线图集。

下面是捕获网格卡片使用的VS和PS:

// EngineShadersPrivateLumenLumenCardVertexShader.usf

struct FLumenCardInterpolantsVSToPS
{
};

struct FLumenCardVSToPS
{
    FVertexFactoryInterpolantsVSToPS FactoryInterpolants;
    FLumenCardInterpolantsVSToPS PassInterpolants;
    float4 Position : SV_POSITION;
};

// 网格卡片VS主入口.
void Main(
    FVertexFactoryInput Input,
    OPTIONAL_VertexID
    out FLumenCardVSToPS Output
    )
{    
    uint EyeIndex = 0;
    ResolvedView = ResolveView();

    FVertexFactoryIntermediates VFIntermediates = GetVertexFactoryIntermediates(Input);
    float4 WorldPositionExcludingWPO = VertexFactoryGetWorldPosition(Input, VFIntermediates);
    float4 WorldPosition = WorldPositionExcludingWPO;
    float4 ClipSpacePosition;

    float3x3 TangentToLocal = VertexFactoryGetTangentToLocal(Input, VFIntermediates);    
    FMaterialVertexParameters VertexParameters = GetMaterialVertexParameters(Input, VFIntermediates, WorldPosition.xyz, TangentToLocal);

    ISOLATE
    {
        // 材质的位置偏移.
        WorldPosition.xyz += GetMaterialWorldPositionOffset(VertexParameters);
        // 光栅化的位置偏移.
        float4 RasterizedWorldPosition = VertexFactoryGetRasterizedWorldPosition(Input, VFIntermediates, WorldPosition);
        // 将位置变换到裁剪空间.
        ClipSpacePosition = INVARIANT(mul(RasterizedWorldPosition, ResolvedView.TranslatedWorldToClip));
        Output.Position = INVARIANT(ClipSpacePosition);
    }

    bool bClampToNearPlane = false;// GetPrimitiveData(Input.PrimitiveId).ObjectWorldPositionAndRadius.w < .5f * max();

    if (bClampToNearPlane && Output.Position.z < 0)
    {
        Output.Position.z = 0.01f;
        Output.Position.w = 1.0f;
    }

    Output.FactoryInterpolants = VertexFactoryGetInterpolantsVSToPS(Input, VFIntermediates, VertexParameters);
}


// EngineShadersPrivateLumenLumenCardPixelShader.usf

struct FLumenCardInterpolantsVSToPS
{
};

// 网格卡片PS主入口.
void Main(
    FVertexFactoryInterpolantsVSToPS Interpolants,
    FLumenCardInterpolantsVSToPS PassInterpolants,
    in INPUT_POSITION_QUALIFIERS float4 SvPosition : SV_Position        // after all interpolators
    OPTIONAL_IsFrontFace,
    out float4 OutTarget0 : SV_Target0,
    out float4 OutTarget1 : SV_Target1,
    out float4 OutTarget2 : SV_Target2)
{
    ResolvedView = ResolveView();

    // 获取材质的基本属性.
    FMaterialPixelParameters MaterialParameters = GetMaterialPixelParameters(Interpolants, SvPosition);
    FPixelMaterialInputs PixelMaterialInputs;
    
    // 计算材质的额外属性.
    {
        float4 ScreenPosition = SvPositionToResolvedScreenPosition(SvPosition);
        float3 TranslatedWorldPosition = SvPositionToResolvedTranslatedWorld(SvPosition);
        CalcMaterialParametersEx(MaterialParameters, PixelMaterialInputs, SvPosition, ScreenPosition, bIsFrontFace, TranslatedWorldPosition, TranslatedWorldPosition);
    }

    // 获取材质覆盖和裁剪数据.
    GetMaterialCoverageAndClipping(MaterialParameters, PixelMaterialInputs);

    float3 BaseColor = GetMaterialBaseColor(PixelMaterialInputs);
    float  Metallic = GetMaterialMetallic(PixelMaterialInputs);
    float  Specular = GetMaterialSpecular(PixelMaterialInputs);

    float Roughness = GetMaterialRoughness(PixelMaterialInputs);
    float Opacity = GetMaterialOpacity(PixelMaterialInputs);

    float3 DiffuseColor = BaseColor - BaseColor * Metallic;
    float3 SpecularColor = lerp(0.08 * Specular.xxx, BaseColor, Metallic.xxx);

    // 计算环境光的影响.
    EnvBRDFApproxFullyRough(DiffuseColor, SpecularColor);

    // 存储基础色, 法线, 自发光.
    //@todo DynamicGI better encoding for low precision, hemispherical normal encoding
    OutTarget0 = float4(sqrt(DiffuseColor), Opacity);
    OutTarget1 = float4(MaterialParameters.WorldNormal * .5f + .5f, 0);
    OutTarget2 = float4(GetMaterialEmissive(PixelMaterialInputs), 0);
}

其中VS的输入是局部空间的长方体,VS的输出是裁剪空间的长方体:

经过PS渲染完之后,会在基础色、法线、自发光的三个RT图集中对应的位置存储数据。需要特意提出的是,这里的VS和PS逻辑远远没有传统BasePass的VS和PS复杂,这也是Lumen得以实时渲染的其中一个重要优化措施。

另外说一下,渲染新卡片到Atlas图集的位置可由Bin packing problem解决,渲染时只要将起始点和宽高设置到ViewPort就行了,对应的类型是FBinnedTextureLayout,其它相关类型还有FTextureLayoutFTextureLayout3d。比如以下截帧的卡片ViewPort的位置是(0, 0),宽高是(64, 64),意味着它将被渲染到图集中最前面宽高为64的区域:

顺带提一下,网格卡片的绘制指令是在FLumenCardMeshProcessor中处理的:

// EngineSourceRuntimeRendererPrivateLumenLumenSceneRendering.cpp

void FLumenCardMeshProcessor::AddMeshBatch(const FMeshBatch& RESTRICT MeshBatch, uint64 BatchElementMask, const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy, int32 StaticMeshId)
{
    LLM_SCOPE_BYTAG(Lumen);

    if (MeshBatch.bUseForMaterial && DoesPlatformSupportLumenGI(GetFeatureLevelShaderPlatform(FeatureLevel)))
    {
        // 处理材质.
        const FMaterialRenderProxy* FallbackMaterialRenderProxyPtr = nullptr;
        const FMaterial& Material = MeshBatch.MaterialRenderProxy->GetMaterialWithFallback(FeatureLevel, FallbackMaterialRenderProxyPtr);

        const FMaterialRenderProxy& MaterialRenderProxy = FallbackMaterialRenderProxyPtr ? *FallbackMaterialRenderProxyPtr : *MeshBatch.MaterialRenderProxy;

        // 处理渲染状态.
        const EBlendMode BlendMode = Material.GetBlendMode();
        const FMaterialShadingModelField ShadingModels = Material.GetShadingModels();
        const bool bIsTranslucent = IsTranslucentBlendMode(BlendMode);
        const FMeshDrawingPolicyOverrideSettings OverrideSettings = ComputeMeshOverrideSettings(MeshBatch);
        const ERasterizerFillMode MeshFillMode = ComputeMeshFillMode(MeshBatch, Material, OverrideSettings);
        const ERasterizerCullMode MeshCullMode = ComputeMeshCullMode(MeshBatch, Material, OverrideSettings);

        if (!bIsTranslucent
            && (PrimitiveSceneProxy && PrimitiveSceneProxy->ShouldRenderInMainPass() && PrimitiveSceneProxy->AffectsDynamicIndirectLighting())
            && ShouldIncludeDomainInMeshPass(Material.GetMaterialDomain()))
        {
            // 选择VS和PS等shader
            const FVertexFactory* VertexFactory = MeshBatch.VertexFactory;
            FVertexFactoryType* VertexFactoryType = VertexFactory->GetType();

            TMeshProcessorShaders<FLumenCardVS, FLumenCardPS> PassShaders;

            PassShaders.VertexShader = Material.GetShader<FLumenCardVS>(VertexFactoryType);
            PassShaders.PixelShader = Material.GetShader<FLumenCardPS>(VertexFactoryType);

            FMeshMaterialShaderElementData ShaderElementData;
            ShaderElementData.InitializeMeshMaterialData(ViewIfDynamicMeshCommand, PrimitiveSceneProxy, MeshBatch, StaticMeshId, false);

            const FMeshDrawCommandSortKey SortKey = CalculateMeshStaticSortKey(PassShaders.VertexShader, PassShaders.PixelShader);

            // 构建绘制指令
            BuildMeshDrawCommands(
                MeshBatch,
                BatchElementMask,
                PrimitiveSceneProxy,
                MaterialRenderProxy,
                Material,
                PassDrawRenderState,
                PassShaders,
                MeshFillMode,
                MeshCullMode,
                SortKey,
                EMeshPassFeatures::Default,
                ShaderElementData);
        }
    }
}

6.5.5.4 RasterizeLumenCards

光栅化Lumen卡片逻辑如下:

if (UseNanite(ShaderPlatform) && ViewFamily.EngineShowFlags.NaniteMeshes && bAnyNaniteMeshes)
{
    (......)

    Nanite::FRasterContext RasterContext = Nanite::InitRasterContext(...);

    (......)

    Nanite::FCullingContext CullingContext = Nanite::InitCullingContext(...);

    if (GLumenSceneNaniteMultiViewCapture) // 多视图绘制模型
    {
        const uint32 NumCardsToRender = CardsToRender.Num();

        // 拆分视图, 防止超过同批次的最大数量.
        uint32 NextCardIndex = 0;
        while(NextCardIndex < NumCardsToRender)
        {
            TArray<Nanite::FPackedView, SceneRenderingAllocator> NaniteViews;
            TArray<Nanite::FInstanceDraw, SceneRenderingAllocator> NaniteInstanceDraws;

            while(NextCardIndex < NumCardsToRender && NaniteViews.Num() < MAX_VIEWS_PER_CULL_RASTERIZE_PASS)
            {
                const FCardRenderData& CardRenderData = CardsToRender[NextCardIndex];

                if(CardRenderData.NaniteInstanceIds.Num() > 0)
                {
                    for(uint32 InstanceID : CardRenderData.NaniteInstanceIds)
                    {
                        NaniteInstanceDraws.Add(Nanite::FInstanceDraw { InstanceID, (uint32)NaniteViews.Num() });
                    }

                    Nanite::FPackedViewParams Params;
                    Params.ViewMatrices = CardRenderData.ViewMatrices;
                    Params.PrevViewMatrices = CardRenderData.ViewMatrices;
                    Params.ViewRect = CardRenderData.AtlasAllocation;
                    Params.RasterContextSize = DepthStencilAtlasSize;
                    Params.LODScaleFactor = CardRenderData.NaniteLODScaleFactor;
                    NaniteViews.Add(Nanite::CreatePackedView(Params));
                }

                NextCardIndex++;
            }

            // 实例化绘制.
            if (NaniteInstanceDraws.Num() > 0)
            {
                RDG_EVENT_SCOPE(GraphBuilder, "Nanite::RasterizeLumenCards");

                Nanite::FRasterState RasterState;
                Nanite::CullRasterize(
                    GraphBuilder,
                    *Scene,
                    NaniteViews,
                    CullingContext,
                    RasterContext,
                    RasterState,
                    &NaniteInstanceDraws
                );
            }
        }
    }
    else // 单视图模式.
    {
        (......)
    }
    
    extern float GLumenDistantSceneMinInstanceBoundsRadius;

    // 渲染远景的卡片.
    for (FCardRenderData& CardRenderData : CardsToRender)
    {
        if (CardRenderData.bDistantScene)
        {
            (......)
        }
    }

    // 绘制Lumen的网格.
    Nanite::DrawLumenMeshCapturePass(
        GraphBuilder,
        *Scene,
        SharedView,
        CardsToRender,
        CullingContext,
        RasterContext,
        PassUniformParameters,
        RectMinMaxBufferSRV,
        NumRects,
        LumenSceneData.MaxAtlasSize,
        AlbedoAtlasTexture,
        NormalAtlasTexture,
        EmissiveAtlasTexture,
        DepthStencilAtlasTexture
    );
}

光栅化卡片的阶段跟Nanite流程基本一致:

光栅化后输出的结果也是一致,包含可见性、深度模板缓冲、三角形ID等信息:

之后的步骤就是绘制网格卡片,这个阶段也和Nanite基本一致:

输出的GBuffer依然是上面提及的基础色、法线、自发光三个图集,但会附加到它们的空白区域。

6.5.6 Lumen场景光照

6.5.6.1 Voxel Cone Tracing

后面小节会较多地涉及到Voxel Cone Tracing(体素椎体追踪)的相关知识,本小节先补充一下它的相关知识,论文依据是Interactive Indirect Illumination Using Voxel Cone TracingVoxel Cone Tracing and Sparse Voxel Octree for Real-time Global Illumination

对场景执行Voxel Cone Tracing的第一步是构建场景物体的稀疏体素八叉树(Sparse Voxel Octree),UE5使用了稀疏HLOD的网格距离场。

下图是Sponza场景体素化后的情形:

渲染引擎(如UE)一般使用了混合渲染管线,直接光(Primary ray)使用传统的光栅化获得,次级光则使用椎体追踪:

在体素椎体追踪之前,会预过滤几何体,然后像参合介质那样去追踪(可使用体积光线投射法)。而体素使用不透明场+入射辐射率来代表场景物体,这样可以使用四线性(Quadrilinearly)插值采样来模拟椎体射线覆盖的脚印:

上图步骤中的单条椎体射线追踪需要用到MIP映射图,MIP映射图的生成使用了高斯权重,即体素中心的权重最大,偏离体素中心越远的点权重越小:

利用高斯权重生成的MIP图越高的Level越模糊,刚好可以匹配椎体的形状:椎体射线离起点越远,其覆盖的范围越大,接收到的光照越模糊!在此前提下,就可以根据椎体射线相交点与起点的距离去四线性采样对应Level的MIP图,以快速得到椎体射线相交点的辐射率:

Voxel的渲染过程可分拆成3个Pass:第一个Pass是光照,烘焙辐照度(反射阴影图,RSM);第二个Pass是预过滤,使用稀疏八叉树下采样辐射率;第三个Pass是相机Pass,收集每个可见片元(像素)的辐照度。(下图)

同样地,Voxel追踪还可以用于镜面反射、AO、软阴影中。对于镜面反射,可以采用类似的追踪方式,只是生成的镜面椎体数量少且范围小:

实际上,在Cone Tracing中,不同粗糙度的表面可以构造不同的数量和大小的椎体进行追踪:

左:高粗糙度表面,即漫反射,需要多个椎体追踪;中:较粗糙的镜面反射,只需一个角度较大的椎体追踪;右:低粗糙的镜面反射,只需一个角度较小的椎体追踪。

对于AO,采用近处多采样椎体追踪+远景AO+离线遮挡的综合方式:

对于软阴影,可以用一个像素一个椎体的方式采样,达到越光滑越高效的计算效果:

论文还提到了只用一个Pass达到体素化的技术,以及用Compute Shader构建稀疏八叉树的技术和过程:

6.5.6.2 RenderLumenSceneLighting

Lumen的场景光照由RenderLumenSceneLighting担当,它的代码如下:

// EngineSourceRuntimeRendererPrivateLumenLumenSceneLighting.cpp

void FDeferredShadingSceneRenderer::RenderLumenSceneLighting(
    FRDGBuilder& GraphBuilder,
    FViewInfo& View)
{
    FLumenSceneData& LumenSceneData = *Scene->LumenSceneData;
    // 检测是否开启了Lumen: 非直接漫反射或反射方式的其中一个是Lumen即可.
    const bool bAnyLumenEnabled = GetViewPipelineState(Views[0]).DiffuseIndirectMethod == EDiffuseIndirectMethod::Lumen 
        || GetViewPipelineState(Views[0]).ReflectionsMethod == EReflectionsMethod::Lumen;

    if (bAnyLumenEnabled)
    {
        RDG_EVENT_SCOPE(GraphBuilder, "LumenSceneLighting");

        FGlobalShaderMap* GlobalShaderMap = View.ShaderMap;
        FLumenCardTracingInputs TracingInputs(GraphBuilder, Scene, Views[0]);

        if (LumenSceneData.VisibleCardsIndices.Num() > 0)
        {
            FRDGTextureRef RadiosityAtlas = GraphBuilder.RegisterExternalTexture(LumenSceneData.RadiosityAtlas, TEXT("Lumen.RadiosityAtlas"));

            // 渲染辐射度.
            RenderRadiosityForLumenScene(GraphBuilder, TracingInputs, GlobalShaderMap, RadiosityAtlas);

            ConvertToExternalTexture(GraphBuilder, RadiosityAtlas, LumenSceneData.RadiosityAtlas);

            FLumenCardScatterContext DirectLightingCardScatterContext;
            extern float GLumenSceneCardDirectLightingUpdateFrequencyScale;

            // 构建间接参数并写入卡片的面,这些面用来更新这一帧的直接照明.
            DirectLightingCardScatterContext.Init(
                GraphBuilder,
                View,
                LumenSceneData,
                LumenCardRenderer,
                ECullCardsMode::OperateOnSceneForceUpdateForCardsToRender,
                1);

            // 裁剪卡片到指定形状.
            DirectLightingCardScatterContext.CullCardsToShape(
                GraphBuilder,
                View,
                LumenSceneData,
                LumenCardRenderer,
                TracingInputs.LumenCardSceneUniformBuffer,
                ECullCardsShapeType::None,
                FCullCardsShapeParameters(),
                GLumenSceneCardDirectLightingUpdateFrequencyScale,
                0);

            // 构建散射非直接参数.
            DirectLightingCardScatterContext.BuildScatterIndirectArgs(
                GraphBuilder,
                View);

            extern int32 GLumenSceneRecaptureLumenSceneEveryFrame;

            // 清理光照相关的图集: 最终收集图集, 辐照度图集, 非直接辐照度图集.
            if (GLumenSceneRecaptureLumenSceneEveryFrame)
            {
                ClearAtlasRDG(GraphBuilder, TracingInputs.FinalLightingAtlas);
                if (Lumen::UseIrradianceAtlas(View))
                {
                    ClearAtlasRDG(GraphBuilder, TracingInputs.IrradianceAtlas);
                }
                if (Lumen::UseIndirectIrradianceAtlas(View))
                {
                    ClearAtlasRDG(GraphBuilder, TracingInputs.IndirectIrradianceAtlas);
                }
            }

            // 组合场景光照.
            CombineLumenSceneLighting(
                Scene,
                View,
                GraphBuilder,
                TracingInputs.LumenCardSceneUniformBuffer,
                TracingInputs.FinalLightingAtlas,
                TracingInputs.OpacityAtlas,
                RadiosityAtlas,
                GlobalShaderMap, 
                DirectLightingCardScatterContext);

            // 拷贝TracingInputs.FinalLightingAtlas的数据到TracingInputs.IndirectIrradianceAtlas.
            if (Lumen::UseIndirectIrradianceAtlas(View))
            {
                CopyLumenCardAtlas(
                    Scene,
                    View,
                    GraphBuilder,
                    TracingInputs.LumenCardSceneUniformBuffer,
                    TracingInputs.FinalLightingAtlas,
                    TracingInputs.IndirectIrradianceAtlas,
                    GlobalShaderMap,
                    DirectLightingCardScatterContext);
            }

            // 渲染Lumen场景的直接光照.
            RenderDirectLightingForLumenScene(
                GraphBuilder,
                TracingInputs.LumenCardSceneUniformBuffer,
                TracingInputs.FinalLightingAtlas,
                TracingInputs.OpacityAtlas,
                GlobalShaderMap,
                DirectLightingCardScatterContext);

            if (Lumen::UseIrradianceAtlas(View))
            {
                CopyLumenCardAtlas(
                    Scene,
                    View,
                    GraphBuilder,
                    TracingInputs.LumenCardSceneUniformBuffer,
                    TracingInputs.FinalLightingAtlas,
                    TracingInputs.IrradianceAtlas,
                    GlobalShaderMap,
                    DirectLightingCardScatterContext);
            }

            FRDGTextureRef AlbedoAtlas = GraphBuilder.RegisterExternalTexture(LumenSceneData.AlbedoAtlas, TEXT("Lumen.AlbedoAtlas"));
            FRDGTextureRef EmissiveAtlas = GraphBuilder.RegisterExternalTexture(LumenSceneData.EmissiveAtlas, TEXT("Lumen.EmissiveAtlas"));
            // 应用Lumen卡片的基础色.
            ApplyLumenCardAlbedo(
                Scene,
                View,
                GraphBuilder,
                TracingInputs.LumenCardSceneUniformBuffer,
                TracingInputs.FinalLightingAtlas,
                AlbedoAtlas,
                EmissiveAtlas,
                GlobalShaderMap,
                DirectLightingCardScatterContext);

            LumenSceneData.bFinalLightingAtlasContentsValid = true;

            // 预过滤光照.
            PrefilterLumenSceneLighting(GraphBuilder, View, TracingInputs, GlobalShaderMap, DirectLightingCardScatterContext);

            ConvertToExternalTexture(GraphBuilder, TracingInputs.FinalLightingAtlas, LumenSceneData.FinalLightingAtlas);
            if (Lumen::UseIrradianceAtlas(View))
            {
                ConvertToExternalTexture(GraphBuilder, TracingInputs.IrradianceAtlas, LumenSceneData.IrradianceAtlas);
            }
            if (Lumen::UseIndirectIrradianceAtlas(View))
            {
                ConvertToExternalTexture(GraphBuilder, TracingInputs.IndirectIrradianceAtlas, LumenSceneData.IndirectIrradianceAtlas);
            }
        }

        // 计算Voxel光照.
        ComputeLumenSceneVoxelLighting(GraphBuilder, TracingInputs, GlobalShaderMap);

        // 透明物体GI.
        ComputeLumenTranslucencyGIVolume(GraphBuilder, TracingInputs, GlobalShaderMap);
    }
}

RenderDoc的截帧一目了然地显示了以上流程:

后面的小节对部分主要步骤执行分析。

6.5.6.3 RenderRadiosityForLumenScene

RenderRadiosityForLumenScene的逻辑是渲染Lumen场景的辐射度,代码如下:

// EngineSourceRuntimeRendererPrivateLumenLumenRadiosity.cpp

void FDeferredShadingSceneRenderer::RenderRadiosityForLumenScene(
    FRDGBuilder& GraphBuilder, 
    const FLumenCardTracingInputs& TracingInputs, 
    FGlobalShaderMap* GlobalShaderMap, 
    FRDGTextureRef RadiosityAtlas)
{
    LLM_SCOPE_BYTAG(Lumen);

    const FViewInfo& MainView = Views[0];
    FLumenSceneData& LumenSceneData = *Scene->LumenSceneData;

    extern int32 GLumenSceneRecaptureLumenSceneEveryFrame;

    if (IsRadiosityEnabled() 
        && !GLumenSceneRecaptureLumenSceneEveryFrame
        && LumenSceneData.bFinalLightingAtlasContentsValid
        && TracingInputs.NumClipmapLevels > 0)
    {
        RDG_EVENT_SCOPE(GraphBuilder, "Radiosity");

        FLumenCardScatterContext VisibleCardScatterContext;

        // 构建间接参数并写入卡片的面,这些面用来更新这一帧的直接照明.
        VisibleCardScatterContext.Init(
            GraphBuilder,
            MainView,
            LumenSceneData,
            LumenCardRenderer,
            ECullCardsMode::OperateOnSceneForceUpdateForCardsToRender);

        VisibleCardScatterContext.CullCardsToShape(
            GraphBuilder,
            MainView,
            LumenSceneData,
            LumenCardRenderer,
            TracingInputs.LumenCardSceneUniformBuffer,
            ECullCardsShapeType::None,
            FCullCardsShapeParameters(),
            GLumenSceneCardRadiosityUpdateFrequencyScale,
            0);

        // 构建非直接散射参数.
        VisibleCardScatterContext.BuildScatterIndirectArgs(
            GraphBuilder,
            MainView);

        // 生成采样点.
        RadiosityDirections.GenerateSamples(
            FMath::Clamp(GLumenRadiosityNumTargetCones, 1, (int32)MaxRadiosityConeDirections),
            1,
            GLumenRadiosityNumTargetCones,
            false,
            true /* Cosine distribution */);

        const bool bRenderSkylight = Lumen::ShouldHandleSkyLight(Scene, ViewFamily);

        // 渲染辐射度的散射.
        if (GLumenRadiosityComputeTraceBlocksScatter) // CS模式
        {
            RenderRadiosityComputeScatter(
                GraphBuilder,
                Scene,
                Views[0],
                bRenderSkylight,
                LumenSceneData,
                RadiosityAtlas,
                TracingInputs,
                VisibleCardScatterContext.Parameters,
                GlobalShaderMap);
        }
        else // PS模式
        {
            FLumenCardRadiosity* PassParameters = GraphBuilder.AllocParameters<FLumenCardRadiosity>();

            PassParameters->RenderTargets[0] = FRenderTargetBinding(RadiosityAtlas, ERenderTargetLoadAction::ENoAction);

            PassParameters->VS.LumenCardScene = TracingInputs.LumenCardSceneUniformBuffer;
            PassParameters->VS.CardScatterParameters = VisibleCardScatterContext.Parameters;
            PassParameters->VS.ScatterInstanceIndex = 0;
            PassParameters->VS.CardUVSamplingOffset = FVector2D::ZeroVector;

            SetupTraceFromTexelParameters(Views[0], TracingInputs, LumenSceneData, PassParameters->PS.TraceFromTexelParameters);

            FLumenCardRadiosityPS::FPermutationDomain PermutationVector;
            PermutationVector.Set<FLumenCardRadiosityPS::FDynamicSkyLight>(bRenderSkylight);
            auto PixelShader = GlobalShaderMap->GetShader<FLumenCardRadiosityPS>(PermutationVector);

            FScene* LocalScene = Scene;
            const int32 RadiosityDownsampleArea = GLumenRadiosityDownsampleFactor * GLumenRadiosityDownsampleFactor;

            // 从图集中追踪辐射度.
            GraphBuilder.AddPass(
                RDG_EVENT_NAME("TraceFromAtlasTexels: %u Cones", RadiosityDirections.SampleDirections.Num()),
                PassParameters,
                ERDGPassFlags::Raster,
                [LocalScene, PixelShader, PassParameters, GlobalShaderMap](FRHICommandListImmediate& RHICmdList)
            {
                FIntPoint ViewRect = FIntPoint::DivideAndRoundDown(LocalScene->LumenSceneData->MaxAtlasSize, GLumenRadiosityDownsampleFactor);
                DrawQuadsToAtlas(ViewRect, PixelShader, PassParameters, GlobalShaderMap, TStaticBlendState<>::GetRHI(), RHICmdList);
            });
        }
    }
    else
    {
        ClearAtlasRDG(GraphBuilder, RadiosityAtlas);
    }
}

以上代码中最后阶段是计算辐射度,通常情况下,会进入CS模式RenderRadiosityComputeScatter,下面进入其代码分析:

void RenderRadiosityComputeScatter(
    FRDGBuilder& GraphBuilder,
    const FScene* Scene,
    const FViewInfo& View,
    bool bRenderSkylight, 
    const FLumenSceneData& LumenSceneData,
    FRDGTextureRef RadiosityAtlas,
    const FLumenCardTracingInputs& TracingInputs,
    const FLumenCardScatterParameters& CardScatterParameters,
    FGlobalShaderMap* GlobalShaderMap)
{
    const bool bUseIrradianceCache = GLumenRadiosityUseIrradianceCache != 0;

    // 构建追踪块的非直接参数.
    FRDGBufferRef SetupCardTraceBlocksIndirectArgsBuffer = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateIndirectDesc<FRHIDispatchIndirectParameters>(1), TEXT("SetupCardTraceBlocksIndirectArgsBuffer"));
    {
        FRDGBufferUAVRef SetupCardTraceBlocksIndirectArgsBufferUAV = GraphBuilder.CreateUAV(FRDGBufferUAVDesc(SetupCardTraceBlocksIndirectArgsBuffer));

        FPlaceProbeIndirectArgsCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FPlaceProbeIndirectArgsCS::FParameters>();
        PassParameters->RWIndirectArgs = SetupCardTraceBlocksIndirectArgsBufferUAV;
        PassParameters->QuadAllocator = CardScatterParameters.QuadAllocator;

        auto ComputeShader = GlobalShaderMap->GetShader< FPlaceProbeIndirectArgsCS >(0);

        ensure(GSetupCardTraceBlocksGroupSize == GPlaceRadiosityProbeGroupSize);
        const FIntVector GroupSize(1, 1, 1);

        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("SetupCardTraceBlocksIndirectArgsCS"),
            ComputeShader,
            PassParameters,
            GroupSize);
    }

    const int32 TraceBlockMaxSize = 2;
    extern int32 GLumenSceneCardLightingForceFullUpdate;
    const int32 Divisor = TraceBlockMaxSize * GLumenRadiosityDownsampleFactor * (GLumenSceneCardLightingForceFullUpdate ? 1 : GLumenRadiosityTraceBlocksAllocationDivisor);
    const int32 NumTraceBlocksToAllocate = (LumenSceneData.MaxAtlasSize.X / Divisor) 
        * (LumenSceneData.MaxAtlasSize.Y / Divisor);

    FRDGBufferRef CardTraceBlockAllocator = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateBufferDesc(sizeof(uint32), 1), TEXT("CardTraceBlockAllocator"));
    FRDGBufferRef CardTraceBlockData = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateBufferDesc(sizeof(FIntVector4), NumTraceBlocksToAllocate), TEXT("CardTraceBlockData"));
    FRDGBufferUAVRef CardTraceBlockAllocatorUAV = GraphBuilder.CreateUAV(FRDGBufferUAVDesc(CardTraceBlockAllocator, PF_R32_UINT));
    FRDGBufferUAVRef CardTraceBlockDataUAV = GraphBuilder.CreateUAV(FRDGBufferUAVDesc(CardTraceBlockData, PF_R32G32B32A32_UINT));

    FComputeShaderUtils::ClearUAV(GraphBuilder, View.ShaderMap, CardTraceBlockAllocatorUAV, 0);

    // 构建卡片追踪块.
    {
        FSetupCardTraceBlocksCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FSetupCardTraceBlocksCS::FParameters>();
        PassParameters->RWCardTraceBlockAllocator = CardTraceBlockAllocatorUAV;
        PassParameters->RWCardTraceBlockData = CardTraceBlockDataUAV;
        PassParameters->QuadAllocator = CardScatterParameters.QuadAllocator;
        PassParameters->QuadData = CardScatterParameters.QuadData;
        PassParameters->CardBuffer = LumenSceneData.CardBuffer.SRV;
        PassParameters->RadiosityAtlasSize = FIntPoint::DivideAndRoundDown(LumenSceneData.MaxAtlasSize, GLumenRadiosityDownsampleFactor);
        PassParameters->IndirectArgs = SetupCardTraceBlocksIndirectArgsBuffer;

        auto ComputeShader = GlobalShaderMap->GetShader<FSetupCardTraceBlocksCS>();

        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("SetupCardTraceBlocksCS"),
            ComputeShader,
            PassParameters,
            SetupCardTraceBlocksIndirectArgsBuffer,
            0);
    }

    // 构建卡片追踪参数.
    FRDGBufferRef TraceBlocksIndirectArgsBuffer = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateIndirectDesc<FRHIDispatchIndirectParameters>(1), TEXT("TraceBlocksIndirectArgsBuffer"));
    {
        FRDGBufferUAVRef TraceBlocksIndirectArgsBufferUAV = GraphBuilder.CreateUAV(FRDGBufferUAVDesc(TraceBlocksIndirectArgsBuffer));

        FTraceBlocksIndirectArgsCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FTraceBlocksIndirectArgsCS::FParameters>();
        PassParameters->RWIndirectArgs = TraceBlocksIndirectArgsBufferUAV;
        PassParameters->CardTraceBlockAllocator = GraphBuilder.CreateSRV(FRDGBufferSRVDesc(CardTraceBlockAllocator, PF_R32_UINT));

        FTraceBlocksIndirectArgsCS::FPermutationDomain PermutationVector;
        PermutationVector.Set<FTraceBlocksIndirectArgsCS::FIrradianceCache>(bUseIrradianceCache);
        auto ComputeShader = GlobalShaderMap->GetShader< FTraceBlocksIndirectArgsCS >(PermutationVector);

        const FIntVector GroupSize(1, 1, 1);

        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("TraceBlocksIndirectArgsCS"),
            ComputeShader,
            PassParameters,
            GroupSize);
    }

    LumenRadianceCache::FRadianceCacheInterpolationParameters RadianceCacheParameters;

    // 渲染辐照度缓存.
    if (bUseIrradianceCache)
    {
        const LumenRadianceCache::FRadianceCacheInputs RadianceCacheInputs = LumenRadiosity::SetupRadianceCacheInputs();

        FRadiosityMarkUsedProbesData MarkUsedProbesData;
        MarkUsedProbesData.Parameters.View = View.ViewUniformBuffer;
        MarkUsedProbesData.Parameters.DepthAtlas = LumenSceneData.DepthAtlas->GetRenderTargetItem().ShaderResourceTexture;
        MarkUsedProbesData.Parameters.CurrentOpacityAtlas = LumenSceneData.OpacityAtlas->GetRenderTargetItem().ShaderResourceTexture;
        MarkUsedProbesData.Parameters.CardTraceBlockAllocator = GraphBuilder.CreateSRV(FRDGBufferSRVDesc(CardTraceBlockAllocator, PF_R32_UINT));
        MarkUsedProbesData.Parameters.CardTraceBlockData = GraphBuilder.CreateSRV(FRDGBufferSRVDesc(CardTraceBlockData, PF_R32G32B32A32_UINT));
        MarkUsedProbesData.Parameters.CardBuffer = LumenSceneData.CardBuffer.SRV;
        MarkUsedProbesData.Parameters.RadiosityAtlasSize = FIntPoint::DivideAndRoundDown(LumenSceneData.MaxAtlasSize, GLumenRadiosityDownsampleFactor);
        MarkUsedProbesData.Parameters.IndirectArgs = TraceBlocksIndirectArgsBuffer;

        RenderRadianceCache(
            GraphBuilder, 
            TracingInputs, 
            RadianceCacheInputs, 
            Scene,
            View, 
            nullptr, 
            nullptr, 
            FMarkUsedRadianceCacheProbes::CreateStatic(&RadianceCacheMarkUsedProbes), 
            &MarkUsedProbesData, 
            View.ViewState->RadiosityRadianceCacheState, 
            RadianceCacheParameters);
    }

    // 从图集中追踪卡片纹素的辐射度.
    {
        FLumenCardRadiosityTraceBlocksCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FLumenCardRadiosityTraceBlocksCS::FParameters>();
        PassParameters->RWRadiosityAtlas = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(RadiosityAtlas));
        PassParameters->RadianceCacheParameters = RadianceCacheParameters;
        PassParameters->CardTraceBlockAllocator = GraphBuilder.CreateSRV(FRDGBufferSRVDesc(CardTraceBlockAllocator, PF_R32_UINT));
        PassParameters->CardTraceBlockData = GraphBuilder.CreateSRV(FRDGBufferSRVDesc(CardTraceBlockData, PF_R32G32B32A32_UINT));
        PassParameters->ProbeOcclusionNormalBias = GLumenRadiosityIrradianceCacheProbeOcclusionNormalBias;
        PassParameters->IndirectArgs = TraceBlocksIndirectArgsBuffer;

        SetupTraceFromTexelParameters(View, TracingInputs, LumenSceneData, PassParameters->TraceFromTexelParameters);

        FLumenCardRadiosityTraceBlocksCS::FPermutationDomain PermutationVector;
        PermutationVector.Set<FLumenCardRadiosityTraceBlocksCS::FDynamicSkyLight>(bRenderSkylight);
        PermutationVector.Set<FLumenCardRadiosityTraceBlocksCS::FIrradianceCache>(bUseIrradianceCache);
        auto ComputeShader = GlobalShaderMap->GetShader< FLumenCardRadiosityTraceBlocksCS >(PermutationVector);

        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("TraceFromAtlasTexels: %u Cones", RadiosityDirections.SampleDirections.Num()),
            ComputeShader,
            PassParameters,
            TraceBlocksIndirectArgsBuffer,
            0);
    }
}

由此可知计算辐射度的过程比较多,包含裁剪、构建追踪参数、追踪图集纹素等:

最后阶段的追踪纹素主要是构造采样方向,每个采样方向构建一个椎体(Cone)去追踪附近的辐射度,它的输入参数主要有全局距离场图集、场景深度、场景透明度、场景法线、VoxelLighting等数据:

追踪卡片纹素所需的数据:左上是全局距离场图集,右上是场景深度图集,左下是场景透明度,右下是场景法线。

输出的是场景辐射度图集:

对应的CS shader代码如下:

// EngineShadersPrivateLumenLumenRadiosity.usf

float ProbeOcclusionNormalBias;
// 用于保持线程组的光照结果, 注意是groupshared的.
groupshared float3 ThreadLighting[THREADGROUP_SIZE];

[numthreads(THREADGROUP_SIZE, 1, 1)]
void LumenCardRadiosityTraceBlocksCS(
    uint3 DispatchThreadId : SV_DispatchThreadID,
    uint3 GroupThreadId : SV_GroupThreadID)
{
#if IRRADIANCE_CACHE // 辐照度缓存模式
    uint ThreadIndex = DispatchThreadId.x;

    uint GlobalBlockIndex = ThreadIndex / (CARD_TRACE_BLOCK_SIZE * CARD_TRACE_BLOCK_SIZE);

    if (GlobalBlockIndex < CardTraceBlockAllocator[0])
    {
        // 计算纹素索引.
        uint TexelIndexInBlock = ThreadIndex % (CARD_TRACE_BLOCK_SIZE * CARD_TRACE_BLOCK_SIZE);
        uint2 TexelOffsetInBlock = uint2(TexelIndexInBlock % CARD_TRACE_BLOCK_SIZE, TexelIndexInBlock / CARD_TRACE_BLOCK_SIZE);

        // 获取追踪块数据.
        uint4 TraceBlockData = CardTraceBlockData[GlobalBlockIndex];
        uint CardId = TraceBlockData.x;
        uint ProbeIndex = TraceBlockData.y;
        uint BlockIndex = TraceBlockData.z;

        // 获取卡片数据.
        FLumenCardData CardData = GetLumenCardData(CardId, CardBuffer);

        float2 CardSizeTexels = abs(CardData.LocalExtent.xy * 2 * CardData.LocalPositionToAtlasUVScale * RadiosityAtlasSize);
        uint2 NumBlocksXY = ((uint2)CardSizeTexels + CARD_TRACE_BLOCK_SIZE - 1) / CARD_TRACE_BLOCK_SIZE;
        uint2 BlockOffset = uint2(BlockIndex % NumBlocksXY.x, BlockIndex / NumBlocksXY.x);
        float2 TexelCoord = BlockOffset * CARD_TRACE_BLOCK_SIZE + TexelOffsetInBlock;

        if (all(TexelCoord < CardSizeTexels))
        {
            // 计算卡片UV.
            float2 CardUV = (TexelCoord + .5f) / (float2)CardSizeTexels;
            float2 CardUVToAtlasScale = GetCardUVToAtlasScale(CardData.LocalPositionToAtlasUVScale, CardData.LocalExtent);
            float2 CardUVToAtlasBias = GetCardUVToAtlasBias(CardUVToAtlasScale, CardData.LocalPositionToAtlasUVBias);
            float2 AtlasUV = CardUV * CardUVToAtlasScale + CardUVToAtlasBias;

            float Opacity = Texture2DSampleLevel(CurrentOpacityAtlas, GlobalBilinearClampedSampler, AtlasUV, 0).x;

            float3 DiffuseLighting = 0;

            // 透明度大于0的辐射度才有意义.
            if (Opacity > 0)
            {
                float Depth = 1.0f - Texture2DSampleLevel(DepthAtlas, GlobalBilinearClampedSampler, AtlasUV, 0).x;

                float3 LocalPosition;
                LocalPosition.xy = (AtlasUV - CardData.LocalPositionToAtlasUVBias) / CardData.LocalPositionToAtlasUVScale;
                LocalPosition.z = -CardData.LocalExtent.z + Depth * 2 * CardData.LocalExtent.z;

                // 计算世界空间的位置和法线.
                float3 WorldPosition = mul(CardData.WorldToLocalRotation, LocalPosition) + CardData.Origin;
                float3 WorldNormal = normalize(Texture2DSampleLevel(NormalAtlas, GlobalBilinearClampedSampler, AtlasUV, 0).xyz * 2 - 1);
                uint ClipmapIndex = GetRadianceProbeClipmap(WorldPosition);

                // 计算漫反射光照. 如果裁剪图有效, 则从中插值获得.
                if (ClipmapIndex < NumRadianceProbeClipmaps)
                {
                    float3 BiasOffset = WorldNormal * ProbeOcclusionNormalBias;
                    // 从RadianceProbeIndirectionTexture采样计算漫反射.
                    DiffuseLighting = SampleIrradianceCacheInterpolated(WorldPosition, WorldNormal, BiasOffset, ClipmapIndex);
                }
                else // 没有有效裁剪图, 从天空光的球谐中计算漫反射.
                {
                    DiffuseLighting = GetSkySHDiffuse(WorldNormal) * View.SkyLightColor.rgb;
                }
            }

            // 存储辐射度.
            uint2 AtlasCoord = uint2(AtlasUV * RadiosityAtlasSize);
            RWRadiosityAtlas[AtlasCoord] = float4(DiffuseLighting * PI, 0);
        }
    }
#else // 非辐照度缓存模式
    ThreadLighting[GroupThreadId.x] = 0;

    uint ThreadIndex = DispatchThreadId.x;
    uint GlobalBlockIndex = ThreadIndex / (CARD_TRACE_BLOCK_SIZE * CARD_TRACE_BLOCK_SIZE * THREADS_PER_RADIOSITY_TEXEL);
    int2 AtlasCoord = -1;

    if (GlobalBlockIndex < CardTraceBlockAllocator[0])
    {
        uint TexelIndexInBlock = (ThreadIndex / THREADS_PER_RADIOSITY_TEXEL) % (CARD_TRACE_BLOCK_SIZE * CARD_TRACE_BLOCK_SIZE);
        uint2 TexelOffsetInBlock = uint2(TexelIndexInBlock % CARD_TRACE_BLOCK_SIZE, TexelIndexInBlock / CARD_TRACE_BLOCK_SIZE);

        uint4 TraceBlockData = CardTraceBlockData[GlobalBlockIndex];
        uint CardId = TraceBlockData.x;
        uint ProbeIndex = TraceBlockData.y;
        uint BlockIndex = TraceBlockData.z;

        FLumenCardData CardData = GetLumenCardData(CardId, CardBuffer);

        float2 CardSizeTexels = abs(CardData.LocalExtent.xy * 2 * CardData.LocalPositionToAtlasUVScale * RadiosityAtlasSize);
        uint2 NumBlocksXY = ((uint2)CardSizeTexels + CARD_TRACE_BLOCK_SIZE - 1) / CARD_TRACE_BLOCK_SIZE;
        uint2 BlockOffset = uint2(BlockIndex % NumBlocksXY.x, BlockIndex / NumBlocksXY.x);
        float2 TexelCoord = BlockOffset * CARD_TRACE_BLOCK_SIZE + TexelOffsetInBlock;

        if (all(TexelCoord < CardSizeTexels))
        {
            uint TraceThreadIndex = ThreadIndex % THREADS_PER_RADIOSITY_TEXEL;

            float2 CardUV = (TexelCoord + .5f) / (float2)CardSizeTexels;
            float2 CardUVToAtlasScale = GetCardUVToAtlasScale(CardData.LocalPositionToAtlasUVScale, CardData.LocalExtent);
            float2 CardUVToAtlasBias = GetCardUVToAtlasBias(CardUVToAtlasScale, CardData.LocalPositionToAtlasUVBias);
            float2 AtlasUV = CardUV * CardUVToAtlasScale + CardUVToAtlasBias;

            uint NumTracesPerThread = NumCones / THREADS_PER_RADIOSITY_TEXEL;
            uint ConeStartIndex = TraceThreadIndex * NumTracesPerThread;
            AtlasCoord = int2(AtlasUV * RadiosityAtlasSize);
            // 从卡片纹素追踪辐射度.
            float3 Lighting = RadiosityTraceFromTexel(AtlasUV, AtlasCoord, ProbeIndex, CardData, ConeStartIndex, ConeStartIndex + NumTracesPerThread);
            ThreadLighting[GroupThreadId.x] = Lighting;
        }
    }

    // 等待同线程组的其它线程完成计算.
    GroupMemoryBarrierWithGroupSync();

    uint TraceThreadIndex = ThreadIndex % THREADS_PER_RADIOSITY_TEXEL;

    // 叠加同线程组所有线程的光照结果并保存. TraceThreadIndex == 0表明只在每个线程组的第一个线程执行.
    if (TraceThreadIndex == 0 && all(AtlasCoord >= 0))
    {
        float3 Lighting = 0;

        for (uint OtherThreadIndex = GroupThreadId.x; OtherThreadIndex < GroupThreadId.x + THREADS_PER_RADIOSITY_TEXEL; OtherThreadIndex += 1)
        {
            Lighting += ThreadLighting[OtherThreadIndex];
        }

        RWRadiosityAtlas[AtlasCoord] = float4(Lighting, 0);
    }
#endif
}

由此可知,追踪辐射度时,支持两种模式:辐照度缓存模式和非辐照度缓存模式。辐照度缓存模式是从3D的RadianceProbeIndirectionTexture采样、插值计算而得到辐射度,而非辐照度缓存模式是实时追踪卡片纹素附近的辐射度,再叠加它们的结果,其中用到了RadiosityTraceFromTexel的逻辑如下:

float3 RadiosityTraceFromTexel(float2 AtlasUV, int2 AtlasCoord, uint ProbeIndex, FLumenCardData LumenCardData, uint ConeStartIndex, uint ConeEndIndex)
{
    float Opacity = Texture2DSampleLevel(CurrentOpacityAtlas, GlobalBilinearClampedSampler, AtlasUV, 0).x;

    float3 Lighting = 0;

    if (Opacity > 0)
    {
        float Depth = 1.0f - Texture2DSampleLevel(DepthAtlas, GlobalBilinearClampedSampler, AtlasUV, 0).x;

        // 重建局部位置
        float3 LocalPosition;
        LocalPosition.xy = (AtlasUV - LumenCardData.LocalPositionToAtlasUVBias) / LumenCardData.LocalPositionToAtlasUVScale;
        LocalPosition.z = -LumenCardData.LocalExtent.z + Depth * 2 * LumenCardData.LocalExtent.z;

        // 世界空间的位置和法线.
        float3 WorldPosition = mul(LumenCardData.WorldToLocalRotation, LocalPosition) + LumenCardData.Origin;
        float3 WorldNormal = normalize(Texture2DSampleLevel(NormalAtlas, GlobalBilinearClampedSampler, AtlasUV, 0).xyz * 2 - 1);

        //@todo - derive bias from texel world size
        WorldPosition += WorldNormal * SurfaceBias;

        // 追踪起点.
        float VoxelTraceStartDistance = CalculateVoxelTraceStartDistance(MinTraceDistance, MaxTraceDistance, MaxMeshSDFTraceDistance, false);

        // 遍历所有方向的椎体, 叠加它们的结果.
        for (uint ConeIndex = ConeStartIndex; ConeIndex < ConeEndIndex; ConeIndex++)
        {
            //uint ConeIndex = ConeStartIndex;
            float3x3 TangentBasis = GetTangentBasisFrisvad(WorldNormal);

            // 计算椎体方向.
            #define PRECOMPUTED_SAMPLE_DIRECTIONS 1
            #if PRECOMPUTED_SAMPLE_DIRECTIONS // 预计算的方向.
                float3 LocalConeDirection = RadiosityConeDirections[ConeIndex].xyz;
                float3 WorldConeDirection = mul(LocalConeDirection, TangentBasis);
            #else // 非预计算, 直接通过低差异序列生成方向.
                uint2 Seed0 = Rand3DPCG16(int3(AtlasCoord + 17, 0)).xy;
                float2 E = Hammersley16(ConeIndex, NumCones, Seed0);
                float2 DiskE = UniformSampleDiskConcentric(E.xy);
                float TangentZ = sqrt(1 - length2(DiskE));
                float3 WorldConeDirection = mul(float3(DiskE, TangentZ), TangentBasis);
            #endif

            //@todo - derive bias from texel world size
            // 采样位置.
            float3 SamplePosition = WorldPosition + SurfaceBias * WorldConeDirection;

            // 构建椎体追踪输入数据.
            FConeTraceInput TraceInput;
            TraceInput.Setup(SamplePosition, WorldConeDirection, DiffuseConeHalfAngle, MinSampleRadius, MinTraceDistance, MaxTraceDistance, StepFactor);
            TraceInput.VoxelStepFactor = VoxelStepFactor;
            TraceInput.VoxelTraceStartDistance = VoxelTraceStartDistance;
            TraceInput.SDFStepFactor = 1;

            // 执行椎体追踪, 保存结果.
            FConeTraceResult TraceResult;
            ConeTraceVoxels(TraceInput, TraceResult);

            // 用椎体计算天空光的辐射度.
            EvaluateSkyRadianceForCone(WorldConeDirection, TraceInput.TanConeAngle, TraceResult);

            // 叠加采样的光照结果.
            Lighting += TraceResult.Lighting;
        }
    }

    // 缩放采样结果, 防止能量不守恒.
    Lighting *= PI / (float)NumCones;
    return Lighting;
}

上面涉及到了椎体追踪场景的接口ConeTraceVoxels就是6.5.6.1 Voxel Cone Tracing提及的方式,代码如下:

// EngineShadersPrivateLumenLumenTracingCommon.ush

void ConeTraceVoxels(
    FConeTraceInput TraceInput,
    inout FConeTraceResult OutResult)
{
    FGlobalSDFTraceResult SDFTraceResult;

    // 追踪SDF射线
    {
        FGlobalSDFTraceInput SDFTraceInput = SetupGlobalSDFTraceInput(TraceInput.ConeOrigin, TraceInput.ConeDirection, TraceInput.MinTraceDistance, TraceInput.MaxTraceDistance, TraceInput.SDFStepFactor, TraceInput.VoxelStepFactor);
        SDFTraceInput.bExpandSurfaceUsingRayTimeInsteadOfMaxDistance = TraceInput.bExpandSurfaceUsingRayTimeInsteadOfMaxDistance;
        SDFTraceInput.InitialMaxDistance = TraceInput.InitialMaxDistance;

        // 追踪全局距离场.
        SDFTraceResult = RayTraceGlobalDistanceField(SDFTraceInput);
    }

    float4 LightingAndAlpha = float4(0, 0, 0, 1);

    // 只有全局距离场命中才执行下面的逻辑.
    if (GlobalSDFTraceResultIsHit(SDFTraceResult))
    {
        float3 SampleWorldPosition = TraceInput.ConeOrigin + TraceInput.ConeDirection * SDFTraceResult.HitTime;

        uint VoxelClipmapIndex = 0;
        float3 VoxelClipmapCenter = ClipmapWorldCenter[VoxelClipmapIndex].xyz;
        float3 VoxelClipmapExtent = ClipmapWorldSamplingExtent[VoxelClipmapIndex].xyz;

        bool bOutsideValidRegion = any(SampleWorldPosition > VoxelClipmapCenter + VoxelClipmapExtent || SampleWorldPosition < VoxelClipmapCenter - VoxelClipmapExtent);

        // 查找匹配当前步进的椎体宽度的voxel clipmap.
        while (bOutsideValidRegion && VoxelClipmapIndex + 1 < NumClipmapLevels)
        {
            VoxelClipmapIndex++;
            VoxelClipmapCenter = ClipmapWorldCenter[VoxelClipmapIndex].xyz;
            VoxelClipmapExtent = ClipmapWorldSamplingExtent[VoxelClipmapIndex].xyz;
            bOutsideValidRegion = any(SampleWorldPosition > VoxelClipmapCenter + VoxelClipmapExtent || SampleWorldPosition < VoxelClipmapCenter - VoxelClipmapExtent);
        }

        LightingAndAlpha.xyzw = 0.0f;

        // 如果没有超出有效范围, 则计算Voxel光照.
        if (!bOutsideValidRegion)
        {
            float3 DistanceFieldGradient = -TraceInput.ConeDirection;

            float3 ClipmapVolumeUV = ComputeGlobalUV(SampleWorldPosition, SDFTraceResult.HitClipmapIndex);
            uint PageIndex = GetGlobalDistanceFieldPage(ClipmapVolumeUV, SDFTraceResult.HitClipmapIndex);

            if (PageIndex < GLOBAL_DISTANCE_FIELD_INVALID_PAGE_ID)
            {
                float3 PageUV = ComputeGlobalDistanceFieldPageUV(ClipmapVolumeUV, PageIndex);
                DistanceFieldGradient = GlobalDistanceFieldPageCentralDiff(PageUV);
            }

            float DistanceFieldGradientLength = length(DistanceFieldGradient);
            float3 SampleNormal = DistanceFieldGradientLength > 0.001 ? DistanceFieldGradient / DistanceFieldGradientLength : -TraceInput.ConeDirection;

            // 采样3D纹理VoxelLighting, 获得光照.
            float4 StepLighting = SampleVoxelLighting(SampleWorldPosition, -SampleNormal, VoxelClipmapIndex);

            StepLighting.xyz = StepLighting.xyz * (1.0f / max(StepLighting.w, 0.1));

            // 计算自遮挡因子.
            float VoxelSelfLightingBias = 1.0f;
            if (TraceInput.bExpandSurfaceUsingRayTimeInsteadOfMaxDistance)
            {
                // 对于漫射光线,最好是过度遮挡, 而不该漏光.
                VoxelSelfLightingBias = smoothstep(1.5 * ClipmapVoxelSizeAndRadius[VoxelClipmapIndex].w, 2.0 * ClipmapVoxelSizeAndRadius[VoxelClipmapIndex].w, SDFTraceResult.HitTime);
            }

            // 获得自遮挡后的光照结果.
            LightingAndAlpha.xyz = StepLighting.xyz * VoxelSelfLightingBias;
        }
    }

    // 根据Opacity过渡光照结果.
    LightingAndAlpha = FadeOutVoxelConeTraceMinTransparency(LightingAndAlpha);

    // 保存结果.
    OutResult = (FConeTraceResult)0;
    #if !VISIBILITY_ONLY_TRACE
        OutResult.Lighting = LightingAndAlpha.rgb;
    #endif
    OutResult.Transparency = LightingAndAlpha.a;
    OutResult.NumSteps = SDFTraceResult.TotalStepsTaken;
    OutResult.OpaqueHitDistance = GlobalSDFTraceResultIsHit(SDFTraceResult) ? SDFTraceResult.HitTime : TraceInput.MaxTraceDistance;
}

上面的椎体追踪中使用了VoxelLighting的3D纹理,该纹理同时还是Clipmap,笔者所截取的数据中显示它的维度是64x256x384,并且很多切片(Slice)是黑色的,只有少许是有像素的,且区域很小:

6.5.6.4 CombineLumenSceneLighting

CombineLumenSceneLighting是组合光照,具体逻辑如下:

void CombineLumenSceneLighting(
    FScene* Scene, 
    FViewInfo& View,
    FRDGBuilder& GraphBuilder,
    TRDGUniformBufferRef<FLumenCardScene> LumenCardSceneUniformBuffer,
    FRDGTextureRef FinalLightingAtlas, 
    FRDGTextureRef OpacityAtlas, 
    FRDGTextureRef RadiosityAtlas, 
    FGlobalShaderMap* GlobalShaderMap,
    const FLumenCardScatterContext& VisibleCardScatterContext)
{
    LLM_SCOPE_BYTAG(Lumen);

    FLumenSceneData& LumenSceneData = *Scene->LumenSceneData;

    {
        FLumenCardLightingEmissive* PassParameters = GraphBuilder.AllocParameters<FLumenCardLightingEmissive>();
        
        extern int32 GLumenRadiosityDownsampleFactor;
        FVector2D CardUVSamplingOffset = FVector2D::ZeroVector;
        if (GLumenRadiosityDownsampleFactor > 1)
        {
            // Offset bilinear samples in order to not sample outside of the lower res radiosity card bounds
            CardUVSamplingOffset.X = (GLumenRadiosityDownsampleFactor * 0.25f) / LumenSceneData.MaxAtlasSize.X;
            CardUVSamplingOffset.Y = (GLumenRadiosityDownsampleFactor * 0.25f) / LumenSceneData.MaxAtlasSize.Y;
        }

        PassParameters->RenderTargets[0] = FRenderTargetBinding(FinalLightingAtlas, ERenderTargetLoadAction::ENoAction);
        PassParameters->VS.LumenCardScene = LumenCardSceneUniformBuffer;
        PassParameters->VS.CardScatterParameters = VisibleCardScatterContext.Parameters;
        PassParameters->VS.ScatterInstanceIndex = 0;
        PassParameters->VS.CardUVSamplingOffset = CardUVSamplingOffset;
        PassParameters->PS.View = View.ViewUniformBuffer;
        PassParameters->PS.LumenCardScene = LumenCardSceneUniformBuffer;
        PassParameters->PS.RadiosityAtlas = RadiosityAtlas;
        PassParameters->PS.OpacityAtlas = OpacityAtlas;

        // 增加光照组合Pass, 用的是传统的光栅化流程.
        GraphBuilder.AddPass(
            RDG_EVENT_NAME("LightingCombine"),
            PassParameters,
            ERDGPassFlags::Raster,
            [MaxAtlasSize = Scene->LumenSceneData->MaxAtlasSize, PassParameters, GlobalShaderMap](FRHICommandListImmediate& RHICmdList)
        {
            FLumenCardLightingInitializePS::FPermutationDomain PermutationVector;
            auto PixelShader = GlobalShaderMap->GetShader< FLumenCardLightingInitializePS >(PermutationVector);

            DrawQuadsToAtlas(MaxAtlasSize, PixelShader, PassParameters, GlobalShaderMap, TStaticBlendState<>::GetRHI(), RHICmdList);
        });
    }
}

这个阶段是将上一节的场景辐射度图集作为输入,然后输出输出辐射度颜色到SceneFinalLighting中。

6.5.6.5 RenderDirectLightingForLumenScene

RenderDirectLightingForLumenScene是计算Lumen场景的直接光照,流程有点类似于传统的光照:

// EngineSourceRuntimeRendererPrivateLumenLumenSceneDirectLighting.cpp

void FDeferredShadingSceneRenderer::RenderDirectLightingForLumenScene(
    FRDGBuilder& GraphBuilder,
    TRDGUniformBufferRef<FLumenCardScene> LumenCardSceneUniformBuffer,
    FRDGTextureRef FinalLightingAtlas,
    FRDGTextureRef OpacityAtlas,
    FGlobalShaderMap* GlobalShaderMap,
    const FLumenCardScatterContext& VisibleCardScatterContext)
{
    LLM_SCOPE_BYTAG(Lumen);

    if (GLumenDirectLighting)
    {
        RDG_EVENT_SCOPE(GraphBuilder, "DirectLighting");
        QUICK_SCOPE_CYCLE_COUNTER(RenderDirectLightingForLumenScene);

        const FViewInfo& MainView = Views[0];
        FLumenSceneData& LumenSceneData = *Scene->LumenSceneData;
        const bool bLumenUseHardwareRayTracedShadow = Lumen::UseHardwareRayTracedShadows(MainView);
        FLumenDirectLightingHardwareRayTracingData LumenDirectLightingHardwareRayTracingData;
        
        if(bLumenUseHardwareRayTracedShadow)
        {
            LumenDirectLightingHardwareRayTracingData.Initialize(GraphBuilder, Scene);
        }

        TArray<const FLightSceneInfo*, TInlineAllocator<64>> GatheredLocalLights;

        // 遍历场景的所有光源.
        for (TSparseArray<FLightSceneInfoCompact>::TConstIterator LightIt(Scene->Lights); LightIt; ++LightIt)
        {
            const FLightSceneInfoCompact& LightSceneInfoCompact = *LightIt;
            const FLightSceneInfo* LightSceneInfo = LightSceneInfoCompact.LightSceneInfo;

            if (LightSceneInfo->ShouldRenderLightViewIndependent()
                && LightSceneInfo->ShouldRenderLight(MainView, true)
                && LightSceneInfo->Proxy->GetIndirectLightingScale() > 0.0f)
            {
                const ELightComponentType LightType = (ELightComponentType)LightSceneInfo->Proxy->GetLightType();

                // 平行光
                if (LightType == LightType_Directional)
                {
                    // 不需要裁剪, 直接绘制.

                    FString LightNameWithLevel;
                    FSceneRenderer::GetLightNameForDrawEvent(LightSceneInfo->Proxy, LightNameWithLevel);

                    // 渲染直接光到Lumen卡片.
                    RenderDirectLightIntoLumenCards(
                        GraphBuilder,
                        Scene,
                        MainView,
                        ViewFamily.EngineShowFlags,
                        VisibleLightInfos,
                        LumenCardSceneUniformBuffer,
                        FinalLightingAtlas,
                        OpacityAtlas,
                        LightSceneInfo,
                        LightNameWithLevel,
                        VisibleCardScatterContext,
                        0,
                        LumenDirectLightingHardwareRayTracingData,
                        VirtualShadowMapArray);
                }
                else // 非平行光, 收集到GatheredLocalLights.
                {
                    GatheredLocalLights.Add(LightSceneInfo);
                }
            }
        }

        const int32 LightBatchSize = FMath::Clamp(GLumenDirectLightingBatchSize, 1, 256);

        // 分批的光照裁剪和绘图
        for (int32 LightBatchIndex = 0; LightBatchIndex * LightBatchSize < GatheredLocalLights.Num(); ++LightBatchIndex)
        {
            const int32 FirstLightIndex = LightBatchIndex * LightBatchSize;
            const int32 LastLightIndex = FMath::Min((LightBatchIndex + 1) * LightBatchSize, GatheredLocalLights.Num());

            FLumenCardScatterContext CardScatterContext;

            {
                RDG_EVENT_SCOPE(GraphBuilder, "Cull Cards %d Lights", LastLightIndex - FirstLightIndex);

                // 初始化上下文.
                CardScatterContext.Init(
                    GraphBuilder,
                    MainView,
                    LumenSceneData,
                    LumenCardRenderer,
                    ECullCardsMode::OperateOnSceneForceUpdateForCardsToRender,
                    LightBatchSize);

                // 将卡片裁剪到光源的形状上.
                for (int32 LightIndex = FirstLightIndex; LightIndex < LastLightIndex; ++LightIndex)
                {
                    const int32 ScatterInstanceIndex = LightIndex - FirstLightIndex;
                    const FLightSceneInfo* LightSceneInfo = GatheredLocalLights[LightIndex];
                    const ELightComponentType LightType = (ELightComponentType)LightSceneInfo->Proxy->GetLightType();
                    const FSphere LightBounds = LightSceneInfo->Proxy->GetBoundingSphere();

                    ECullCardsShapeType ShapeType = ECullCardsShapeType::None;

                    if (LightType == LightType_Point)
                    {
                        ShapeType = ECullCardsShapeType::PointLight;
                    }
                    else if (LightType == LightType_Spot)
                    {
                        ShapeType = ECullCardsShapeType::SpotLight;
                    }
                    else if (LightType == LightType_Rect)
                    {
                        ShapeType = ECullCardsShapeType::RectLight;
                    }
                    else
                    {
                        ensureMsgf(false, TEXT("Need Lumen card culling for new light type"));
                    }

                    FCullCardsShapeParameters ShapeParameters;
                    ShapeParameters.InfluenceSphere = FVector4(LightBounds.Center, LightBounds.W);
                    ShapeParameters.LightPosition = LightSceneInfo->Proxy->GetPosition();
                    ShapeParameters.LightDirection = LightSceneInfo->Proxy->GetDirection();
                    ShapeParameters.LightRadius = LightSceneInfo->Proxy->GetRadius();
                    ShapeParameters.CosConeAngle = FMath::Cos(LightSceneInfo->Proxy->GetOuterConeAngle());
                    ShapeParameters.SinConeAngle = FMath::Sin(LightSceneInfo->Proxy->GetOuterConeAngle());

                    // 根据光源形状裁剪卡片
                    CardScatterContext.CullCardsToShape(
                        GraphBuilder,
                        MainView,
                        LumenSceneData,
                        LumenCardRenderer,
                        LumenCardSceneUniformBuffer,
                        ShapeType,
                        ShapeParameters,
                        GLumenSceneCardDirectLightingUpdateFrequencyScale,
                        ScatterInstanceIndex);
                }

                // 构建散射非直接参数.
                CardScatterContext.BuildScatterIndirectArgs(
                    GraphBuilder,
                    MainView);
            }

            // 绘制非平行光的光源.
            {
                RDG_EVENT_SCOPE(GraphBuilder, "Draw %d Lights", LastLightIndex - FirstLightIndex);

                for (int32 LightIndex = FirstLightIndex; LightIndex < LastLightIndex; ++LightIndex)
                {
                    const int32 ScatterInstanceIndex = LightIndex - FirstLightIndex;
                    const FLightSceneInfo* LightSceneInfo = GatheredLocalLights[LightIndex];

                    FString LightNameWithLevel;
                    FSceneRenderer::GetLightNameForDrawEvent(LightSceneInfo->Proxy, LightNameWithLevel);

                    // 绘制非平行光的光源到Lumen卡片.
                    RenderDirectLightIntoLumenCards(
                        GraphBuilder,
                        Scene,
                        MainView,
                        ViewFamily.EngineShowFlags,
                        VisibleLightInfos,
                        LumenCardSceneUniformBuffer,
                        FinalLightingAtlas,
                        OpacityAtlas,
                        LightSceneInfo,
                        LightNameWithLevel,
                        CardScatterContext,
                        ScatterInstanceIndex,
                        LumenDirectLightingHardwareRayTracingData,
                        VirtualShadowMapArray);
                }
            }
        }
    }
}

下面是绘制单个光源RenderDirectLightIntoLumenCards的代码:

void RenderDirectLightIntoLumenCards(
    FRDGBuilder& GraphBuilder,
    const FScene* Scene,
    const FViewInfo& View,
    const FEngineShowFlags& EngineShowFlags,
    TArray<FVisibleLightInfo, SceneRenderingAllocator>& VisibleLightInfos,
    TRDGUniformBufferRef<FLumenCardScene> LumenCardSceneUniformBuffer,
    FRDGTextureRef FinalLightingAtlas,
    FRDGTextureRef OpacityAtlas,
    const FLightSceneInfo* LightSceneInfo,
    const FString& LightName,
    const FLumenCardScatterContext& CardScatterContext,
    int32 ScatterInstanceIndex,
    FLumenDirectLightingHardwareRayTracingData& LumenDirectLightingHardwareRayTracingData,
    const FVirtualShadowMapArray& VirtualShadowMapArray)
{
    FLumenSceneData& LumenSceneData = *Scene->LumenSceneData;
    const FSphere LightBounds = LightSceneInfo->Proxy->GetBoundingSphere();
    const ELightComponentType LightType = (ELightComponentType)LightSceneInfo->Proxy->GetLightType();
    bool bShadowed = LightSceneInfo->Proxy->CastsDynamicShadow();

    // 转换光源类型.
    ELumenLightType LumenLightType = ELumenLightType::MAX;
    {
        switch (LightType)
        {
        case LightType_Directional: LumenLightType = ELumenLightType::Directional;    break;
        case LightType_Point:        LumenLightType = ELumenLightType::Point;        break;
        case LightType_Spot:        LumenLightType = ELumenLightType::Spot;            break;
        case LightType_Rect:        LumenLightType = ELumenLightType::Rect;            break;
        }
        check(LumenLightType != ELumenLightType::MAX);
    }

    // 设置阴影信息.
    FVisibleLightInfo& VisibleLightInfo = VisibleLightInfos[LightSceneInfo->Id];
    FLumenShadowSetup ShadowSetup = GetShadowForLumenDirectLighting(VisibleLightInfo);

    const bool bDynamicallyShadowed = ShadowSetup.DenseShadowMap != nullptr;

    FDistanceFieldObjectBufferParameters ObjectBufferParameters = DistanceField::SetupObjectBufferParameters(Scene->DistanceFieldSceneData);

    FLightTileIntersectionParameters LightTileIntersectionParameters;
    FDistanceFieldCulledObjectBufferParameters CulledObjectBufferParameters;
    FMatrix WorldToMeshSDFShadowValue = FMatrix::Identity;

    const bool bLumenUseHardwareRayTracedShadow = Lumen::UseHardwareRayTracedShadows(View) && bShadowed;
    const bool bTraceMeshSDFs = bShadowed 
        && LumenLightType == ELumenLightType::Directional 
        && DoesPlatformSupportDistanceFieldShadowing(View.GetShaderPlatform())
        && GLumenDirectLightingOffscreenShadowingTraceMeshSDFs != 0
        && Lumen::UseMeshSDFTracing()
        && ObjectBufferParameters.NumSceneObjects > 0;

    // 处理虚拟阴影图ID.
    int32 VirtualShadowMapId = -1;
    if (bDynamicallyShadowed
        && !bLumenUseHardwareRayTracedShadow
        && GLumenDirectLightingVirtualShadowMap != 0
        && VirtualShadowMapArray.IsAllocated())
    {
        if (LightType == LightType_Directional)
        {
            VirtualShadowMapId = VisibleLightInfo.VirtualShadowMapClipmaps[0]->GetVirtualShadowMap()->ID;
        }
        else if (ShadowSetup.VirtualShadowMap)
        {
            VirtualShadowMapId = ShadowSetup.VirtualShadowMap->VirtualShadowMaps[0]->ID;
        }
    }

    const bool bUseVirtualShadowMap = VirtualShadowMapId >= 0;
    if (!bUseVirtualShadowMap)
    {
        // Fallback to a complete shadow map
        ShadowSetup.VirtualShadowMap = nullptr;
        ShadowSetup.DenseShadowMap = GetShadowForInjectionIntoVolumetricFog(VisibleLightInfo);
    }

    if (bLumenUseHardwareRayTracedShadow)
    {
        RenderHardwareRayTracedShadowIntoLumenCards(
            GraphBuilder, Scene, View, LumenCardSceneUniformBuffer, OpacityAtlas, 
            LightSceneInfo, LightName, CardScatterContext, ScatterInstanceIndex,
            LumenDirectLightingHardwareRayTracingData, bDynamicallyShadowed, LumenLightType);
    }
    else if (bTraceMeshSDFs)
    {
        CullMeshSDFsForLightCards(GraphBuilder, Scene, View, LightSceneInfo, ObjectBufferParameters, WorldToMeshSDFShadowValue, CulledObjectBufferParameters, LightTileIntersectionParameters);
    }

    FLumenCardDirectLighting* PassParameters = GraphBuilder.AllocParameters<FLumenCardDirectLighting>();
    {
        PassParameters->RenderTargets[0] = FRenderTargetBinding(FinalLightingAtlas, ERenderTargetLoadAction::ELoad);
        PassParameters->VS.InfluenceSphere = FVector4(LightBounds.Center, LightBounds.W);
        PassParameters->VS.LumenCardScene = LumenCardSceneUniformBuffer;
        PassParameters->VS.CardScatterParameters = CardScatterContext.Parameters;
        PassParameters->VS.ScatterInstanceIndex = ScatterInstanceIndex;
        PassParameters->VS.CardUVSamplingOffset = FVector2D::ZeroVector;

        // 获取体积阴影shader参数.
        GetVolumeShadowingShaderParameters(
            GraphBuilder,
            View,
            LightSceneInfo,
            ShadowSetup.DenseShadowMap,
            0,
            bDynamicallyShadowed,
            PassParameters->PS.VolumeShadowingShaderParameters);

        // 光源全局缓冲.
        FDeferredLightUniformStruct DeferredLightUniforms = GetDeferredLightParameters(View, *LightSceneInfo);

        if (LightSceneInfo->Proxy->IsInverseSquared())
        {
            DeferredLightUniforms.LightParameters.FalloffExponent = 0;
        }

        PassParameters->PS.View = View.ViewUniformBuffer;
        PassParameters->PS.LumenCardScene = LumenCardSceneUniformBuffer;
        PassParameters->PS.OpacityAtlas = OpacityAtlas;
        DeferredLightUniforms.LightParameters.Color *= LightSceneInfo->Proxy->GetIndirectLightingScale();
        PassParameters->PS.DeferredLightUniforms = CreateUniformBufferImmediate(DeferredLightUniforms, UniformBuffer_SingleDraw);
        PassParameters->PS.ForwardLightData = View.ForwardLightingResources->ForwardLightDataUniformBuffer;
        SetupLightFunctionParameters(LightSceneInfo, 1.0f, PassParameters->PS.LightFunctionParameters);

        PassParameters->PS.VirtualShadowMapId = VirtualShadowMapId;
        if (bUseVirtualShadowMap)
        {
            PassParameters->PS.VirtualShadowMapSamplingParameters = VirtualShadowMapArray.GetSamplingParameters(GraphBuilder);
        }
        
        PassParameters->PS.ObjectBufferParameters = ObjectBufferParameters;
        PassParameters->PS.CulledObjectBufferParameters = CulledObjectBufferParameters;
        PassParameters->PS.LightTileIntersectionParameters = LightTileIntersectionParameters;

        FDistanceFieldAtlasParameters DistanceFieldAtlasParameters = DistanceField::SetupAtlasParameters(Scene->DistanceFieldSceneData);

        // 距离场图集
        PassParameters->PS.DistanceFieldAtlasParameters = DistanceFieldAtlasParameters;
        PassParameters->PS.WorldToShadow = WorldToMeshSDFShadowValue;
        extern float GTwoSidedMeshDistanceBias;
        PassParameters->PS.TwoSidedMeshDistanceBias = GTwoSidedMeshDistanceBias;

        PassParameters->PS.TanLightSourceAngle = FMath::Tan(LightSceneInfo->Proxy->GetLightSourceAngle());
        PassParameters->PS.MaxTraceDistance = GOffscreenShadowingMaxTraceDistance;
        PassParameters->PS.StepFactor = FMath::Clamp(GOffscreenShadowingTraceStepFactor, .1f, 10.0f);
        PassParameters->PS.SurfaceBias = FMath::Clamp(GShadowingSurfaceBias, .01f, 100.0f);
        PassParameters->PS.SlopeScaledSurfaceBias = FMath::Clamp(GShadowingSlopeScaledSurfaceBias, .01f, 100.0f);
        PassParameters->PS.SDFSurfaceBiasScale = FMath::Clamp(GOffscreenShadowingSDFSurfaceBiasScale, .01f, 100.0f);
        PassParameters->PS.VirtualShadowMapSurfaceBias = FMath::Clamp(GLumenDirectLightingVirtualShadowMapBias, .01f, 100.0f);
        PassParameters->PS.ForceOffscreenShadowing = GLumenDirectLightingForceOffscreenShadowing;

        if (bLumenUseHardwareRayTracedShadow)
        {
            PassParameters->PS.ShadowMaskAtlas = LumenDirectLightingHardwareRayTracingData.ShadowMaskAtlas;
        }

        // IES
        {
            FTexture* IESTextureResource = LightSceneInfo->Proxy->GetIESTextureResource();

            if (View.Family->EngineShowFlags.TexturedLightProfiles && IESTextureResource)
            {
                PassParameters->PS.UseIESProfile = 1;
                PassParameters->PS.IESTexture = IESTextureResource->TextureRHI;
            }
            else
            {
                PassParameters->PS.UseIESProfile = 0;
                PassParameters->PS.IESTexture = GWhiteTexture->TextureRHI;
            }

            PassParameters->PS.IESTextureSampler = TStaticSamplerState<SF_Bilinear,AM_Clamp,AM_Clamp,AM_Clamp>::GetRHI();
        }
    }

    FRasterizeToCardsVS::FPermutationDomain VSPermutationVector;
    VSPermutationVector.Set< FRasterizeToCardsVS::FClampToInfluenceSphere >(LightType != LightType_Directional);
    auto VertexShader = View.ShaderMap->GetShader<FRasterizeToCardsVS>(VSPermutationVector);
    const FMaterialRenderProxy* LightFunctionMaterialProxy = LightSceneInfo->Proxy->GetLightFunctionMaterial();
    bool bUseLightFunction = true;

    if (!LightFunctionMaterialProxy
        || !LightFunctionMaterialProxy->GetIncompleteMaterialWithFallback(Scene->GetFeatureLevel()).IsLightFunction()
        || !EngineShowFlags.LightFunctions)
    {
        bUseLightFunction = false;
        LightFunctionMaterialProxy = UMaterial::GetDefaultMaterial(MD_LightFunction)->GetRenderProxy();
    }

    const bool bUseCloudTransmittance = SetupLightCloudTransmittanceParameters(Scene, View, GLumenDirectLightingCloudTransmittance != 0 ? LightSceneInfo : nullptr, PassParameters->PS.LightCloudTransmittanceParameters);

    // 设置排列.
    FLumenCardDirectLightingPS::FPermutationDomain PermutationVector;
    PermutationVector.Set< FLumenCardDirectLightingPS::FLightType >(LumenLightType);
    PermutationVector.Set< FLumenCardDirectLightingPS::FDynamicallyShadowed >(bDynamicallyShadowed);
    PermutationVector.Set< FLumenCardDirectLightingPS::FShadowed >(bShadowed);
    PermutationVector.Set< FLumenCardDirectLightingPS::FTraceMeshSDFs >(bTraceMeshSDFs);
    PermutationVector.Set< FLumenCardDirectLightingPS::FVirtualShadowMap >(bUseVirtualShadowMap);
    PermutationVector.Set< FLumenCardDirectLightingPS::FLightFunction >(bUseLightFunction);
    PermutationVector.Set< FLumenCardDirectLightingPS::FRayTracingShadowPassCombine>(bLumenUseHardwareRayTracedShadow);
    PermutationVector.Set< FLumenCardDirectLightingPS::FCloudTransmittance >(bUseCloudTransmittance);
    
    PermutationVector = FLumenCardDirectLightingPS::RemapPermutation(PermutationVector);

    const FMaterial& Material = LightFunctionMaterialProxy->GetMaterialWithFallback(Scene->GetFeatureLevel(), LightFunctionMaterialProxy);
    const FMaterialShaderMap* MaterialShaderMap = Material.GetRenderingThreadShaderMap();
    auto PixelShader = MaterialShaderMap->GetShader<FLumenCardDirectLightingPS>(PermutationVector);

    ClearUnusedGraphResources(PixelShader, &PassParameters->PS);

    const uint32 CardIndirectArgOffset = CardScatterContext.GetIndirectArgOffset(ScatterInstanceIndex);

    // 光照绘制Pass.
    GraphBuilder.AddPass(
        RDG_EVENT_NAME("%s %s", *LightName, bDynamicallyShadowed ? TEXT("Shadowmap") : TEXT("")),
        PassParameters,
        ERDGPassFlags::Raster,
        [MaxAtlasSize = LumenSceneData.MaxAtlasSize, PassParameters, LightSceneInfo, VertexShader, PixelShader, GlobalShaderMap = View.ShaderMap, LightFunctionMaterialProxy, &Material, &View, CardIndirectArgOffset](FRHICommandListImmediate& RHICmdList)
        {
            DrawQuadsToAtlas(
                MaxAtlasSize,
                VertexShader,
                PixelShader,
                PassParameters,
                GlobalShaderMap,
                TStaticBlendState<CW_RGBA, BO_Add, BF_One, BF_One>::GetRHI(),
                RHICmdList,
                [LightFunctionMaterialProxy, &Material, &View](FRHICommandListImmediate& RHICmdList, TShaderRefBase<FLumenCardDirectLightingPS, FShaderMapPointerTable> Shader, FRHIPixelShader* ShaderRHI, const FLumenCardDirectLightingPS::FParameters& Parameters)
                {
                    Shader->SetParameters(RHICmdList, ShaderRHI, LightFunctionMaterialProxy, Material, View);
                },
                CardIndirectArgOffset);
        });
}

直接光照被截帧后的流程如下所示:

光照计算过程中输入的纹理数据根据光源类型有所不同,但所有光源类型都会输入深度、法线、Opacity等数据,不同的是局部光源(非平行光)会输入距离场相关纹理和16x16x16的Perlin噪点3D纹理,而平行光会输入128x128x128的3D材质VolumeTexture(下图是切片0放大4倍后的效果):

经过光照计算后输出如下所示的结果:

直接光照计算使用的PS如下所示:

// EngineShadersPrivateLumenLumenSceneDirectLighting.usf

void LumenCardDirectLightingPS(
    FCardVSToPS CardInterpolants,
    out float4 OutColor : SV_Target0)
{
    float Opacity = Texture2DSampleLevel(OpacityAtlas, GlobalBilinearClampedSampler, CardInterpolants.AtlasCoord, 0).x;
    float3 Irradiance = 0;

    if (Opacity > 0)
    {
        // 构建光源数据.
        FDeferredLightData LightData;
        {
            LightData.Position = DeferredLightUniforms.Position;
            LightData.InvRadius = DeferredLightUniforms.InvRadius;
            LightData.Color = DeferredLightUniforms.Color;
            LightData.FalloffExponent = DeferredLightUniforms.FalloffExponent;
            LightData.Direction = DeferredLightUniforms.Direction;  
            LightData.Tangent = DeferredLightUniforms.Tangent;
            LightData.SpotAngles = DeferredLightUniforms.SpotAngles;
            LightData.SourceRadius = DeferredLightUniforms.SourceRadius;
            LightData.SourceLength = DeferredLightUniforms.SourceLength;
            LightData.SoftSourceRadius = DeferredLightUniforms.SoftSourceRadius;
            LightData.SpecularScale = DeferredLightUniforms.SpecularScale;
            LightData.ContactShadowLength = abs(DeferredLightUniforms.ContactShadowLength);
            LightData.ContactShadowLengthInWS = DeferredLightUniforms.ContactShadowLength < 0.0f;
            LightData.DistanceFadeMAD = DeferredLightUniforms.DistanceFadeMAD;
            LightData.ShadowMapChannelMask = DeferredLightUniforms.ShadowMapChannelMask;
            LightData.ShadowedBits = DeferredLightUniforms.ShadowedBits;
            LightData.RectLightBarnCosAngle = DeferredLightUniforms.RectLightBarnCosAngle;
            LightData.RectLightBarnLength = DeferredLightUniforms.RectLightBarnLength;

            LightData.bInverseSquared = LightData.FalloffExponent == 0.0f;
            LightData.bRadialLight = LIGHT_TYPE != LIGHT_TYPE_DIRECTIONAL;
            LightData.bSpotLight = LIGHT_TYPE == LIGHT_TYPE_SPOT;
            LightData.bRectLight = LIGHT_TYPE == LIGHT_TYPE_RECT;
        }

        // 获取Lumen卡片数据.
        FLumenCardData LumenCardData = GetLumenCardData(CardInterpolants.CardId);

        float Depth = 1.0f - Texture2DSampleLevel(LumenCardScene.DepthAtlas, GlobalBilinearClampedSampler, CardInterpolants.AtlasCoord, 0).x;

        // 计算位置.
        float3 LocalPosition;
        LocalPosition.xy = (CardInterpolants.AtlasCoord - LumenCardData.LocalPositionToAtlasUVBias) / LumenCardData.LocalPositionToAtlasUVScale;
        LocalPosition.z = -LumenCardData.LocalExtent.z + Depth * 2 * LumenCardData.LocalExtent.z;

        float3 WorldPosition = mul(LumenCardData.WorldToLocalRotation, LocalPosition) + LumenCardData.Origin;

        float3 LightColor = DeferredLightUniforms.Color;
        float3 L = LightData.Direction;
        float3 ToLight = L;
    
        // 计算光源衰减.
#if LIGHT_TYPE == LIGHT_TYPE_DIRECTIONAL
        float CombinedAttenuation = 1;
#else
        float LightMask = 1;
        if (LightData.bRadialLight)
        {
            LightMask = GetLocalLightAttenuation(WorldPosition, LightData, ToLight, L);
        }

        float Attenuation;

        if (LightData.bRectLight)
        {
            FRect Rect = GetRect(ToLight, LightData);
            FRectTexture RectTexture = InitRectTexture(DeferredLightUniforms.SourceTexture);
            Attenuation = IntegrateLight(Rect, RectTexture);
        }
        else
        {
            FCapsuleLight Capsule = GetCapsule(ToLight, LightData);
            Capsule.DistBiasSqr = 0;
            Attenuation = IntegrateLight(Capsule, LightData.bInverseSquared);
        }

        float CombinedAttenuation = Attenuation * LightMask;
#endif

        if (CombinedAttenuation > 0)
        {
            float3 WorldNormal = Texture2DSampleLevel(LumenCardScene.NormalAtlas, GlobalBilinearClampedSampler, CardInterpolants.AtlasCoord, 0).xyz * 2 - 1;

            // 面向光源的表面才计算光源.
            if (dot(WorldNormal, L) > 0)
            {
                float ShadowFactor = 1.0f;

                #if SHADOWED_LIGHT  // 带阴影
                {
                    // 硬件光追阴影
                    #if HARDWARE_RAYTRACING_SHADOW_PASS_COMBINE
                    {
                        float2 AtlasTextureSize = LumenCardScene.AtlasSize;
                        uint2 Pos2D = CardInterpolants.AtlasCoord * AtlasTextureSize.xy - float2(0.5, 0.5) / AtlasTextureSize.xy;
                        ShadowFactor = ShadowMaskAtlas.Load(uint3(Pos2D, 0));
                    }
                    #else // 非硬件光追阴影
                    {
                        bool bShadowFactorComplete = false;
                        bool bVSMValid = false;

                        // 使用虚拟阴影图
                        #if VIRTUAL_SHADOW_MAP
                        {
                            // Bias only ray start to maximize chances of hitting an allocated page
                            FVirtualShadowMapSampleResult VirtualShadowMapSample = SampleVirtualShadowMap(VirtualShadowMapId, WorldPosition, VirtualShadowMapSurfaceBias, WorldNormal);

                            bVSMValid = VirtualShadowMapSample.bValid;
                            bShadowFactorComplete = VirtualShadowMapSample.bValid && VirtualShadowMapSample.bOccluded;
                            ShadowFactor = VirtualShadowMapSample.ShadowFactor;
                        }
                        #endif

                        // 计算阴影强度ShadowFactor.
                        if (!bShadowFactorComplete)
                        {
                            float3 WorldPositionForShadowing = GetWorldPositionForShadowing(WorldPosition, L, WorldNormal, 1.0f);

                            #if LIGHT_TYPE == LIGHT_TYPE_DIRECTIONAL
                            {
                                #if DYNAMICALLY_SHADOWED
                                    float SceneDepth = dot(WorldPositionForShadowing - View.WorldCameraOrigin, View.ViewForward);

                                    bool bShadowingFromValidUVArea = false;
                                    float NewShadowFactor = ComputeDirectionalLightDynamicShadowing(WorldPositionForShadowing, SceneDepth, bShadowingFromValidUVArea);

                                    float4 PostProjectionPosition = mul(float4(WorldPosition, 1.0), View.WorldToClip);
                                    // CSM's are culled so only query points inside the view are valid
                                    float2 ValidTexelSize = float2(length(ddx(WorldPosition)), length(ddy(WorldPosition))) * 2;
                                    if (bShadowingFromValidUVArea && all(PostProjectionPosition.xy - ValidTexelSize < PostProjectionPosition.w&& PostProjectionPosition.xy + ValidTexelSize > -PostProjectionPosition.w))
                                    { 
                                        ShadowFactor *= NewShadowFactor;
                                        bShadowFactorComplete = VIRTUAL_SHADOW_MAP ? bVSMValid : true;
                                    }
                                #endif
                            }
                            #else
                            {
                                bool bShadowingFromValidUVArea = false;
                                float NewShadowFactor = ComputeVolumeShadowing(WorldPositionForShadowing, LightData.bRadialLight && !LightData.bSpotLight, LightData.bSpotLight, bShadowingFromValidUVArea);

                                if (bShadowingFromValidUVArea) 
                                {
                                    ShadowFactor *= NewShadowFactor;
                                    bShadowFactorComplete = VIRTUAL_SHADOW_MAP ? bVSMValid : true;
                                }
                            }
                            #endif
                        }

                        // 处理离屏阴影.
                        bool bOffscreenShadowing = !bShadowFactorComplete;
                        if (ForceOffscreenShadowing != 0)
                        {
                            ShadowFactor = 1.0;
                            bOffscreenShadowing = true;
                        }

                        if (bOffscreenShadowing)
                        {
                            ShadowFactor *= TraceOffscreenShadows(WorldPosition, L, ToLight, WorldNormal);
                        }
                    }
                    #endif // End hardware/software shadow selection        
                }
                #endif // End ShadowLight

                // 光照图
                #if LIGHT_FUNCTION
                    ShadowFactor *= GetLightFunction(WorldPosition);
                #endif

                // 云体透射
                #if USE_CLOUD_TRANSMITTANCE
                {
                    float OutOpticalDepth = 0.0f;
                    ShadowFactor *= lerp(1.0f, GetCloudVolumetricShadow(WorldPosition, CloudShadowmapWorldToLightClipMatrix, CloudShadowmapFarDepthKm, CloudShadowmapTexture, CloudShadowmapSampler, OutOpticalDepth), CloudShadowmapStrength);
                }
                #endif

                // IES
                if (UseIESProfile > 0)
                {
                    ShadowFactor *= ComputeLightProfileMultiplier(WorldPosition, DeferredLightUniforms.Position, -DeferredLightUniforms.Direction, DeferredLightUniforms.Tangent);
                }

                // 最终辐照度
                float NoL = saturate(dot(WorldNormal, L));
                Irradiance = LightColor * (CombinedAttenuation * NoL * ShadowFactor);
                //Irradiance = bShadowFactorValid ? float3(0, 1, 0) : float3(0.2f, 0.0f, 0.0f);
            }
        }
    }
        
    OutColor = float4(Irradiance, 0);
}

6.5.6.6 PrefilterLumenSceneLighting

这个过程类似于6.5.6.1 Voxel Cone Tracing提及的Geometry Prefiltering:

// EngineSourceRuntimeRendererPrivateLumenLumenScenePrefilter.cpp

void FDeferredShadingSceneRenderer::PrefilterLumenSceneLighting(
    FRDGBuilder& GraphBuilder,
    const FViewInfo& View,
    FLumenCardTracingInputs& TracingInputs,
    FGlobalShaderMap* GlobalShaderMap,
    const FLumenCardScatterContext& VisibleCardScatterContext)
{
    LLM_SCOPE_BYTAG(Lumen);
    RDG_EVENT_SCOPE(GraphBuilder, "Prefilter");

    FLumenSceneData& LumenSceneData = *Scene->LumenSceneData;

    // 根据分辨率计算Mip的数量.
    const int32 NumMips = FMath::CeilLogTwo(FMath::Max(LumenSceneData.MaxAtlasSize.X, LumenSceneData.MaxAtlasSize.Y)) + 1;
    {
        FIntPoint SrcSize = LumenSceneData.MaxAtlasSize;
        FIntPoint DestSize = SrcSize / 2;

        // 循环Mip数量-1次(第0级就是初始纹理本身), 每次生成一个MIP.
        for (int32 MipIndex = 1; MipIndex < NumMips; MipIndex++)
        {
            SrcSize.X = FMath::Max(SrcSize.X, 1);
            SrcSize.Y = FMath::Max(SrcSize.Y, 1);
            DestSize.X = FMath::Max(DestSize.X, 1);
            DestSize.Y = FMath::Max(DestSize.Y, 1);

            FLumenCardPrefilterLighting* PassParameters = GraphBuilder.AllocParameters<FLumenCardPrefilterLighting>();
            
            // 设置渲染目标, 最多3个: 最终光照图集, 辐照度图集, 非直接辐照度图集.
            PassParameters->RenderTargets[0] = FRenderTargetBinding(TracingInputs.FinalLightingAtlas, ERenderTargetLoadAction::ENoAction, MipIndex);
            bool bUseIrradianceAtlas = Lumen::UseIrradianceAtlas(View);
            bool bUseIndirectIrradianceAtlas = Lumen::UseIndirectIrradianceAtlas(View);
            if (bUseIrradianceAtlas)
            {
                PassParameters->RenderTargets[1] = FRenderTargetBinding(TracingInputs.IrradianceAtlas, ERenderTargetLoadAction::ENoAction, MipIndex);
                if (bUseIndirectIrradianceAtlas)
                {
                    PassParameters->RenderTargets[2] = FRenderTargetBinding(TracingInputs.IndirectIrradianceAtlas, ERenderTargetLoadAction::ENoAction, MipIndex);
                }
            }
            else if (bUseIndirectIrradianceAtlas)
            {
                PassParameters->RenderTargets[1] = FRenderTargetBinding(TracingInputs.IndirectIrradianceAtlas, ERenderTargetLoadAction::ENoAction, MipIndex);
            }
            PassParameters->VS.LumenCardScene = TracingInputs.LumenCardSceneUniformBuffer;
            PassParameters->VS.CardScatterParameters = VisibleCardScatterContext.Parameters;
            PassParameters->VS.ScatterInstanceIndex = 0;
            PassParameters->VS.CardUVSamplingOffset = FVector2D::ZeroVector;
            PassParameters->PS.View = View.ViewUniformBuffer;
            PassParameters->PS.LumenCardScene = TracingInputs.LumenCardSceneUniformBuffer;
            PassParameters->PS.ParentFinalLightingAtlas = GraphBuilder.CreateSRV(FRDGTextureSRVDesc::CreateForMipLevel(TracingInputs.FinalLightingAtlas, MipIndex - 1));
            // 注意创建SRV使用的是CreateForMipLevel.
            if (bUseIrradianceAtlas)
            {
                PassParameters->PS.ParentIrradianceAtlas = GraphBuilder.CreateSRV(FRDGTextureSRVDesc::CreateForMipLevel(TracingInputs.IrradianceAtlas, MipIndex - 1));
            }
            if (bUseIndirectIrradianceAtlas)
            {
                PassParameters->PS.ParentIndirectIrradianceAtlas = GraphBuilder.CreateSRV(FRDGTextureSRVDesc::CreateForMipLevel(TracingInputs.IndirectIrradianceAtlas, MipIndex - 1));
            }
            PassParameters->PS.InvSize = FVector2D(1.0f / SrcSize.X, 1.0f / SrcSize.Y);

            FScene* LocalScene = Scene;

            // 增加预过滤Pass.
            GraphBuilder.AddPass(
                RDG_EVENT_NAME("PrefilterMip"),
                PassParameters,
                ERDGPassFlags::Raster,
                [LocalScene, PassParameters, DestSize, GlobalShaderMap, bUseIrradianceAtlas, bUseIndirectIrradianceAtlas](FRHICommandListImmediate& RHICmdList)
            {
                FLumenCardPrefilterLightingPS::FPermutationDomain PermutationVector;
                PermutationVector.Set<FLumenCardPrefilterLightingPS::FUseIrradianceAtlas>(bUseIrradianceAtlas != 0);
                PermutationVector.Set<FLumenCardPrefilterLightingPS::FUseIndirectIrradianceAtlas>(bUseIndirectIrradianceAtlas != 0);
                auto PixelShader = GlobalShaderMap->GetShader< FLumenCardPrefilterLightingPS >(PermutationVector);
                DrawQuadsToAtlas(DestSize, PixelShader, PassParameters, GlobalShaderMap, TStaticBlendState<>::GetRHI(), RHICmdList);
            });

            SrcSize /= 2;
            DestSize /= 2;
        }
    }
}

使用的Shader如下:

// EngineShadersPrivateLumenLumenSceneLighting.usf

Texture2D ParentFinalLightingAtlas;
Texture2D ParentIrradianceAtlas;
Texture2D ParentIndirectIrradianceAtlas;

void LumenCardPrefilterLightingPS(
    FCardVSToPS CardInterpolants,
    out float4 OutLighting : SV_Target0,
    out float4 OutColor1 : SV_Target1,
    out float4 OutColor2 : SV_Target2)
{
    // 直接使用双线性过滤获得该MIP层级的颜色, 并没有像6.5.6.1节使用高斯权重.
    OutLighting = Texture2DSampleLevel(ParentFinalLightingAtlas, GlobalBilinearClampedSampler, CardInterpolants.AtlasCoord, 0);
#if USE_IRRADIANCE_ATLAS
    OutColor1 = Texture2DSampleLevel(ParentIrradianceAtlas, GlobalBilinearClampedSampler, CardInterpolants.AtlasCoord, 0);
    #if USE_INDIRECTIRRADIANCE_ATLAS
        OutColor2 = Texture2DSampleLevel(ParentIndirectIrradianceAtlas, GlobalBilinearClampedSampler, CardInterpolants.AtlasCoord, 0);
    #endif
#elif USE_INDIRECTIRRADIANCE_ATLAS
    OutColor1 = Texture2DSampleLevel(ParentIndirectIrradianceAtlas, GlobalBilinearClampedSampler, CardInterpolants.AtlasCoord, 0);
#endif
}

从截帧可看到,纹理的MIP层级和PrefilterMip的Pass数量一致:

6.5.6.7 ComputeLumenSceneVoxelLighting

ComputeLumenSceneVoxelLighting的主要作用是计算Lumen场景的Voxel光照,代码如下:

// EngineSourceRuntimeRendererPrivateLumenLumenVoxelLighting.cpp

void FDeferredShadingSceneRenderer::ComputeLumenSceneVoxelLighting(
    FRDGBuilder& GraphBuilder,
    FLumenCardTracingInputs& TracingInputs,
    FGlobalShaderMap* GlobalShaderMap)
{
    LLM_SCOPE_BYTAG(Lumen);

    const FViewInfo& View = Views[0];

    const int32 ClampedNumClipmapLevels = GetNumLumenVoxelClipmaps();
    const FIntVector ClipmapResolution = GetClipmapResolution();
    bool bForceFullUpdate = GLumenSceneVoxelLightingForceFullUpdate != 0;

    // 处理体素光照3D纹理.
    FRDGTextureRef VoxelLighting = TracingInputs.VoxelLighting;
    {
        FRDGTextureDesc LightingDesc(FRDGTextureDesc::Create3D(
            FIntVector(
                ClipmapResolution.X,
                ClipmapResolution.Y * ClampedNumClipmapLevels,
                ClipmapResolution.Z * GNumVoxelDirections),
            PF_FloatRGBA,
            FClearValueBinding::Black,
            TexCreate_ShaderResource | TexCreate_UAV | TexCreate_3DTiling));

        if (!VoxelLighting || VoxelLighting->Desc != LightingDesc)
        {
            bForceFullUpdate = true;
            VoxelLighting = GraphBuilder.CreateTexture(LightingDesc, TEXT("Lumen.VoxelLighting"));
        }
    }

    // 处理可见性纹理.
    FRDGTextureRef VoxelVisBuffer = View.ViewState->Lumen.VoxelVisBuffer ? GraphBuilder.RegisterExternalTexture(View.ViewState->Lumen.VoxelVisBuffer) : nullptr;
    {
        FRDGTextureDesc VoxelVisBufferDesc(FRDGTextureDesc::Create3D(
            FIntVector(
                ClipmapResolution.X,
                ClipmapResolution.Y * ClampedNumClipmapLevels,
                ClipmapResolution.Z * GNumVoxelDirections),
            PF_R32_UINT,
            FClearValueBinding::Black,
            TexCreate_ShaderResource | TexCreate_UAV | TexCreate_3DTiling));

        if (!VoxelVisBuffer
            || VoxelVisBuffer->Desc.Extent != VoxelVisBufferDesc.Extent
            || VoxelVisBuffer->Desc.Depth != VoxelVisBufferDesc.Depth)
        {
            bForceFullUpdate = true;
            VoxelVisBuffer = GraphBuilder.CreateTexture(VoxelVisBufferDesc, TEXT("Lumen.VoxelVisBuffer"));

            uint32 VisBufferClearValue[4] = { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF };
            AddClearUAVPass(GraphBuilder, GraphBuilder.CreateUAV(VoxelVisBuffer), VisBufferClearValue);
        }
    }

    // 可见性缓冲区数据仅对特定场景有效,如果场景发生变化需要重新创建.
    if (View.ViewState->Lumen.VoxelVisBufferCachedScene != Scene)
    {
        bForceFullUpdate = true;
        View.ViewState->Lumen.VoxelVisBufferCachedScene = Scene;
    }

    // 处理需要更新的Clipmap.
    TArray<int32, SceneRenderingAllocator> ClipmapsToUpdate;
    ClipmapsToUpdate.Empty(ClampedNumClipmapLevels);

    for (int32 ClipmapIndex = 0; ClipmapIndex < ClampedNumClipmapLevels; ClipmapIndex++)
    {
        if (bForceFullUpdate || ShouldUpdateVoxelClipmap(ClipmapIndex, ClampedNumClipmapLevels, View.ViewState->GetFrameIndex()))
        {
            ClipmapsToUpdate.Add(ClipmapIndex);
        }
    }

    ensureMsgf(bForceFullUpdate || ClipmapsToUpdate.Num() <= 1, TEXT("Tweak ShouldUpdateVoxelClipmap for better clipmap update distribution"));

    FString ClipmapsToUpdateString;

    for (int32 ToUpdateIndex = 0; ToUpdateIndex < ClipmapsToUpdate.Num(); ++ToUpdateIndex)
    {
        ClipmapsToUpdateString += FString::FromInt(ClipmapsToUpdate[ToUpdateIndex]);
        if (ToUpdateIndex + 1 < ClipmapsToUpdate.Num())
        {
            ClipmapsToUpdateString += TEXT(",");
        }
    }

    RDG_EVENT_SCOPE(GraphBuilder, "VoxelizeCards Clipmaps=[%s]", *ClipmapsToUpdateString);

    // 更新并体素化可见性缓冲.
    if (ClipmapsToUpdate.Num() > 0)
    {
        TracingInputs.VoxelLighting = VoxelLighting;
        TracingInputs.VoxelGridResolution = GetClipmapResolution();
        TracingInputs.NumClipmapLevels = ClampedNumClipmapLevels;

        // 更新可见性缓冲
        UpdateVoxelVisBuffer(GraphBuilder, Scene, View, TracingInputs, VoxelVisBuffer, ClipmapsToUpdate, bForceFullUpdate);
        // 体素化可见性缓冲
        VoxelizeVisBuffer(View, Scene, TracingInputs, VoxelLighting, VoxelVisBuffer, ClipmapsToUpdate, GraphBuilder);

        ConvertToExternalTexture(GraphBuilder, VoxelLighting, View.ViewState->Lumen.VoxelLighting);
        View.ViewState->Lumen.VoxelGridResolution = TracingInputs.VoxelGridResolution;
        View.ViewState->Lumen.NumClipmapLevels = TracingInputs.NumClipmapLevels;
    }

    ConvertToExternalTexture(GraphBuilder, VoxelVisBuffer, View.ViewState->Lumen.VoxelVisBuffer);
}

上面涉及了更新和体素化可见性缓存,其具体的代码不再分析,但截帧的过程如下所示:

其中UpdateVoxelVisBuffer过程的最后阶段VoxelTraceCS的输入是距离场块3D纹理,输出是VoxelVisBuffer的3D纹理:

而VoxelizeVoxelVisBuffer过程的最后阶段VisBufferShading的输入有SceneFinalLighting、SceneOpacity、SceneDepth、距离场块3D纹理和VoxelVisBuffer,输出是VoxelLighting3D纹理,此阶段之后,Lumen场景的光照信息已经存储在体素化后的3D纹理中了:

6.5.7 Lumen非直接光照

6.5.7.1 RenderDiffuseIndirectAndAmbientOcclusion

此阶段就是利用之前Lumen计算生成的信息计算最终的非直接光照,以模拟全局光照效果,它的过程如下所示:

可知有SSGI降噪、屏幕空间探针收集、反射以及非直接光组合等几个阶段。对应的源码RenderDiffuseIndirectAndAmbientOcclusion如下:

// EngineSourceRuntimeRendererPrivateIndirectLightRendering.cpp

oid FDeferredShadingSceneRenderer::RenderDiffuseIndirectAndAmbientOcclusion(
    FRDGBuilder& GraphBuilder,
    FSceneTextures& SceneTextures,
    FRDGTextureRef LightingChannelsTexture,
    bool bIsVisualizePass)
{
    using namespace HybridIndirectLighting;

    if (ViewFamily.EngineShowFlags.VisualizeLumenIndirectDiffuse != bIsVisualizePass)
    {
        return;
    }

    RDG_EVENT_SCOPE(GraphBuilder, "DiffuseIndirectAndAO");

    FSceneTextureParameters SceneTextureParameters = GetSceneTextureParameters(GraphBuilder, SceneTextures.UniformBuffer);
    FRDGTextureRef SceneColorTexture = SceneTextures.Color.Target;

    const FRDGSystemTextures& SystemTextures = FRDGSystemTextures::Get(GraphBuilder);

    // 每个view都需要单独计算一次.
    for (FViewInfo& View : Views)
    {
        RDG_GPU_MASK_SCOPE(GraphBuilder, View.GPUMask);

        const FPerViewPipelineState& ViewPipelineState = GetViewPipelineState(View);

        int32 DenoiseMode = CVarDiffuseIndirectDenoiser.GetValueOnRenderThread();

        // 设置通用的漫反射参数.
        FCommonParameters CommonDiffuseParameters;
        SetupCommonDiffuseIndirectParameters(GraphBuilder, SceneTextureParameters, View, /* out */ CommonDiffuseParameters);

        // 为降噪器更新旧的光线追踪配置.
        IScreenSpaceDenoiser::FAmbientOcclusionRayTracingConfig RayTracingConfig;
        {
            RayTracingConfig.RayCountPerPixel = CommonDiffuseParameters.RayCountPerPixel;
            RayTracingConfig.ResolutionFraction = 1.0f / float(CommonDiffuseParameters.DownscaleFactor);
        }

        // 上一帧场景颜色
        ScreenSpaceRayTracing::FPrevSceneColorMip PrevSceneColorMip;
        if ((ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::Lumen || ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::SSGI) && View.PrevViewInfo.ScreenSpaceRayTracingInput.IsValid())
        {
            PrevSceneColorMip = ScreenSpaceRayTracing::ReducePrevSceneColorMip(GraphBuilder, SceneTextureParameters, View);
        }

        // 降噪器输入输出参数
        FSSDSignalTextures DenoiserOutputs;
        IScreenSpaceDenoiser::FDiffuseIndirectInputs DenoiserInputs;
        IScreenSpaceDenoiser::FDiffuseIndirectHarmonic DenoiserSphericalHarmonicInputs;
        FLumenReflectionCompositeParameters LumenReflectionCompositeParameters;
        bool bLumenUseDenoiserComposite = ViewPipelineState.bUseLumenProbeHierarchy;

        // 根据不同的非直接光方法获得降噪输入或输出结构.
        
        // Lumen探针层次结构
        if (ViewPipelineState.bUseLumenProbeHierarchy)
        {
            check(ViewPipelineState.DiffuseIndirectDenoiser == IScreenSpaceDenoiser::EMode::Disabled);
            DenoiserOutputs = RenderLumenProbeHierarchy(
                GraphBuilder,
                SceneTextures,
                CommonDiffuseParameters, PrevSceneColorMip,
                View, &View.PrevViewInfo);
        }
        // 屏幕空间全局光照
        else if (ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::SSGI)
        {
            RDG_EVENT_SCOPE(GraphBuilder, "SSGI %dx%d", CommonDiffuseParameters.TracingViewportSize.X, CommonDiffuseParameters.TracingViewportSize.Y);
            DenoiserInputs = ScreenSpaceRayTracing::CastStandaloneDiffuseIndirectRays(
                GraphBuilder, CommonDiffuseParameters, PrevSceneColorMip, View);
        }
        // 光线追踪全局光照
        else if (ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::RTGI)
        {
            // TODO: Refactor under the HybridIndirectLighting standard API.
            // TODO: hybrid SSGI / RTGI
            RenderRayTracingGlobalIllumination(GraphBuilder, SceneTextureParameters, View, /* out */ &RayTracingConfig, /* out */ &DenoiserInputs);
        }
        // Lumen全局光照
        else if (ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::Lumen)
        {
            check(ViewPipelineState.DiffuseIndirectDenoiser == IScreenSpaceDenoiser::EMode::Disabled);

            FLumenMeshSDFGridParameters MeshSDFGridParameters;

            DenoiserOutputs = RenderLumenScreenProbeGather(
                GraphBuilder, 
                SceneTextures,
                PrevSceneColorMip, 
                LightingChannelsTexture,
                View,
                &View.PrevViewInfo,
                bLumenUseDenoiserComposite,
                MeshSDFGridParameters);

            if (ViewPipelineState.ReflectionsMethod == EReflectionsMethod::Lumen)
            {
                DenoiserOutputs.Textures[2] = RenderLumenReflections(
                    GraphBuilder,
                    View,
                    SceneTextures, 
                    MeshSDFGridParameters,
                    LumenReflectionCompositeParameters);
            }

            if (!DenoiserOutputs.Textures[2])
            {
                DenoiserOutputs.Textures[2] = DenoiserOutputs.Textures[1];
            }
        }

        FRDGTextureRef AmbientOcclusionMask = DenoiserInputs.AmbientOcclusionMask;

        // 处理降噪.
        if (ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::Lumen)
        {
            // 由于Lumen全局输出的已经带了降噪, 所以此处不需要任何操作.
        }
        else if (ViewPipelineState.DiffuseIndirectDenoiser == IScreenSpaceDenoiser::EMode::Disabled)
        {
            DenoiserOutputs.Textures[0] = DenoiserInputs.Color;
            DenoiserOutputs.Textures[1] = SystemTextures.White;
        }
        else
        {
            const IScreenSpaceDenoiser* DefaultDenoiser = IScreenSpaceDenoiser::GetDefaultDenoiser();
            const IScreenSpaceDenoiser* DenoiserToUse = 
                ViewPipelineState.DiffuseIndirectDenoiser == IScreenSpaceDenoiser::EMode::DefaultDenoiser
                ? DefaultDenoiser : GScreenSpaceDenoiser;

            RDG_EVENT_SCOPE(GraphBuilder, "%s%s(DiffuseIndirect) %dx%d",
                DenoiserToUse != DefaultDenoiser ? TEXT("ThirdParty ") : TEXT(""),
                DenoiserToUse->GetDebugName(),
                View.ViewRect.Width(), View.ViewRect.Height());

            if (ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::RTGI)
            {
                // 对RTGI进行降噪.
                DenoiserOutputs = DenoiserToUse->DenoiseDiffuseIndirect(
                    GraphBuilder,
                    View,
                    &View.PrevViewInfo,
                    SceneTextureParameters,
                    DenoiserInputs,
                    RayTracingConfig);

                AmbientOcclusionMask = DenoiserOutputs.Textures[1];
            }
            else if (ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::SSGI)
            {
                // 对SSGI的结果降噪.
                DenoiserOutputs = DenoiserToUse->DenoiseScreenSpaceDiffuseIndirect(
                    GraphBuilder,
                    View,
                    &View.PrevViewInfo,
                    SceneTextureParameters,
                    DenoiserInputs,
                    RayTracingConfig);

                AmbientOcclusionMask = DenoiserOutputs.Textures[1];
            }
        }

        // 渲染AO
        bool bWritableAmbientOcclusionMask = true;
        if (ViewPipelineState.AmbientOcclusionMethod == EAmbientOcclusionMethod::Disabled)
        {
            ensure(!HasBeenProduced(SceneTextures.ScreenSpaceAO));
            AmbientOcclusionMask = nullptr;
            bWritableAmbientOcclusionMask = false;
        }
        else if (ViewPipelineState.AmbientOcclusionMethod == EAmbientOcclusionMethod::RTAO)
        {
            RenderRayTracingAmbientOcclusion(
                GraphBuilder,
                View,
                SceneTextureParameters,
                &AmbientOcclusionMask);
        }
        else if (ViewPipelineState.AmbientOcclusionMethod == EAmbientOcclusionMethod::SSGI)
        {
            check(AmbientOcclusionMask);
        }
        else if (ViewPipelineState.AmbientOcclusionMethod == EAmbientOcclusionMethod::SSAO)
        {
            // Fetch result of SSAO that was done earlier.
            if (HasBeenProduced(SceneTextures.ScreenSpaceAO))
            {
                AmbientOcclusionMask = SceneTextures.ScreenSpaceAO;
            }
            else
            {
                AmbientOcclusionMask = GetScreenSpaceAOFallback(SystemTextures);
                bWritableAmbientOcclusionMask = false;
            }
        }
        else
        {
            unimplemented();
            bWritableAmbientOcclusionMask = false;
        }

        // Extract the dynamic AO for application of AO beyond RenderDiffuseIndirectAndAmbientOcclusion()
        if (AmbientOcclusionMask && ViewPipelineState.AmbientOcclusionMethod != EAmbientOcclusionMethod::SSAO)
        {
            ensureMsgf(Views.Num() == 1, TEXT("Need to add support for one AO texture per view in FSceneTextures"));
            SceneTextures.ScreenSpaceAO = AmbientOcclusionMask;
        }

        if (HairStrands::HasViewHairStrandsData(View) && (ViewPipelineState.AmbientOcclusionMethod == EAmbientOcclusionMethod::SSGI || ViewPipelineState.AmbientOcclusionMethod == EAmbientOcclusionMethod::SSAO) && bWritableAmbientOcclusionMask)
        {
            RenderHairStrandsAmbientOcclusion(
                GraphBuilder,
                View,
                AmbientOcclusionMask);
        }

        // 应用漫反射非直接光和环境光AO到场景颜色.
        if ((DenoiserOutputs.Textures[0] || AmbientOcclusionMask) && (!bIsVisualizePass || ViewPipelineState.DiffuseIndirectDenoiser != IScreenSpaceDenoiser::EMode::Disabled || ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::Lumen)
            && !IsMetalPlatform(ShaderPlatform))
        {
            // 用的PS是FDiffuseIndirectCompositePS
            FDiffuseIndirectCompositePS::FParameters* PassParameters = GraphBuilder.AllocParameters<FDiffuseIndirectCompositePS::FParameters>();
            
            PassParameters->AmbientOcclusionStaticFraction = FMath::Clamp(View.FinalPostProcessSettings.AmbientOcclusionStaticFraction, 0.0f, 1.0f);

            PassParameters->ApplyAOToDynamicDiffuseIndirect = 0.0f;

            if (ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::Lumen)
            {
                PassParameters->ApplyAOToDynamicDiffuseIndirect = 1.0f;
            }

            const FIntPoint BufferExtent = SceneTextureParameters.SceneDepthTexture->Desc.Extent;

            {
                // Placeholder texture for textures pulled in from SSDCommon.ush
                FRDGTextureDesc Desc = FRDGTextureDesc::Create2D(
                    FIntPoint(1),
                    PF_R32_UINT,
                    FClearValueBinding::Black,
                    TexCreate_ShaderResource);
                FRDGTextureRef CompressedMetadataPlaceholder = GraphBuilder.CreateTexture(Desc, TEXT("CompressedMetadataPlaceholder"));

                PassParameters->CompressedMetadata[0] = CompressedMetadataPlaceholder;
                PassParameters->CompressedMetadata[1] = CompressedMetadataPlaceholder;
            }

            PassParameters->BufferUVToOutputPixelPosition = BufferExtent;
            PassParameters->EyeAdaptation = GetEyeAdaptationTexture(GraphBuilder, View);
            PassParameters->LumenReflectionCompositeParameters = LumenReflectionCompositeParameters;

            PassParameters->bVisualizeDiffuseIndirect = bIsVisualizePass;

            PassParameters->DiffuseIndirect = DenoiserOutputs;
            PassParameters->DiffuseIndirectSampler = TStaticSamplerState<SF_Point>::GetRHI();

            PassParameters->PreIntegratedGF = GSystemTextures.PreintegratedGF->GetRenderTargetItem().ShaderResourceTexture;
            PassParameters->PreIntegratedGFSampler = TStaticSamplerState<SF_Bilinear, AM_Clamp, AM_Clamp, AM_Clamp>::GetRHI();

            PassParameters->AmbientOcclusionTexture = AmbientOcclusionMask;
            PassParameters->AmbientOcclusionSampler = TStaticSamplerState<SF_Point>::GetRHI();
            
            if (!PassParameters->AmbientOcclusionTexture || bIsVisualizePass)
            {
                PassParameters->AmbientOcclusionTexture = SystemTextures.White;
            }

            // 设置降噪器的通用shader参数.
            Denoiser::SetupCommonShaderParameters(
                View, SceneTextureParameters,
                View.ViewRect,
                1.0f / CommonDiffuseParameters.DownscaleFactor,
                /* out */ &PassParameters->DenoiserCommonParameters);
            PassParameters->SceneTextures = SceneTextureParameters;
            PassParameters->ViewUniformBuffer = View.ViewUniformBuffer;

            PassParameters->RenderTargets[0] = FRenderTargetBinding(
                SceneColorTexture, ERenderTargetLoadAction::ELoad);

            {
                FRDGTextureDesc Desc = FRDGTextureDesc::Create2D(
                    SceneColorTexture->Desc.Extent,
                    PF_FloatRGBA,
                    FClearValueBinding::None,
                    TexCreate_ShaderResource | TexCreate_UAV);

                PassParameters->PassDebugOutput = GraphBuilder.CreateUAV(
                    GraphBuilder.CreateTexture(Desc, TEXT("DebugDiffuseIndirectComposite")));
            }

            const TCHAR* DiffuseIndirectSampling = TEXT("Disabled");
            FDiffuseIndirectCompositePS::FPermutationDomain PermutationVector;
            bool bUpscale = false;

            if (DenoiserOutputs.Textures[0])
            {
                if (bLumenUseDenoiserComposite)
                {
                    PermutationVector.Set<FDiffuseIndirectCompositePS::FApplyDiffuseIndirectDim>(2);
                    DiffuseIndirectSampling = TEXT("ProbeHierarchy");
                }
                else if (ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::RTGI)
                {
                    PermutationVector.Set<FDiffuseIndirectCompositePS::FApplyDiffuseIndirectDim>(3);
                    DiffuseIndirectSampling = TEXT("RTGI");
                }
                else if (ViewPipelineState.DiffuseIndirectMethod == EDiffuseIndirectMethod::Lumen)
                {
                    PermutationVector.Set<FDiffuseIndirectCompositePS::FApplyDiffuseIndirectDim>(4);
                    DiffuseIndirectSampling = TEXT("ScreenProbeGather");
                }
                else
                {
                    PermutationVector.Set<FDiffuseIndirectCompositePS::FApplyDiffuseIndirectDim>(1);
                    DiffuseIndirectSampling = TEXT("SSGI");
                    bUpscale = DenoiserOutputs.Textures[0]->Desc.Extent != SceneColorTexture->Desc.Extent;
                }

                PermutationVector.Set<FDiffuseIndirectCompositePS::FUpscaleDiffuseIndirectDim>(bUpscale);
            }

            TShaderMapRef<FDiffuseIndirectCompositePS> PixelShader(View.ShaderMap, PermutationVector);
            // 清理和优化无用的shader资源绑定.
            ClearUnusedGraphResources(PixelShader, PassParameters);

            FRHIBlendState* BlendState = TStaticBlendState<CW_RGBA, BO_Add, BF_One, BF_Source1Color, BO_Add, BF_One, BF_Source1Alpha>::GetRHI();

            if (bIsVisualizePass)
            {
                BlendState = TStaticBlendState<>::GetRHI();
            }

            // 组合非直接光Pass.
            FPixelShaderUtils::AddFullscreenPass(
                GraphBuilder,
                View.ShaderMap,
                RDG_EVENT_NAME(
                    "DiffuseIndirectComposite(DiffuseIndirect=%s%s%s%s) %dx%d",
                    DiffuseIndirectSampling,
                    PermutationVector.Get<FDiffuseIndirectCompositePS::FUpscaleDiffuseIndirectDim>() ? TEXT(" UpscaleDiffuseIndirect") : TEXT(""),
                    AmbientOcclusionMask ? TEXT(" ApplyAOToSceneColor") : TEXT(""),
                    PassParameters->ApplyAOToDynamicDiffuseIndirect > 0.0f ? TEXT(" ApplyAOToDynamicDiffuseIndirect") : TEXT(""),
                    View.ViewRect.Width(), View.ViewRect.Height()),
                PixelShader,
                PassParameters,
                View.ViewRect,
                BlendState);
        } // if (DenoiserOutputs.Color || bApplySSAO)

        // 应用环境cubemap.
        if (IsAmbientCubemapPassRequired(View) && !bIsVisualizePass && !ViewPipelineState.bUseLumenProbeHierarchy)
        {
            FAmbientCubemapCompositePS::FParameters* PassParameters = GraphBuilder.AllocParameters<FAmbientCubemapCompositePS::FParameters>();
            
            PassParameters->PreIntegratedGF = GSystemTextures.PreintegratedGF->GetRenderTargetItem().ShaderResourceTexture;
            PassParameters->PreIntegratedGFSampler = TStaticSamplerState<SF_Bilinear, AM_Clamp, AM_Clamp, AM_Clamp>::GetRHI();
            
            PassParameters->AmbientOcclusionTexture = AmbientOcclusionMask;
            PassParameters->AmbientOcclusionSampler = TStaticSamplerState<SF_Point>::GetRHI();
            
            if (!PassParameters->AmbientOcclusionTexture)
            {
                PassParameters->AmbientOcclusionTexture = SystemTextures.White;
            }

            PassParameters->SceneTextures = SceneTextureParameters;
            PassParameters->ViewUniformBuffer = View.ViewUniformBuffer;

            PassParameters->RenderTargets[0] = FRenderTargetBinding(
                SceneColorTexture, ERenderTargetLoadAction::ELoad);
        
            TShaderMapRef<FAmbientCubemapCompositePS> PixelShader(View.ShaderMap);
            GraphBuilder.AddPass(
                RDG_EVENT_NAME("AmbientCubemapComposite %dx%d", View.ViewRect.Width(), View.ViewRect.Height()),
                PassParameters,
                ERDGPassFlags::Raster,
                [PassParameters, &View, PixelShader](FRHICommandList& RHICmdList)
            {
                TShaderMapRef<FPostProcessVS> VertexShader(View.ShaderMap);
                
                RHICmdList.SetViewport(View.ViewRect.Min.X, View.ViewRect.Min.Y, 0.0f, View.ViewRect.Max.X, View.ViewRect.Max.Y, 0.0);

                FGraphicsPipelineStateInitializer GraphicsPSOInit;
                RHICmdList.ApplyCachedRenderTargets(GraphicsPSOInit);

                // set the state
                GraphicsPSOInit.BlendState = TStaticBlendState<CW_RGB, BO_Add, BF_One, BF_One, BO_Add, BF_One, BF_One>::GetRHI();
                GraphicsPSOInit.RasterizerState = TStaticRasterizerState<>::GetRHI();
                GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<false, CF_Always>::GetRHI();

                GraphicsPSOInit.BoundShaderState.VertexDeclarationRHI = GFilterVertexDeclaration.VertexDeclarationRHI;
                GraphicsPSOInit.BoundShaderState.VertexShaderRHI = VertexShader.GetVertexShader();
                GraphicsPSOInit.BoundShaderState.PixelShaderRHI = PixelShader.GetPixelShader();
                GraphicsPSOInit.PrimitiveType = PT_TriangleList;

                SetGraphicsPipelineState(RHICmdList, GraphicsPSOInit);

                uint32 Count = View.FinalPostProcessSettings.ContributingCubemaps.Num();
                for (const FFinalPostProcessSettings::FCubemapEntry& CubemapEntry : View.FinalPostProcessSettings.ContributingCubemaps)
                {
                    FAmbientCubemapCompositePS::FParameters ShaderParameters = *PassParameters;
                    SetupAmbientCubemapParameters(CubemapEntry, &ShaderParameters.AmbientCubemap);
                    SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), ShaderParameters);
                    
                    DrawPostProcessPass(
                        RHICmdList,
                        0, 0,
                        View.ViewRect.Width(), View.ViewRect.Height(),
                        View.ViewRect.Min.X, View.ViewRect.Min.Y,
                        View.ViewRect.Width(), View.ViewRect.Height(),
                        View.ViewRect.Size(),
                        GetSceneTextureExtent(),
                        VertexShader,
                        View.StereoPass, 
                        false, // TODO.
                        EDRF_UseTriangleOptimization);
                }
            });
        } // if (IsAmbientCubemapPassRequired(View))
    } // for (FViewInfo& View : Views)
}

6.5.7.2 RenderLumenScreenProbeGather

RenderLumenScreenProbeGather的功能是渲染Lumen屏幕空间的探针收集,其代码如下:

// EngineSourceRuntimeRendererPrivateLumenLumenScreenProbeGather.cpp

FSSDSignalTextures FDeferredShadingSceneRenderer::RenderLumenScreenProbeGather(
    FRDGBuilder& GraphBuilder,
    const FSceneTextures& SceneTextures,
    const ScreenSpaceRayTracing::FPrevSceneColorMip& PrevSceneColorMip,
    FRDGTextureRef LightingChannelsTexture,
    const FViewInfo& View,
    FPreviousViewInfo* PreviousViewInfos,
    bool& bLumenUseDenoiserComposite,
    FLumenMeshSDFGridParameters& MeshSDFGridParameters)
{
    LLM_SCOPE_BYTAG(Lumen);

    // 渲染Lumen辐照度场收集.
    if (GLumenIrradianceFieldGather != 0)
    {
        bLumenUseDenoiserComposite = false;
        return RenderLumenIrradianceFieldGather(GraphBuilder, SceneTextures, View);
    }

    RDG_EVENT_SCOPE(GraphBuilder, "LumenScreenProbeGather");
    RDG_GPU_STAT_SCOPE(GraphBuilder, LumenScreenProbeGather);

    check(ShouldRenderLumenDiffuseGI(Scene, View, true));
    const FRDGSystemTextures& SystemTextures = FRDGSystemTextures::Get(GraphBuilder);

    if (!LightingChannelsTexture)
    {
        LightingChannelsTexture = SystemTextures.Black;
    }

    // 如果没有启用LumenScreenProbeGather, 则直接清理降噪输入.
    if (!GLumenScreenProbeGather)
    {
        FSSDSignalTextures ScreenSpaceDenoiserInputs;
        ScreenSpaceDenoiserInputs.Textures[0] = SystemTextures.Black;
        FRDGTextureDesc RoughSpecularIndirectDesc = FRDGTextureDesc::Create2D(SceneTextures.Config.Extent, PF_FloatRGB, FClearValueBinding::Black, TexCreate_ShaderResource | TexCreate_UAV);
        ScreenSpaceDenoiserInputs.Textures[1] = GraphBuilder.CreateTexture(RoughSpecularIndirectDesc, TEXT("Lumen.ScreenProbeGather.RoughSpecularIndirect"));
        AddClearUAVPass(GraphBuilder, GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ScreenSpaceDenoiserInputs.Textures[1])), FLinearColor::Black);
        bLumenUseDenoiserComposite = false;
        return ScreenSpaceDenoiserInputs;
    }

    // 从统一缓冲区拉取备用纹理.
    const FSceneTextureParameters SceneTextureParameters = GetSceneTextureParameters(GraphBuilder, SceneTextures.UniformBuffer);

    // 设置屏幕空间探针的参数.
    FScreenProbeParameters ScreenProbeParameters;
    ScreenProbeParameters.ScreenProbeTracingOctahedronResolution = LumenScreenProbeGather::GetTracingOctahedronResolution(View);
    ensureMsgf(ScreenProbeParameters.ScreenProbeTracingOctahedronResolution < (1 << 6) - 1, TEXT("Tracing resolution %u was larger than supported by PackRayInfo()"), ScreenProbeParameters.ScreenProbeTracingOctahedronResolution);
    ScreenProbeParameters.ScreenProbeGatherOctahedronResolution = LumenScreenProbeGather::GetGatherOctahedronResolution(ScreenProbeParameters.ScreenProbeTracingOctahedronResolution);
    ScreenProbeParameters.ScreenProbeGatherOctahedronResolutionWithBorder = ScreenProbeParameters.ScreenProbeGatherOctahedronResolution + 2 * (1 << (GLumenScreenProbeGatherNumMips - 1));
    ScreenProbeParameters.ScreenProbeDownsampleFactor = LumenScreenProbeGather::GetScreenDownsampleFactor(View);

    ScreenProbeParameters.ScreenProbeViewSize = FIntPoint::DivideAndRoundUp(View.ViewRect.Size(), (int32)ScreenProbeParameters.ScreenProbeDownsampleFactor);
    ScreenProbeParameters.ScreenProbeAtlasViewSize = ScreenProbeParameters.ScreenProbeViewSize;
    ScreenProbeParameters.ScreenProbeAtlasViewSize.Y += FMath::TruncToInt(ScreenProbeParameters.ScreenProbeViewSize.Y * GLumenScreenProbeGatherAdaptiveProbeAllocationFraction);

    ScreenProbeParameters.ScreenProbeAtlasBufferSize = FIntPoint::DivideAndRoundUp(SceneTextures.Config.Extent, (int32)ScreenProbeParameters.ScreenProbeDownsampleFactor);
    ScreenProbeParameters.ScreenProbeAtlasBufferSize.Y += FMath::TruncToInt(ScreenProbeParameters.ScreenProbeAtlasBufferSize.Y * GLumenScreenProbeGatherAdaptiveProbeAllocationFraction);

    ScreenProbeParameters.ScreenProbeGatherMaxMip = GLumenScreenProbeGatherNumMips - 1;
    ScreenProbeParameters.RelativeSpeedDifferenceToConsiderLightingMoving = GLumenScreenProbeRelativeSpeedDifferenceToConsiderLightingMoving;
    ScreenProbeParameters.ScreenTraceNoFallbackThicknessScale = Lumen::UseHardwareRayTracedScreenProbeGather() ? 1.0f : GLumenScreenProbeScreenTracesThicknessScaleWhenNoFallback;
    ScreenProbeParameters.NumUniformScreenProbes = ScreenProbeParameters.ScreenProbeViewSize.X * ScreenProbeParameters.ScreenProbeViewSize.Y;
    ScreenProbeParameters.MaxNumAdaptiveProbes = FMath::TruncToInt(ScreenProbeParameters.NumUniformScreenProbes * GLumenScreenProbeGatherAdaptiveProbeAllocationFraction);
    extern int32 GLumenScreenProbeGatherVisualizeTraces;
    ScreenProbeParameters.FixedJitterIndex = GLumenScreenProbeGatherVisualizeTraces == 0 ? GLumenScreenProbeFixedJitterIndex : 6;

    FRDGTextureDesc DownsampledDepthDesc(FRDGTextureDesc::Create2D(ScreenProbeParameters.ScreenProbeAtlasBufferSize, PF_R32_UINT, FClearValueBinding::Black, TexCreate_ShaderResource | TexCreate_UAV));
    ScreenProbeParameters.ScreenProbeSceneDepth = GraphBuilder.CreateTexture(DownsampledDepthDesc, TEXT("Lumen.ScreenProbeGather.ScreenProbeSceneDepth"));

    FRDGTextureDesc DownsampledSpeedDesc(FRDGTextureDesc::Create2D(ScreenProbeParameters.ScreenProbeAtlasBufferSize, PF_R16F, FClearValueBinding::Black, TexCreate_ShaderResource | TexCreate_UAV));
    ScreenProbeParameters.ScreenProbeWorldSpeed = GraphBuilder.CreateTexture(DownsampledSpeedDesc, TEXT("Lumen.ScreenProbeGather.ScreenProbeWorldSpeed"));

    FBlueNoise BlueNoise;
    InitializeBlueNoise(BlueNoise);
    ScreenProbeParameters.BlueNoise = CreateUniformBufferImmediate(BlueNoise, EUniformBufferUsage::UniformBuffer_SingleDraw);

    ScreenProbeParameters.OctahedralSolidAngleParameters.OctahedralSolidAngleTextureResolutionSq = GLumenOctahedralSolidAngleTextureSize * GLumenOctahedralSolidAngleTextureSize;
    ScreenProbeParameters.OctahedralSolidAngleParameters.OctahedralSolidAngleTexture = InitializeOctahedralSolidAngleTexture(GraphBuilder, View.ShaderMap, GLumenOctahedralSolidAngleTextureSize, View.ViewState->Lumen.ScreenProbeGatherState.OctahedralSolidAngleTextureRT);

    // 探针下采样深度.
    {
        FScreenProbeDownsampleDepthUniformCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FScreenProbeDownsampleDepthUniformCS::FParameters>();
        PassParameters->RWScreenProbeSceneDepth = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ScreenProbeParameters.ScreenProbeSceneDepth));
        PassParameters->RWScreenProbeWorldSpeed = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ScreenProbeParameters.ScreenProbeWorldSpeed));
        PassParameters->View = View.ViewUniformBuffer;
        PassParameters->SceneTexturesStruct = SceneTextures.UniformBuffer;
        PassParameters->SceneTextures = SceneTextureParameters;
        PassParameters->ScreenProbeParameters = ScreenProbeParameters;

        auto ComputeShader = View.ShaderMap->GetShader<FScreenProbeDownsampleDepthUniformCS>(0);

        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("UniformPlacement DownsampleFactor=%u", ScreenProbeParameters.ScreenProbeDownsampleFactor),
            ComputeShader,
            PassParameters,
            FComputeShaderUtils::GetGroupCount(ScreenProbeParameters.ScreenProbeViewSize, FScreenProbeDownsampleDepthUniformCS::GetGroupSize()));
    }

    FRDGBufferRef NumAdaptiveScreenProbes = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateBufferDesc(sizeof(uint32), 1), TEXT("Lumen.ScreenProbeGather.NumAdaptiveScreenProbes"));
    FRDGBufferRef AdaptiveScreenProbeData = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateBufferDesc(sizeof(uint32), FMath::Max<uint32>(ScreenProbeParameters.MaxNumAdaptiveProbes, 1)), TEXT("Lumen.ScreenProbeGather.daptiveScreenProbeData"));

    ScreenProbeParameters.NumAdaptiveScreenProbes = GraphBuilder.CreateSRV(FRDGBufferSRVDesc(NumAdaptiveScreenProbes, PF_R32_UINT));
    ScreenProbeParameters.AdaptiveScreenProbeData = GraphBuilder.CreateSRV(FRDGBufferSRVDesc(AdaptiveScreenProbeData, PF_R32_UINT));

    const FIntPoint ScreenProbeViewportBufferSize = FIntPoint::DivideAndRoundUp(SceneTextures.Config.Extent, (int32)ScreenProbeParameters.ScreenProbeDownsampleFactor);
    FRDGTextureDesc ScreenTileAdaptiveProbeHeaderDesc(FRDGTextureDesc::Create2D(ScreenProbeViewportBufferSize, PF_R32_UINT, FClearValueBinding::Black, TexCreate_ShaderResource | TexCreate_UAV));
    FIntPoint ScreenTileAdaptiveProbeIndicesBufferSize = FIntPoint(ScreenProbeViewportBufferSize.X * ScreenProbeParameters.ScreenProbeDownsampleFactor, ScreenProbeViewportBufferSize.Y * ScreenProbeParameters.ScreenProbeDownsampleFactor);
    FRDGTextureDesc ScreenTileAdaptiveProbeIndicesDesc(FRDGTextureDesc::Create2D(ScreenTileAdaptiveProbeIndicesBufferSize, PF_R16_UINT, FClearValueBinding::Black, TexCreate_ShaderResource | TexCreate_UAV));
    ScreenProbeParameters.ScreenTileAdaptiveProbeHeader = GraphBuilder.CreateTexture(ScreenTileAdaptiveProbeHeaderDesc, TEXT("Lumen.ScreenProbeGather.ScreenTileAdaptiveProbeHeader"));
    ScreenProbeParameters.ScreenTileAdaptiveProbeIndices = GraphBuilder.CreateTexture(ScreenTileAdaptiveProbeIndicesDesc, TEXT("Lumen.ScreenProbeGather.ScreenTileAdaptiveProbeIndices"));

    FComputeShaderUtils::ClearUAV(GraphBuilder, View.ShaderMap, GraphBuilder.CreateUAV(FRDGBufferUAVDesc(NumAdaptiveScreenProbes, PF_R32_UINT)), 0);
    uint32 ClearValues[4] = {0, 0, 0, 0};
    AddClearUAVPass(GraphBuilder, GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ScreenProbeParameters.ScreenTileAdaptiveProbeHeader)), ClearValues);

    const uint32 AdaptiveProbeMinDownsampleFactor = FMath::Clamp(GLumenScreenProbeGatherAdaptiveProbeMinDownsampleFactor, 1, 64);

    if (ScreenProbeParameters.MaxNumAdaptiveProbes > 0 && AdaptiveProbeMinDownsampleFactor < ScreenProbeParameters.ScreenProbeDownsampleFactor)
    { 
        // 探针自适应地放置位置.
        uint32 PlacementDownsampleFactor = ScreenProbeParameters.ScreenProbeDownsampleFactor;
        do
        {
            PlacementDownsampleFactor /= 2;
            FScreenProbeAdaptivePlacementCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FScreenProbeAdaptivePlacementCS::FParameters>();
            PassParameters->RWScreenProbeSceneDepth = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ScreenProbeParameters.ScreenProbeSceneDepth));
            PassParameters->RWScreenProbeWorldSpeed = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ScreenProbeParameters.ScreenProbeWorldSpeed));
            PassParameters->RWNumAdaptiveScreenProbes = GraphBuilder.CreateUAV(FRDGBufferUAVDesc(NumAdaptiveScreenProbes, PF_R32_UINT));
            PassParameters->RWAdaptiveScreenProbeData = GraphBuilder.CreateUAV(FRDGBufferUAVDesc(AdaptiveScreenProbeData, PF_R32_UINT));
            PassParameters->RWScreenTileAdaptiveProbeHeader = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ScreenProbeParameters.ScreenTileAdaptiveProbeHeader));
            PassParameters->RWScreenTileAdaptiveProbeIndices = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ScreenProbeParameters.ScreenTileAdaptiveProbeIndices));
            PassParameters->View = View.ViewUniformBuffer;
            PassParameters->SceneTexturesStruct = SceneTextures.UniformBuffer;
            PassParameters->SceneTextures = SceneTextureParameters;
            PassParameters->ScreenProbeParameters = ScreenProbeParameters;
            PassParameters->PlacementDownsampleFactor = PlacementDownsampleFactor;

            auto ComputeShader = View.ShaderMap->GetShader<FScreenProbeAdaptivePlacementCS>(0);

            FComputeShaderUtils::AddPass(
                GraphBuilder,
                RDG_EVENT_NAME("AdaptivePlacement DownsampleFactor=%u", PlacementDownsampleFactor),
                ComputeShader,
                PassParameters,
                FComputeShaderUtils::GetGroupCount(FIntPoint::DivideAndRoundDown(View.ViewRect.Size(), (int32)PlacementDownsampleFactor), FScreenProbeAdaptivePlacementCS::GetGroupSize()));
        }
        while (PlacementDownsampleFactor > AdaptiveProbeMinDownsampleFactor);
    }
    else
    {
        FComputeShaderUtils::ClearUAV(GraphBuilder, View.ShaderMap, GraphBuilder.CreateUAV(FRDGBufferUAVDesc(AdaptiveScreenProbeData, PF_R32_UINT)), 0);
        AddClearUAVPass(GraphBuilder, GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ScreenProbeParameters.ScreenTileAdaptiveProbeIndices)), ClearValues);
    }

    FRDGBufferRef ScreenProbeIndirectArgs = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateIndirectDesc<FRHIDispatchIndirectParameters>((uint32)EScreenProbeIndirectArgs::Max), TEXT("Lumen.ScreenProbeGather.ScreenProbeIndirectArgs"));

    // 设置自适应探针的非直接参数.
    {
        FSetupAdaptiveProbeIndirectArgsCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FSetupAdaptiveProbeIndirectArgsCS::FParameters>();
        PassParameters->RWScreenProbeIndirectArgs = GraphBuilder.CreateUAV(FRDGBufferUAVDesc(ScreenProbeIndirectArgs, PF_R32_UINT));
        PassParameters->ScreenProbeParameters = ScreenProbeParameters;

        auto ComputeShader = View.ShaderMap->GetShader<FSetupAdaptiveProbeIndirectArgsCS>(0);

        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("SetupAdaptiveProbeIndirectArgs"),
            ComputeShader,
            PassParameters,
            FIntVector(1, 1, 1));
    }

    ScreenProbeParameters.ProbeIndirectArgs = ScreenProbeIndirectArgs;

    FLumenCardTracingInputs TracingInputs(GraphBuilder, Scene, View);

    FRDGTextureRef BRDFProbabilityDensityFunction = nullptr;
    FRDGBufferSRVRef BRDFProbabilityDensityFunctionSH = nullptr;
    GenerateBRDF_PDF(GraphBuilder, View, SceneTextures, BRDFProbabilityDensityFunction, BRDFProbabilityDensityFunctionSH, ScreenProbeParameters);

    const LumenRadianceCache::FRadianceCacheInputs RadianceCacheInputs = LumenScreenProbeGatherRadianceCache::SetupRadianceCacheInputs();
    LumenRadianceCache::FRadianceCacheInterpolationParameters RadianceCacheParameters;

    // 辐射率缓存.
    if (LumenScreenProbeGather::UseRadianceCache(View))
    {
        FScreenGatherMarkUsedProbesData MarkUsedProbesData;
        MarkUsedProbesData.Parameters.View = View.ViewUniformBuffer;
        MarkUsedProbesData.Parameters.SceneTexturesStruct = SceneTextures.UniformBuffer;
        MarkUsedProbesData.Parameters.ScreenProbeParameters = ScreenProbeParameters;
        MarkUsedProbesData.Parameters.VisualizeLumenScene = View.Family->EngineShowFlags.VisualizeLumenScene != 0 ? 1 : 0;
        MarkUsedProbesData.Parameters.RadianceCacheParameters = RadianceCacheParameters;

        // 渲染辐射率缓存.
        RenderRadianceCache(
            GraphBuilder, 
            TracingInputs, 
            RadianceCacheInputs, 
            Scene,
            View, 
            &ScreenProbeParameters, 
            BRDFProbabilityDensityFunctionSH, 
            FMarkUsedRadianceCacheProbes::CreateStatic(&ScreenGatherMarkUsedProbes), 
            &MarkUsedProbesData, 
            View.ViewState->RadianceCacheState, 
            RadianceCacheParameters);
    }

    if (LumenScreenProbeGather::UseImportanceSampling(View))
    {
        // 生成重要性采样射线.
        GenerateImportanceSamplingRays(
            GraphBuilder,
            View,
            SceneTextures,
            RadianceCacheParameters,
            BRDFProbabilityDensityFunction,
            BRDFProbabilityDensityFunctionSH,
            ScreenProbeParameters);
    }

    const FIntPoint ScreenProbeTraceBufferSize = ScreenProbeParameters.ScreenProbeAtlasBufferSize * ScreenProbeParameters.ScreenProbeTracingOctahedronResolution;
    FRDGTextureDesc TraceRadianceDesc(FRDGTextureDesc::Create2D(ScreenProbeTraceBufferSize, PF_FloatRGB, FClearValueBinding::Black, TexCreate_ShaderResource | TexCreate_UAV));
    ScreenProbeParameters.TraceRadiance = GraphBuilder.CreateTexture(TraceRadianceDesc, TEXT("Lumen.ScreenProbeGather.TraceRadiance"));
    ScreenProbeParameters.RWTraceRadiance = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ScreenProbeParameters.TraceRadiance));

    FRDGTextureDesc TraceHitDesc(FRDGTextureDesc::Create2D(ScreenProbeTraceBufferSize, PF_R32_UINT, FClearValueBinding::Black, TexCreate_ShaderResource | TexCreate_UAV));
    ScreenProbeParameters.TraceHit = GraphBuilder.CreateTexture(TraceHitDesc, TEXT("Lumen.ScreenProbeGather.TraceHit"));
    ScreenProbeParameters.RWTraceHit = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ScreenProbeParameters.TraceHit));

    // 追踪屏幕空间的探针.
    TraceScreenProbes(
        GraphBuilder, 
        Scene,
        View, 
        GLumenGatherCvars.TraceMeshSDFs != 0 && Lumen::UseMeshSDFTracing(),
        SceneTextures.UniformBuffer,
        PrevSceneColorMip,
        LightingChannelsTexture,
        TracingInputs,
        RadianceCacheParameters,
        ScreenProbeParameters,
        MeshSDFGridParameters);
    
    FScreenProbeGatherParameters GatherParameters;
    // 过滤屏幕空间探针.
    FilterScreenProbes(GraphBuilder, View, ScreenProbeParameters, GatherParameters);

    FScreenSpaceBentNormalParameters ScreenSpaceBentNormalParameters;
    ScreenSpaceBentNormalParameters.UseScreenBentNormal = 0;
    ScreenSpaceBentNormalParameters.ScreenBentNormal = SystemTextures.Black;
    ScreenSpaceBentNormalParameters.ScreenDiffuseLighting = SystemTextures.Black;

    // 计算屏幕空间的环境法线.
    if (LumenScreenProbeGather::UseScreenSpaceBentNormal())
    {
        ScreenSpaceBentNormalParameters = ComputeScreenSpaceBentNormal(GraphBuilder, Scene, View, SceneTextures, LightingChannelsTexture, ScreenProbeParameters);
    }

    FRDGTextureDesc DiffuseIndirectDesc = FRDGTextureDesc::Create2D(SceneTextures.Config.Extent, PF_FloatRGBA, FClearValueBinding::Black, TexCreate_ShaderResource | TexCreate_UAV);
    FRDGTextureRef DiffuseIndirect = GraphBuilder.CreateTexture(DiffuseIndirectDesc, TEXT("Lumen.ScreenProbeGather.DiffuseIndirect"));

    FRDGTextureDesc RoughSpecularIndirectDesc = FRDGTextureDesc::Create2D(SceneTextures.Config.Extent, PF_FloatRGB, FClearValueBinding::Black, TexCreate_ShaderResource | TexCreate_UAV);
    FRDGTextureRef RoughSpecularIndirect = GraphBuilder.CreateTexture(RoughSpecularIndirectDesc, TEXT("Lumen.ScreenProbeGather.RoughSpecularIndirect"));

    {
        FScreenProbeIndirectCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FScreenProbeIndirectCS::FParameters>();
        PassParameters->RWDiffuseIndirect = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(DiffuseIndirect));
        PassParameters->RWRoughSpecularIndirect = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(RoughSpecularIndirect));
        PassParameters->GatherParameters = GatherParameters;
        PassParameters->ScreenProbeParameters = ScreenProbeParameters;
        PassParameters->View = View.ViewUniformBuffer;
        PassParameters->SceneTexturesStruct = SceneTextures.UniformBuffer;
        PassParameters->FullResolutionJitterWidth = GLumenScreenProbeFullResolutionJitterWidth;
        extern float GLumenReflectionMaxRoughnessToTrace;
        extern float GLumenReflectionRoughnessFadeLength;
        PassParameters->MaxRoughnessToTrace = GLumenReflectionMaxRoughnessToTrace;
        PassParameters->RoughnessFadeLength = GLumenReflectionRoughnessFadeLength;
        PassParameters->ScreenSpaceBentNormalParameters = ScreenSpaceBentNormalParameters;

        FScreenProbeIndirectCS::FPermutationDomain PermutationVector;
        PermutationVector.Set< FScreenProbeIndirectCS::FDiffuseIntegralMethod >(LumenScreenProbeGather::GetDiffuseIntegralMethod());
        auto ComputeShader = View.ShaderMap->GetShader<FScreenProbeIndirectCS>(PermutationVector);

        // 计算屏幕空间探针的非直接光.
        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("ComputeIndirect %ux%u", View.ViewRect.Width(), View.ViewRect.Height()),
            ComputeShader,
            PassParameters,
            FComputeShaderUtils::GetGroupCount(View.ViewRect.Size(), FScreenProbeIndirectCS::GetGroupSize()));
    }

    FSSDSignalTextures DenoiserOutputs;
    DenoiserOutputs.Textures[0] = DiffuseIndirect;
    DenoiserOutputs.Textures[1] = RoughSpecularIndirect;
    bLumenUseDenoiserComposite = false;

    // 屏幕空间探针的时间过滤.
    if (GLumenScreenProbeTemporalFilter)
    {
        if (GLumenScreenProbeUseHistoryNeighborhoodClamp)
        {
            FRDGTextureRef CompressedDepthTexture;
            FRDGTextureRef CompressedShadingModelTexture;
            {
                FRDGTextureDesc Desc = FRDGTextureDesc::Create2D(
                    SceneTextures.Depth.Resolve->Desc.Extent,
                    PF_R16F,
                    FClearValueBinding::None,                    
                    /* InTargetableFlags = */ TexCreate_ShaderResource | TexCreate_UAV);

                CompressedDepthTexture = GraphBuilder.CreateTexture(Desc, TEXT("Lumen.ScreenProbeGather.CompressedDepth"));

                Desc.Format = PF_R8_UINT;
                CompressedShadingModelTexture = GraphBuilder.CreateTexture(Desc, TEXT("Lumen.ScreenProbeGather.CompressedShadingModelID"));
            }

            {
                FGenerateCompressedGBuffer::FParameters* PassParameters = GraphBuilder.AllocParameters<FGenerateCompressedGBuffer::FParameters>();
                PassParameters->RWCompressedDepthBufferOutput = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(CompressedDepthTexture));
                PassParameters->RWCompressedShadingModelOutput = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(CompressedShadingModelTexture));
                PassParameters->View = View.ViewUniformBuffer;
                PassParameters->SceneTextures = SceneTextureParameters;

                auto ComputeShader = View.ShaderMap->GetShader<FGenerateCompressedGBuffer>(0);

                FComputeShaderUtils::AddPass(
                    GraphBuilder,
                    RDG_EVENT_NAME("GenerateCompressedGBuffer"),
                    ComputeShader,
                    PassParameters,
                    FComputeShaderUtils::GetGroupCount(View.ViewRect.Size(), FGenerateCompressedGBuffer::GetGroupSize()));
            }

            FSSDSignalTextures ScreenSpaceDenoiserInputs;
            ScreenSpaceDenoiserInputs.Textures[0] = DiffuseIndirect;
            ScreenSpaceDenoiserInputs.Textures[1] = RoughSpecularIndirect;

            DenoiserOutputs = IScreenSpaceDenoiser::DenoiseIndirectProbeHierarchy(
                GraphBuilder,
                View, 
                PreviousViewInfos,
                SceneTextureParameters,
                ScreenSpaceDenoiserInputs,
                CompressedDepthTexture,
                CompressedShadingModelTexture);

            bLumenUseDenoiserComposite = true;
        }
        else
        {
            UpdateHistoryScreenProbeGather(
                GraphBuilder,
                View,
                SceneTextures,
                DiffuseIndirect,
                RoughSpecularIndirect);

            DenoiserOutputs.Textures[0] = DiffuseIndirect;
            DenoiserOutputs.Textures[1] = RoughSpecularIndirect;
        }
    }

    return DenoiserOutputs;
}

结合源码和RenderDoc截帧数据,可知屏幕空间的探针收集阶段异常复杂,常规流程的主要步骤有:全局并自适应调整位置、计算BRDF、渲染辐射率缓存、计算光照PDF、生成采样射线、追踪屏幕空间的探针、压缩追踪结果、追踪Voxel体素、组合追踪结果、过滤带收集的辐射率、处理环境法线、计算非直接光、更新历史数据:

由于以上步骤涉及太多了,只能结合截帧数据挑选部分重要步骤加以分析。

  • RadianceCache

光照缓存(RadianceCache)也是一系列非常复杂的过程,先后经历清理、标记、更新、分配探针,设置绘制参数,追踪探针,过滤探针辐射度等阶段:

RadianceCache最重要的是追踪屏幕空间的探针,它的输入数据有全局距离场、VoxelLighting等纹理。

输出是4096x4096的辐射率探针图集和深度:

TraceFromProbes输出的探针图集(局部放大)。

其使用的Compute Shader代码如下:

// EngineShadersPrivateLumenLumenRadianceCache.usf

groupshared float3 SharedTraceRadiance[THREADGROUP_SIZE][THREADGROUP_SIZE];
groupshared float SharedTraceHitDistance[THREADGROUP_SIZE][THREADGROUP_SIZE];

[numthreads(THREADGROUP_SIZE, THREADGROUP_SIZE, 1)]
void TraceFromProbesCS(
    uint3 GroupId : SV_GroupID,
    uint2 GroupThreadId : SV_GroupThreadID)
{
    uint TraceTileIndex = GroupId.y * TRACE_TILE_GROUP_STRIDE + GroupId.x;

    if (TraceTileIndex < ProbeTraceTileAllocator[0])
    {
        uint2 TraceTileCoord;
        uint TraceTileLevel;
        uint ProbeTraceIndex;
        // 获取追踪块的信息
        UnpackTraceTileInfo(ProbeTraceTileData[TraceTileIndex], TraceTileCoord, TraceTileLevel, ProbeTraceIndex);

        uint TraceResolution = (RadianceProbeResolution / 2) << TraceTileLevel;
        // 探针纹素坐标
        uint2 ProbeTexelCoord = TraceTileCoord * THREADGROUP_SIZE + GroupThreadId.xy;


        float3 ProbeWorldCenter;
        uint ClipmapIndex;
        uint ProbeIndex;
        // 获取探针的追踪数据.
        GetProbeTraceData(ProbeTraceIndex, ProbeWorldCenter, ClipmapIndex, ProbeIndex);

        if (all(ProbeTexelCoord < TraceResolution))
        {
            float2 ProbeTexelCenter = float2(0.5, 0.5);
            float2 ProbeUV = (ProbeTexelCoord + ProbeTexelCenter) / float(TraceResolution);
            float3 WorldConeDirection = OctahedralMapToDirection(ProbeUV);

            float FinalMinTraceDistance = max(MinTraceDistance, GetRadianceProbeTMin(ClipmapIndex));
            float FinalMaxTraceDistance = MaxTraceDistance;
            float EffectiveStepFactor = StepFactor;

            // 将球的立体角均匀地分布在所有锥体上,而不是基于八面体的畸变.
            float ConeHalfAngle = acosFast(1.0f - 1.0f / (float)(TraceResolution * TraceResolution));

            // 设置锥体追踪输入数据.
            FConeTraceInput TraceInput;
            TraceInput.Setup(
                ProbeWorldCenter, WorldConeDirection,
                ConeHalfAngle, MinSampleRadius,
                FinalMinTraceDistance, FinalMaxTraceDistance,
                EffectiveStepFactor);
            TraceInput.VoxelStepFactor = VoxelStepFactor;

            bool bContinueCardTracing = false;

            TraceInput.VoxelTraceStartDistance = CalculateVoxelTraceStartDistance(FinalMinTraceDistance, FinalMaxTraceDistance, MaxMeshSDFTraceDistance, bContinueCardTracing);

            // 为探针纹素执行锥体追踪.
            FConeTraceResult TraceResult = TraceForProbeTexel(TraceInput);

            // 存储追踪的光照结果.
            SharedTraceRadiance[GroupThreadId.y][GroupThreadId.x] = TraceResult.Lighting;

            // 存储追踪的深度.
            #if RADIANCE_CACHE_STORE_DEPTHS
                SharedTraceHitDistance[GroupThreadId.y][GroupThreadId.x] = TraceResult.OpaqueHitDistance;
            #endif
        }

        GroupMemoryBarrierWithGroupSync();

        uint2 ProbeAtlasBaseCoord = RadianceProbeResolution * uint2(ProbeIndex % ProbeAtlasResolutionInProbes.x, ProbeIndex / ProbeAtlasResolutionInProbes.x);

        // 存储光照结果和相交点的距离.
        if (TraceResolution < RadianceProbeResolution)
        {
            uint UpsampleFactor = RadianceProbeResolution / TraceResolution;
            ProbeAtlasBaseCoord += (THREADGROUP_SIZE * TraceTileCoord + GroupThreadId.xy) * UpsampleFactor;

            float3 Lighting = SharedTraceRadiance[GroupThreadId.y][GroupThreadId.x];

            for (uint Y = 0; Y < UpsampleFactor; Y++)
            {
                for (uint X = 0; X < UpsampleFactor; X++)
                {
                    RWRadianceProbeAtlasTexture[ProbeAtlasBaseCoord + uint2(X, Y)] = Lighting;
                }
            }

            #if RADIANCE_CACHE_STORE_DEPTHS
                float HitDistance = min(SharedTraceHitDistance[GroupThreadId.y][GroupThreadId.x], MaxHalfFloat);

                for (uint Y = 0; Y < UpsampleFactor; Y++)
                {
                    for (uint X = 0; X < UpsampleFactor; X++)
                    {
                        RWDepthProbeAtlasTexture[ProbeAtlasBaseCoord + uint2(X, Y)] = HitDistance;
                    }
                }
            #endif
        }
        else
        {
            uint DownsampleFactor = TraceResolution / RadianceProbeResolution;
            uint WriteTileSize = THREADGROUP_SIZE / DownsampleFactor;

            if (all(GroupThreadId.xy < WriteTileSize))
            {
                float3 Lighting = 0;

                for (uint Y = 0; Y < DownsampleFactor; Y++)
                {
                    for (uint X = 0; X < DownsampleFactor; X++)
                    {
                        Lighting += SharedTraceRadiance[GroupThreadId.y * DownsampleFactor + Y][GroupThreadId.x * DownsampleFactor + X];
                    }
                }

                ProbeAtlasBaseCoord += WriteTileSize * TraceTileCoord + GroupThreadId.xy;
                RWRadianceProbeAtlasTexture[ProbeAtlasBaseCoord] = Lighting / (float)(DownsampleFactor * DownsampleFactor);

                #if RADIANCE_CACHE_STORE_DEPTHS
                    float HitDistance = MaxHalfFloat;

                    for (uint Y = 0; Y < DownsampleFactor; Y++)
                    {
                        for (uint X = 0; X < DownsampleFactor; X++)
                        {
                            HitDistance = min(HitDistance, SharedTraceHitDistance[GroupThreadId.y * DownsampleFactor + Y][GroupThreadId.x * DownsampleFactor + X]);
                        }
                    }

                    RWDepthProbeAtlasTexture[ProbeAtlasBaseCoord] = HitDistance;
                #endif
            }
        }
    }
}

下面再进入TraceForProbeTexel分析探针纹素的追踪堆栈:

FConeTraceResult TraceForProbeTexel(FConeTraceInput TraceInput)
{
    // 构造追踪结果结构体.
    FConeTraceResult TraceResult;
    TraceResult = (FConeTraceResult)0;
    TraceResult.Lighting = 0.0;
    TraceResult.Transparency = 1.0;
    TraceResult.OpaqueHitDistance = TraceInput.MaxTraceDistance;

    // 锥体追踪Lumen场景的纹素, 后面有解析.
    ConeTraceLumenSceneVoxels(TraceInput, TraceResult);

    // 远景距离场的追踪.
#if TRACE_DISTANT_SCENE
    if (TraceResult.Transparency > .01f)
    {
        FConeTraceResult DistantTraceResult;
        // 锥体追踪Lumen远处场景, 后面有解析.
        ConeTraceLumenDistantScene(TraceInput, DistantTraceResult);
        TraceResult.Lighting += DistantTraceResult.Lighting * TraceResult.Transparency;
        TraceResult.Transparency *= DistantTraceResult.Transparency;
    }
#endif

    // 天空光处理.
#if ENABLE_DYNAMIC_SKY_LIGHT
    if (ReflectionStruct.SkyLightParameters.y > 0)
    {
        float SkyAverageBrightness = 1.0f;
        float Roughness = TanConeAngleToRoughness(tan(TraceInput.ConeAngle));

        TraceResult.Lighting = TraceResult.Lighting + GetSkyLightReflection(TraceInput.ConeDirection, Roughness, SkyAverageBrightness) * TraceResult.Transparency;
    }
#endif

    return TraceResult;
}

// 锥体追踪Lumen场景的纹素
void ConeTraceLumenSceneVoxels(
    FConeTraceInput TraceInput,
    inout FConeTraceResult OutResult)
{
#if SCENE_TRACE_VOXELS
    if (TraceInput.VoxelTraceStartDistance < TraceInput.MaxTraceDistance)
    {
        FConeTraceInput VoxelTraceInput = TraceInput;
        VoxelTraceInput.MinTraceDistance = TraceInput.VoxelTraceStartDistance;
        FConeTraceResult VoxelTraceResult;
        // 锥体追踪体素, 之前就解析过了.
        ConeTraceVoxels(VoxelTraceInput, VoxelTraceResult);

        // 应用透明度.
        #if !VISIBILITY_ONLY_TRACE
            OutResult.Lighting += VoxelTraceResult.Lighting * OutResult.Transparency;
        #endif
        OutResult.Transparency *= VoxelTraceResult.Transparency;
        OutResult.NumSteps += VoxelTraceResult.NumSteps;
        OutResult.OpaqueHitDistance = min(OutResult.OpaqueHitDistance, VoxelTraceResult.OpaqueHitDistance);
    }
#endif
}

// 锥体追踪Lumen远处场景.
void ConeTraceLumenDistantScene(
    FConeTraceInput TraceInput,
    inout FConeTraceResult OutResult)
{
    float3 debug = 0;
    TraceInput.MaxTraceDistance = LumenCardScene.DistantSceneMaxTraceDistance;
    TraceInput.bBlackOutSteepIntersections = true;

    FCardTraceBlendState CardTraceBlendState;
    CardTraceBlendState.Initialize(TraceInput.MaxTraceDistance);

    if (LumenCardScene.NumDistantCards > 0)
    {
        // 从裁剪图获取最小追踪距离.
        if (NumClipmapLevels > 0)
        {
            float3 VoxelLightingCenter = ClipmapWorldCenter[NumClipmapLevels - 1].xyz;
            float3 VoxelLightingExtent = ClipmapWorldSamplingExtent[NumClipmapLevels - 1].xyz;

            float3 RayEnd = TraceInput.ConeOrigin + TraceInput.ConeDirection * TraceInput.MaxTraceDistance;
            float2 IntersectionTimes = LineBoxIntersect(TraceInput.ConeOrigin, RayEnd, VoxelLightingCenter - VoxelLightingExtent, VoxelLightingCenter + VoxelLightingExtent);

            // If we are starting inside the voxel clipmaps, move the start of the trace past the voxel clipmaps
            if (IntersectionTimes.x < IntersectionTimes.y && IntersectionTimes.x < .001f)
            {
                TraceInput.MinTraceDistance = IntersectionTimes.y * TraceInput.MaxTraceDistance;
            }
        }

        float TraceEndDistance = TraceInput.MinTraceDistance;

        {
            uint ListIndex = 0;
            uint CardIndex = LumenCardScene.DistantCardIndices[ListIndex];

            // 锥体追踪单个Lumen卡片, 后面有解析.
            ConeTraceSingleLumenCard(
                TraceInput,
                CardIndex,
                debug,
                TraceEndDistance,
                CardTraceBlendState);
        }
    }

    OutResult = (FConeTraceResult)0;

    // 存储结果.
    #if !VISIBILITY_ONLY_TRACE
        OutResult.Lighting = CardTraceBlendState.GetFinalLighting();
    #endif
    OutResult.Transparency = CardTraceBlendState.GetTransparency();
    OutResult.NumSteps = CardTraceBlendState.NumSteps;
    OutResult.NumOverlaps = CardTraceBlendState.NumOverlaps;
    OutResult.OpaqueHitDistance = CardTraceBlendState.OpaqueHitDistance;
    OutResult.Debug = debug;
}

// 锥体追踪单个Lumen卡片
void ConeTraceSingleLumenCard(
    FConeTraceInput TraceInput,
    uint CardIndex,
    inout float3 Debug,
    inout float OutTraceEndDistance,
    inout FCardTraceBlendState CardTraceBlendState)
{
    // 获取卡片数据.
    FLumenCardData LumenCardData = GetLumenCardData(CardIndex);

    // 计算局部空间的锥体数据.
    float3 LocalConeOrigin = mul(TraceInput.ConeOrigin - LumenCardData.Origin, LumenCardData.WorldToLocalRotation);
    float3 LocalConeDirection = mul(TraceInput.ConeDirection, LumenCardData.WorldToLocalRotation);
    float3 LocalTraceEnd = LocalConeOrigin + LocalConeDirection * TraceInput.MaxTraceDistance;

    // 相交范围.
    float2 IntersectionRange = LineBoxIntersect(LocalConeOrigin, LocalTraceEnd, -LumenCardData.LocalExtent, LumenCardData.LocalExtent);
    IntersectionRange.x = max(IntersectionRange.x, TraceInput.MinTraceDistance / TraceInput.MaxTraceDistance);
    OutTraceEndDistance = IntersectionRange.y * TraceInput.MaxTraceDistance;

    if (IntersectionRange.y > IntersectionRange.x
        && LumenCardData.bVisible)
    {
        {
            // 卡片追踪混合状态.
            FCardTraceBlendState ConeStepBlendState;
            ConeStepBlendState.Initialize(TraceInput.MaxTraceDistance);

            float StepTime = IntersectionRange.x * TraceInput.MaxTraceDistance;
            float3 SamplePosition = LocalConeOrigin + StepTime * LocalConeDirection;
            float TraceEndDistance = IntersectionRange.y * TraceInput.MaxTraceDistance;

            float IntersectionLength = (IntersectionRange.y - IntersectionRange.x) * TraceInput.MaxTraceDistance;
            float MinStepSize = IntersectionLength / (float)LumenCardScene.MaxConeSteps;

            float PreviousStepTime = StepTime;
            float3 PreviousSamplePosition = SamplePosition;
            // Magic value to prevent linear intersection approximation on first step
            float PreviousHeightfieldZ = -2;

            bool bClampedToEnd = false;
            bool bFoundSurface = false;
            bool bRayAboveSurface = false;
            float IntersectionStepTime = 0;
            float2 IntersectionSamplePositionXY = SamplePosition.xy;
            float IntersectionSlope = 0;

            uint NumStepsPerLoop = 4; // 每次循环采样4次.
            for (uint StepIndex = 0; StepIndex < LumenCardScene.MaxConeSteps && StepTime < TraceEndDistance; StepIndex += NumStepsPerLoop)
            {
                float SampleRadius = max(TraceInput.ConeStartRadius + TraceInput.TanConeAngle * StepTime, TraceInput.MinSampleRadius);
                float StepSize = max(SampleRadius * TraceInput.StepFactor, MinStepSize);
                float TraceClampDistance = TraceEndDistance - StepSize * .0001f;

                float DepthMip;
                float2 DepthValidRegionScale;
                CalculateMip(SampleRadius, LumenCardData, LumenCardData.LocalExtent, LumenCardData.MaxMip, DepthMip, DepthValidRegionScale);

                // 4个采样位置.
                float3 SamplePosition1 = LocalConeOrigin + min(StepTime + 0 * StepSize, TraceClampDistance) * LocalConeDirection;
                float3 SamplePosition2 = LocalConeOrigin + min(StepTime + 1 * StepSize, TraceClampDistance) * LocalConeDirection;
                float3 SamplePosition3 = LocalConeOrigin + min(StepTime + 2 * StepSize, TraceClampDistance) * LocalConeDirection;
                float3 SamplePosition4 = LocalConeOrigin + min(StepTime + 3 * StepSize, TraceClampDistance) * LocalConeDirection;

                // 4个深度UV.
                float2 DepthAtlasUV1 = CalculateAtlasUV(SamplePosition1.xy, DepthValidRegionScale, LumenCardData);
                float2 DepthAtlasUV2 = CalculateAtlasUV(SamplePosition2.xy, DepthValidRegionScale, LumenCardData);
                float2 DepthAtlasUV3 = CalculateAtlasUV(SamplePosition3.xy, DepthValidRegionScale, LumenCardData);
                float2 DepthAtlasUV4 = CalculateAtlasUV(SamplePosition4.xy, DepthValidRegionScale, LumenCardData);

                // 4个深度.
                float Depth1 = Texture2DSampleLevel(DepthAtlas, TRACING_ATLAS_SAMPLER, DepthAtlasUV1, DepthMip).x;
                float Depth2 = Texture2DSampleLevel(DepthAtlas, TRACING_ATLAS_SAMPLER, DepthAtlasUV2, DepthMip).x;
                float Depth3 = Texture2DSampleLevel(DepthAtlas, TRACING_ATLAS_SAMPLER, DepthAtlasUV3, DepthMip).x;
                float Depth4 = Texture2DSampleLevel(DepthAtlas, TRACING_ATLAS_SAMPLER, DepthAtlasUV4, DepthMip).x;

                // 4个高度场Z值.
                float HeightfieldZ1 = LumenCardData.LocalExtent.z - Depth1 * 2 * LumenCardData.LocalExtent.z;
                float HeightfieldZ2 = LumenCardData.LocalExtent.z - Depth2 * 2 * LumenCardData.LocalExtent.z;
                float HeightfieldZ3 = LumenCardData.LocalExtent.z - Depth3 * 2 * LumenCardData.LocalExtent.z;
                float HeightfieldZ4 = LumenCardData.LocalExtent.z - Depth4 * 2 * LumenCardData.LocalExtent.z;

                ConeStepBlendState.RegisterStep(NumStepsPerLoop);

                // 高度场是否相交.
                bool4 HeightfieldHit = bool4(
                    SamplePosition1.z < HeightfieldZ1,
                    SamplePosition2.z < HeightfieldZ2,
                    SamplePosition3.z < HeightfieldZ3,
                    SamplePosition4.z < HeightfieldZ4);

                bool bRayBelowHeightfield = any(HeightfieldHit);
                bool bRayWasAboveSurface = bRayAboveSurface;

                if (!bRayBelowHeightfield)
                {
                    bRayAboveSurface = true;
                }

                // 从高度场以下开始的追踪必须在到达高度场以上才能被命中
                if (bRayBelowHeightfield && bRayWasAboveSurface)
                {
                    float HeightfieldZ;
                    if (HeightfieldHit.x)
                    {
                        SamplePosition = SamplePosition1;
                        HeightfieldZ = HeightfieldZ1;
                        StepTime = StepTime + 0 * StepSize;
                    }
                    else if (HeightfieldHit.y)
                    {
                        PreviousSamplePosition = SamplePosition1;
                        PreviousHeightfieldZ = HeightfieldZ1;
                        PreviousStepTime = StepTime + 0 * StepSize;

                        SamplePosition = SamplePosition2;
                        HeightfieldZ = HeightfieldZ2;
                        StepTime = StepTime + 1 * StepSize;
                    }
                    else if (HeightfieldHit.z)
                    {
                        PreviousSamplePosition = SamplePosition2;
                        PreviousHeightfieldZ = HeightfieldZ2;
                        PreviousStepTime = StepTime + 1 * StepSize;

                        SamplePosition = SamplePosition3;
                        HeightfieldZ = HeightfieldZ3;
                        StepTime = StepTime + 2 * StepSize;
                    }
                    else
                    {
                        PreviousSamplePosition = SamplePosition3;
                        PreviousHeightfieldZ = HeightfieldZ3;
                        PreviousStepTime = StepTime + 2 * StepSize;

                        SamplePosition = SamplePosition4;
                        HeightfieldZ = HeightfieldZ4;
                        StepTime = StepTime + 3 * StepSize;
                    }

                    StepTime = min(StepTime, TraceClampDistance);

                    if (PreviousHeightfieldZ != -2)
                    {
                        // 求出x的交点.
                        IntersectionStepTime = PreviousStepTime + ((PreviousSamplePosition.z - PreviousHeightfieldZ) * (StepTime - PreviousStepTime)) / (HeightfieldZ - PreviousHeightfieldZ + PreviousSamplePosition.z - SamplePosition.z);

                        float2 LocalPositionSlopeXY = (SamplePosition.xy - PreviousSamplePosition.xy) / (StepTime - PreviousStepTime);
                        IntersectionSamplePositionXY = LocalPositionSlopeXY * (IntersectionStepTime - PreviousStepTime) + PreviousSamplePosition.xy;

                        IntersectionSlope = abs(PreviousHeightfieldZ - HeightfieldZ) / max(length(PreviousSamplePosition.xy - SamplePosition.xy), .0001f);

                        PreviousHeightfieldZ = -2;
                        // 找到了表面.
                        bFoundSurface = true;
                    }
                    break;
                }

                PreviousStepTime = StepTime + 3 * StepSize;
                PreviousSamplePosition = SamplePosition4;
                PreviousHeightfieldZ = HeightfieldZ4;
                StepTime += 4 * StepSize;

                if (StepTime >= TraceEndDistance && !bClampedToEnd)
                {
                    bClampedToEnd = true;
                    // Stop the last step just before the intersection end, since the linear approximation needs to step past the surface to detect a hit, without terminating the loop
                    StepTime = TraceClampDistance;
                }
            }

            // 如果找到了表面点.
            if (bFoundSurface)
            {
                float IntersectionSampleRadius = TraceInput.ConeStartRadius + TraceInput.TanConeAngle * IntersectionStepTime;

                float MaxMip;
                float2 ValidRegionScale;
                CalculateMip(IntersectionSampleRadius, LumenCardData, LumenCardData.LocalExtent, LumenCardData.MaxMip, MaxMip, ValidRegionScale);

                float2 IntersectionAtlasUV = CalculateAtlasUV(IntersectionSamplePositionXY, ValidRegionScale, LumenCardData);

                float DistanceToSurface = 0;
                float ConeIntersectSurface = saturate(DistanceToSurface / IntersectionSampleRadius);
                float ConeVisibility = ConeIntersectSurface;

                float MaxDistanceFade = 1;

                ConeStepBlendState.RegisterOpaqueHit(IntersectionStepTime);
                OutTraceEndDistance = IntersectionStepTime;

                float Opacity = Texture2DSampleLevel(OpacityAtlas, TRACING_ATLAS_SAMPLER, IntersectionAtlasUV, MaxMip).x;
                float ConeOcclusion = (1.0f - ConeVisibility) * Opacity * MaxDistanceFade;

                #if VISIBILITY_ONLY_TRACE
                    float3 StepLighting = 0;
                #else
                    float3 StepLighting = Texture2DSampleLevel(FinalLightingAtlas, TRACING_ATLAS_SAMPLER, IntersectionAtlasUV, MaxMip).rgb;
                #endif
            
                if (TraceInput.bBlackOutSteepIntersections)
                {
                    // 假设陡峭的部分被其他面覆盖,然后淡出。
                    float SlopeFade = 1 - saturate((IntersectionSlope - 5) / 1.0f);
                    StepLighting = lerp(0, StepLighting, SlopeFade);
                    ConeOcclusion = lerp(0, ConeOcclusion, SlopeFade);
                }

                ConeStepBlendState.AddLighting(StepLighting, ConeOcclusion, IntersectionStepTime);
            }

            CardTraceBlendState.AddCardTrace(ConeStepBlendState);
        }
    }
}

以上可知,RadianceCache阶段经历纷繁复杂的渲染过程,其中单单TraceFromProbes就先后考虑了锥体追踪Voxel光场和场景远处的卡片,最后还需要考虑天空光的影响。

  • TraceScreenProbes

TraceScreenProbes包含追踪屏幕的探针、网格距离场、Voxel光照等,具体的代码如下:

// EngineSourceRuntimeRendererPrivateLumenLumenScreenProbeTracing.cpp

void TraceScreenProbes(
    FRDGBuilder& GraphBuilder, 
    const FScene* Scene,
    const FViewInfo& View, 
    bool bTraceMeshSDFs,
    TRDGUniformBufferRef<FSceneTextureUniformParameters> SceneTexturesUniformBuffer,
    const ScreenSpaceRayTracing::FPrevSceneColorMip& PrevSceneColor,
    FRDGTextureRef LightingChannelsTexture,
    const FLumenCardTracingInputs& TracingInputs,
    const LumenRadianceCache::FRadianceCacheInterpolationParameters& RadianceCacheParameters,
    FScreenProbeParameters& ScreenProbeParameters,
    FLumenMeshSDFGridParameters& MeshSDFGridParameters)
{
    const FSceneTextureParameters SceneTextures = GetSceneTextureParameters(GraphBuilder, SceneTexturesUniformBuffer);

    // 清理探针.
    {
        FClearTracesCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FClearTracesCS::FParameters>();
        PassParameters->ScreenProbeParameters = ScreenProbeParameters;

        auto ComputeShader = View.ShaderMap->GetShader<FClearTracesCS>(0);

        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("ClearTraces %ux%u", ScreenProbeParameters.ScreenProbeTracingOctahedronResolution, ScreenProbeParameters.ScreenProbeTracingOctahedronResolution),
            ComputeShader,
            PassParameters,
            ScreenProbeParameters.ProbeIndirectArgs,
            (uint32)EScreenProbeIndirectArgs::ThreadPerTrace * sizeof(FRHIDispatchIndirectParameters));
    }

    FLumenIndirectTracingParameters IndirectTracingParameters;
    SetupLumenDiffuseTracingParameters(IndirectTracingParameters);

    const bool bTraceScreen = View.PrevViewInfo.ScreenSpaceRayTracingInput.IsValid() 
        && GLumenScreenProbeGatherScreenTraces != 0
        && !View.Family->EngineShowFlags.VisualizeLumenIndirectDiffuse;

    // 追踪屏幕空间的探针.
    if (bTraceScreen)
    {
        FScreenProbeTraceScreenTexturesCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FScreenProbeTraceScreenTexturesCS::FParameters>();

        ScreenSpaceRayTracing::SetupCommonScreenSpaceRayParameters(GraphBuilder, SceneTextures, PrevSceneColor, View, /* out */ &PassParameters->ScreenSpaceRayParameters);

        PassParameters->ScreenSpaceRayParameters.CommonDiffuseParameters.SceneTextures = SceneTextures;

        {
            const FVector2D HZBUvFactor(
                float(View.ViewRect.Width()) / float(2 * View.HZBMipmap0Size.X),
                float(View.ViewRect.Height()) / float(2 * View.HZBMipmap0Size.Y));

            const FVector4 ScreenPositionScaleBias = View.GetScreenPositionScaleBias(SceneTextures.SceneDepthTexture->Desc.Extent, View.ViewRect);
            const FVector2D HZBUVToScreenUVScale = FVector2D(1.0f / HZBUvFactor.X, 1.0f / HZBUvFactor.Y) * FVector2D(2.0f, -2.0f) * FVector2D(ScreenPositionScaleBias.X, ScreenPositionScaleBias.Y);
            const FVector2D HZBUVToScreenUVBias = FVector2D(-1.0f, 1.0f) * FVector2D(ScreenPositionScaleBias.X, ScreenPositionScaleBias.Y) + FVector2D(ScreenPositionScaleBias.W, ScreenPositionScaleBias.Z);
            PassParameters->HZBUVToScreenUVScaleBias = FVector4(HZBUVToScreenUVScale, HZBUVToScreenUVBias);
        }

        checkf(View.ClosestHZB, TEXT("Lumen screen tracing: ClosestHZB was not setup, should have been setup by FDeferredShadingSceneRenderer::RenderHzb"));
        PassParameters->ClosestHZBTexture = View.ClosestHZB;
        PassParameters->SceneDepthTexture = SceneTextures.SceneDepthTexture;
        PassParameters->LightingChannelsTexture = LightingChannelsTexture;
        PassParameters->HZBBaseTexelSize = FVector2D(1.0f / View.ClosestHZB->Desc.Extent.X, 1.0f / View.ClosestHZB->Desc.Extent.Y);
        PassParameters->MaxHierarchicalScreenTraceIterations = GLumenScreenProbeGatherHierarchicalScreenTracesMaxIterations;
        PassParameters->UncertainTraceRelativeDepthThreshold = GLumenScreenProbeGatherUncertainTraceRelativeDepthThreshold;
        PassParameters->NumThicknessStepsToDetermineCertainty = GLumenScreenProbeGatherNumThicknessStepsToDetermineCertainty;

        PassParameters->ScreenProbeParameters = ScreenProbeParameters;
        PassParameters->IndirectTracingParameters = IndirectTracingParameters;
        PassParameters->RadianceCacheParameters = RadianceCacheParameters;

        FScreenProbeTraceScreenTexturesCS::FPermutationDomain PermutationVector;
        PermutationVector.Set< FScreenProbeTraceScreenTexturesCS::FRadianceCache >(LumenScreenProbeGather::UseRadianceCache(View));
        PermutationVector.Set< FScreenProbeTraceScreenTexturesCS::FHierarchicalScreenTracing >(GLumenScreenProbeGatherHierarchicalScreenTraces != 0);
        PermutationVector.Set< FScreenProbeTraceScreenTexturesCS::FStructuredImportanceSampling >(LumenScreenProbeGather::UseImportanceSampling(View));
        auto ComputeShader = View.ShaderMap->GetShader<FScreenProbeTraceScreenTexturesCS>(PermutationVector);

        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("TraceScreen"),
            ComputeShader,
            PassParameters,
            ScreenProbeParameters.ProbeIndirectArgs,
            (uint32)EScreenProbeIndirectArgs::ThreadPerTrace * sizeof(FRHIDispatchIndirectParameters));
    }

    // 追踪网格距离场.
    if (bTraceMeshSDFs)
    {
        // 硬件模式
        if (Lumen::UseHardwareRayTracedScreenProbeGather())
        {
            FCompactedTraceParameters CompactedTraceParameters = CompactTraces(
                GraphBuilder,
                View,
                ScreenProbeParameters,
                WORLD_MAX,
                IndirectTracingParameters.MaxTraceDistance);

            RenderHardwareRayTracingScreenProbe(GraphBuilder,
                Scene,
                SceneTextures,
                ScreenProbeParameters,
                View,
                TracingInputs,
                IndirectTracingParameters,
                RadianceCacheParameters,
                CompactedTraceParameters);
        }
        // 软件模式
        else
        {
            CullForCardTracing(
                GraphBuilder,
                Scene, View,
                TracingInputs,
                IndirectTracingParameters,
                /* out */ MeshSDFGridParameters);

            if (MeshSDFGridParameters.TracingParameters.DistanceFieldObjectBuffers.NumSceneObjects > 0)
            {
                FCompactedTraceParameters CompactedTraceParameters = CompactTraces(
                    GraphBuilder,
                    View,
                    ScreenProbeParameters,
                    IndirectTracingParameters.CardTraceEndDistanceFromCamera,
                    IndirectTracingParameters.MaxMeshSDFTraceDistance);

                {
                    FScreenProbeTraceMeshSDFsCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FScreenProbeTraceMeshSDFsCS::FParameters>();
                    GetLumenCardTracingParameters(View, TracingInputs, PassParameters->TracingParameters);
                    PassParameters->MeshSDFGridParameters = MeshSDFGridParameters;
                    PassParameters->ScreenProbeParameters = ScreenProbeParameters;
                    PassParameters->IndirectTracingParameters = IndirectTracingParameters;
                    PassParameters->SceneTexturesStruct = SceneTexturesUniformBuffer;
                    PassParameters->CompactedTraceParameters = CompactedTraceParameters;

                    FScreenProbeTraceMeshSDFsCS::FPermutationDomain PermutationVector;
                    PermutationVector.Set< FScreenProbeTraceMeshSDFsCS::FStructuredImportanceSampling >(LumenScreenProbeGather::UseImportanceSampling(View));
                    auto ComputeShader = View.ShaderMap->GetShader<FScreenProbeTraceMeshSDFsCS>(PermutationVector);

                    FComputeShaderUtils::AddPass(
                        GraphBuilder,
                        RDG_EVENT_NAME("TraceMeshSDFs"),
                        ComputeShader,
                        PassParameters,
                        CompactedTraceParameters.IndirectArgs,
                        0);
                }
            }
        }
    }

    // 压缩追踪参数.
    FCompactedTraceParameters CompactedTraceParameters = CompactTraces(
        GraphBuilder,
        View,
        ScreenProbeParameters,
        WORLD_MAX,
        // Make sure the shader runs on all misses to apply radiance cache + skylight
        IndirectTracingParameters.MaxTraceDistance + 1);

    // 追踪Voxel光照.
    {
        FScreenProbeTraceVoxelsCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FScreenProbeTraceVoxelsCS::FParameters>();
        PassParameters->RadianceCacheParameters = RadianceCacheParameters;
        GetLumenCardTracingParameters(View, TracingInputs, PassParameters->TracingParameters);
        PassParameters->ScreenProbeParameters = ScreenProbeParameters;
        PassParameters->IndirectTracingParameters = IndirectTracingParameters;
        PassParameters->SceneTexturesStruct = SceneTexturesUniformBuffer;
        PassParameters->CompactedTraceParameters = CompactedTraceParameters;

        const bool bRadianceCache = LumenScreenProbeGather::UseRadianceCache(View);

        FScreenProbeTraceVoxelsCS::FPermutationDomain PermutationVector;
        PermutationVector.Set< FScreenProbeTraceVoxelsCS::FDynamicSkyLight >(Lumen::ShouldHandleSkyLight(Scene, *View.Family));
        PermutationVector.Set< FScreenProbeTraceVoxelsCS::FTraceDistantScene >(Scene->LumenSceneData->DistantCardIndices.Num() > 0);
        PermutationVector.Set< FScreenProbeTraceVoxelsCS::FRadianceCache >(bRadianceCache);
        PermutationVector.Set< FScreenProbeTraceVoxelsCS::FStructuredImportanceSampling >(LumenScreenProbeGather::UseImportanceSampling(View));
        auto ComputeShader = View.ShaderMap->GetShader<FScreenProbeTraceVoxelsCS>(PermutationVector);

        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("TraceVoxels"),
            ComputeShader,
            PassParameters,
            CompactedTraceParameters.IndirectArgs,
            0);
    }

    if (GLumenScreenProbeGatherVisualizeTraces)
    {
        SetupVisualizeTraces(GraphBuilder, Scene, View, ScreenProbeParameters);
    }
}

先结合截帧数据分析TraceScreen,它的输入是BlueNoise、Velocity、深度、探针速度、射线信息、HZB、SSRReducedSceneColor等纹理,输出是像素格式为R11G11B10的TraceRadiance和R32的TraceHit纹理:

左:TraceRadiance,右:TraceHit。

它使用的Compute Shader如下:

// EngineShadersPrivateLumenLumenScreenProbeTracing.usf

[numthreads(PROBE_THREADGROUP_SIZE_2D, PROBE_THREADGROUP_SIZE_2D, 1)]
void ScreenProbeTraceScreenTexturesCS(
    uint3 GroupId : SV_GroupID,
    uint3 DispatchThreadId : SV_DispatchThreadID,
    uint3 GroupThreadId : SV_GroupThreadID)
{
#define DEINTERLEAVED_SCREEN_TRACING 1
    // 计算纹理坐标
#if DEINTERLEAVED_SCREEN_TRACING
    uint2 AtlasSizeInProbes = uint2(ScreenProbeAtlasViewSize.x, (GetNumScreenProbes() + ScreenProbeAtlasViewSize.x - 1) / ScreenProbeAtlasViewSize.x);
    uint2 ScreenProbeAtlasCoord = DispatchThreadId.xy % AtlasSizeInProbes;
    uint2 TraceTexelCoord = DispatchThreadId.xy / AtlasSizeInProbes;
#else
    uint2 ScreenProbeAtlasCoord = DispatchThreadId.xy / ScreenProbeTracingOctahedronResolution;
    uint2 TraceTexelCoord = DispatchThreadId.xy - ScreenProbeAtlasCoord * ScreenProbeTracingOctahedronResolution;
#endif

    uint ScreenProbeIndex = ScreenProbeAtlasCoord.y * ScreenProbeAtlasViewSize.x + ScreenProbeAtlasCoord.x;

    uint2 ScreenProbeScreenPosition = GetScreenProbeScreenPosition(ScreenProbeIndex);
    uint2 ScreenTileCoord = GetScreenTileCoord(ScreenProbeScreenPosition);

    if (ScreenProbeIndex < GetNumScreenProbes() && all(TraceTexelCoord < ScreenProbeTracingOctahedronResolution))
    {
        float2 ScreenUV = GetScreenUVFromScreenProbePosition(ScreenProbeScreenPosition);
        float SceneDepth = GetScreenProbeDepth(ScreenProbeAtlasCoord);

        if (SceneDepth > 0.0f)
        {
            float3 WorldPosition = GetWorldPositionFromScreenUV(ScreenUV, SceneDepth);

            float2 ProbeUV;
            float ConeHalfAngle;
            // 获取探针追踪的UV.
            GetProbeTracingUV(ScreenProbeAtlasCoord, TraceTexelCoord, GetProbeTexelCenter(ScreenTileCoord), 1, ProbeUV, ConeHalfAngle);

            float3 WorldConeDirection = OctahedralMapToDirection(ProbeUV);

            float DepthThresholdScale = HasDistanceFieldRepresentation(ScreenUV) ? 1.0f : ScreenTraceNoFallbackThicknessScale;

            {
                float TraceDistance = MaxTraceDistance;
                bool bCoveredByRadianceCache = false;
                #if RADIANCE_CACHE
                    float ProbeOcclusionDistance = GetRadianceProbeOcclusionDistanceWithInterpolation(WorldPosition, WorldConeDirection, bCoveredByRadianceCache);
                    TraceDistance = min(TraceDistance, ProbeOcclusionDistance);
                #endif


#if HIERARCHICAL_SCREEN_TRACING // 层级屏幕追踪

                bool bHit;
                bool bUncertain;
                float3 HitUVz;

                // 屏幕追踪
                TraceScreen(
                    WorldPosition + View.PreViewTranslation,
                    WorldConeDirection,
                    TraceDistance,
                    HZBUvFactorAndInvFactor,
                    MaxHierarchicalScreenTraceIterations, 
                    UncertainTraceRelativeDepthThreshold * DepthThresholdScale,
                    NumThicknessStepsToDetermineCertainty,
                    bHit,
                    bUncertain,
                    HitUVz);
                
                float Level = 1;
                bool bWriteDepthOnMiss = true;
#else // 非层级屏幕追踪
    
                uint NumSteps = 16;
                float StartMipLevel = 1.0f;
                float MaxScreenTraceFraction = .2f;

                // 通过限制跟踪距离,只能在固定步长计数的屏幕跟踪中获得良好的质量.
                float MaxWorldTraceDistance = SceneDepth * MaxScreenTraceFraction * 2.0 * GetTanHalfFieldOfView().x;
                TraceDistance = min(TraceDistance, MaxWorldTraceDistance);

                uint2 NoiseCoord = ScreenProbeAtlasCoord * ScreenProbeTracingOctahedronResolution + TraceTexelCoord;
                float StepOffset = InterleavedGradientNoise(NoiseCoord + 0.5f, 0);

                float RayRoughness = .2f;
                StepOffset = StepOffset - .9f;

                FSSRTCastingSettings CastSettings = CreateDefaultCastSettings();
                CastSettings.bStopWhenUncertain = true;

                bool bHit = false;
                float Level;
                float3 HitUVz;
                bool bRayWasClipped;

                // 初始化屏幕空间的来自世界空间的光线.
                FSSRTRay Ray = InitScreenSpaceRayFromWorldSpace(
                    WorldPosition + View.PreViewTranslation, WorldConeDirection,
                    /* WorldTMax = */ TraceDistance,
                    /* SceneDepth = */ SceneDepth,
                    /* SlopeCompareToleranceScale */ 2.0f * DepthThresholdScale,
                    /* bExtendRayToScreenBorder = */ false,
                    /* out */ bRayWasClipped);

                bool bUncertain;
                float3 DebugOutput;

                // 投射屏幕空间的射线.
                CastScreenSpaceRay(
                    FurthestHZBTexture, FurthestHZBTextureSampler,
                    StartMipLevel,
                    CastSettings,
                    Ray, RayRoughness, NumSteps, StepOffset,
                    HZBUvFactorAndInvFactor, false,
                    /* out */ DebugOutput,
                    /* out */ HitUVz,
                    /* out */ Level,
                    /* out */ bHit,
                    /* out */ bUncertain);

                // CastScreenSpaceRay skips Mesh SDF tracing in a lot of places where it shouldn't, in particular missing thin occluders due to low NumSteps.  
                bool bWriteDepthOnMiss = !bUncertain;

#endif
                bHit = bHit && !bUncertain;

                uint2 TraceCoord = GetTraceBufferCoord(ScreenProbeAtlasCoord, TraceTexelCoord);
                bool bFastMoving = false;

                // 处理相交后的逻辑.
                if (bHit)
                {
                    float2 ReducedColorUV = HitUVz.xy * ColorBufferScaleBias.xy + ColorBufferScaleBias.zw;
                    ReducedColorUV = min(ReducedColorUV, ReducedColorUVMax);

                    float3 Lighting = ColorTexture.SampleLevel(ColorTextureSampler, ReducedColorUV, Level).rgb;
                    
                    #if DEBUG_VISUALIZE_TRACE_TYPES
                        RWTraceRadiance[TraceCoord] = float3(.5f, 0, 0) * View.PreExposure;
                    #else
                        RWTraceRadiance[TraceCoord] = Lighting;
                    #endif

                    float3 HitWorldVelocity;
                    {
                        float2 HitScreenUV = HitUVz.xy;
                        float2 HitScreenPosition = (HitScreenUV.xy - View.ScreenPositionScaleBias.wz) / View.ScreenPositionScaleBias.xy;

                        float HitDeviceZ = HitUVz.z;
                        float HitSceneDepth = ConvertFromDeviceZ(HitUVz.z);
                        float3 HitHistoryScreenPosition = GetHistoryScreenPosition(HitScreenPosition, HitScreenUV, HitDeviceZ);

                        float3 HitTranslatedWorldPosition = mul(float4(HitScreenPosition * HitSceneDepth, HitSceneDepth, 1), View.ScreenToTranslatedWorld).xyz;
                        HitWorldVelocity = HitTranslatedWorldPosition - GetPrevTranslatedWorldPosition(HitHistoryScreenPosition);
                    }

                    float ProbeWorldSpeed = ScreenProbeWorldSpeed.Load(int3(ScreenProbeAtlasCoord, 0)).x;
                    float HitWorldSpeed = length(HitWorldVelocity);

                    bFastMoving = abs(ProbeWorldSpeed - HitWorldSpeed) / max(SceneDepth, 100.0f) > RelativeSpeedDifferenceToConsiderLightingMoving;
                }

                // 相交或要求写深度则保存深度.
                if (bHit || bWriteDepthOnMiss)
                {
                    float HitDistance = min(sqrt(ComputeRayHitSqrDistance(WorldPosition + View.PreViewTranslation, HitUVz)), MaxTraceDistance);
                    RWTraceHit[TraceCoord] = EncodeProbeRayDistance(HitDistance, bHit, bFastMoving);
                }
            }
        }
    }
}

上面会根据是否HIERARCHICAL_SCREEN_TRACING而进入两种不同的屏幕追踪方式,截帧数据显示HIERARCHICAL_SCREEN_TRACING为1,即会进入TraceScreen而不会进入CastScreenSpaceRay。下面分析TraceScreen

// EngineShadersPrivateLumenLumenScreenTracing.ush

// 通过遍历HZB追踪屏幕空间, 虽然精确但比较慢。
void TraceScreen(
    float3 RayTranslatedWorldOrigin, 
    float3 RayWorldDirection,
    float MaxWorldTraceDistance,
    float4 HZBUvFactorAndInvFactor,
    float MaxIterations,
    float UncertainTraceRelativeDepthThreshold,
    float NumThicknessStepsToDetermineCertainty,
    inout bool bHit,
    inout bool bUncertain,
    inout float3 OutScreenUV)
{
    // 计算射线起点的屏幕UV.
    float3 RayStartScreenUV;
    {
        float4 RayStartClip = mul(float4(RayTranslatedWorldOrigin, 1.0f), View.TranslatedWorldToClip);
        float3 RayStartScreenPosition = RayStartClip.xyz / max(RayStartClip.w, 1.0f);
        RayStartScreenUV = float3((RayStartScreenPosition.xy * float2(0.5f, -0.5f) + 0.5f) * HZBUvFactorAndInvFactor.xy, RayStartScreenPosition.z);
    }
    
    // 计算射线终点的屏幕UV.
    float3 RayEndScreenUV;
    {
        float3 ViewRayDirection = mul(float4(RayWorldDirection, 0.0), View.TranslatedWorldToView).xyz;
        float SceneDepth = mul(float4(RayTranslatedWorldOrigin, 1.0f), View.TranslatedWorldToView).z;
        // 将射线夹在Z==0的平面结束,这样结束点将在NDC空间中有效.
        float RayEndWorldDistance = ViewRayDirection.z < 0.0 ? min(-0.99f * SceneDepth / ViewRayDirection.z, MaxWorldTraceDistance) : MaxWorldTraceDistance;

        float3 RayWorldEnd = RayTranslatedWorldOrigin + RayWorldDirection * RayEndWorldDistance;
        float4 RayEndClip = mul(float4(RayWorldEnd, 1.0f), View.TranslatedWorldToClip);
        float3 RayEndScreenPosition = RayEndClip.xyz / RayEndClip.w;
        RayEndScreenUV = float3((RayEndScreenPosition.xy * float2(0.5f, -0.5f) + 0.5f) * HZBUvFactorAndInvFactor.xy, RayEndScreenPosition.z);

        float2 ScreenEdgeIntersections = LineBoxIntersect(RayStartScreenUV, RayEndScreenUV, float3(0, 0, 0), float3(HZBUvFactorAndInvFactor.xy, 1));

        // 重新计算它离开屏幕的终点.
        RayEndScreenUV = RayStartScreenUV + (RayEndScreenUV - RayStartScreenUV) * ScreenEdgeIntersections.y;
    }

    float BaseMipLevel = HZB_TRACE_INCLUDE_FULL_RES_DEPTH ? -1 : 0;
    float MipLevel = BaseMipLevel;

    // 跳出当前分块而不进行命中测试,以避免自遮挡. 这是必要的,因为HZB mip 0是最接近2x2深度的,而且HZB存储在16位浮点数中
    bool bStepOutOfCurrentTile = true;
    if (bStepOutOfCurrentTile)
    {
        float2 HZBTileSize = exp2(MipLevel) * HZBBaseTexelSize;
        float2 BiasedUV = RayStartScreenUV.xy;
        float3 HZBTileMin = float3(floor(BiasedUV.xy / HZBTileSize) * HZBTileSize, 0.0f);
        float3 HZBTileMax = float3(HZBTileMin.xy + HZBTileSize, 1);
        float2 TileIntersections = LineBoxIntersect(RayStartScreenUV, RayEndScreenUV, HZBTileMin, HZBTileMax);

        {
            float3 RayTileHit = RayStartScreenUV + (RayEndScreenUV - RayStartScreenUV) * TileIntersections.y;
            RayStartScreenUV = RayTileHit;
        }
    }

    bHit = false;
    bUncertain = false;

    float RayLength2D = length(RayEndScreenUV.xy - RayStartScreenUV.xy);
    float2 RayDirectionScreenUV = (RayEndScreenUV.xy - RayStartScreenUV.xy) / max(RayLength2D, .0001f);
    float3 RayScreenUV = RayStartScreenUV;
    float NumIterations = 0;
    
    // 无栈遍历HZB.
    while (MipLevel >= BaseMipLevel && NumIterations < MaxIterations)
    {
        float2 HZBTileSize = exp2(MipLevel) * HZBBaseTexelSize;
        // RayScreenUV is on a tile boundary due to bStepOutOfCurrentTile
        // Offset the UV along the ray direction so it always quantizes to the next tile
        float2 BiasedUV = RayScreenUV.xy + .01f * RayDirectionScreenUV.xy * HZBTileSize;
        float3 HZBTileMin = float3(floor(BiasedUV / HZBTileSize) * HZBTileSize, 0.0f);
        float3 HZBTileMax = float3(HZBTileMin.xy + HZBTileSize, 1);
        float2 TileIntersections = LineBoxIntersect(RayStartScreenUV, RayEndScreenUV, HZBTileMin, HZBTileMax);
        float3 RayTileHit = RayStartScreenUV + (RayEndScreenUV - RayStartScreenUV) * TileIntersections.y;

        float TileZ;
        float AvoidSelfIntersectionZScale = 1.0f;

#if HZB_TRACE_INCLUDE_FULL_RES_DEPTH
        if (MipLevel < 0)
        {
            TileZ = SceneDepthTexture.SampleLevel(GlobalPointClampedSampler, BiasedUV * HZBUVToScreenUVScaleBias.xy + HZBUVToScreenUVScaleBias.zw, 0).x;
        }
        else
#endif
        {
            TileZ = ClosestHZBTexture.SampleLevel(GlobalPointClampedSampler, BiasedUV, MipLevel).x;
            // 启发式避免错误的自遮挡, 因为HZB mip 0是最接近2x2深度的,而且HZB存储在16位浮点数中
            AvoidSelfIntersectionZScale = lerp(.99f, 1.0f, saturate(TileIntersections.y * 10.0f));
        }

        if (RayTileHit.z > TileZ * AvoidSelfIntersectionZScale)
        {
            RayScreenUV = RayTileHit;
            MipLevel++;

            if (TileIntersections.y == 1.0f)
            {
                // 射线没有和HZB块相交.
                MipLevel = BaseMipLevel - 1;
            }
        }
        else
        {
            if (abs(MipLevel - BaseMipLevel) < .1f)
            {
                // 将相交点的UV对齐到纹素的中心,进行SceneColor查找.
                RayScreenUV = float3(.5f * (HZBTileMin.xy + HZBTileMax.xy), RayTileHit.z);
                bHit = true;
                float IntersectionDepth = ConvertFromDeviceZ(TileZ);
                float RayTileEnterZ = RayStartScreenUV.z + (RayEndScreenUV.z - RayStartScreenUV.z) * TileIntersections.x;
                bUncertain = (ConvertFromDeviceZ(RayTileEnterZ) - IntersectionDepth) / max(IntersectionDepth, .00001f) > UncertainTraceRelativeDepthThreshold;
            }

            MipLevel--;
        }

        NumIterations++;
    }

    // 沿着射线确定特定厚度的线性步骤,以拒绝非常薄的表面(草, 头发, 植被)后面的相交.
    if (bHit && !bUncertain && NumThicknessStepsToDetermineCertainty > 0)
    {
        float ThicknessSearchMipLevel = 0.0f;
        float MipNumTexels = exp2(ThicknessSearchMipLevel);
        float2 HZBTileSize = MipNumTexels * HZBBaseTexelSize;
        float NumSteps = NumThicknessStepsToDetermineCertainty / MipNumTexels;
        float ThicknessSearchEndTime = min(length(RayDirectionScreenUV * HZBTileSize * NumSteps) / length(RayEndScreenUV.xy - RayScreenUV.xy), 1.0f);

        for (float I = 0; I < NumSteps; I++)
        {
            float3 SampleUV = RayScreenUV + (I / NumSteps) * ThicknessSearchEndTime * (RayEndScreenUV - RayScreenUV);

            if (all(SampleUV.xy > 0 && SampleUV.xy < HZBUvFactorAndInvFactor.xy))
            {
                float SampleTileZ = ClosestHZBTexture.SampleLevel(GlobalPointClampedSampler, SampleUV.xy, ThicknessSearchMipLevel).x;

                if (SampleUV.z > SampleTileZ)
                {
                    bUncertain = true;
                }
            }
        }
    }

    OutScreenUV.xy = RayScreenUV.xy * HZBUVToScreenUVScaleBias.xy + HZBUVToScreenUVScaleBias.zw;
    OutScreenUV.z = RayScreenUV.z;
}

关于HZB屏幕空间的光线追踪,推荐参看闫令琪大神的图形学课程《GAMES202-高质量实时渲染》Lecture9 Real-Time Global Illumination(Screen Space),其视频详尽动态地描述了HZB的遍历和追踪过程。下图只是截取视频的其中一幅图例:

  • TraceVoxels

追踪体素的输入有全局距离场、法线、深度、天空光、蓝噪点、VoxelLighting、RadianceProbeIndirectTexture、FinalRadianceAtlas、射线信息等,输出有R32的TraceHit、R11G11B10的TraceRandiance:

TraceVoxels的输出纹理TraceHit,存储了相交点的深度,注意右上角范围做了调整。

TraceVoxels的输出纹理TraceRadiance,存储了相交点的辐射率。

再分析其使用的compute shader:

// EngineShadersPrivateLumenLumenScreenProbeTracing.usf

[numthreads(PROBE_THREADGROUP_SIZE_1D, 1, 1)]
void ScreenProbeTraceVoxelsCS(
    uint3 GroupId : SV_GroupID,
    uint3 DispatchThreadId : SV_DispatchThreadID,
    uint3 GroupThreadId : SV_GroupThreadID)
{
    if (DispatchThreadId.x < CompactedTraceTexelAllocator[0])
    {
        uint ScreenProbeIndex;
        uint2 TraceTexelCoord;
        float TraceHitDistance;
        // 解码需要追踪的纹素信息.
        DecodeTraceTexel(CompactedTraceTexelData[DispatchThreadId.x], ScreenProbeIndex, TraceTexelCoord, TraceHitDistance);

        // 计算探针所在图集的UV.
        uint2 ScreenProbeAtlasCoord = uint2(ScreenProbeIndex % ScreenProbeAtlasViewSize.x, ScreenProbeIndex / ScreenProbeAtlasViewSize.x);
        // 追踪探针纹素的体素光照.
        TraceVoxels(ScreenProbeAtlasCoord, TraceTexelCoord, ScreenProbeIndex, TraceHitDistance);
    }
}

void TraceVoxels(
    uint2 ScreenProbeAtlasCoord,
    uint2 TraceTexelCoord,
    uint ScreenProbeIndex,
    float TraceHitDistance)
{
    // 计算追踪的UV.
    uint2 ScreenProbeScreenPosition = GetScreenProbeScreenPosition(ScreenProbeIndex);
    uint2 ScreenTileCoord = GetScreenTileCoord(ScreenProbeScreenPosition);

    uint2 TraceCoord = GetTraceBufferCoord(ScreenProbeAtlasCoord, TraceTexelCoord);
    
    {
        // 获取屏幕空间的各类数据.
        float2 ScreenUV = GetScreenUVFromScreenProbePosition(ScreenProbeScreenPosition);
        float SceneDepth = GetScreenProbeDepth(ScreenProbeAtlasCoord);
        float3 SceneNormal = DecodeNormal(SceneTexturesStruct.GBufferATexture.Load(int3(ScreenUV * View.BufferSizeAndInvSize.xy, 0)).xyz);

        bool bHit = false;

        {
            // 计算世界坐标.
            float3 WorldPosition = GetWorldPositionFromScreenUV(ScreenUV, SceneDepth);

            float2 ProbeUV;
            float ConeHalfAngle;
            // 获取探针追踪UV.
            GetProbeTracingUV(ScreenProbeAtlasCoord, TraceTexelCoord, GetProbeTexelCenter(ScreenTileCoord), 1, ProbeUV, ConeHalfAngle);

            // 从八面体图反算成方向.
            float3 WorldConeDirection = OctahedralMapToDirection(ProbeUV);

            // 采样位置.
            float3 SamplePosition = WorldPosition + SurfaceBias * WorldConeDirection;
            SamplePosition += SurfaceBias * SceneNormal;

            float TraceDistance = MaxTraceDistance;
            bool bCoveredByRadianceCache = false;
#if RADIANCE_CACHE
            float ProbeOcclusionDistance = GetRadianceProbeOcclusionDistanceWithInterpolation(WorldPosition, WorldConeDirection, bCoveredByRadianceCache);
            TraceDistance = min(TraceDistance, ProbeOcclusionDistance);
#endif

            // 构建锥体追踪输入数据.
            FConeTraceInput TraceInput;
            TraceInput.Setup(SamplePosition, WorldConeDirection, ConeHalfAngle, MinSampleRadius, MinTraceDistance, TraceDistance, StepFactor);
            TraceInput.VoxelStepFactor = VoxelStepFactor;
            TraceInput.VoxelTraceStartDistance = max(MinTraceDistance, TraceHitDistance);

            // 构建锥体追踪输出数据.
            FConeTraceResult TraceResult = (FConeTraceResult)0;
            TraceResult.Lighting = 0;
            TraceResult.Transparency = 1;
            TraceResult.OpaqueHitDistance = TraceInput.MaxTraceDistance;

            // 锥体追踪Lumen场景的光照体素.
            ConeTraceLumenSceneVoxels(TraceInput, TraceResult);

            if (TraceResult.Transparency <= .5f)
            {
                // 掠射角追踪的自相交产生的噪点无法被空间滤波器消除.
                #define USE_VOXEL_TRACE_HIT_DISTANCE 0
                #if USE_VOXEL_TRACE_HIT_DISTANCE
                    TraceHitDistance = TraceResult.OpaqueHitDistance;
                #else
                    TraceHitDistance = TraceDistance;
                #endif
                bHit = true;
            }

#if RADIANCE_CACHE
            if (bCoveredByRadianceCache)
            {
                if (TraceResult.Transparency > .5f)
                {
                    // 不保存辐射率缓存相交点的深度.
                    TraceHitDistance = MaxTraceDistance;
                }

                SampleRadianceCacheAndApply(WorldPosition, WorldConeDirection, ConeHalfAngle, float3(0, 0, 0), TraceResult.Lighting, TraceResult.Transparency);
            }
            else
#endif
            {
#if TRACE_DISTANT_SCENE
                // 追踪远处场景.
                if (TraceResult.Transparency > .01f)
                {
                    FConeTraceResult DistantTraceResult;
                    ConeTraceLumenDistantScene(TraceInput, DistantTraceResult);
                    TraceResult.Lighting += DistantTraceResult.Lighting * TraceResult.Transparency;
                    TraceResult.Transparency *= DistantTraceResult.Transparency;
                }
#endif
                // 计算天空光.
                EvaluateSkyRadianceForCone(WorldConeDirection, tan(ConeHalfAngle), TraceResult);

                if (TraceHitDistance >= GetProbeMaxHitDistance())
                {
                    TraceHitDistance = MaxTraceDistance;
                }
            }
            
            #if USE_PREEXPOSURE
                TraceResult.Lighting *= View.PreExposure;
            #endif

            #if DEBUG_VISUALIZE_TRACE_TYPES
                RWTraceRadiance[TraceCoord] = float3(0, 0, .5f) * View.PreExposure;
            #else
                RWTraceRadiance[TraceCoord] = TraceResult.Lighting;
            #endif
        }

        // 存储追踪结果, 将相交点距离/是否相交/是否移动编码到32位非负整数中.
        RWTraceHit[TraceCoord] = EncodeProbeRayDistance(TraceHitDistance, bHit, false);
    }
}
  • CompositeTraces

CompositeTraces就是根据前面步骤生成的TraceHit、RayInfo和TraceRadianc生成ScreenProbeRadiance、ScreenProbeHitDistance、ScreenProbeTraceMoving纹理。其使用的Compute Shader是LumenScreenProbeFiltering.usf,主入口是ScreenProbeCompositeTracesWithScatterCS,具体代码此文忽略。

  • FilterRadianceWithGather

CompositeTraces之后会经历数次FilterRadianceWithGather,执行探针辐射率过滤:

左:过滤前的ScreenProbeRadiance;右:执行若干次过滤后的ScreenProbeRadiance。

  • ComputeIndirect

这个阶段就是利用之前生成的各种屏幕空间的探针数据(深度、法线、基础色、FilteredScreenProbeRadiance、BentNormal)计算出最终的场景非直接光颜色(下图):

6.5.7.3 RenderLumenReflections

RenderLumenReflections就是渲染Lumen场景中粗糙度比较低比较光滑的表面的反射,其流程和RenderLumenScreenProbeGather类似,但更简单步骤更少:

其涉及的C++渲染代码如下:

// EngineSourceRuntimeRendererPrivateLumenLumenReflections.cpp

FRDGTextureRef FDeferredShadingSceneRenderer::RenderLumenReflections(
    FRDGBuilder& GraphBuilder, 
    const FViewInfo& View,
    const FSceneTextures& SceneTextures,
    const FLumenMeshSDFGridParameters& MeshSDFGridParameters,
    FLumenReflectionCompositeParameters& OutCompositeParameters)
{
    // 反射追踪的最大的粗糙度, 大于此的表面将忽略.
    OutCompositeParameters.MaxRoughnessToTrace = GLumenReflectionMaxRoughnessToTrace;
    OutCompositeParameters.InvRoughnessFadeLength = 1.0f / GLumenReflectionRoughnessFadeLength;

    (......)

    {
        (......)

        auto ComputeShader = View.ShaderMap->GetShader<FReflectionGenerateRaysCS>(0);

        // 生成射线Pass.
        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("GenerateRaysCS"),
            ComputeShader,
            PassParameters,
            ReflectionTileParameters.TracingIndirectArgs,
            0);
    }

    FLumenCardTracingInputs TracingInputs(GraphBuilder, Scene, View);

    (......)

    // 追踪反射.
    TraceReflections(
        GraphBuilder, 
        Scene,
        View, 
        GLumenReflectionTraceMeshSDFs != 0 && Lumen::UseMeshSDFTracing(),
        SceneTextures,
        TracingInputs,
        ReflectionTracingParameters,
        ReflectionTileParameters,
        MeshSDFGridParameters);
    
    (......)

    {
        FReflectionResolveCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FReflectionResolveCS::FParameters>();
        
        (......)
        
        auto ComputeShader = View.ShaderMap->GetShader<FReflectionResolveCS>(PermutationVector);

        // 解析反射.
        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("ReflectionResolve"),
            ComputeShader,
            PassParameters,
            ReflectionTileParameters.ResolveIndirectArgs,
            0);
    }

    (......)

    // 更新历史数据.
    UpdateHistoryReflections(
        GraphBuilder,
        View,
        SceneTextures,
        ReflectionTileParameters,
        ResolvedSpecularIndirect,
        SpecularIndirect);

    return SpecularIndirect;
}

void TraceReflections(
    FRDGBuilder& GraphBuilder,
    const FScene* Scene,
    const FViewInfo& View,
    bool bTraceMeshSDFs,
    const FSceneTextures& SceneTextures,
    const FLumenCardTracingInputs& TracingInputs,
    const FLumenReflectionTracingParameters& ReflectionTracingParameters,
    const FLumenReflectionTileParameters& ReflectionTileParameters,
    const FLumenMeshSDFGridParameters& InMeshSDFGridParameters)
{
    {
        (......)

        auto ComputeShader = View.ShaderMap->GetShader<FReflectionClearTracesCS>(0);

        // 清理追踪输出纹理.
        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("ClearTraces"),
            ComputeShader,
            PassParameters,
            ReflectionTileParameters.TracingIndirectArgs,
            0);
    }

    FLumenIndirectTracingParameters IndirectTracingParameters;
    SetupIndirectTracingParametersForReflections(IndirectTracingParameters);

    const FSceneTextureParameters& SceneTextureParameters = GetSceneTextureParameters(GraphBuilder, SceneTextures);

    const bool bScreenTraces = GLumenReflectionScreenTraces != 0;

    if (bScreenTraces)
    {
        FReflectionTraceScreenTexturesCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FReflectionTraceScreenTexturesCS::FParameters>();

        (......)

        FReflectionTraceScreenTexturesCS::FPermutationDomain PermutationVector;
        auto ComputeShader = View.ShaderMap->GetShader<FReflectionTraceScreenTexturesCS>(PermutationVector);

        // 屏幕追踪.
        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("TraceScreen"),
            ComputeShader,
            PassParameters,
            ReflectionTileParameters.TracingIndirectArgs,
            0);
    }
    
    // 网格距离场追踪.
    if (bTraceMeshSDFs)
    {
        if (Lumen::UseHardwareRayTracedReflections()) // 硬件追踪反射.
        {
            FCompactedReflectionTraceParameters CompactedTraceParameters = CompactTraces(
                GraphBuilder,
                View,
                ReflectionTracingParameters,
                ReflectionTileParameters,
                WORLD_MAX,
                IndirectTracingParameters.MaxTraceDistance);

            RenderLumenHardwareRayTracingReflections(
                GraphBuilder,
                SceneTextureParameters,
                View,
                ReflectionTracingParameters,
                ReflectionTileParameters,
                TracingInputs,
                CompactedTraceParameters,
                IndirectTracingParameters.MaxTraceDistance);
        }
        else
        {
            FLumenMeshSDFGridParameters MeshSDFGridParameters = InMeshSDFGridParameters;
            if (!MeshSDFGridParameters.NumGridCulledMeshSDFObjects)
            {
                CullForCardTracing(
                    GraphBuilder,
                    Scene, View,
                    TracingInputs,
                    IndirectTracingParameters,
                    /* out */ MeshSDFGridParameters);
            }

            if (MeshSDFGridParameters.TracingParameters.DistanceFieldObjectBuffers.NumSceneObjects > 0)
            {
                // 压缩追踪.
                FCompactedReflectionTraceParameters CompactedTraceParameters = CompactTraces(
                    GraphBuilder,
                    View,
                    ReflectionTracingParameters,
                    ReflectionTileParameters,
                    IndirectTracingParameters.CardTraceEndDistanceFromCamera,
                    IndirectTracingParameters.MaxMeshSDFTraceDistance);

                {
                    (......)
                    
                    auto ComputeShader = View.ShaderMap->GetShader<FReflectionTraceMeshSDFsCS>(PermutationVector);

                    // 追踪网格距离场.
                    FComputeShaderUtils::AddPass(
                        GraphBuilder,
                        RDG_EVENT_NAME("TraceMeshSDFs"),
                        ComputeShader,
                        PassParameters,
                        CompactedTraceParameters.IndirectArgs,
                        0);
                }
            }
        }
    }

    FCompactedReflectionTraceParameters CompactedTraceParameters = CompactTraces(...);

    {
        (......)
        
        auto ComputeShader = View.ShaderMap->GetShader<FReflectionTraceVoxelsCS>(PermutationVector);

        // 追踪Voxel光照.
        FComputeShaderUtils::AddPass(
            GraphBuilder,
            RDG_EVENT_NAME("TraceVoxels"),
            ComputeShader,
            PassParameters,
            CompactedTraceParameters.IndirectArgs,
            0);
    }
}

Lumen反射非直接光和Lumen漫反射非直接光最重要的区别是它们追踪的射线数量和方式有所不同,Lumen反射需要指定追踪的最大粗糙度GLumenReflectionMaxRoughnessToTrace(默认值是0.4,可由控制台命令r.Lumen.Reflections.MaxRoughnessToTrace改变),生成的TraceHit、TraceRadiance结果也会不同。

由于反射和漫反射涉及到的技术高度相似,此文就不再细究其技术细节了。

6.5.7.4 DiffuseIndirectComposite

此阶段就是将之前的RenderLumenScreenProbeGather生成的探针的信息(DiffuseIndirect、RoughSpecularIndirect)和RenderLumenReflections生成的反射信息(SpecularIndirect),结合场景的GBuffer及相关数据,生成最终的场景颜色:

组合了GI的漫反射和镜面反射后的场景颜色。(放大1.5倍,颜色范围做了调整)

至于组合的过程,可以在其使用的PS中找到答案:

// EngineShadersPrivateDiffuseIndirectComposite.usf

void MainPS(
    float4 SvPosition : SV_POSITION
    , out float4 OutAddColor : SV_Target0
    , out float4 OutMultiplyColor : SV_Target1
)
{
    float2 SceneBufferUV = SvPositionToBufferUV(SvPosition);
    float2 ScreenPosition = SvPositionToScreenPosition(SvPosition).xy;

    // 采样场景的GBuffer.
    FGBufferData GBuffer = GetGBufferDataFromSceneTextures(SceneBufferUV);

    // 采样每帧动态生成的AO.
    float DynamicAmbientOcclusion = AmbientOcclusionTexture.SampleLevel(AmbientOcclusionSampler, SceneBufferUV, 0).r;

    // 计算最终要应用的AO.  
    float AOMask = (GBuffer.ShadingModelID != SHADINGMODELID_UNLIT);
    float FinalAmbientOcclusion = lerp(1.0f, GBuffer.GBufferAO * DynamicAmbientOcclusion, AOMask * AmbientOcclusionStaticFraction);

    float3 TranslatedWorldPosition = mul(float4(ScreenPosition * GBuffer.Depth, GBuffer.Depth, 1), View.ScreenToTranslatedWorld).xyz;

    float3 N = GBuffer.WorldNormal;
    float3 V = normalize(View.TranslatedWorldCameraOrigin - TranslatedWorldPosition);
    float NoV = saturate(dot(N, V));

    // 应用非直接漫反射.
#if DIM_APPLY_DIFFUSE_INDIRECT
    {
        float3 DiffuseIndirectLighting = 0;
        float3 RoughSpecularIndirectLighting = 0;
        float3 SpecularIndirectLighting = 0;

        #if DIM_APPLY_DIFFUSE_INDIRECT == 4
            DiffuseIndirectLighting = DiffuseIndirect_Textures_0.SampleLevel(GlobalPointClampedSampler, SceneBufferUV, 0).rgb;
            RoughSpecularIndirectLighting = DiffuseIndirect_Textures_1.SampleLevel(GlobalPointClampedSampler, SceneBufferUV, 0).rgb;
            SpecularIndirectLighting = DiffuseIndirect_Textures_2.SampleLevel(GlobalPointClampedSampler, SceneBufferUV, 0).rgb;
        #else
        {
            // 采样降噪器的输出.
            FSSDKernelConfig KernelConfig = CreateKernelConfig();
                
            #if DEBUG_OUTPUT
            {
                KernelConfig.DebugPixelPosition = uint2(SvPosition.xy);
                KernelConfig.DebugEventCounter = 0;
            }
            #endif

            // Compile time.
            KernelConfig.bSampleKernelCenter = true;
            KernelConfig.BufferLayout = CONFIG_SIGNAL_INPUT_LAYOUT;
            KernelConfig.bUnroll = true;

            #if DIM_UPSCALE_DIFFUSE_INDIRECT
            {
                KernelConfig.SampleSet = SAMPLE_SET_2X2_BILINEAR;
                KernelConfig.BilateralDistanceComputation = SIGNAL_WORLD_FREQUENCY_REF_METADATA_ONLY;
                KernelConfig.WorldBluringDistanceMultiplier = 16.0;
                
                KernelConfig.BilateralSettings[0] = BILATERAL_POSITION_BASED(3);
                
                // SGPRs(Scalar General Purpose Register, 标量通用寄存器)
                KernelConfig.BufferSizeAndInvSize = View.BufferSizeAndInvSize * float4(0.5, 0.5, 2.0, 2.0);
                KernelConfig.BufferBilinearUVMinMax = View.BufferBilinearUVMinMax;
            }
            #else
            {
                KernelConfig.SampleSet = SAMPLE_SET_1X1;
                KernelConfig.bNormalizeSample = true;
                
                // SGPRs
                KernelConfig.BufferSizeAndInvSize = View.BufferSizeAndInvSize;
                KernelConfig.BufferBilinearUVMinMax = View.BufferBilinearUVMinMax;
            }
            #endif

            // VGPRs(Vector General Purpose Register, 向量通用寄存器)
            KernelConfig.BufferUV = SceneBufferUV; 
            {
                KernelConfig.CompressedRefSceneMetadata = GBufferDataToCompressedSceneMetadata(GBuffer);
                KernelConfig.RefBufferUV = SceneBufferUV;
                KernelConfig.RefSceneMetadataLayout = METADATA_BUFFER_LAYOUT_DISABLED;
            }
            KernelConfig.HammersleySeed = Rand3DPCG16(int3(SvPosition.xy, View.StateFrameIndexMod8)).xy;
                
            FSSDSignalAccumulatorArray UncompressedAccumulators = CreateSignalAccumulatorArray();
            FSSDCompressedSignalAccumulatorArray CompressedAccumulators = CompressAccumulatorArray(
                UncompressedAccumulators, CONFIG_ACCUMULATOR_VGPR_COMPRESSION);

            // 累加卷积核
            AccumulateKernel(
                KernelConfig,
                DiffuseIndirect_Textures_0,
                DiffuseIndirect_Textures_1,
                DiffuseIndirect_Textures_2,
                DiffuseIndirect_Textures_3,
                /* inout */ UncompressedAccumulators,
                /* inout */ CompressedAccumulators);

            // 采样
            FSSDSignalSample Sample;
            #if DIM_UPSCALE_DIFFUSE_INDIRECT
                Sample = NormalizeToOneSample(UncompressedAccumulators.Array[0].Moment1);
            #else
                Sample = UncompressedAccumulators.Array[0].Moment1;
            #endif
            
            // DIM_APPLY_DIFFUSE_INDIRECT是1或3时只有漫反射非直接光.
            #if DIM_APPLY_DIFFUSE_INDIRECT == 1 || DIM_APPLY_DIFFUSE_INDIRECT == 3
            {
                DiffuseIndirectLighting = Sample.SceneColor.rgb;
            }
            // DIM_APPLY_DIFFUSE_INDIRECT是2时有漫反射和镜面非直接光.
            #elif DIM_APPLY_DIFFUSE_INDIRECT == 2
            {
                DiffuseIndirectLighting = UncompressedAccumulators.Array[0].Moment1.ColorArray[0];
                SpecularIndirectLighting = UncompressedAccumulators.Array[0].Moment1.ColorArray[1];
            }
            #else
                #error Unimplemented
            #endif
        }
        #endif

        float3 DiffuseColor = bVisualizeDiffuseIndirect ? float3(.18f, .18f, .18f) : GBuffer.DiffuseColor;
        float3 SpecularColor = GBuffer.SpecularColor;

        #if DIM_APPLY_DIFFUSE_INDIRECT == 4
            RemapClearCoatDiffuseAndSpecularColor(GBuffer, NoV, DiffuseColor, SpecularColor);
        #endif

        #if DIM_APPLY_DIFFUSE_INDIRECT == 2 || DIM_APPLY_DIFFUSE_INDIRECT == 4
            float DiffuseIndirectAO = 1;
        #else
            float DiffuseIndirectAO = lerp(1, FinalAmbientOcclusion, ApplyAOToDynamicDiffuseIndirect);
        #endif

        FDirectLighting IndirectLighting;
        if (GBuffer.ShadingModelID == SHADINGMODELID_HAIR)
        {
            IndirectLighting.Diffuse = DiffuseIndirectLighting * GBuffer.BaseColor;
            IndirectLighting.Specular = 0;
        }
        else
        {
            IndirectLighting.Diffuse = DiffuseIndirectLighting * DiffuseColor * DiffuseIndirectAO;
            IndirectLighting.Transmission = 0;

            #if DIM_APPLY_DIFFUSE_INDIRECT == 4
                IndirectLighting.Specular = CombineRoughSpecular(GBuffer, NoV, SpecularIndirectLighting, RoughSpecularIndirectLighting, SpecularColor);
            #else
                IndirectLighting.Specular = SpecularIndirectLighting * EnvBRDF(SpecularColor, GBuffer.Roughness, NoV);
            #endif
        }

        const bool bNeedsSeparateSubsurfaceLightAccumulation = UseSubsurfaceProfile(GBuffer.ShadingModelID);

        if (bNeedsSeparateSubsurfaceLightAccumulation &&
            View.bSubsurfacePostprocessEnabled > 0 && View.bCheckerboardSubsurfaceProfileRendering > 0)
        {
            bool bChecker = CheckerFromSceneColorUV(SceneBufferUV);

            // Adjust for checkerboard. only apply non-diffuse lighting (including emissive) 
            // to the specular component, otherwise lighting is applied twice
            IndirectLighting.Specular *= !bChecker;
        }

        // 累加光照结果.
        FLightAccumulator LightAccumulator = (FLightAccumulator)0;
        LightAccumulator_Add(
            LightAccumulator,
            IndirectLighting.Diffuse + IndirectLighting.Specular,
            IndirectLighting.Diffuse,
            1.0f,
            bNeedsSeparateSubsurfaceLightAccumulation);
        // 获取光照结果.
        OutAddColor = LightAccumulator_GetResult(LightAccumulator);
    }
    #else
    {
        OutAddColor = 0;
    }
    #endif

    OutMultiplyColor = FinalAmbientOcclusion;
}

6.5.8 Lumen总结

Lumen的步骤很多很复杂,但总结起来可分为几个步骤:

1、构建MeshCard和LumenCard,更新它们。

2、根据Lumen场景的Card信息,追踪并更新对应的纹素(Texel)。

3、在漫反射和镜面反射阶段,利用多种方式追踪和计算屏幕空间表面的光照。

4、组合前述步骤得到的非直接光的漫反射和镜面反射,获得叠加了非直接光的最终场景颜色。

另外,在追踪过程中涉及到了多种方式,并且它们是按照权重过渡而成(下图)。

混合追踪示意图。红色表示屏幕追踪,绿色表示网格距离场追踪,蓝色表示Voxel Lighting追踪。颜色过渡代表着不同类型追踪之间的过渡。

修改DEBUG_VISUALIZE_TRACE_TYPES为1且在命令行关闭ShowFlag.DirectLighting可以开启追踪权重可视化模式:

// EngineShadersPrivateLumenLumenScreenProbeTracing.usf

#define DEBUG_VISUALIZE_TRACE_TYPES 1 // 启用追踪权重可视化(默认为0)

整体上,Lumen综合了SSGI、SDF(Mesh SDF和Global SDF)、Lumen Card、Voxel Cone等追踪技术,应用了各种技术生成了各类数据息(自适应的Screen Space Probe、 Irradiance Probe、Surface Cache、Prefilter Radiance、Voxel Lighting、RSM、Virtual Texture、Clipmap),计算出非直接光的漫反射和镜面反射,最后按权重混合成场景颜色。

Lumen漫反射GI支持软硬件两种方式,默认参数下,其软件方式涉及的各类追踪描述如下:

追踪类型 译名 范围 描述
Screen Trace 屏幕追踪 全场景 亦即SSGI,只要能追踪到相交点,优先使用其反弹信息。
Voxel Lighting Trace 体素光照追踪 距相机200米内 基于Cone的射线追踪,会采样MIP快速得到不同Hit距离的信息。
Detail MeshCard Trace 细节网格卡片追踪 2~40米 采样MeshCard 光照信息时会使⽤类似VSM的⽅式使⽤概率估算遮挡。
Distant MeshCard Trace 远距网格卡片追踪 200~1000米 会追踪预先生成的全局距离场,不再使用遮挡估算。

Lumen镜面反射GI也支持软硬件两种方式,其中软件方式结合了SSR + SDF Tracing(Mesh SDF、Global SDF)的技术。

6.6 其它渲染技术

6.6.1 Temporal Super Resolution

时间超分辨率(Temporal Super Resolution,TSR)是新一代的时间抗锯齿算法,用来替换传统(UE4)的TAA。它的特性有利于低分辨率输入获得高分辨率的输出,且质量解决原生分辨率,在高频下更少鬼影更少闪烁,针对PS5等平台做了优化,但同时需要SM5.0以上的图形平台。

TSR使用的技术跟NVIDIA的DLSS和AMD的FidelityFX Super Resolution(FSR)相似,只是DLSS基于Tensor Core的深度学习做了加速,而TSR不需要依赖Tensor Core。换句话说,TSR可以不依赖RTX显卡而运行于其它显卡厂商的设备。TSR由于可以采用低分辨率输出高分辨率的纹理,所以不仅可以提升抗锯齿效果,还可以提升渲染性能,减少能耗。

不同于UE4,UE5只要配置没有显式禁用TemporalAA,无论选择了何种抗锯齿,在后处理阶段都会走TSR通道。调用堆栈如下所示:

// EngineSourceRuntimeRendererPrivatePostProcessPostProcessing.cpp

void AddPostProcessingPasses(FRDGBuilder& GraphBuilder, const FViewInfo& View, ...)
{
    (......)
    
    // TAA抗锯齿.
    EMainTAAPassConfig TAAConfig = ITemporalUpscaler::GetMainTAAPassConfig(View);
    // TAA配置没有禁用.
    if (TAAConfig != EMainTAAPassConfig::Disabled)
    {
        (......)
        
        // 调用FDefaultTemporalUpscaler::AddPasses, 见后面的解析.
        UpscalerToUse->AddPasses(
            GraphBuilder,
            View,
            UpscalerPassInputs,
            &SceneColor.Texture,
            &SecondaryViewRect,
            &DownsampledSceneColor.Texture,
            &DownsampledSceneColor.ViewRect);
    }
    
    (......)
}

// EngineSourceRuntimeRendererPrivatePostProcessTemporalAA.cpp

void FDefaultTemporalUpscaler::AddPasses(FRDGBuilder& GraphBuilder, const FViewInfo& View,...) const final
{
    // 如果启用了且支持第五代TAA, 则进入TSR通道.
    if (CVarTAAAlgorithm.GetValueOnRenderThread() && DoesPlatformSupportGen5TAA(View.GetShaderPlatform()))
    {
        *OutSceneColorHalfResTexture = nullptr;

        return AddTemporalSuperResolutionPasses(
            GraphBuilder,
            View,
            PassInputs,
            OutSceneColorTexture,
            OutSceneColorViewRect);
    }
    (......)
}

由此进入了AddTemporalSuperResolutionPasses,以下是RenderDoc截取的TSR渲染过程:

由此可知,TSR相比UE4的TAA多了很多个Pass,主要包含清理上一帧纹理、放大速度缓冲、摒弃无效速度缓冲、过滤频率、对比历史数据、后置过滤重投射、放大重投射、更新历史等几个阶段。

其中以上阶段最重要的一步是更新历史阶段,它会根据输入的场景颜色、深度、放大后速度、视差系数、历史帧数据(放大后重投影、重投影、高频、低频、元数据、子像素信息)等数据生成最终的抗锯齿后的场景颜色和当前的历史帧数据。

左:场景颜色输入;右:TSR后的场景颜色输出。

TSR输出的历史帧数据:低频、高频、元数据、子像素信息。

下面直接进入更新历史阶段使用的Compute Shader进行分析:

// /Engine/Private/TemporalAA/TAAUpdateHistory.usf

[numthreads(TILE_SIZE, TILE_SIZE, 1)]
void MainCS(
    uint2 GroupId : SV_GroupID,
    uint GroupThreadIndex : SV_GroupIndex)
{
    uint GroupWaveIndex = GetGroupWaveIndex(GroupThreadIndex, /* GroupSize = */ TILE_SIZE * TILE_SIZE);

    float4 Debug = 0.0;

    // 历史像素位置.
    taa_short2 HistoryPixelPos = (
        taa_short2(GroupId) * taa_short2(TILE_SIZE, TILE_SIZE) +
        Map8x8Tile2x2Lane(GroupThreadIndex));

    float2 ViewportUV = (float2(HistoryPixelPos) + 0.5f) * HistoryInfo_ViewportSizeInverse;
    float2 ScreenPos = ViewportUVToScreenPos(ViewportUV);
    
    // 输入视口中输出像素O中心的像素坐标.
    float2 PPCo = ViewportUV * InputInfo_ViewportSize + InputJitter;

    // 最近的输入像素K的中心像素坐标。
    float2 PPCk = floor(PPCo) + 0.5;
    
    taa_short2 InputPixelPos = ClampPixelOffset(
        taa_short2(InputPixelPosMin) + taa_short2(PPCo),
        InputPixelPosMin, InputPixelPosMax);

    // 获取重投影相关的信息.
    float2 PrevScreenPos = ScreenPos;
    taa_half ParallaxRejectionMask = taa_half(1.0);
    taa_half LowFrequencyRejection = taa_half(1.0);
    taa_half OutputPixelVelocity = taa_half(0.0);
    #if 1
    {
        float2 EncodedVelocity = DilatedVelocityTexture[InputPixelPos];
        ParallaxRejectionMask = ParallaxRejectionMaskTexture[InputPixelPos];

        float2 ScreenVelocity = DecodeVelocityFromTexture(float4(EncodedVelocity, 0.0, 0.0)).xy;

        PrevScreenPos = ScreenPos - ScreenVelocity;
        OutputPixelVelocity = taa_half(length(ScreenVelocity * HistoryInfo_ViewportSize));

        taa_ushort2 RejectionPixelPos = (taa_ushort2(InputPixelPos) - taa_short2(InputPixelPosMin)) / 2;
        LowFrequencyRejection = HistoryRejectionTexture[RejectionPixelPos];
        
        #if !CONFIG_CLAMP
        {
            ParallaxRejectionMask = taa_half(1.0);
            LowFrequencyRejection = taa_half(1.0);
        }
        #endif
    }
    #endif

    // 获取像素是否响应AA.
    bool bIsResponsiveAAPixel = false;
    #if CONFIG_RESPONSIVE_STENCIL
    {
        const uint kResponsiveStencilMask = 1 << 3;
            
        uint SceneStencilRef = InputSceneStencilTexture.Load(int3(InputPixelPos, 0)) STENCIL_COMPONENT_SWIZZLE;

        bIsResponsiveAAPixel = (SceneStencilRef & kResponsiveStencilMask) != 0;
    }
    #endif
    
    // 检测HistoryBufferUV是否在视口之外.
    bool bOffScreen = IsOffScreen(bCameraCut, PrevScreenPos, ParallaxRejectionMask);
    
    taa_half TotalRejection = bOffScreen ? 0.0 : saturate(LowFrequencyRejection * 4.0);


    // 以预测频率过滤输入场景颜色.
    taa_half3 FilteredInputColor;
    taa_half3 InputMinColor;
    taa_half3 InputMaxColor;
    taa_half InputPixelAlignement;
    taa_half ClosestInputLuma4;
    
    ISOLATE
    {
        // 从像素K到O的向量.
        taa_half2 dKO = taa_half2(PPCo - PPCk);

        FilteredInputColor = taa_half(0.0);

        taa_half FilteredInputColorWeight = taa_half(0.0);
        
        #if 0 // shader compiler bug :'(
            taa_half InputToHistoryFactor = taa_half(HistoryInfo_ViewportSize.x * InputInfo_ViewportSizeInverse.x);
            taa_half FinalInputToHistoryFactor = bOffScreen ? taa_half(1.0) : InputToHistoryFactor;
        #else
            float InputToHistoryFactor = float(HistoryInfo_ViewportSize.x * InputInfo_ViewportSizeInverse.x);
            float FinalInputToHistoryFactor = lerp(1.0, InputToHistoryFactor, TotalRejection);
        #endif

        InputMinColor = taa_half(INFINITE_FLOAT);
        InputMaxColor = taa_half(-INFINITE_FLOAT);

        // 根据CONFIG_SAMPLES用不同方式生成采样坐标并采样输入的场景颜色.
        UNROLL_N(CONFIG_SAMPLES)
        for (uint SampleId = 0; SampleId < CONFIG_SAMPLES; SampleId++)
        {
            taa_short2 SampleInputPixelPos;
            taa_half2 PixelOffset;
            
            #if CONFIG_SAMPLES == 9
            {
                taa_short2 iPixelOffset = taa_short2(kOffsets3x3[kSquareIndexes3x3[SampleId]]);
                PixelOffset = taa_half2(iPixelOffset);
                
                SampleInputPixelPos = AddAndClampPixelOffset(
                    InputPixelPos,
                    iPixelOffset, iPixelOffset,
                    InputPixelPosMin, InputPixelPosMax);
            }
            #elif CONFIG_SAMPLES == 5 || CONFIG_SAMPLES == 6
            {
                if (SampleId == 5)
                {
                    taa_short2 iPixelOffset;
                    #if CONFIG_COMPILE_FP16
                        iPixelOffset = int16_t2(1, 1) - int16_t2((asuint16(dKO) & uint16_t(0x8000)) >> uint16_t(14));
                        PixelOffset = asfloat16(asuint16(half(1.0)).xx | (asuint16(dKO) & uint16_t(0x8000)));
                    #else
                        iPixelOffset = SignFastInt(dKO);
                        PixelOffset = asfloat(asuint(1.0).xx | (asuint(dKO) & uint(0x80000000)));
                    #endif
                        
                    SampleInputPixelPos = ClampPixelOffset(InputPixelPos, InputPixelPosMin, InputPixelPosMax);
                }
                else
                {
                    taa_short2 iPixelOffset = taa_short2(kOffsets3x3[kPlusIndexes3x3[SampleId]]);
                    PixelOffset = taa_half2(iPixelOffset);
                    
                    SampleInputPixelPos = AddAndClampPixelOffset(
                        InputPixelPos,
                        iPixelOffset, iPixelOffset,
                        InputPixelPosMin, InputPixelPosMax);
                }
            }
            #else
                #error Unknown sample count
            #endif

            taa_half3 InputColor = InputSceneColorTexture[SampleInputPixelPos];

            taa_half2 dPP = PixelOffset - dKO;
            taa_half SampleSpatialWeight = ComputeSampleWeigth(FinalInputToHistoryFactor, dPP, /* MinimalContribution = */ float(0.005));

            taa_half ToneWeight = HdrWeight4(InputColor);

            FilteredInputColor       += (SampleSpatialWeight * ToneWeight) * InputColor;
            FilteredInputColorWeight += (SampleSpatialWeight * ToneWeight);

            if (SampleId == 0)
            {
                ClosestInputLuma4 = Luma4(InputColor);
                InputMinColor = TransformColorForClampingBox(InputColor);
                InputMaxColor = TransformColorForClampingBox(InputColor);
            }
            else
            {
                InputMinColor = min(InputMinColor, TransformColorForClampingBox(InputColor));
                InputMaxColor = max(InputMaxColor, TransformColorForClampingBox(InputColor));
            }
        }
        
        FilteredInputColor *= rcp(FilteredInputColorWeight);

        InputPixelAlignement = ComputeSampleWeigth(InputToHistoryFactor, dKO, /* MinimalContribution = */ float(0.0));
    }
        
    // 保存到LDS中,为VGPR采样历史数据腾出空间.
    #if CONFIG_MANUAL_LDS_SPILL
    ISOLATE
    {
        uint LocalGroupThreadIndex = GetGroupThreadIndex(GroupThreadIndex, GroupWaveIndex);

        SharedArray0[LocalGroupThreadIndex] = taa_half4(FilteredInputColor, LowFrequencyRejection);
        SharedArray1[LocalGroupThreadIndex] = taa_half4(InputMinColor, InputPixelAlignement);
        SharedArray2[LocalGroupThreadIndex] = taa_half4(InputMaxColor, OutputPixelVelocity);
    }
    #endif
    
    // 重投影历史数据.
    taa_half3 PrevHistoryMoment1;
    taa_half PrevHistoryValidity;
    
    taa_half3 PrevHistoryMommentMin;
    taa_half3 PrevHistoryMommentMax;

    taa_half3 PrevFallbackColor;
    taa_half PrevFallbackWeight;
    
    taa_subpixel_details PrevSubpixelDetails;

    ISOLATE
    {
        // 重投影历史数据.
        taa_half3 RawHistory0 = taa_half(0);
        taa_half3 RawHistory1 = taa_half(0);
        taa_half2 RawHistory2 = taa_half(0);

        taa_half3 RawHistory1Min = INFINITE_FLOAT;
        taa_half3 RawHistory1Max = -INFINITE_FLOAT;

        // 采样原始的历史数据.
        {
            float2 PrevHistoryBufferUV = (PrevHistoryInfo_ScreenPosToViewportScale * PrevScreenPos + PrevHistoryInfo_ScreenPosToViewportBias) * PrevHistoryInfo_ExtentInverse;
            PrevHistoryBufferUV = clamp(PrevHistoryBufferUV, PrevHistoryInfo_UVViewportBilinearMin, PrevHistoryInfo_UVViewportBilinearMax);

            #if 1
            {
                FCatmullRomSamples Samples = GetBicubic2DCatmullRomSamples(PrevHistoryBufferUV, PrevHistoryInfo_Extent, PrevHistoryInfo_ExtentInverse);

                UNROLL
                for (uint i = 0; i < Samples.Count; i++)
                {
                    float2 SampleUV = clamp(Samples.UV[i], PrevHistoryInfo_UVViewportBilinearMin, PrevHistoryInfo_UVViewportBilinearMax);

                    taa_half3 Sample0 = PrevHistory_Textures_0.SampleLevel(GlobalBilinearClampedSampler, SampleUV, 0);
                    taa_half3 Sample1 = PrevHistory_Textures_1.SampleLevel(GlobalBilinearClampedSampler, SampleUV, 0);
                    taa_half2 Sample2 = PrevHistory_Textures_2.SampleLevel(GlobalBilinearClampedSampler, SampleUV, 0);

                    RawHistory1Min = min(RawHistory1Min, Sample1 * SafeRcp(Sample2.g));
                    RawHistory1Max = max(RawHistory1Max, Sample1 * SafeRcp(Sample2.g));

                    RawHistory0 += Sample0 * taa_half(Samples.Weight[i]);
                    RawHistory1 += Sample1 * taa_half(Samples.Weight[i]);
                    RawHistory2 += Sample2 * taa_half(Samples.Weight[i]);
                }
                RawHistory0 *= taa_half(Samples.FinalMultiplier);
                RawHistory1 *= taa_half(Samples.FinalMultiplier);
                RawHistory2 *= taa_half(Samples.FinalMultiplier);
            }
            #else
            {
                RawHistory0 = PrevHistory_Textures_0.SampleLevel(GlobalBilinearClampedSampler, PrevHistoryBufferUV, 0);
                RawHistory1 = PrevHistory_Textures_1.SampleLevel(GlobalBilinearClampedSampler, PrevHistoryBufferUV, 0);
                RawHistory2 = PrevHistory_Textures_2.SampleLevel(GlobalBilinearClampedSampler, PrevHistoryBufferUV, 0);
            }
            #endif
            
            FSubpixelNeighborhood SubpixelNeighborhood = GatherPrevSubpixelNeighborhood(PrevHistory_Textures_3, PrevHistoryBufferUV);
            {
                PrevSubpixelDetails = 0;
                UNROLL_N(SUB_PIXEL_COUNT)
                for (uint SubpixelId = 0; SubpixelId < SUB_PIXEL_COUNT; SubpixelId++)
                {
                    taa_subpixel_payload SubpixelPayload = GetSubpixelPayload(SubpixelNeighborhood, SubpixelId);
                    PrevSubpixelDetails |= SubpixelPayload << (SUB_PIXEL_BIT_COUNT * SubpixelId);
                }
            }

            RawHistory0 = -min(-RawHistory0, taa_half(0.0));
            RawHistory1 = -min(-RawHistory1, taa_half(0.0));
            RawHistory2 = -min(-RawHistory2, taa_half(0.0));
        }
        
        // 解压历史数据.
        {
            PrevFallbackColor = RawHistory0;
            PrevFallbackWeight = RawHistory2.r;
            
            PrevHistoryMommentMin = RawHistory1Min;
            PrevHistoryMommentMax = RawHistory1Max;

            PrevHistoryMoment1 = RawHistory1;
            PrevHistoryValidity = RawHistory2.g;
        }

        // 校正历史数据.
        {
            PrevHistoryMommentMin *= taa_half(HistoryPreExposureCorrection);
            PrevHistoryMommentMax *= taa_half(HistoryPreExposureCorrection);
            PrevHistoryMoment1 *= taa_half(HistoryPreExposureCorrection);
            PrevFallbackColor *= taa_half(HistoryPreExposureCorrection);
        }
    }
    
    // 从LDS读取数据.
    #if CONFIG_MANUAL_LDS_SPILL
    ISOLATE
    {
        uint LocalGroupThreadIndex = GetGroupThreadIndex(GroupThreadIndex, GroupWaveIndex);

        taa_half4 RawLDS0 = SharedArray0[LocalGroupThreadIndex];
        taa_half4 RawLDS1 = SharedArray1[LocalGroupThreadIndex];
        taa_half4 RawLDS2 = SharedArray2[LocalGroupThreadIndex];

        FilteredInputColor = RawLDS0.rgb;
        InputMinColor = RawLDS1.rgb;
        InputMaxColor = RawLDS2.rgb;
        
        LowFrequencyRejection = RawLDS0.a;
        InputPixelAlignement = RawLDS1.a;
        OutputPixelVelocity = RawLDS2.a;
    }
    #endif

    // 如果当前低频偏离历史低频, 摒弃高频细节.
    #if CONFIG_LOW_FREQUENCY_DRIFT_REJECTION
    {
        taa_half3 PrevHighFrequencyYCoCg = TransformColorForClampingBox(PrevHistoryMoment1 * SafeRcp(PrevHistoryValidity));
        taa_half3 PrevYCoCg = TransformColorForClampingBox(PrevFallbackColor);
        taa_half3 ClampedPrevYCoCg = TransformColorForClampingBox(clamp(PrevFallbackColor, PrevHistoryMommentMin, PrevHistoryMommentMax));

        taa_half HighFrequencyRejection = MeasureRejectionFactor(
            PrevYCoCg, ClampedPrevYCoCg,
            PrevHighFrequencyYCoCg, InputMinColor, InputMaxColor);
        
        PrevHistoryMoment1 *= HighFrequencyRejection;
        PrevHistoryValidity *= HighFrequencyRejection;
    }
    #endif

    // 将当前帧的输入输入到下一帧的预测器中.
    const taa_half Histeresis = rcp(taa_half(MAX_SAMPLE_COUNT));
    const taa_half PredictionOnlyValidity = Histeresis * taa_half(2.0);
    
    // 截取备选数据.
    taa_half LumaMin;
    taa_half LumaMax;
    taa_half3 ClampedFallbackColor;
    taa_half FallbackRejection;
    {
        LumaMin = InputMinColor.x;
        LumaMax = InputMaxColor.x;

        taa_half3 PrevYCoCg = TransformColorForClampingBox(PrevFallbackColor);
        taa_half3 ClampedPrevYCoCg = clamp(PrevYCoCg, InputMinColor, InputMaxColor);
        taa_half3 InputCenterYCoCg = TransformColorForClampingBox(FilteredInputColor);

        ClampedFallbackColor = YCoCgToRGB(ClampedPrevYCoCg);
        
        FallbackRejection = MeasureRejectionFactor(
            PrevYCoCg, ClampedPrevYCoCg,
            InputCenterYCoCg, InputMinColor, InputMaxColor);

        #if !CONFIG_CLAMP
        {
            ClampedFallbackColor = PrevFallbackColor;
            FallbackRejection = taa_half(1.0);
        }
        #endif
    }

    taa_half3 FinalHistoryMoment1;
    taa_half FinalHistoryValidity;
    {
        // 根据完整性,计算需要摒弃多少历史记录.
        taa_half PrevHistoryRejectionWeight = LowFrequencyRejection;
            
        FLATTEN
        if (bOffScreen)
        {
            PrevHistoryRejectionWeight = taa_half(0.0);
        }

        taa_half DesiredCurrentContribution = max(Histeresis * InputPixelAlignement, taa_half(0.0));

        // 确定基于预测的摒弃是否足够可信.
        taa_half RejectionConfidentEnough = taa_half(1); // saturate(RejectionValidity * MAX_SAMPLE_COUNT - 3.0);

        // 计算新摒弃的有效性.
        taa_half RejectedValidity = (
            min(PrevHistoryValidity, PredictionOnlyValidity - DesiredCurrentContribution) +
            max(PrevHistoryValidity - PredictionOnlyValidity + DesiredCurrentContribution, taa_half(0.0)) * PrevHistoryRejectionWeight);

        RejectedValidity = PrevHistoryValidity * PrevHistoryRejectionWeight;

        // 计算最大输出有效性.
        taa_half OutputValidity = (
            clamp(RejectedValidity + DesiredCurrentContribution, taa_half(0.0), PredictionOnlyValidity) +
            clamp(RejectedValidity + DesiredCurrentContribution * PrevHistoryRejectionWeight * RejectionConfidentEnough - PredictionOnlyValidity, 0.0, 1.0 - PredictionOnlyValidity));

        FLATTEN
        if (bIsResponsiveAAPixel)
        {
            OutputValidity = taa_half(0.0);
        }
        
        taa_half InvPrevHistoryValidity = SafeRcp(PrevHistoryValidity);

        taa_half PrevMomentWeight = max(OutputValidity - DesiredCurrentContribution, taa_half(0.0));
        taa_half CurrentMomentWeight = min(DesiredCurrentContribution, OutputValidity);
        
        {
            taa_half PrevHistoryToneWeight = HdrWeightY(Luma4(PrevHistoryMoment1) * InvPrevHistoryValidity);
            taa_half FilteredInputToneWeight = HdrWeight4(FilteredInputColor);
            
            taa_half BlendPrevHistory = PrevMomentWeight * PrevHistoryToneWeight;
            taa_half BlendFilteredInput = CurrentMomentWeight * FilteredInputToneWeight;

            taa_half CommonWeight = OutputValidity * SafeRcp(BlendPrevHistory + BlendFilteredInput);

            FinalHistoryMoment1 = (
                PrevHistoryMoment1 * (CommonWeight * BlendPrevHistory * InvPrevHistoryValidity) +
                FilteredInputColor * (CommonWeight * BlendFilteredInput));
        }

        // 量化有效性的8位编码调整,以避免数字偏移.
        taa_half OutputInvValidity = SafeRcp(OutputValidity);
        FinalHistoryValidity = ceil(taa_half(255.0) * OutputValidity) * rcp(taa_half(255.0));
        FinalHistoryMoment1 *= FinalHistoryValidity * OutputInvValidity;
    }

    // 计算备用的历史数据.
    taa_half3 FinalFallbackColor;
    taa_half FinalFallbackWeight;
    {
        const taa_half TargetHesteresisCurrentFrameWeight = rcp(taa_half(MAX_FALLBACK_SAMPLE_COUNT));

        taa_half LumaHistory = Luma4(PrevFallbackColor);
        taa_half LumaFiltered = Luma4(FilteredInputColor);

        {
            taa_half OutputBlend = ComputeFallbackContribution(FinalHistoryValidity);
        }

        taa_half BlendFinal;
        #if 1
        {
            taa_half CurrentFrameSampleCount = max(InputPixelAlignement, taa_half(0.005));
            
            // 仅使用一个样本计数就可以极快地恢复历史摒弃, 但随后立即稳定,以便子像素频率可以尽快使用.
            taa_half PrevFallbackSampleCount;
            FLATTEN
            if (PrevFallbackWeight < taa_half(1.0))
            {
                PrevFallbackSampleCount = PrevFallbackWeight;
            }
            else
            {
                PrevFallbackSampleCount = taa_half(MAX_FALLBACK_SAMPLE_COUNT);
            }

            // 根据低频摒弃历史数据.
            #if 1
            {
                taa_half PrevFallbackRejectionFactor = saturate(LowFrequencyRejection * (CurrentFrameSampleCount + PrevFallbackSampleCount) / PrevFallbackSampleCount);

                PrevFallbackSampleCount *= PrevFallbackRejectionFactor;
            }
            #endif

            BlendFinal = CurrentFrameSampleCount / (CurrentFrameSampleCount + PrevFallbackSampleCount);

            // 增加运动的混合权重.
            #if 1
            {
                BlendFinal = lerp(BlendFinal, max(taa_half(0.2), BlendFinal), saturate(OutputPixelVelocity * rcp(taa_half(40.0))));
            }
            #endif

            // 抗闪烁.
            #if 1
            {
                taa_half DistToClamp = min( abs(LumaHistory - LumaMin), abs(LumaHistory - LumaMax) ) / max3( LumaHistory, LumaFiltered, taa_half(1e-4) );
                BlendFinal *= taa_half(0.2) + taa_half(0.8) * saturate(taa_half(0.5) * DistToClamp);
            }
            #endif
            
            // 确保至少有一些小的贡献.
            #if 1
            {
                BlendFinal = max( BlendFinal, saturate( taa_half(0.01) * LumaHistory / abs( LumaFiltered - LumaHistory ) ) );
            }
            #endif

            // 反应力度是新帧的1/4.
            BlendFinal = bIsResponsiveAAPixel ? taa_half(1.0/4.0) : BlendFinal;

            // 完全摒弃历史数据.
            {
                PrevFallbackSampleCount *= TotalRejection;
                BlendFinal = lerp(1.0, BlendFinal, TotalRejection);
            }

            FinalFallbackWeight = saturate(CurrentFrameSampleCount + PrevFallbackSampleCount);
            
            #if 1
                FinalFallbackWeight = saturate(floor(255.0 * (CurrentFrameSampleCount + PrevFallbackSampleCount)) * rcp(255.0));
            #endif
        }
        #endif

        {
            taa_half FilterWeight = HdrWeight4(FilteredInputColor);
            taa_half ClampedHistoryWeight = HdrWeight4(ClampedFallbackColor);

            taa_half2 Weights = WeightedLerpFactors(ClampedHistoryWeight, FilterWeight, BlendFinal);

            FinalFallbackColor = ClampedFallbackColor * Weights.x + FilteredInputColor * Weights.y;
        }
    }

    // 更新子像素细节.
    taa_subpixel_details FinalSubpixelDetails;
    {
        taa_half2 dKO = taa_half2(PPCo - PPCk);

        bool bUpdate = all(abs(dKO) < 0.5 * (InputInfo_ViewportSize.x * HistoryInfo_ViewportSizeInverse.x));

        FinalSubpixelDetails = PrevSubpixelDetails;

        taa_subpixel_payload ParallaxFactorBits = ParallaxFactorTexture[InputPixelPos] & SUB_PIXEL_PARALLAX_FACTOR_BIT_MASK;

        {
            const uint ParallaxFactorMask = (
                (SUB_PIXEL_PARALLAX_FACTOR_BIT_MASK << (SUB_PIXEL_PARALLAX_FACTOR_BIT_OFFSET + 0 * SUB_PIXEL_BIT_COUNT)) | 
                (SUB_PIXEL_PARALLAX_FACTOR_BIT_MASK << (SUB_PIXEL_PARALLAX_FACTOR_BIT_OFFSET + 1 * SUB_PIXEL_BIT_COUNT)) | 
                (SUB_PIXEL_PARALLAX_FACTOR_BIT_MASK << (SUB_PIXEL_PARALLAX_FACTOR_BIT_OFFSET + 2 * SUB_PIXEL_BIT_COUNT)) | 
                (SUB_PIXEL_PARALLAX_FACTOR_BIT_MASK << (SUB_PIXEL_PARALLAX_FACTOR_BIT_OFFSET + 3 * SUB_PIXEL_BIT_COUNT)) | 
                0x0);
            
            // 重置视差系数.
            FLATTEN
            if (bOffScreen)
            {
                FinalSubpixelDetails = FinalSubpixelDetails & ~ParallaxFactorMask;
            }
        }

        FLATTEN
        if (bUpdate)
        {
            bool2 bBool = dKO < 0.0;

            uint SubpixelId = dot(uint2(bBool), uint2(1, SUB_PIXEL_GRID_SIZE));
            uint SubpixelShift = SubpixelId * SUB_PIXEL_BIT_COUNT;

            taa_subpixel_payload SubpixelPayload = (ParallaxFactorBits << SUB_PIXEL_PARALLAX_FACTOR_BIT_OFFSET);

            FinalSubpixelDetails = (FinalSubpixelDetails & (~(SUB_PIXEL_BIT_MASK << SubpixelShift))) | (SubpixelPayload << SubpixelShift);
        }
    }

    // 计算最终输出.
    taa_half3 FinalOutputColor;
    taa_half FinalOutputValidity;
    {
        taa_half OutputBlend = ComputeFallbackContribution(FinalHistoryValidity);

        FinalOutputValidity = lerp(taa_half(1.0), saturate(FinalHistoryValidity), OutputBlend);

        taa_half3 NormalizedFinalHistoryMoment1 = taa_half3(FinalHistoryMoment1 * float(SafeRcp(FinalHistoryValidity)));

        taa_half FallbackWeight = HdrWeight4(FinalFallbackColor);
        taa_half Moment1Weight = HdrWeight4(NormalizedFinalHistoryMoment1);

        taa_half2 Weights = WeightedLerpFactors(FallbackWeight, Moment1Weight, OutputBlend);

        #if DEBUG_FALLBACK_BLENDING
            taa_half3 FallbackColor = taa_half3(1, 0.25, 0.25);
            taa_half3 HighFrequencyColor = taa_half3(0.25, 1, 0.25);

            FinalOutputColor = FinalFallbackColor * Weights.x * FallbackColor + NormalizedFinalHistoryMoment1 * Weights.y * HighFrequencyColor;
        #elif DEBUG_LOW_FREQUENCY_REJECTION
            taa_half3 DebugColor = lerp(taa_half3(1, 0.5, 0.5), taa_half3(0.5, 1, 0.5), LowFrequencyRejection);
            
            FinalOutputColor = FinalFallbackColor * Weights.x * DebugColor + NormalizedFinalHistoryMoment1 * Weights.y * DebugColor;
        #else
            FinalOutputColor = FinalFallbackColor * Weights.x + NormalizedFinalHistoryMoment1 * Weights.y;
        #endif
    }

    ISOLATE
    {
        uint LocalGroupThreadIndex = GetGroupThreadIndex(GroupThreadIndex, GroupWaveIndex);

        taa_short2 LocalHistoryPixelPos = (
            taa_short2(GroupId) * taa_short2(TILE_SIZE, TILE_SIZE) +
            Map8x8Tile2x2Lane(LocalGroupThreadIndex));
            
        LocalHistoryPixelPos = InvalidateOutputPixelPos(LocalHistoryPixelPos, HistoryInfo_ViewportMax);

        // 输出最终的历史数据.
        {
            #if CONFIG_ENABLE_STOCASTIC_QUANTIZATION
            {
                uint2 Random = Rand3DPCG16(int3(LocalHistoryPixelPos, View.StateFrameIndexMod8)).xy;
                float2 E = Hammersley16(0, 1, Random);

                FinalHistoryMoment1 = QuantizeForFloatRenderTarget(FinalHistoryMoment1, E.x, HistoryQuantizationError);
                FinalFallbackColor = QuantizeForFloatRenderTarget(FinalFallbackColor, E.x, HistoryQuantizationError);
            }
            #endif

            FinalFallbackColor = -min(-FinalFallbackColor, taa_half(0.0));
            FinalHistoryMoment1 = -min(-FinalHistoryMoment1, taa_half(0.0));
            FinalFallbackColor = min(FinalFallbackColor, taa_half(Max10BitsFloat));
            FinalHistoryMoment1 = min(FinalHistoryMoment1, taa_half(Max10BitsFloat));
            
            HistoryOutput_Textures_0[LocalHistoryPixelPos] = FinalFallbackColor;
            HistoryOutput_Textures_1[LocalHistoryPixelPos] = FinalHistoryMoment1;
            HistoryOutput_Textures_2[LocalHistoryPixelPos] = taa_half2(FinalFallbackWeight, FinalHistoryValidity);
            HistoryOutput_Textures_3[LocalHistoryPixelPos] = FinalSubpixelDetails;

            #if DEBUG_OUTPUT
            {
                DebugOutput[LocalHistoryPixelPos] = Debug;
            }
            #endif
        }

        // 输出最终的场景颜色.
        {
            taa_half3 OutputColor = FinalOutputColor;
                
            OutputColor = -min(-OutputColor, taa_half(0.0));
            OutputColor = min(OutputColor, taa_half(Max10BitsFloat));

            SceneColorOutput[LocalHistoryPixelPos] = OutputColor;
        }
    }
}

由此可知,相较传统的TAA,TSR增加了很多数据,包含当前和历史的高频、低频、视差系数、重投影等等数据,先后根据这些信息摒弃或恢复历史数据,生成当前帧的混合权重,最终算出抗锯齿之后的场景颜色和历史帧数据。

以上代码只是TSR的最后一个阶段更新历史数据的代码,前面还有很多步骤来生成此阶段所需的数据,此文不再分析,留给读者们自行研究。

6.6.2 Strata

笔者粗略地看了Strata的相关代码,看起来Strata类似于UE4的Material Layer,但它主要应用于Nanite几何体的材质投射、混合和光影处理。Strata有专用的材质、材质节点、着色模型、可视化模式和Shader处理模块。不过,当前EA版本尚处于体验阶段,限制较多。涉及Strata的主要文件有:

  • Strata.h/cpp
  • StrataMaterial.h/cpp
  • StrataDefinitions.h
  • MaterialExpressionStrata.h
  • Strata.ush
  • BasePassPixelShader.usf
  • DeferredLightPixelShaders.usf
  • 场景渲染管线、光照相关的代码。

有兴趣的同学自行研读相关源码。

6.7 本篇总结

本篇主要阐述了UE5的编辑器特性、Nanite、Lumen及相关渲染技术,但由于UE5改动巨大,无法覆盖所有的技术点,除了本篇文章谈及的技术,实际上还有很多未涉及的,这就需要感兴趣的读者自己去探索UE的源码了。

UE5 EA阶段,无论是Nanite还是Lumen,都存在着诸多瑕疵,如Nanite只支持静态物体,Lumen的噪点、漏光,TSR的闪烁和模糊,阴影精度的不足(下图),海量传统特性的不支持......

镜头离物体足够近时出现的物体模糊和阴影瑕疵。

虽然UE5目前存在着诸多瑕疵,但它是沐浴着阳光雨露的小树苗,经过Epic Game的精心培育,假以时日,终会成长为枝繁叶茂的参天大树,荫护着UE引擎关联的各行各业。UE5 really No.1!!!

特别说明

  • 感谢所有参考文献的作者,部分图片来自参考文献和网络,侵删。
  • 本系列文章为笔者原创,只发表在博客园上,欢迎分享本文链接,但未经同意,不允许转载
  • 系列文章,未完待续,完整目录请戳内容纲目
  • 系列文章,未完待续,完整目录请戳内容纲目
  • 系列文章,未完待续,完整目录请戳内容纲目

参考文献

原文地址:https://www.cnblogs.com/timlly/p/15007236.html