Deferred Shading

Deferred Shading是现在比较流行实时渲染方式，这种渲染方式能把Geometry和Lighting之间的耦合解开，把Forward Shading的Geometry Pass*Lighting Pass复杂度下降为Geometry Pass+Lighting Pass，特别适合于渲染较多动态光源的场景，本文将快速浏览实现Deferred Shading的各个阶段，并提供一个带源代码的简单的例子程序，这个程序可以在SM2.0及以上的硬件上运行，通过dx9接口实现。

Deferred Shading介绍可参阅《RealTime Rendering》3rd 7.9.2、《GPU Gems2》及《GPU Gems3》。另外，《Deferred Shading Tutorial》提供了详细的OpenGL实现流程。而网上可找到示例代码有nVidia SDK 9.52以及Intel的《Deferred Rendering for Current and Future Rendering Pipelines》，网址是http://software.intel.com/en-us/articles/deferred-rendering-for-current-and-future-rendering-pipelines/。

Deferred Shading可分为四个阶段：Geometry、Lighting、Post-Processing和MergeOutput，其中第三阶段可选。各个阶段分别输出到texture，所以，deferred shading将使用到Render To Texture(RTT)及Multiple Render Targets(MRT)。每个阶段及其对应的输出如下表：

阶段	输出	作用
Geometry	G-Buffer	记录整个场景的几何信息例如normal、depth(position)、diffuse color、specular intensity等
Lighting	P-Buffer1	使用G-Buffer信息逐像素计算光照
Post-Processing	P-Buffer2	后处理，例如motion blur、Bloom、Anti-Aliasing等
MergeOutput	BackBuffer	混合之前所有Buffer的数据，输出到BackBuffer

GeometryStage G-Buffer

此阶段是把场景内所有3D模型的几何信息都渲染(记录)到G-Buffer内，G-Buffer的分辨率是屏幕分辨率，以便后续阶段进行逐像素渲染。G-Buffer可以有多个texture，通常，使用MRT在一个Batch内完成这些属性的渲染。此阶段，场景的几何信息都以texture coordinate的方式插值并投影到G-Buffer上，所以，需要设置好各个space的转换矩阵。示例程序在此阶段输出normal、depth、diffuse color及specular intensity到G-Buffer。示例程序在view space计算光照，所以这里输出的normal是转换到view space的值。这里输出的depth是已转换到normalized device space，在计算光照时，depth配合project matrix可以恢复出view space下的坐标值。G-Buffer输出如下图：

由上往下分别是normal(view space)、depth、diffuse color、specular intensity。

LightingStage P-Buffer1

此阶段使用光照模型、光源位置结合G-Buffer的几何信息计算G-Buffer上每个像素的颜色，如果有多个光源，每个光源执行此阶段一次，并把计算结果累积到P-Buffer上。再次提示，示例程序是在view space上计算光照，所以G-Buffer上的depth需要恢复为view space的position。要理解恢复view space position的过程，先来认识一些概念：

G-Buffer上的depth是normalized device space，而view space转换到normalized device space要通过view-->homogeneous-->normalized device，其中，view-->homogeneous通过projection matrix完成；而homogeneous->normalized device则是把4d vector都除以w，而w是view space下的z。projection matrix如下（请注意D3D使用row-major matrix并且使用pre-multipling），因此，我们得到homogeneous下的z是，除以view space的z就是等于normalized device下的depth。表达是内的z均为view space下的z，f是far plane，n是near plane。f和n是我们定义project matrix时指定并且表达式的值我们知道，所以通过上述表达式，可以求出view space下的z的值。normalized device space下的xy我们也知道，分别是texture coordinate的u*2-1及-(v*2-1），这是因为，我们要把G-Buffer点对点地渲染到P-Buffer上， texture coordinate是[0,1]要转换到[-1,1]normalized device space的xy区间。想详细了解各个空间转换及转换矩阵的推导，可参阅《RealTime Rendering》3rd及《Introduction to 3D Game Programming with DirectX 9.0c—A Shader Approach》。

当我们得到了normalized device space下的xyz以及view space下的z后，有两种方法可以回到view space，第一种方法，normalized device space的xyzw(w=1)分别乘以view space的z，回到homogeneous clip space，然后通过projection matrix的inverse matrix(projection matrix并没有真正把点投影到平面上，只是转到homogeneous space，所以这个matrix是invertable的)回到view space；第二种方法，使用projection matrix的(0,0)及(1,1)元素计算出view space的xy值，其中R是aspect ration，a是fovy。示例程序使用第二种方法。

得到每个像素的view space 坐标，就可以做逐像素光照，得到P-Buffer1，如下图：

Post-Processing P-Buffer2

示例程序进行了AA及Bloom处理。AA处理使用G-Buffer的normal作为依据，检测三角形边界并决定3x3临近像素的混合权重，混合输出中心像素。更有效的Post-Processing AA可参考MLAA及SRAA(后续文章中介绍)。Bloom就是对P-Buffer1进行纵向和横向模糊。下面是Post-Processing的输出：

MergeOutput Backbuffer

这个步骤很简单，对Post-Processing的输出进行混合并渲染到Backbuffer上就ok了，下图就是完整的渲染效果：

最后需要说明的是，示例程序使用《Introduction to 3D Game Programming with DirectX 9.0c—A Shader Approach》的框架代码及纹理。

示例程序源代码下载：https://files.cnblogs.com/rickerliang/AmbientDiffuseSpecularDemo-DeferredShading.zip

希望本文对想了解Deferred Shading的朋友有帮助。