性能一 Exploring Mobile vs. Desktop OpenGL Performance

opengl insight

Exploring Mobile vs. Desktop
OpenGL Performance

Jon McCaffrey

前面那些内容以前看过应该写在谋篇帖子里了

Fullscreen effect

pp优化的核心在于减pass 因为这全是system mem sync read modify write...

1.全屏特效如果不需要 neighboring pixelx 用frame buffer ftech 和base做到一起带宽可以做到最低

2.particle的bloom去掉用mask 区分共用一次bloom mask用mrt出

3.ubershader把pp互相挪到一起参见2014gdc unreal

Offscreen Effect

1.用downsample+bilinear做blur for bloom and environment reflection

2.这种方式每个像素只画一次就是deferred lighting的优点利用这一点减小overdraw

Fragment

优化核心用profiler测 bound

1.float

2.如果顶点少考虑逐顶点计算
3.lightmap

4.lookup texture (bandwidth memory 换alu）例如beckmon参数一维 cache命中率高

5.texture fetch 这里bound ------如果varyings太多 register压力导致不能有更多fragment同时处理以便把 tsp 或者usse 的开销通过调度掩盖掉

Vertexshading

2.varyings 数量限制住太多不但增加memory开销导致并行fragment数量受限

3.顶点带宽每个tile都会拉一遍除非已经在cache（下一块tile数据已经在cache）里 vertex用低精度数据 OES vertex half float

4.交错interleaved vertex attribute 这样cache命中高 pos normal uv pos normal uv

5 频繁改变的vertex attribute和static的buffer 分开放比如每次更新的可能只是pos而 normal uv不变把这两组数据分开（看起来animation才会有这个问题 scene全部数据都是static

====================================

https://cdn2.unrealengine.com/Resources/files/GDC2014_Next_Generation_Mobile_Rendering-2033767592.pdf

=============

cover latency

https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf