0/ 1727/ /4

Today, we continue to select a number of issues related to development and optimization for you. It is recommended that you read it for 10 minutes, and you will be rewarded after reading it carefully.

  1. Add a Cookie mask to parallel light
  2. Time-consuming optimization of ToLua table
  3. Some optimizations on specular reflection
  4. Use BVH to make a dynamic occlusion culling plug-in in Unity
  5. Problems with variables inside a certain class in the PlayableBehaviour class


Q: We wrote a Shader with general parallel light processed by light and shadow. But the artist wanted to add a Cookie mask to the light, it turned out that the model was dim.


        Name "FORWARD"
        Tags { "LightMode"="ForwardBase"}
        #pragma vertex vert
        #pragma fragment frag
        #include "UnityCG.cginc"
        #include "AutoLight.cginc"
        uniform fixed4 _LightColor0;
       struct VertexInput
       half4 vertex : POSITION;
       half3 normal : NORMAL;
       half2 texcoord : TEXCOORD0;

        struct VertexOutput
            half4 pos : SV_POSITION;
            half2 uvMain : TEXCOORD0;
            half3 normalDir : TEXCOORD1;

        fixed4 frag(VertexOutput i) : SV_Target

_LightColorO.rgb is used here. If there is a cookie, it is all black; if no cookie is set, what you get is the color of the light. What is going on, do I have to add another ForwardAdd Pass to deal with it?

The art team wants an additional light and shadow change to render the atmosphere, for example, the feeling of scattered light and dark in the forest, but does not want to double DrawCall. Is there a good way to achieve it? Thanks.

The picture below is flat and dull without Cookie.

A: The current way can only refer to a way to simplify the fog, because the map here is fairly flat, so this way can be used. Put a Mesh patch at the same height as the ground, and then adjust the Shader rendering level to this:

Tags{ “Queue” = “Transparent+150” “IgnoreProjector” = “True” “RenderType” = “Transparent”}

So think about it and change it a little bit differently, this method can be changed slightly, and you can also do a simple cloudy weather effect. (The mixing method may have to be changed, the art said that the building has some color distortion)

Shader "Studio1/Flow2"
        _MainTex ("Texture", 2D) = "white" {}
        _MashAlpha("MashAlpha",Range(0.00, 1.00)) = 0.60
        [KeywordEnum(OFF, ON)] _MoveUV("uv Move", Float) = 0
        _MoveSpeed("uv MoveSpeed",Float) = 1
        Tags{ "Queue" = "Transparent+150" "IgnoreProjector" = "True" "RenderType" = "Transparent"   }
        LOD 100

        Blend SrcAlpha OneMinusSrcAlpha
        ZWrite Off
        ZTest Off
        Cull Back

            #pragma vertex vert
            #pragma fragment frag
            // make fog work
            #pragma multi_compile_fog
            #pragma multi_compile _MOVEUV_OFF _MOVEUV_ON
            #include "UnityCG.cginc"

            struct appdata
                float4 vertex : POSITION;
                float2 uv : TEXCOORD0;

            struct v2f
                float2 uv : TEXCOORD0;
                float4 vertex : SV_POSITION;

            sampler2D _MainTex;
            float4 _MainTex_ST;

            float _MashAlpha;
#ifdef _MOVEUV_ON
            float _MoveSpeed;
            v2f vert (appdata v)
                v2f o;
                o.vertex = UnityObjectToClipPos(v.vertex);
                o.uv = TRANSFORM_TEX(v.uv, _MainTex);
                return o;
            fixed4 frag (v2f i) : SV_Target
                // sample the texture
                fixed2 tep=i.uv;

                #ifdef _MOVEUV_ON
                //let uv move on with time changing
                tep.x +=_Time.y * _MoveSpeed;
                tep.y +=_Time.y * _MoveSpeed;

                fixed4 col = tex2D(_MainTex,tep);
                col = fixed4(col.rgb, col.a*_MashAlpha);
                // apply fog
                UNITY_APPLY_FOG(i.fogCoord, col);
                return col;


Q: Generally speaking, I think Lua’s table cache is better to use. But I tried to print it myself and felt that for a small-capacity table like Vector3, the cost of new is much smaller than using the cache to change the value. Does the difference between these two usages have a big impact on performance? Do I need to pay attention to it?

Attached picture: Left is the time consumption of cache, and on the right is the time consumption of New, there is a gap of 8 times

A1: You just need to write the following code:

local VSet = Vector.Set

for i = 1, 100000000 do
VSet(c, a.x+b.x, a.y+b.y, a.z+b.z)

Your case compares calling functions through Metatable and calling functions directly to see who is faster. Because the mainstream Lua object-oriented implementation uses the obj:XXX() method to stack trace, all are done through Metatable. Although this approach is very suitable for supporting inheritance, but with one more Metatable lookup, the performance will definitely be worse. Cache Vector.Set to save these lookup time.

In addition:

local VNew = Vector.New

It is also faster than the direct Vector.New, saving a field lookup time (search for the New field in Vector).

This optimization can be used in any place where there are stack trace, just do it in a place where the consumption is relatively high, and it does not need to be like this for the entire code.

A2: This involves the issue of test methods:

  1. GC must be performed before and after the test to prevent the previously accumulated memory from being affected by the later stage of the test.
  2. The magnitude of the test must be consistent with the magnitude of the run in the game. Too large or too small magnitude is not conducive to reflecting the degree of impact.
  3. To reduce external calls, the external implementation can be written directly in the loop. If you must use an external interface, you should also do a good job of evaluating the performance of this type of interface.

Finally, let me talk about the conclusion of my recent performance test: Reusing Vector is faster than New, and will not cause time-consuming memory allocation and GC.


Q: Now the real-time specular reflection is used on the ground of some scenes in the game. The reflection cost is a bit large, and the peak number of faces is more than 200,000.

I checked it with FrameDebugger and found that some less needed parts were also reflected, which caused the DrawCall and the number of faces to increase a lot, for example:

  1. Character stroke, assuming that a character has 6000 faces, a stroke of 6000, a reflective character of 6000, and a reflective stroke of 6000, a character needs 24000 faces. This part of the reflection stroke is not visible at all and can be completely removed.
  2. Model LOD, the reflection map itself is not very clear, it is also possible to use low-level model LOD.


In view of these two problems, I think of the following solutions:

  1. In the OnWillRenderObject method of rendering the reflection map, change the Shader.globalMaximumLOD to 100 forcibly. The shader of LOD100 has no stroke, and the calculation is much less. After rendering, change it back to the original one.
  2. In the OnWillRenderObject method of rendering the reflection map, change the QualitySettings.lodBias to 0 forcibly, let the camera use the low-level model LOD rendering, and then change it back to the original after rendering.

Now my question is that the real-time specular reflection map must be rendered every frame, which means I will modify the values of Shader.globalMaximumLOD and QualitySettings.lodBias every frame. Will this operation cause any problems?

I made a package and tried it. It seems that the time to set Shader.globalMaximumLOD and QualitySettings.lodBias in the Profiler is more than the time saved for rendering.

So is there any other optimization method? In addition to controlling the level of reflective objects. (This plan is to be done, but there are still some difficulties to be solved in the current project)

A1: I don’t know if this can help the subject, Material.SetShaderPassEnabled turns off the stroked Pass when the reflection camera is rendering. I don’t know the performance but just provide an idea.


A2: Later I thought of a way to use ReplacementShader. Use a shader replace with only diffuse, so that 1 has no stroke and can reduce the amount of calculation; 2 only renders opaque objects, which can save special effects and 3D UI DrawCall.


Q: The objects in my project are all dynamically loaded, so Unity’s built-in occlusion culling function can’t be used, and some plugins such as InstantOC proposed by dynamic occlusion have also been used, but these plugins are based on ray detection of all objects, so the calculation of the CPU takes a long time in the project, it is a bit outweighed by the gains.

So I plan to use BVH to make a dynamic occlusion plug-in in Unity. Is there any idea I can provide? Thanks.

A: If you use BVH to calculate it, it is also considered as an occlusion relationship, right? To do a more accurate calculation based on the line of sight, it should cost a lot.

I don’t know if the subject’s dynamic loading of the scene objects is only dynamic loading, or if they don’t move after the loading is complete, or whether the key objects of the scene are dynamic, or the scene is randomly generated. If it is the former, the occlusion relationship can also be calculated offline in advance; if it is the latter, it may indeed only be calculated dynamically.

This module is a big cost, and may not be as good as direct violent distance elimination and LOD. Looking at your screenshot, there are more opaque things, so the occlusion culling is mainly to reduce the DrawCall and save the CPU time, culling time and see which one is more.


Q: Timeline playback can record variable changes, commonly known as hand K-frames. I saw that the example can record the variable changes inside the PlayableBehaviour class, such as floating-point, vector, etc. Currently, I hope to maintain a custom config class instance in a custom PlayableBehaviour. But when using it, I found that if the variables in the recorded config instance are modified, the config instance is a null reference during playback, so I would like to ask how to operate it correctly? Is it because the initial timing of config is incorrect? When there is no K frame operation, config as a member variable does not need to be initialized by writing New, and there is no error or running problem at this time.

A: I also posted a post on the Unity forum to ask that in the custom Behaviour, internal classes cannot be used to record changes, structures must be used.

That’s all for today’s sharing. Of course, life is boundless but knowing is boundless. In the long development cycle, these problems you see are just the tip of the iceberg. We have already prepared more technical topics on the UWA Q&A Blogs, waiting for you to explore and share together. You are welcome to join in, who loves progress. Perhaps your method can solve the urgent needs of others, and the “stone” of other mountains can also attack your “jade”.


UWA website:

UWA Blog:




Related Topics

Post a Reply

Your email address will not be published.