circle-loader
0
by
0/ 1091/ /1

Compute Shader is a relatively popular technology today, such as in the previous “Moonlight Blade Mobile Game” and the recently popular “Naraka: Bladepoint”. Unity’s official introduction to Compute Shader is as follows: https://docs.unity3d.com/Manual/class-ComputeShader.html. The Compute Shader runs on the GPU like other Shaders, but it is independent of the rendering pipeline. We can use it to implement a large number of parallel GPU algorithms to speed up our games.

 

In Unity, we right-click in the Project to create a Compute Shader file:

 

The generated files belong to a kind of Asset file, and all have .compute as the file suffix. Let’s take a look at the default content inside:

#pragma kernel CSMain

RWTexture2D<float4> Result;

[numthreads(8,8,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
    Result[id.xy] = float4(id.x & id.y, (id.x & 15)/15.0, (id.y & 15)/15.0, 0.0);
}

 

The main purpose of this article is to enable new learners to understand the meaning of these lines of code, and only after learning the basics can they read more awesome code.


Language

Unity uses the HLSL language of DirectX 11 and will be automatically compiled to the corresponding platform.

Kernel

Then let’s look at the first line:

#pragma kernel CSMain

 

CSMain is actually a function, which can be seen behind the code, and kernel means the kernel. This line declares a function named CSMain as the kernel, or as a kernel function. This kernel function will eventually be executed in the GPU.

 

At least one kernel in a Compute Shader can be invoked. The declaration method is:
#pragma kernel functionName

 

We can also use it to declare multiple kernels in a Compute Shader, and we can also define some preprocessing macros after this directive, as follows:
#pragma kernel KernelOne SOME_DEFINE DEFINE_WITH_VALUE=1337
#pragma kernel KernelTwo OTHER_DEFINE

 

We can’t write comments after this command but should write comments on a new line. For example, the following way of writing will cause compilation errors:
#pragma kernel functionName // some comments

RWTexture2DRWTexture2D

Then let’s look at the second line:

RWTexture2D<float4> Result;

 

It looks like a variable related to texture is declared, let’s take a look at the meaning of these keywords.

 

In RWTexture2D, RW actually means Read and Write, Texture2D is a two-dimensional texture, so it means a two-dimensional texture that can be read and written by Compute Shader.

 

If we only want to read and not write, we can use the Texture2D type.

 

We know that the texture is composed of pixels, and each pixel has its subscript, so we can access them through the subscript of the pixel, for example, Result[uint2(0,0)].

 

The same pixel will have its corresponding value, which is the value we want to read or write. The type of this value is written in <>, which usually corresponds to an RGBA value, so it is of type float4. Usually, we will process the texture in the Compute Shader, and then sample the processed texture in the FragmentShader.

 

This way we roughly understand the meaning of this line of code, declaring a readable and writable two-dimensional texture named Result, where the value of each pixel is float4.

 

In addition to RWTexture, the readable and writable types in Compute Shader include RWBuffer and RWStructuredBuffer, which will be introduced later.

RWTexture2D – Win32 apps

numthreads

Then the following sentence (very important!):

[numthreads(8,8,1)]

 

It is num and thread again, which must be related to the number of threads. Yes, it is to define the total number of threads (Threads) that can be executed in a thread group (Thread Group). The format is as follows:
numthreads(tX, tY, tZ)
Note: add a t before X, Y, Z for convenience and follow-up Group X, Y, Z to distinguish

 

The value of tX tY tZ is the total number of threads. For example, numthreads(4, 4, 1) and numthreads(16, 1, 1) are represent of 16 threads. So why not use the definition of numthreads(num) directly, instead of dividing it into three-dimensional forms such as tX, tY, and tZ? After continuing reading to the end, you will understand the mystery.

 

Before each kernel function, we need to define numthreads, otherwise, the compilation will report an error.

 

Among them, the three values of tX, tY, and tZ cannot be filled in randomly. For example, a knife with tX=99999 crit is not acceptable. They have the following constraints in different versions:

 

In Direct11, you can create gX gY gZ thread groups through the ID3D11DeviceContext::Dispatch(gX, gY, gZ) method, and a thread group will contain multiple threads (the number is defined by numthreads).

 

Pay attention to the order. First, numthreads defines the number of threads in the thread group corresponding to each kernel function (tX tY tZ), and then uses Dispatch to define how many thread groups (gX gY gZ) are used to process this kernel function. The threads in each thread group are parallel, and the threads of different thread groups may or may not execute at the same time. Generally, the number of threads executed by a GPU at the same time is between 1000-10000.

 

Then we use a schematic diagram to see the structure of threads and thread groups, as shown below:

 

The upper part represents the thread group structure, and the lower part represents the thread structure in a single thread group. Because they are all defined by (X, Y, Z), they are like a three-dimensional array, and the subscripts start from 0. We can think of them as a table: there is a Z identical table, each with X columns and Y rows. For example, (2,1,0) in the thread group is the thread group corresponding to the second row and the third column of the first table, and the same is true for the threads in the lower half.

 

By clarifying the structure, we can well understand the meaning of the following parameters related to a single thread:

 

It should be noted here that, whether it is a Group or a Thread, their order is X first, then Y, and last Z. The understanding of the table is to go first (X), then column (Y) and then the next table (Z). For example, we tX= 5, tY=6, then SV_GroupThreadID=(0,0,0) for the first thread, SV_GroupThreadID=(1,0,0) for the second thread, SV_GroupThreadID=(0,1,0) for the sixth thread, the 30th SV_GroupThreadID=(4,5,0), the 31st SV_GroupThreadID=(0,0,1). Group is the same, after figuring out the order, the calculation formula of SV_GroupIndex is easy to understand.

 

For another example, such as two Groups with SV_GroupID of (0,0,0) and (1,0,0), the SV_GroupThreadID of the first Thread inside them is both (0,0,0) and the SV_GroupIndex is 0, but the former’s SV_DispatchThreadID=(0,0,0) and the latter’s SV_DispatchThreadID=(tX,0,0).

 

It is very important in the kernel function.
numthreads – Win32 apps

Kernel Function

void CSMain (uint3 id : SV_DispatchThreadID)
{
    Result[id.xy] = float4(id.x & id.y, (id.x & 15)/15.0, (id.y & 15)/15.0, 0.0);
}

 

The last is the kernel function we declared. The meaning of the parameter SV_DispatchThreadID has been introduced above. In addition to this parameter, the parameters we mentioned earlier can be passed into the kernel function, which can be chosen according to actual needs, complete as follows:

void KernelFunction(uint3 groupId : SV_GroupID,
    uint3 groupThreadId : SV_GroupThreadID,
    uint3 dispatchThreadId : SV_DispatchThreadID,
    uint groupIndex : SV_GroupIndex)
{

}

 

The code executed in the function is to assign a color to the pixel with the subscript id.xy in our Texture, which is the best place.

 

For example, in the past, we wanted to assign a value to each pixel of a Texture with an x*y resolution. In the case of a single thread, our code is often as follows:

for (int i = 0; i < x; i++)
    for (int j = 0; j < y; j++)
        Result[uint2(x, y)] = float4(a, b, c, d);

 

In two loops, the pixels are slowly assigned one by one. So if we want to operate on many 2048*2048 pictures per frame, it is conceivable that it will be stuck.

If multi-threading is used, in order to avoid different threads operating on the same pixel, we often use the method of segmented operation, as follows, four threads are processed:

void Thread1()
{
    for (int i = 0; i < x/4; i++)
        for (int j = 0; j < y/4; j++)
            Result[uint2(x, y)] = float4(a, b, c, d);
}

void Thread2()
{
    for (int i = x/4; i < x/2; i++)
        for (int j = y/4; j < y/2; j++)
            Result[uint2(x, y)] = float4(a, b, c, d);
}

void Thread3()
{
    for (int i = x/2; i < x/4*3; i++)
        for (int j = x/2; j < y/4*3; j++)
            Result[uint2(x, y)] = float4(a, b, c, d);
}

void Thread4()
{
    for (int i = x/4*3; i < x; i++)
        for (int j = y/4*3; j < y; j++)
            Result[uint2(x, y)] = float4(a, b, c, d);
}

 

Isn’t it stupid to write like this? If there are more threads, divided into more segments, it will be a bunch of repetitive code. But if we can know the start and end subscripts of each thread, can’t we unify these codes, as follows:

void Thread(int start, int end)
{
    for (int i = start; i < end; i++)
        for (int j = start; j < end; j++)
            Result[uint2(x, y)] = float4(a, b, c, d);
}

 

So if I can open a lot of threads, can one thread process one pixel?

void Thread(int x, int y)
{
    Result[uint2(x, y)] = float4(a, b, c, d);
}

 

We can’t do this with CPU, but with GPU, we can with Compute Shader. In fact, in the code of the default Compute Shader above, the content of the kernel function is like this.

 

Next, let’s take a look at the beauty of Compute Shader and see the value of id.xy. The type of id is SV_DispatchThreadID. Let’s first recall the calculation formula of SV_DispatchThreadID:
assuming that the thread’s SV_GroupID=(a, b, c), SV_GroupThreadID=(i, j, k) then SV_DispatchThreadID=(a tX+i, b tY +j, c*tZ+k)

 

First, we used [numthreads(8,8,1)], that is, tX=8, tY=8, tZ=1, and the value range of i and j is 0 to 7, and k=0. Then the value range of SV_DispatchThreadID.xy of all threads in our thread group (0,0,0), that is, id.xy is (0,0) to (7,7), thread group (1,0,0) Its value range in thread (0,1,0) is (0,8) to (7,15), … …, in the thread group (a,b,0) its value range is (a 8, b 8, 0) to (a 8+7,b 8+7,0).

 

Let’s look at it with a schematic diagram, assuming that each grid in the following image contains 64 pixels:

 

That is to say, each thread group will have 64 threads to process 64 pixels synchronously, and threads in different thread groups will not process the same pixel repeatedly. To process a picture with a resolution of 1024*1024, we only need to dispatch(1024/8, 1024/8, 1) thread groups.

 

In this way, hundreds or thousands of threads can process a pixel at the same time, which is impossible in the way of the CPU. Isn’t it wonderful?

 

And we can find that the value set in numthreads is worth scrutinizing, for example, we have a 4*4 matrix to process, set numthreads(4,4,1), then the value of sv_groupThreadiD.xy for each thread corresponds exactly to the subscript of each entry in the matrix.

 

How do we call kernel functions in Unity, how do we dispatch thread groups and how do we use RWTexture? This is where we come back to our C# part.

C# part

In the past vertex&fragment shaders, we used to associate it with Material, but Compute Shader is different, it is driven by C#.

First create a new monobehaviour script, Unity provides us with a Compute Shader type to refer to the .compute file we generated earlier:

public ComputeShader computeShader;

Associate .compute files in the Inspector interface

 

In addition, we associate a Material, because the texture processed by the Compute Shader still needs to be sampled by the Fragment Shader and displayed.
public Material material;

 

For this Material, we use an Unlit Shader, and the texture does not need to be set, as follows:

 

Then we can assign the RenderTexture in Unity to the RWTexture2D in the Compute Shader, but it should be noted that because we are processing pixels in multiple threads, and this processing process is out of order, so we need to set the enableRandomWrite property of RenderTexture to true, the code as follows:

RenderTexture mRenderTexture = new RenderTexture(256, 256, 16);
mRenderTexture.enableRandomWrite = true;
mRenderTexture.Create();

 

We created a RenderTexture with a resolution of 256*256, first, we have to assign it to our Material so that our Cube will display it. Then assign it to the Result variable in our Compute Shader, the code is as follows:

material.mainTexture = mRenderTexture;
computeShader.SetTexture(kernelIndex, "Result", mRenderTexture);

 

There is a kernelIndex variable, which is the kernel function subscript. We can use FindKernel to find the subscript of the kernel function we declared:

int kernelIndex = computeShader.FindKernel("CSMain");

 

In this way, when we sample FragmentShader, the sampled texture is processed by Compute Shader:

fixed4 frag (v2f i) : SV_Target
{
    // _MainTex is the processed RenderTexture
    fixed4 col = tex2D(_MainTex, i.uv);
    return col;
}

 

The last step is to open the thread group and call our kernel function. In the Compute Shader, the Dispatch method is in place for us in one step:

computeShader.Dispatch(kernelIndex, 256 / 8, 256 / 8, 1);

 

Why 256/8 has been explained earlier. Let’s see the effect:

 

The above picture shows the effect of the Compute Shader code generated by our Unity by default. We can also try to use it to process 2048*2048 Texture, which is also very fast.


Next, let’s look at an example of particle effects:

First of all, a particle usually has two attributes of color and position, and we must process these two attributes in Compute Shader, then we can create a struct in Compute Shader to store:

struct ParticleData {
float3 pos;
float4 color;
};

 

Then, this particle must be a lot, we need something like a List to store them, and the RWStructuredBuffer type is provided for us in ComputeShader.

RWStructuredBuffer

It is a readable and writable Buffer, and we can specify the data type in the Buffer as our custom struct type, no longer limited to basic types such as int and float.

So we can define our particle data like this:

RWStructuredBuffer<ParticleData> ParticleBuffer;

RWStructuredBuffer – Win32 apps

 

For animation, we can add another time-dependent value, and we can modify the position and color of the particles according to the time:

float Time;

 

Next is how to modify our particle information in the kernel function. To modify a particle, we must know the subscript of the particle in the Buffer, and this subscript cannot be repeated in different threads, otherwise, it may lead to multiple threads modifying the same particle.

According to the previous introduction, we know that SV_GroupIndex is unique in a thread group, but not in different thread groups. For example, if there are 1000 threads in each thread group, then SV_GroupID is 0 to 999. We can superimpose it according to SV_GroupID. For example, SV_GroupID=(0,0,0) is 0-999, SV_GroupID=(1,0,0) is 1000-1999, etc. For convenience, our thread group can be in (X,1,1) format. Then we can arbitrarily arrange the particles according to Time and Index, and the complete code of Compute Shader:

#pragma kernel UpdateParticle

struct ParticleData {
float3 pos;
float4 color;
};

RWStructuredBuffer<ParticleData> ParticleBuffer;

float Time;

[numthreads(10, 10, 10)]
void UpdateParticle(uint3 gid : SV_GroupID, uint index : SV_GroupIndex)
{
int pindex = gid.x * 1000 + index;

float x = sin(index);
float y = sin(index * 1.2f);
float3 forward = float3(x, y, -sqrt(1 - x * x - y * y));
ParticleBuffer[pindex].color = float4(forward.x, forward.y, cos(index) * 0.5f + 0.5, 1);
if (Time > gid.x)
ParticleBuffer[pindex].pos += forward * 0.005f;
}

 

Next, we need to initialize the particle in C# and pass it to the Compute Shader. We want to pass particle data, that is to say to the previous RWStructuredBufferAssignment, Unity provides us with the ComputeBuffer class to correspond to RWStructuredBuffer or StructuredBuffer.

ComputeBuffer

In Compute Shader, we often need to read and write some of our custom Struct data to the memory buffer. ComputeBuffer is born for this situation. We can create and fill it in C#, and then pass it to Compute Shader or another Shader for use.

 

Usually, we create it in the following way:

ComputeBuffer buffer = new ComputeBuffer(int count, int stride)

 

Where count represents the number of elements in our buffer, and stride refers to the space (bytes) occupied by each element. For example, if we pass 10 float types, then count=10, stride=4. It should be noted that the stride size in ComputeBuffer must be the same as the size of each element in RWStructuredBuffer.

 

After the declaration is complete, we can use the SetData method to fill in, the parameter as a custom struct array:

buffer.SetData(T[]);

 

Finally, we can use the SetBuffer method in the Compute Shader class to pass it to the Compute Shader:

public void SetBuffer(int kernelIndex, string name, ComputeBuffer buffer)

 

Remember to Release() it after use.
https://docs.unity3d.com/ScriptReference/ComputeBuffer.html

 

In C#, we define the same struct, so as to ensure the same size as the Compute Shader:

public struct ParticleData
{
    public Vector3 pos;//Equal to float3
    public Color color;//Equal to float4
}

 

Then we declare our ComputeBuffer in the Start method and find our kernel function:

void Start()
{
    //struct has 7 float,size=28
    mParticleDataBuffer = new ComputeBuffer(mParticleCount, 28);
    ParticleData[] particleDatas = new ParticleData[mParticleCount];
    mParticleDataBuffer.SetData(particleDatas);
    kernelId = computeShader.FindKernel("UpdateParticle");
}

 

Since we want our particles to be in motion, i.e. to modify the information of the particles every frame. So we pass Buffer and Dispatch in the Update method:

void Update()
{
    computeShader.SetBuffer(kernelId, "ParticleBuffer", mParticleDataBuffer);
    computeShader.SetFloat("Time", Time.time);
    computeShader.Dispatch(kernelId,mParticleCount/1000,1,1);
}

 

At this point, our particle position and color operations have been completed, but these data cannot be displayed in Unity. We also need the help of Vertex&FragmentShader. We create a new UnlitShader and modify the code inside as follows:

Shader "Unlit/ParticleShader"
{
    SubShader
    {
        Tags { "RenderType"="Opaque" }
        LOD 100

        Pass
        {
            CGPROGRAM
            #pragma vertex vert
            #pragma fragment frag

            #include "UnityCG.cginc"

            struct v2f
            {
                float4 col : COLOR0;
                float4 vertex : SV_POSITION;
            };

            struct particleData
            {
float3 pos;
float4 color;
            };

            StructuredBuffer<particleData> _particleDataBuffer;

            v2f vert (uint id : SV_VertexID)
            {
                v2f o;
                o.vertex = UnityObjectToClipPos(float4(_particleDataBuffer[id].pos, 0));
                o.col = _particleDataBuffer[id].color;
                return o;
            }

            fixed4 frag (v2f i) : SV_Target
            {
                return i.col;
            }
            ENDCG
        }
    }
}

 

We said earlier that ComputeBuffer can also be passed to ordinary Shader, so we also create a Struct with the same structure in Shader, and then use StructuredBufferto receive.

 

SV_VertexID: It is used as a parameter passed in VertexShader, representing the subscript of the vertex. We have as many particles as we have vertices. The vertex data uses the Buffer we processed in the Compute Shader.

 

Finally, we associate a Material with the above Shader in C#, then pass the particle data, and finally, draw it. The complete code is as follows:

public class ParticleEffect : MonoBehaviour
{
    public ComputeShader computeShader;
    public Material material;

    ComputeBuffer mParticleDataBuffer;
    const int mParticleCount = 20000;
    int kernelId;

    struct ParticleData
    {
        public Vector3 pos;
        public Color color;
    }

    void Start()
    {
        //struct has 7 float,size=28
        mParticleDataBuffer = new ComputeBuffer(mParticleCount, 28);
        ParticleData[] particleDatas = new ParticleData[mParticleCount];
        mParticleDataBuffer.SetData(particleDatas);
        kernelId = computeShader.FindKernel("UpdateParticle");
    }

    void Update()
    {
        computeShader.SetBuffer(kernelId, "ParticleBuffer", mParticleDataBuffer);
        computeShader.SetFloat("Time", Time.time);
        computeShader.Dispatch(kernelId,mParticleCount/1000,1,1);
        material.SetBuffer("_particleDataBuffer", mParticleDataBuffer);
    }

    void OnRenderObject()
    {
        material.SetPass(0);
        Graphics.DrawProceduralNow(MeshTopology.Points, mParticleCount);
    }

    void OnDestroy()
    {
        mParticleDataBuffer.Release();
        mParticleDataBuffer = null;
    }
}

 

material.SetBuffer: Pass ComputeBuffer to our Shader.

OnRenderObject: In this method, we can customize the drawing geometry.

DrawProceduralNow: We can use this method to draw geometry, the first parameter is the topology, and the second parameter is the number of vertices.

https://docs.unity3d.com/ScriptReference/Graphics.DrawProceduralNow.html

 

The final result is as follows:

 

The Demo link is as follows:
https://github.com/luckyWjr/ComputeShaderDemo/tree/master/Assets/Particle

ComputeBufferType

In the example, when we create a new ComputeBuffer, the parameter of ComputeBufferType is not used, and ComputeBufferType.Default is used by default. In fact, our ComputeBuffer can have many different types corresponding to different Buffers in HLSL to be used in different scenarios. There are a total of the following types:

 

For example, Append Buffer is often used when doing GPU culling (for example, using Compute Shader to implement view frustum culling described later), the declaration in C# is as follows:

var buffer = new ComputeBuffer(count, sizeof(float), ComputeBufferType.Append);

 

Note: The size of each element of the Buffer corresponds to Default, Append, Counter, and Structured, that is, the value of stride should be a multiple of 4 and less than 2048.

 

The above ComputeBuffer can correspond to the AppendStructuredBuffer in the Compute Shader, and then we can use the Append method in the Compute Shader to add elements to the Buffer, for example:

AppendStructuredBuffer<float> result;

[numthreads(640, 1, 1)]
void ViewPortCulling(uint3 id : SV_DispatchThreadID)
{
    if(meet some custom conditions)
        result.Append(value);
}

 

So how many elements are there in our Buffer? Counters can help us get this result.

 

In C#, we can first use the ComputeBuffer.SetCounterValue method to initialize the value of the counter, for example:

buffer.SetCounterValue(0);//The counter value is 0

 

With the AppendStructuredBuffer.Append method, the value of our counter is automatically ++. When the Compute Shader processing is complete, we can use the ComputeBuffer.CopyCount method to get the value of the counter, as follows:

public static void CopyCount(ComputeBuffer src, ComputeBuffer dst, int dstOffsetBytes);

 

The Buffer of Append, Consume or Counter will maintain a counter to store the number of elements in the Buffer. This method can copy the value of the counter in src to dst, and dstOffsetBytes is the offset in dst. On the DX11 platform, the type of dst must be Raw or IndirectArguments, while on other platforms it can be any type.

 

So the code to get the number of elements in the buffer is as follows:

uint[] countBufferData = new uint[1] { 0 };
var countBuffer = new ComputeBuffer(1, sizeof(uint), ComputeBufferType.IndirectArguments);
ComputeBuffer.CopyCount(buffer, countBuffer, 0);
countBuffer.GetData(countBufferData);
//The number of elements in the buffer is: countBufferData[0]

From the two most basic examples above, we can see that the data in the Compute Shader is passed from C#, that is to say, the data needs to be passed from the CPU to the GPU. And after the Compute Shader processing ends, it is sent back to the CPU from the GPU. This can be a bit of a delay, and the transfer rate between them is also a bottleneck.

 

But if we have a lot of computing needs, don’t hesitate to use Compute Shader, which can greatly improve performance.

UAVUnordered Access view

Unordered means unordered, Access means to access, and view represents “data in the required format”, which should be understood as the format required by the data.

 

What does that mean? Our Compute Shader is multi-threaded and parallel, so our data must be able to support out-of-order access. For example, if a texture can only be accessed by (0,0), (1,0), (2,0), …, a Buffer can only be accessed by [0], [1], [2], … so there are In order to access, it is obviously impossible to modify them with multi-threading, so a concept is proposed, namely UAV, a data format that can be accessed out of order.

 

We mentioned earlier that RWTexture and RWStructuredBuffer are all UAV data types, and they support writing. They can only be used (bound) in FragmentShader and ComputeShader.

 

If our RenderTexture does not set enableRandomWrite, or we pass a Texture to RWTexture, then the runtime will report an error:
the texture wasn’t created with the UAV usage flag set!

 

Data types that cannot be read and written, such as Texure2D, we call SRV (Shader Resource View).

Direct3D 12 Glossary – Win32 apps

Wrap / WaveFront

Earlier we said that using numthreads can define the number of threads in each thread group, then we use numthreads(1,1,1) to really have only one thread per thread group? NO!

 

This problem starts with the hardware. The mode of our GPU is SIMT (single-instruction multiple-thread, single-instruction multi-thread). In NVIDIA graphics cards, one SM (Streaming Multiprocessor) can schedule multiple wraps, and each wrap will have 32 threads. We can simply understand that instruction will schedule at least 32 parallel threads. In AMD’s graphics card, this number is 64, called Wavefront.

 

That is to say, if it is an NVIDIA graphics card, if we use numthreads(1,1,1), there will still be 32 threads in the thread group, but the extra 31 threads will be completely unused, causing waste. Therefore, when we use numthreads, it is best to define the number of thread groups as a multiple of 64, so that both graphics cards can be taken into account.

https://www.cvg.ethz.ch/teaching/2011spring/gpgpu/GPU-Optimization.pdf

Mobile Support Questions

We can call SystemInfo.supportsComputeShaders at runtime to determine whether the current model supports Compute Shader. OpenGL ES has only supported Compute Shader since version 3.1, and both Android platforms using Vulkan and IOS platforms using Metal support Compute Shader.

 

However, even if some Android phones support Compute Shader, the support for RWStructuredBuffer is not friendly. For example, on some OpenGL ES 3.1 mobile phones, only accessing StructuredBuffer in Fragment Shader is supported.

 

To support Compute Shader in ordinary Shader, the minimum Shader Model requirement is 4.5, namely:

#pragma target 4.5

Shader.PropertyToID

Variables defined in the Compute Shader can still obtain a unique id through Shader.PropertyToID(“name”). In this way, when we frequently use ComputeShader.SetBuffer to assign values to some of the same variables, we can cache these ids in advance to avoid causing GC.

int grassMatrixBufferId;
void Start() {
    grassMatrixBufferId = Shader.PropertyToID("grassMatrixBuffer");
}
void Update() {
    compute.SetBuffer(kernel, grassMatrixBufferId, grassMatrixBuffer);

    // dont use it
    //compute.SetBuffer(kernel, "grassMatrixBuffer", grassMatrixBuffer);
}

Global Variables or Constants?

If we want to implement a requirement to determine whether a vertex is in a fixed-size bounding box in the Compute Shader, then according to the previous C# writing method, we may define the size of the bounding box as follows:

#pragma kernel CSMain

float3 boxSize1 = float3(1.0f, 1.0f, 1.0f); // Method 1
const float3 boxSize2 = float3(2.0f, 2.0f, 2.0f); // Method 2
static float3 boxSize3 = float3(3.0f, 3.0f, 3.0f); // Method 3

[numthreads(8,8,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
    // make judgments
}

 

After testing, the definitions of method 1 and method 2, the values read in CSMain are float3(0.0f, 0.0f, 0.0f), and only method 3 is the value defined at the beginning.

Shader Variants and Keywords

ComputeShader also supports Shader variants, and its usage is basically similar to that of ordinary Shader variants. Examples are as follows:

#pragma kernel CSMain
#pragma multi_compile __ COLOR_WHITE COLOR_BLACK

RWTexture2D<float4> Result;

[numthreads(8,8,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
#if defined(COLOR_WHITE)
Result[id.xy] = float4(1.0, 1.0, 1.0, 1.0);
#elif defined(COLOR_BLACK)
Result[id.xy] = float4(0.0, 0.0, 0.0, 1.0);
#else
Result[id.xy] = float4(id.x & id.y, (id.x & 15) / 15.0, (id.y & 15) / 15.0, 0.0);
#endif
}

 

Then we can enable or disable a variant on the C# side:

  • Global variants declared by #pragma multi_compile can use Shader.EnableKeyword/Shader.DisableKeyword or ComputeShader.EnableKeyword/ComputeShader.DisableKeyword
  • Local variants of the #pragma multi_compile_local declaration can use ComputeShader.EnableKeyword/ComputeShader.DisableKeyword

 

An example is as follows:

public class DrawParticle : MonoBehaviour
{
    public ComputeShader computeShader;

    void Start() {
        ......
        computeShader.EnableKeyword("COLOR_WHITE");
    }
}

That’s all for today’s sharing. Of course, life is boundless but knowing is boundless. In the long development cycle, these problems you see maybe just the tip of the iceberg. We have already prepared more technical topics on the UWA Q&A website, waiting for you to explore and share them together. You are welcome to join us, who love progress. Maybe your method can solve the urgent needs of others, and the “stone” of other mountains can also attack your “jade”.

YOU MAY ALSO LIKE!!!

UWA Website: https://en.uwa4d.com

UWA Blogs: https://blog.en.uwa4d.com

UWA Product: https://en.uwa4d.com/feature/got 

Related Topics

Post a Reply

Your email address will not be published.