0/ 642/ /1

Basic Knowledge


By taking advantage of the properties of the Gaussian distribution function, the number of texture reads and arithmetic operations has been successfully reduced. You can take advantage of the GPU to further speed up. The previous algorithms assumed that the reading of a texture can only be used to obtain information on one pixel, but this is not always the case for GPUs. When Bilinear Sampling is enabled for image reading, The GPU can read multiple pixel information at once, and there is no additional burden on the GPU to use bilinear interpolation.


A brief introduction to bilinear interpolation sampling:

(Schematic diagram of bilinear interpolation)


As shown in the figure, firstly obtain two unilinear interpolations in the x direction to obtain two temporary points R1(x, y1) and R2(x, y2), and then calculate one unilinear interpolation in the y direction to obtain P(x,y).


This means that when reading texture pixels, you can choose not to read the texture at the center of the texel, and choose a suitable position to read, so that the information of two pixels can be obtained through GPU bilinear interpolation.


In order to maintain the same effect as discrete sampling, the read coordinates need to be adjusted so that the weight of the position is equal to the sum of the weights of the two texels, that is, the simplified Gaussian kernel weight and position should meet the following requirements:


For example, for a 5*5 Gaussian kernel, split it into two one-dimensional Gaussian kernels, where the Gaussian kernel in the lateral direction is as follows:





It can be replaced with:





If the rounding error introduced by hardware implementation is ignored, the result obtained by linear sampling will be the same as the result obtained by discrete sampling. In this way, a (2ω+1)*(2ω+1) Gaussian kernel will be simplified to a (ω+1)*(ω+1) Gaussian kernel. To blur a pixel, (2ω+2) ) times multiplication and 2ω addition, the time complexity is O(ω).


For a 1024*1024 size image, if you want to perform a blurring operation on the entire image, using a 33*33 size convolution kernel, you need to perform 1024*1024*17*2≈3.56kw texture readings. Compared with the previous discrete sampling, the number of texture readings has dropped a lot.

Unity Implementation

According to the above algorithm, we implement it in Unity:


In order to facilitate the operation, the sampling point we choose will be directly set as the middle point, so the result obtained will have a certain error with the previous result. The basic process is similar to the previous method, and the fuzzy algorithm is implemented. Note that at this time, the Gaussian kernel does not need to be read one by one, and the step size can be changed to 2:


float4 GaussianBlurLinearSampling(pixel_info pinfo, float sigma, float2 dir)


       float4 o = 0;

       float sum = 0;

       float2 uvOffset;

       float weight;

       for (int kernelStep = -KERNEL_SIZE / 2; kernelStep <= KERNEL_SIZE / 2; kernelStep += 2)


              uvOffset = pinfo.uv;

              uvOffset.x += ((kernelStep + 0.5f) * pinfo.texelSize.x) * dir.x;

              uvOffset.y += ((kernelStep + 0.5f) * pinfo.texelSize.y) * dir.y;

              weight = gauss(kernelStep, sigma) + gauss(kernelStep + 1, sigma);

              o += tex2D(pinfo.tex, uvOffset) * weight;

              sum += weight;


       o *= (1.0f / sum);

       return o;



The quality of the blur effect achieved by the Gaussian blur algorithm is good, but it is not satisfactory in terms of performance. The convolution algorithm requires too many image sampling times, and the linear sampling algorithm requires two passes to double the number of DrawCalls, all of which affect the actual application performance of the Gaussian blur algorithm. This motivates developers to optimize the blur algorithm, and improve the performance to meet the actual needs while achieving the quality of the Gaussian blur effect as much as possible. In the following chapters, we will learn some improved algorithms for image blurring.


Gaussian Kernel Calculator:

Screen Post Processing Effects Series

Screen Post Processing Effects Chapter 2: Two-Step One-Dimensional Operation Algorithm of Gaussian Blur and Its Implementation

Screen Post Processing Effects Chapter 1 – Basic Algorithm of Gaussian Blur and Its Implementation


That’s all for today’s sharing. Of course, life is boundless but knowing is boundless. In the long development cycle, these problems you see maybe just the tip of the iceberg. We have already prepared more technical topics on the UWA Q&A website, waiting for you to explore and share them together. You are welcome to join us, who love progress. Maybe your method can solve the urgent needs of others, and the “stone” of other mountains can also attack your “jade”.


UWA Website:

UWA Blogs:

UWA Product: 

Related Topics

Post a Reply

Your email address will not be published.