circle-loader
0
by
0/ 1136/ /2

Memory is a very important aspect of game performance optimization. Especially for mobile devices, when hardware devices are limited, users of such models need to be covered and compatible. Games are developed on PC or Mac but ultimately run (mostly) on mobile (only Android and iOS are considered), if the memory is not well controlled, it will easily be blocked by the mobile OS due to OOM. 

 

But in fact, under different operating systems, the memory management strategies are quite different. There are more professional memory analysis tools for each platform, but these tools will cause statistical deviations due to different platforms, different statistical strategies, and even different system versions. For example, XCode’s Memory Report page is different from its own statistics under Instrument.

 

 

There are also some wrapping losses, such as a Texture, most of its data memory will enter the Native, but it still needs to wrap some Classes for the logic layer to call, and this part will enter the Mono heap. Another example is, part of the Const data of the System Framework will enter Clean memory, while another part will enter Dirty, which may be further compressed into Swap memory by the system.


1. The Memory from the Perspective of Unity

Memory analysis from the perspective of Unity focuses more on the memory managed by Unity itself (in fact, it is not allocated by it and cannot be managed), but a Unity game actually runs on a platform (such as Android or iOS). ). Then in addition to the memory allocated by Unity itself, there is also a part from the shared library of the system. Furthermore, a complex Unity game often refers to many third-party plug-ins, and the native memory allocated by these plug-ins is beyond Unity’s consideration.

 

1.1 The Origin of Memory

Actually, Unity can be regarded as a game engine developed using C++, which uses the scripting virtual machine of .net. Unity allocates memory for native (C++) objects and virtual machines from virtual memory. Similarly, memory allocations for third-party plugins are also in virtual memory pools.

 

Native Memory is the part of virtual memory that is used to allocate memory pages for all required objects, including the Mono Heap.

 

As mentioned above, the Mono heap is a portion of native memory that is specially allocated for the needs of the virtual machine. It contains all types of managed memory allocated by C#, and the managed object of this memory is the Garbage Collector or GC for short.

 

There are several specialized allocators inside Unity, which are responsible for managing the short-term and long-term allocation needs of virtual memory. All Unity assets are stored in native memory. But these assets will be lightly packaged into Class for logical access and invocation. That is to say, if we create a Texture in C#, most of its raw data (RawData) exists in the Native memory, and a small Class object will enter the virtual machine, that is, the Mono heap.

 

1.2 Reserved/Used

Memory paging (mostly 16K/page on iOS) is the smallest unit of memory management. When Unity needs to apply for memory, it will apply in block mode (several pages). If a page is still Empty after several GCs (8 times for iOS), it will be freed and the actual physical memory will be returned to the physical memory.

 

But since Unity manages memory in virtual memory, the address of the virtual space will not be returned, so it becomes “Reserved”. That is to say, these Reserved address spaces can no longer be allocated repeatedly by Unity.

 

Therefore, on a 32-bit operating system, if the virtual memory address is frequently allocated and released, the address space will be exhausted and the system will kill it.

 

1.3 GC and Memory Fragmentation

The physical memory requested by the Mono heap is continuous, and it is very time-consuming for the Mono heap to apply extended memory to the operating system. So in most cases, the Mono heap will try to keep the physical memory that it has applied for to prevent it from being used later. So in addition to the virtual space address, the memory requested by the Mono heap also has the concept of Reserved.

 

Since the allocation unit of memory is page, that is to say, if a page only stores an int value, the page will still be represented as Used, and its physical memory will not be released. Of course, if a certain memory is larger than one page, multiple consecutive pages will be applied.

 

 

If at some point, the heap memory is GC, then the physical address of this part will be vacated.

 

 

The next time you need to apply for heap memory, the Mono heap will first check whether there is continuous space in the current heap to accommodate this memory application. If it is not enough, a GC will be performed, which is the GC operation we hate the most. Afterward, if such a block is still not found, the Mono heap will perform a memory expansion operation and ask the operating system for more memory.

 

 

And these vacated, but can not be reused memory will become memory fragmentation. They can neither be exploited nor destroyed.

 

 

For example, in the above picture, the relationship between Mono Reserved/Used:
Reserved size : 256KB + 256KB + 128KB = 640KB
Used : 88 562B

 

1.4 Profiler Simple View

When using Unity’s Profiler for memory analysis, in Simple mode, you can see a screenshot similar to the following:

 

Shown here is the virtual memory managed by Unity itself.

 

This is obvious, the first row shows Used memory, and the second row shows Reserved.

 

  • Total:
  • Unity: All the memory requested and managed by Unity minus <Profiler>, <FMOD>, and <Video>. That is, Mono Heap is included.
  • Mono: Managed heap memory.
  • GfxDriver: GPU memory cost is mainly composed of Texture, Vertex buffer, and index buffer.
    But Render Targets are not included. 
    (Does not include driver layers for other platforms)
  • FMOD:
    Memory requested by FMOD.
  • Video:
    Memory required for video file playback.
  • Profiler: Profiler’s own cost.

 

The Total Reserved here is also not the exact value of the game’s virtual memory, because:

  • It does not include the size of the game’s binary executable, loaded libraries, and frameworks.
  • The GfxDriver value does not include the render target and various buffers allocated by the driver layer.
  • The profiler only sees assignments made by Unity code, not third-party native plugins and OS assignments.

 

1.5 Profiler Detailed

The detailed sample view is as follows:

 

It shows the detailed allocation of virtual memory.

  • Assets — Total assets currently loaded from scenes, Resources, and Asset Bundles.
  • Built-in Resources — Unity Editor resources or Unity default resources.
  • Not Saved — GameObjects marked as DontSaved.
  • Scene Memory — GameObject and its attached Components.
  • Other — Others that are not in the above categories.

 

Most of the time, hot issues in memory can be found in Assets. For example, find out the leaks of textures and resources by the number of references (generally leaked resources do not have the number of references).

 

 

Here I want to pay attention to the Objects item in the Other directory.

 

 

Actually, this value is caused by some bug. This item represents various objects inherited from Object, including textures, Meshes, and so on. They are disconnected from the actual object at some point and can be ignored.

  • System.ExecutableAndDlls: This is Unity’s guess value.
    It tries to guess the memory consumed by the loaded binary by summing up the file size.
  • ShaderLab: These are assignments related to compiling shaders.
    Shaders themselves have their own object root and are listed under Shaders.

 

1.6 Limitations of Unity’s Perspective

Unity’s memory analysis is far more than the built-in Profiler. We also use:

  • MemoryProfiler

 

  • MemoryProfiler Extension

 

But they all have the same problem, that is, relying on the Profiler API provided by Unity itself. In other words, although the tools differ in how they present data and operate, they measure the same results.

 

That is to say, the tools from the Unity perspective only see the allocation completed by the Unity code, and cannot see the allocation of third-party native plug-ins and operating systems.

 

However, a complete Unity project will eventually run on the platform, so it will have a large error with the statistical results of the platform’s memory analysis tool. In addition, it is difficult to grasp the real memory distribution and cost of Unity projects.

 

In this article, we introduce the memory situation of Unity games on the iOS platform.


2. Memory from the Perspective of XCode

When Unity’s own tools cannot meet the panoramic statistics of memory distribution, we turn to XCode tools with the best debugging capabilities. Generally, we need to export the Unity project into an XCode project, and then use XCode and its Instrument for Profiler.

 

2.1 Memory Management from the Perspective of iOS

Let’s start with the origin. The memory from the iOS perspective is completely different from the Unity perspective. Whether conceptually, administratively, or categorically.

 

As an operating system, iOS will not be as detailed as Unity (in fact, it can’t do it), it is more concerned with memory management at the operating system level and various underlying memory operations on sandbox APPs records, but this too can only be reflected from the stack. In fact, it records the record of the APP and the operating system requesting memory.

 

The Unity project runs on the iOS platform as an APP, then it will only be recorded and analyzed by the iOS system as an ordinary APP. In fact, Unity still has natural disadvantages compared to native apps. Compared with iOS native apps and controls, Unity does not know the purpose of applying for memory. In other words, the operating system will not pay attention to how the memory applied by the APP is used at all.

 

This is like a parent (iOS) giving a child (Unity) pocket money. The parent will record that you asked me for $100 the day before yesterday and said that you want to buy the test paper (stack record). Yesterday, you asked for another $50 for class fees, and today you asked for another $100 to have dinner with classmates. When the child asks for too much and exceeds the parent’s tolerance limit, it will be terminated, that is there is no more pocket money for this month.

 

The child’s perspective is different. What he/she thinks is that, “I asked for $100 the day before yesterday, $20 for the Grammar course, $20 yuan for mathematics, 20 yuan for History, $8 for the round-trip and with $32 left, but if I need to pay for the physics or chemistry course in a few days, so I will keep the rest without returning them to the parents”. 

 

2.2 The Type of Memory used by iOS

Unlike Unity, which only pays attention to virtual memory, OS needs to pay attention to Physical Memory (RAM). Especially on mobile platforms, it is necessary to use limited memory to the extreme. Therefore, the memory strategy used by many PC platforms cannot be used on mobile platforms, such as swap space (iOS can only do Paging for Clean-type memory).

 

Next, list the memory types used by the iOS system:

Physical Memory: The physical chip memory on the iOS device. That is, what we often call machine memory. On mobile devices, the actual usage of physical memory is deducted from the usage of the operating system itself. The iOS Memory Crash Threshold article records the amount of physical memory that can be used by apps on various iOS devices.

Virtual Memory (VM): Virtual memory is also the virtual address space allocated by the OS to each APP, which is similar to Unity’s virtual space address. It is managed by the memory management unit MMU (memory management unit) and is mapped with the actual physical memory. As mentioned earlier, memory is allocated by page. Early iOS processors used 4K/page (and some generations used 64K). After A9, 16K/page was used uniformly. Virtual memory is generally composed of code segments, dynamic libraries, GPU drivers, Malloc heap, and some other parts.

GPU Driver Memory: The iOS system uses a so-called unified architecture, that is, the GPU and CPU share a certain part of memory, such as texture and mesh data, which are allocated by the driver. In addition, there are some GPU driver layers and GPU-exclusive memory types (Video memory).

 

 

Malloc Heap: The place where the APP actually applies for memory. Unity’s memory application behavior occurs here, and memory application is performed through malloc and calloc functions. Apple does not disclose the address of the maximum virtual heap that can be used. In theory, it is only limited by the size of the pointer (32-bit or 64-bit), but the actual experience is far lower than the theoretical value, and there is no rule. The practical experience is summarized as follows:

 

Like Unity’s own virtual addresses, it’s best for applications not to request and release memory frequently.

Resident Memory: Resident memory is the physical memory actually occupied by the game or App. Generally speaking, when an application requests memory from the system, the virtual memory grows directly. But if the allocated memory does not write data to it, it does not generate actual physical memory allocation. So virtual memory is >= resident memory.

Clean Memory: Clean memory is part of resident memory. But the type of this part of the memory is read-only. Common clean memory includes constant parts of System frameworks, application binaries, memory-mapped files, etc. Since it is read-only, it can be Page Out when the application runs out of memory.

 

 

Dirty Memory: The opposite of Clean is Dirty memory. This part refers to those that cannot be paged by the OS.

 

 

Swapped Compressed Memory: Swapped Compressed actually belongs to Dirty memory. When the application memory is insufficient, the OS will compress and store the less frequently used memory in the dirty memory. Unzip it again when need to use it. These algorithms and strategies are currently not published, but from experience, the OS will be more aggressive to do compression to reduce the total amount of dirty memory. Note that the compressed swap here is not the disk space swap on traditional operating systems.

 

 

The figure above shows the compression process. Both compression and decompression are CPU-intensive.

 

2.3 FootPrint

The footprint is a memory measurement and optimization indicator recommended by Apple. When the value of the Memory Footprint reaches the Limit line, a memory warning is triggered and further leads to OOM.

 

The footprint is mainly composed of Dirty and Compressed. Or that Resident is made up of Footprint and Clean. There is no uniform standard for Footprint, it will vary depending on the device, operating system, and even the current operating environment.

 

 

Byte Dance’s existing testing tool GamePref grabs Footprint memory when testing iOS. For detailed data, please refer to the GamePerf documentation.

 

2.4 Xcode Memory Gauge

This is the simplest interface for XCode debugging. You can see it by switching to the Debug tab.

 

 

Green indicates that memory is good, and yellow is a dangerous area. If it is not handled in time, it will be killed by the OS immediately.

 

The maximum memory of the meter refers to the physical memory of the device, but in fact, it does not formally state what memory the pointer memory refers to. But from the test results, it is always 10-15MB larger than the Dirty Memory + Swapped Memory measured by the VM Tracker tool.

 

But in fact, this value is not the only criterion for the OS to kill the APP. Generally, when killing an app, it will go through the following steps:

  • Try to remove the Clean memory page.
  • If an app takes up too much Dirty memory, the OS sends a memory warning to let it free up resources.
  • After several warnings, Dirty will be killed if memory usage is still high.

Since the iOS killing APP strategy is also opaque, the only way to prevent the APP from being terminated by the system is to reduce the Dirty memory as much as possible.

 

2.5 VM Tracker

VM Tracker is one of the XCode Instruments toolsets. It provides more detailed virtual memory distribution and is also a tool for providing Dirty Memory information. But unfortunately, it doesn’t show the purpose and timing of memory allocation.

 

A typical profiler snapshot of VM Tracker is as follows:

 

The headers are:

Type — Memory Type
Resident Size — Resident Memory Memory
Dirty Size — Dirty Memory Memory
Swapped Size — Compressed Swapped Memory Memory
Virtual Size — Virtual Memory Memory
Res. % — Proportion of Resident Memory and Virtual Memory

 

Next is the specific value of some actual types at various levels. I won’t introduce the types one by one, but pick a few key points to explain.

*All* — All allocations
*Dirty* — Dirty Memory
IOKit — graphics driver memory, such as render targets, textures, meshes, compiled shaders, etc.
VM_ALLOCATE — mainly Mono Heap. If this value is too large, you can use the Unity Profiler to view the specific allocation
MALLOC_* – mainly the memory allocated by Unity native or third-party plugins
__TEXT – read-only executable code segment and static data
__DATA – writable executable code/data
__LINKEDIT — actual metadata for various link libraries, such as symbols, strings, redistribution tables, etc.

 

By comparing the information of each group such as virtual memory and dirty memory, you can do some memory root cause analysis. For example, in the snapshot above, what conclusions can be drawn from the analysis?

 

2.5.1 Regions Map

Regions Map is another perspective of VM Tracker, which mainly provides the display of memory paging and the structure of virtual address space.

 

 

For example, as can be seen from the above figure, the memory of the Mono Heap block is not contiguous.

 

2.6 Allocations

The Allocations tool displays all allocations in the application’s address space, made by all threads (including Unity native code, garbage collector, IL2CPP virtual machine, third-party plugins, etc.) on all threads.

 

Compared with VM Tracker, the advantage of Allocations is that it can view the memory allocation in any period of time. The stack can also see which piece of code did the allocation.

 

But it’s still buggy, it can only look at allocations, not residency. From the perspective of CallTrees, we can see the stack.

 

 

For example, as you can see from the screenshot above, the code creates an instance of the Pooler class, which clones some Prefabs, resulting in the allocation of memory.

 

Under Summary, you can see a detailed list of virtual machine memory allocations. 

 

When a click is selected, a more detailed allocation can be seen. For example, the following memory allocation is caused by the parsing of JSON.

 

 

2.7 Memory Debugger

In addition to tool analysis, Xcode also provides the Memory Debugger function. It needs to open Malloc Stack in the project settings.

 

 

Then click the logo under the Debug tab to capture the memory frame for analysis.

 

 

Here you can view the allocation of each byte. But it’s too detailed.

 

2.8 vmmap

By exporting memory snapshots, we can also use command-line tools to perform a more detailed analysis of memory.

 

 

The exported memgraph file can show the actual allocation of physical memory through the command line of vmmap.

 

Of course, the vmmap tool has more command lines to support viewing more details and content. If you are interested, you can check the second half of the “iOS Memory Debugging Guide for Unity Developers” written by Mr. Jiadong.

 

 

If you want to analyze memory leaks, you can also use the leaks App.memgraph command. For example, the following circular reference:

 

2.9 Limitations of the XCode Perspective

The XCode Perspective also has its own limitations. The main focus or management of Unity is virtual memory. And XCode focuses more on physical memory from the OS’s point of view.

 

Unity is unable to count system libraries and third-party plug-ins, XCode can count, but it is difficult to distinguish. Because for the OS, these memories are all requested by the APP, and they are all Malloc Heap applications, so they are all classified into one category. If we really want to distinguish between libraries, then we need to manually collect all the function assignments, and then manually classify which ones are Unity, which are APlugins, which are Lua assignments, and so on.

 

Another is that XCode has a lot of analysis tools, but these tools still have different problems in the addition of their respective statistical dimensions. That said, even with XCode’s own tools, it doesn’t fully unify the standard. Finally, the statistics of XCode’s toolchain are too low-level and detailed. We can use it to quickly locate memory exceptions, but it is difficult to quickly classify all memory.


3. The Memory from the Perspective of Android

The iOS operating system is based on Unix, the Android operating system is based on Linux, and Linux is based on Unix, so the Android operating system is very similar to iOS in the kernel.

 

Therefore, Android’s memory management strategy is very similar to iOS. But the difference is that the iOS system is closed, and the hardware of each generation is known, even enumerable. The Android system, due to its open source, has a variety of hardware and is very difficult to control, but the open source system also makes it more possible.

 

3.1 Memory Management from the Perspective of Android

Although the memory strategies are similar, there are slight differences in terms and actual management processes. For example, Android divides memory into three types:

RAM: Also known as memory, but its size is usually limited. High-end devices usually have larger RAM capacities.
zRAM: is the RAM partition used for swap space. When memory is insufficient, the OS will compress a portion of the data in RAM and store it in zRAM. Device manufacturers can set an upper limit on the size of zRAM.
Storage: commonly referred to as storage. The usual APP, photos, and cached files are all here.

 

 

But unlike Footprint in iOS, Android’s memory is another name.
VSS – Virtual Set Size virtual memory consumption (including the memory occupied by the shared library)
RSS – Resident Set Size actually used physical memory (including the memory occupied by the shared library)
PSS – The physical memory actually used by the Proportional Set Size (proportional allocation shared library occupancy) memory)
USS – Physical memory occupied by the Unique Set Size process alone (excluding memory occupied by shared libraries)

 

Generally speaking, the size of memory usage is as follows: VSS >= RSS >= PSS >= USS.

 

At present, the indicators of Unity’s games on Android are using PSS by default. What does it mean?

 

For example, we have a memory page as follows:

 

One of them is a location-sharing service, used by both Google Play and a certain gaming app. This makes it difficult to define which APP uses more. If we add all the memory of the location-sharing service to the two applications, then the computing perspective is RSS. This is indeed a relatively accurate physical memory usage, but this way the location sharing service is calculated twice, three times for three applications. Obviously not right.

 

So, simply, everyone equally divides the shared service memory. Then this calculation perspective is PSS. Although it is not completely reasonable, it is the most balanced solution at present.

 

 

3.2 LMK Low Memory Killer

In iOS, the Footprint will be killed by the OS when it reaches the critical value, and the same in Android. However, compared to iOS, Android’s LMK process is more transparent.

 

LMK uses an “out of memory” distinguish called oom_adj_score to prioritize running processes and use this to decide which processes to kill. The process with the highest score is terminated first. Background applications are terminated first, and system processes are terminated last. The table below lists the LMK score categories from high to low. The category with the highest score, i.e. the item in the first row, will be terminated first.

 

 

Here’s a description of the various categories in the table above:
Background apps: Apps that have been running before and are not currently active. LMK will kill background apps first, starting with the app with the highest oom_adj_score.
Previous app: The most recently used background app. The previous app has a higher priority (lower score) than a background app because the user is more likely to switch to the previous app than a background app.
Home screen app: Launcher app. Terminating the app will make the wallpaper disappear.
Service: A service initiated by the app, which may include syncing or uploading to the cloud.
Perceivable applications: Non-foreground applications that the user can perceive in some way, such as running a search process that displays a small interface or listening to music.
Foreground app: The app that is currently in use. Terminating a foreground app looks like the app has crashed and may alert the user that something is wrong with the device.
Persistence (Services): These are the core services of the device, such as telephony and WLAN.

System: System processes. After these processes are killed, the phone may restart.
Native: A very low-level process used by the system (eg: kswapd).

 

Device manufacturers can change the behavior of the LMK.

 

3.3 Android Profiler

Due to the convenience of building Unity packages on the Android platform and the insufficiency of Android’s own Studio for memory and performance analysis tools, many times we choose to connect the Unity Profiler on Android for memory debugging.

But in fact, Android now has a lot of tools that can analyze performance. For example:
https://developer.android.com/studio/profile

Since most of the time, XCode is still used for analysis, this part is not currently practiced. In the future, I will find time to investigate the usage and skills of the tools in depth.

In addition, at the Google conference, Android developers recommended the use of the latest performance analysis tool Perfetto.
https://perfetto.dev/docs/quickstart/android-tracing

For now, Android’s tools, like XCode, have no way of distinguishing whether memory allocations in an application are done by Unity or by Plugin. So the advice they give is also to isolate the measurement.


4. More Options and Extension

 

4.1 Break-down One by One

In view of the above research results, two solutions have been considered.

From the perspective of Unity. Since Unity cannot count the consumption of third-party plug-ins, we use the “difference method” to break down each used third-party plug-in one by one.

For example, based on a mini project, we first measure the current memory index values of the mini project. Then connect to the third-party plug-in and use the use case to test the same indicator again. The resulting difference is approximately equal to the memory consumption of the plugin.

When we measure the indicators of all plug-ins in this way, plus Unity’s own Reserved memory, it can be regarded as the current memory distribution.

 

This method naturally has its drawbacks:

The difference value will be affected by the shared library of the current environment of different devices, which may cause the Plugin to be quite different on different machines.

The method is not a white box, and it is impossible to determine what objective conditions or internal logic will actually affect it.

Due to the dirty memory policy of mobile platforms, only the difference in virtual memory can actually be measured. For details, see the following test conclusions: The impact of different memory allocation methods on actual memory.

It is difficult to control variables. Therefore, it is necessary to write test cases with sufficient coverage for the plug-in.

 

4.2 Underlying Logic Analysis

Starting from the bottom layer of Malloc, write Hook functions to monitor memory applications, and finally summarize and analyze. For example, this article: “mallochook memory allocation callback (glibc-3-memory)”.

This is actually the same as the memory utility model for mobile platforms. It’s just that we can customize the display rules of the tool more in a custom way. Although the solution is feasible, it has the same problem as the mobile platform tool, how the stack should be classified. How do we determine which function belongs to the engine, which belongs to the third-party plug-in, and the system shared library or Framework?


That’s all for today’s sharing. Of course, life is boundless but knowing is boundless. In the long development cycle, these problems you see maybe just the tip of the iceberg. We have already prepared more technical topics on the UWA Q&A website, waiting for you to explore and share them together. You are welcome to join us, who love progress. Maybe your method can solve the urgent needs of others, and the “stone” of other mountains can also attack your “jade”.

YOU MAY ALSO LIKE!!!

UWA Website: https://en.uwa4d.com

UWA Blogs: https://blog.en.uwa4d.com

UWA Product: https://en.uwa4d.com/feature/got 

Related Topics

Post a Reply

Your email address will not be published.