Understand the CPU scheduler
The CPU scheduler manages how the CPU time is distributed among the processes running on Linux systems. The CPU scheduler uses An earliest eligible virtual deadline first (EEVDF) CPU scheduler for Linux, which is provided by the Linux kernel. The EEVDF CPU scheduler uses Per-entity load tracking [LWN.net] to monitor the task load. Utilization clamping (UCLAMP or util clamp) is a scheduler that helps manage performance requirements for tasks. For more information, see Customize CPU scheduler.Understand the CPU frequency governor
A CPU frequency governor adjusts the CPU frequency based on the task load. The CPU scheduler provides the necessary inputs for this process. Qualcomm Linux uses theschedutil governor, provided by the Linux kernel.
This governor increases the CPU frequency when the system is heavily loaded and reduces it when the load is low, ensuring an optimal balance between power consumption and performance.
For more information, see the following:
- CPU frequency and voltage scaling code in the Linux kernel
- Configure CPU
- Customize the CPU frequency governor
Understand DVFS governors
DVFS governors control the frequencies of CPU caches (L3), the last level cache controller (LLCC), and the DDR based on the system workload. These governors increase the frequency when the workload is high and decrease it when the workload is low, ensuring an optimal balance between power consumption and performance. Qualcomm Linux supports the following two types of DVFS governors for L3 cache:- LLCC
- DDR
Configure the static map DVFS governor
This governor aligns the frequencies of the CPU L3 cache and the DDR with the current CPU frequency to balance the power and the performance requirements. For example, if the CPU frequency is at its maximum, the L3 cache and DDR frequencies must also be at their maximum levels. The static mapping is available in the source code atarch/arm64/boot/dts/qcom/<target>.dtsi.
For customization options, see Customize static map DVFS governor.
Understand the BWMON governor
The bandwidth monitoring (BWMON) governor dynamically adjusts the frequencies of the LLCC and DDR based on the measured traffic flow from the CPU to the LLCC and then to the DDR. The BWMON hardware block measures this traffic. It monitors the data throughput between memory and the other subsystems within a specified sampling window and uses this information to scale the LLCC and DDR frequencies to meet the required bandwidth. The BWMON governor driver is available in the source code atdrivers/soc/qcom/icc-bwmon.c.
For more information, see the following:
Understand the Userspace Resource Manager
The Userspace Resource Manager (URM) is an open-source, lightweight, and extensible framework designed to intelligently manage and provision system resources from userspace. Modern workloads vary significantly across segments such as servers, compute, XR, mobile, and IoT, with each use case exhibiting distinct characteristics. Some workloads demand high CPU frequencies, others require sustained GPU throughput, while many depend on efficient caching or increased memory bandwidth. At the same time, these workloads run on a wide range of hardware platforms with varying capabilities, power envelopes, and user expectations. Consequently, a uniform tuning approach is insufficient to meet the diverse performance and power requirements of such environments. URM addresses these challenges by providing the following capabilities:- Enabling application-level tuning
- Enabling use case and workload-level tuning
- Providing signal and tuning APIs
Understand memory management
RAM is used for all memory allocations made by Qualcomm Linux. RAM must be managed to meet performance requirements and ensure smooth application behavior. The following figure shows memory partitioning:Figure : Memory partitioning
- System RAM is partitioned between non-Linux and Linux components.
- Non-Linux section includes a large block labeled Reserved, indicating memory allocated for non-Linux operations.
- Linux section is divided into four blocks under Memory total (system RAM):
- Kernel static
- Kernel dynamic
- User space process
- Free memory
- Implementation of virtual memory and demand paging
- Allocation of memory to both kernel internal structures and userspace programs
- Mapping of files into the address space of the processes
- Other memory management operations
Configure RAM memory partitioning
The following table describes various types of memory allocations.The commands specified in the following table should be run on the device.
| RAM classification | Memory segment | Allocation types | Description | |
|---|---|---|---|---|
| Non-Linux | None | None | Memory is reserved in the form of carveouts by various subsystems other than Linux. These carveouts are specified in the respective DTSI files. | |
| Linux (system RAM) | Kernel static | Vmlinux + kernel page structures | The kernel reserves this memory at boot time for its own usage. Vmlinux is the memory used to store the vmlinux image. The size and breakdown of the vmlinux image can be obtained from the dmesg logs at boot. The kernel page structure memory is calculated as 16 MB per GB of RAM size. | |
| Linux (system RAM) | Kernel dynamic | Slab | The slab is used by the kernel for faster and more efficient memory usage of frequently used data structures. Check slab usage: ```text cat /proc/meminfo | grep -i slab <br />**Detailed slab information:** enableCONFIG_SLUB_DEBUG and run:<br />text cat /proc/slabinfo ``` |
| Linux (system RAM) | Kernel dynamic | Kernel stack | The kernel stack stores the call stack of every process. Check kernel stack usage: ```text cat /proc/meminfo | grep -i kernelstack ``` |
| Linux (system RAM) | Kernel dynamic | PageTables | The kernel uses memory to store PageTables that map virtual addresses to physical addresses. Check PageTables usage: ```text cat /proc/meminfo | grep -i PageTables ``` |
| Linux (system RAM) | Kernel dynamic | Modules | Represents kernel entities dynamically loaded as kernel modules. List loaded kernel modules: text cat /proc/modules | |
| Linux (system RAM) | Kernel dynamic | Vmalloc | Used to allocate contiguous memory. Check Vmalloc breakup: text cat /proc/vmallocinfo | |
| Linux (system RAM) | Kernel dynamic | Cached (kernel + userspace) | The amount of file-backed memory that resides in RAM. Check cached memory usage: ```text cat /proc/meminfo | grep -i cached ``` |
| Linux (system RAM) | Kernel dynamic | Buffers | Fixed-size buffers that contain blocks of information read from or written to disk. Check buffer memory usage: ```text cat /proc/meminfo | grep -i buffers ``` |
| Linux (system RAM) | Kernel dynamic | Shmem | Shared memory mapped into the address spaces of two or more processes. Check shmem usage: ```text cat /proc/meminfo | grep -i shmem ``` |
| Linux (system RAM) | User space | ZUSED (ZRAM) | Anonymous memory post compression by ZRAM. | |
| Linux (system RAM) | User space | CMA | Contiguous physical memory typically mapped to hardware IPs such as video and display but allocated at runtime. Only movable allocations such as userspace process allocations can use CMA free memory; kernel allocations cannot. | |
| Linux (system RAM) | User space | ANON | Memory allocated by userspace applications using malloc() or new().Check per-process ANON usage: text cat /proc/<pid>/smaps | |
| Linux (system RAM) | User space | ION | Enables buffer sharing between hardware IPs such as video, camera, and Qualcomm Linux. ION manages memory pools reserved at boot time. Mount debugfs: text mount -t debugfs none /sys/kernel/debug Check ION buffer usage: ```text cat /sys/kernel/debug/dma_buf/bufinfo | grep bytes ``` |
| Linux (system RAM) | User space | KGSL | Memory allocated by the graphics driver. Overall KGSL usage: text cat /sys/class/kgsl/kgsl/page_alloc Process-level breakup: text cat /sys/class/kgsl/kgsl/proc/<pid>/kernel | |
| Free memory | None | None | Free memory available for any allocation. Check available memory: ```text cat /proc/meminfo | grep -i MemFree ``` |
Understand the real-time (RT) kernel
A real-time system is a deterministic system, where response to an event is expected in a set time. A system is classified as compatible with RT if:- It’s devoid of unbounded latency.
- The maximum response time is calculated with precision.
- It meets the set criteria for scheduling of tasks (latency and deadline).
The real-time support is for kernelspace process and not for userspace.
This section isn’t applicable for QCS5430.
Figure : Build sequence
Set up the workspace
The Qualcomm Linux kernel supports the LTS RT kernel (6.18.x), which is maintained through the Yocto recipe in themeta-qcom layer in the recipes-kernel/linux/linux-qcom-rt_6.18.bb file.
For more information about cloning the workspace and getting all Qualcomm Linux meta layers to use Qualcomm RT Linux, see Sync.
Enable the RT kernel
The Qualcomm Linuxmeta-qcom layer supports linux-qcom-rt_6.18.bb recipe that fetches and builds the Qualcomm Linux kernel for the supported machines by default.
The meta-qcom layer applies the changes on top of the existing layer. During the kernel build, meta-qcom layer enables PREEMPT_RT using rt.config based on the kernel version, and allows real-time configurations.
- Use
linux-qcom-rt_6.18.bbfor QLI.2.0. - Use
linux-qcom-next-rt_git.bbforqcom-next.
Customize the RT kernel
- If you are carrying any changes for the RT kernel, maintain them in the recipe as shown below:
- Maintain the patch file under
recipes-kernel/linux/linux-qcom-6.18/<your_patch_file>.patch - Append the patch file to
SRC_URIin therecipes-kernel/linux/linux-qcom-rt_6.18.bbfile.
- Maintain the patch file under
- To apply any external configurations on the RT kernel:
- Maintain the configuration file in the
recipes-kernel/linux/linux-qcom-6.18/configs/<your_config>.cfgrecipe. - Append the configuration file to
SRC_URIin therecipes-kernel/linux/linux-qcom-rt_6.18.bbfile.
- Maintain the configuration file in the
- To modify the kernel command-line, add the command-line parameters into the
meta-qcom/ci/base.ymlfile toKERNEL_CMDLINE_EXTRAvariable.
Configure kernel settings for the RT kernel
Optional and mandatory kernel configurations are used in the RT kernel. To enable full preemption in the RT kernel, useCONFIG_PREEMPT_RT.
The CONFIG_PREEMPT_RT flag is enabled by default as part of the rt.config used in linux-qcom-rt_6.18.bb recipe.
The following example shows the kernel configuration:
CONFIG_NO_HZ_COMMON- When enabled, it configures the kernel infrastructure for tickless operation.CONFIG_NO_HZ_FULL- When enabled, it configures the kernel to avoid sending scheduling-clock interrupts to CPUs with a single runnable task.CONFIG_CPUSETS- Use theCONFIG_CPUSETSconfiguration option to enablecpuset, where the CPU is grouped to form a set.
Figure: RT kernel verification
Build the RT kernel
To build the RT kernel, run the following commands:- To ensure that you are in the KAS shell, run the following command:
The
qcom-distro-kvm.yml configuration is supported with KVM only for QLI 2.0. The RT kernel (PREEMPT_RT) is hypervisor-agnostic. KVM is selected as the default hypervisor for QLI 2.0 due to the following reasons:- Upstream community support — KVM and its drivers are part of the mainline Linux kernel, enabling broader ecosystem contributions and long-term support.
- Standard VirtIO support — KVM supports all upstream VirtIO devices and is compatible with standard VMMs such as QEMU.
- Open development — Active upstream community with broader developer familiarity.
xbl_config_kvm.elf in the generated qcomflash package. No manual UEFI BDS configuration is required.
Copy the kas lock file from meta-qcom-releases to meta-qcom, see Build a BSP image.
- To compile the BitBake Qualcomm Linux multimedia image, run the following command:
Tune the RT kernel
Tune the RT kernel to achieve a deterministic latency for RT tasks in the device. Set the CPU cores that run RT tasks to run at maximum operating frequency while preventing thermal mitigation of CPU frequency. For example, in an idle sleep scenario, RT tasks face scheduling latency due to CPU wake time delay.These optimizations are automatically applied by the URM. when RT tests(cyclictest) is launched.
Manual configuration is not required under normal operation. The following steps are provided for reference, debugging, or custom setups.To configure the system before running the tests, do the following for QCS6490, Qualcomm Dragonwing™ IQ-9075, and Qualcomm Dragonwing™ IQ-615 Development Kit:
- Disable the timer migration:
- Affine all kernel work queues in
/sys/devices/virtual/workqueue/*to the housekeeping CPUs: - Set CPU frequency governor to performance:
- Disable RT accounting/throttling:
- Set IRQ affinity to housekeeping CPUs:
- Disable the timer migration:
- Affine all kernel work queues in
/sys/devices/virtual/workqueue/*to the housekeeping CPUs: - Set CPU frequency governor to performance:
- Disable RT accounting/throttling:
- Set IRQ affinity to housekeeping CPUs:
meta-qcom/conf/machine/<machine-name.conf>:- CPU cores 7
- IRQ affine to core 0-6
- RCU no call back 7
Test the RT kernel
A suite of tests is available in the Linux foundation RT test suite. The RT Linux kernel test obtains the following information:- Real-time performance of the RT Linux kernel
- RT Linux kernel latencies and key performance indicators (KPIs)
Don’t reboot the system during the RT Linux kernel test as it runs for over 24 hours.
- Cyclictest with no-load: System load isn’t added to perform this test.
- Cyclictest with stress-ng (next-generation): Specific percentage of load is added to perform this test to measure the worst case system latencies.
- Run the following cyclic test for QCS6490, Qualcomm Dragonwing™ IQ-9075, and Qualcomm Dragonwing™ IQ-615 development kits:
- Run the following cyclic test for the Dragonwing™ IQ-8275 Development Kit:
- Note the latencies.
- Set the target CPU load. Define the desired CPU load percentage. If not specified, the default load is 60%:
- Run stress-ng on selected CPU cores in the background. Apply CPU load on cores 0, 1, 2, 4, 5, 6, and 7 using stress-ng. Each instance runs for one day and is pinned to a specific CPU core:
- Run the cyclic latency test. Execute cyclictest with high priority to measure scheduling latency. The test runs for approximately 27.78 hours:
- Note the worst-case latencies.
- Set the target CPU load:
- Run stress-ng on selected CPU cores:
- Run the cyclic latency test:
- Note the worst-case latencies.
RT test results
| Device | Distro and image | Cyclic test use cases (Duration: 24 hours) | Minimum latency (µs) | Avg latency (µs) | Max latency (µs) |
|---|---|---|---|---|---|
| QCS6490 RT core: 7 Boot Flow - KVM | performance_linux-qcom-rt-6.18_qcom-distro-kvm and qcom-multimedia-image | No load | 1 | 1 | 6 |
| Stress-NG | 1 | 1 | 6 | ||
| Dragonwing IQ-615 RT core: 7 Boot Flow - KVM | performance_linux-qcom-rt-6.18_qcom-distro-kvm and qcom-multimedia-image | No load | 1 | 2 | 7 |
| Stress-NG | 1 | 2 | 9 |

