Features impacting performance - Qualcomm Dragonwing Documentation

The Qualcomm^® Linux^® kernel includes features such as the CPU scheduler, CPU frequency governor, dynamic voltage and frequency scaling (DVFS), and memory management. This guide provides an overview of each feature and related reference links. Additionally, Qualcomm uses a feature called Userspace Resource Manager (URM) to enhance the performance of Qualcomm Linux.

Understand the CPU scheduler

The CPU scheduler manages how the CPU time is distributed among the processes running on Linux systems. The CPU scheduler uses An earliest eligible virtual deadline first (EEVDF) CPU scheduler for Linux, which is provided by the Linux kernel. The EEVDF CPU scheduler uses Per-entity load tracking [LWN.net ] to monitor the task load. Utilization clamping (UCLAMP or util clamp) is a scheduler that helps manage performance requirements for tasks. For more information, see Customize CPU scheduler.

Understand the CPU frequency governor

A CPU frequency governor adjusts the CPU frequency based on the task load. The CPU scheduler provides the necessary inputs for this process. Qualcomm Linux uses the schedutil governor, provided by the Linux kernel. This governor increases the CPU frequency when the system is heavily loaded and reduces it when the load is low, ensuring an optimal balance between power consumption and performance. For more information, see the following:

Understand DVFS governors

DVFS governors control the frequencies of CPU caches (L3), the last level cache controller (LLCC), and the DDR based on the system workload. These governors increase the frequency when the workload is high and decrease it when the workload is low, ensuring an optimal balance between power consumption and performance. Qualcomm Linux supports the following two types of DVFS governors for L3 cache:

LLCC
DDR

Configure the static map DVFS governor

This governor aligns the frequencies of the CPU L3 cache and the DDR with the current CPU frequency to balance the power and the performance requirements. For example, if the CPU frequency is at its maximum, the L3 cache and DDR frequencies must also be at their maximum levels. The static mapping is available in the source code at arch/arm64/boot/dts/qcom/<target>.dtsi. For customization options, see Customize static map DVFS governor.

Understand the BWMON governor

The bandwidth monitoring (BWMON) governor dynamically adjusts the frequencies of the LLCC and DDR based on the measured traffic flow from the CPU to the LLCC and then to the DDR. The BWMON hardware block measures this traffic. It monitors the data throughput between memory and the other subsystems within a specified sampling window and uses this information to scale the LLCC and DDR frequencies to meet the required bandwidth. The BWMON governor driver is available in the source code at drivers/soc/qcom/icc-bwmon.c. For more information, see the following:

Understand the Userspace Resource Manager

The Userspace Resource Manager (URM) is an open-source, lightweight, and extensible framework designed to intelligently manage and provision system resources from userspace. Modern workloads vary significantly across segments such as servers, compute, XR, mobile, and IoT, with each use case exhibiting distinct characteristics. Some workloads demand high CPU frequencies, others require sustained GPU throughput, while many depend on efficient caching or increased memory bandwidth. At the same time, these workloads run on a wide range of hardware platforms with varying capabilities, power envelopes, and user expectations. Consequently, a uniform tuning approach is insufficient to meet the diverse performance and power requirements of such environments. URM addresses these challenges by providing the following capabilities:

Enabling application-level tuning
Enabling use case and workload-level tuning
Providing signal and tuning APIs

URM automatically detects use cases and applies tuning parameters specified in per-application or use case YAML configuration files. Use case detection can be customized through extensions. URM can also modify system behavior to efficiently manage intermittent workloads. The Signal API or Tune API can be invoked within specific code segments to temporarily boost or limit system resources. For example, a critical code path can be executed at a higher CPU frequency for a defined duration. URM efficiently handles concurrent requests from multiple clients. When multiple requests target the same resource, URM aggregates them to determine and apply the optimal performance level required by the device. For more information, see Userspace Resource Manager and Userspace Resource Manager Extensions.

Understand memory management

RAM is used for all memory allocations made by Qualcomm Linux. RAM must be managed to meet performance requirements and ensure smooth application behavior. The following figure shows memory partitioning:

Figure : Memory partitioning

The figure shows RAM allocation in systems supporting both Linux and non-Linux environments.

System RAM is partitioned between non-Linux and Linux components.
Non-Linux section includes a large block labeled Reserved, indicating memory allocated for non-Linux operations.
Linux section is divided into four blocks under Memory total (system RAM):
- Kernel static
- Kernel dynamic
- User space process
- Free memory

Certain sections of RAM are managed independent of the Linux system. For example, firmware such as modem, video, and audio run from these specific RAM partitions. The Linux kernel manages all other RAM partitions. The Linux kernel features its own memory management subsystem, which includes:

Implementation of virtual memory and demand paging
Allocation of memory to both kernel internal structures and userspace programs
Mapping of files into the address space of the processes
Other memory management operations

Configure RAM memory partitioning

The following table describes various types of memory allocations.

The commands specified in the following table should be run on the device.

Table : RAM classification

RAM classification	Memory segment	Allocation types	Description
Non-Linux	None	None	Memory is reserved in the form of carveouts by various subsystems other than Linux. These carveouts are specified in the respective DTSI files.
Linux (system RAM)	Kernel static	Vmlinux + kernel page structures	The kernel reserves this memory at boot time for its own usage. Vmlinux is the memory used to store the vmlinux image. The size and breakdown of the vmlinux image can be obtained from the `dmesg` logs at boot. The kernel page structure memory is calculated as 16 MB per GB of RAM size.
Linux (system RAM)	Kernel dynamic	Slab	The slab is used by the kernel for faster and more efficient memory usage of frequently used data structures. Check slab usage: ```text cat /proc/meminfo	grep -i slab `<br />Detailed slab information: enable`CONFIG_SLUB_DEBUG `and run:<br />`text cat /proc/slabinfo ```
Linux (system RAM)	Kernel dynamic	Kernel stack	The kernel stack stores the call stack of every process. Check kernel stack usage: ```text cat /proc/meminfo	grep -i kernelstack ```
Linux (system RAM)	Kernel dynamic	PageTables	The kernel uses memory to store PageTables that map virtual addresses to physical addresses. Check PageTables usage: ```text cat /proc/meminfo	grep -i PageTables ```
Linux (system RAM)	Kernel dynamic	Modules	Represents kernel entities dynamically loaded as kernel modules. List loaded kernel modules: `text cat /proc/modules`
Linux (system RAM)	Kernel dynamic	Vmalloc	Used to allocate contiguous memory. Check Vmalloc breakup: `text cat /proc/vmallocinfo`
Linux (system RAM)	Kernel dynamic	Cached (kernel + userspace)	The amount of file-backed memory that resides in RAM. Check cached memory usage: ```text cat /proc/meminfo	grep -i cached ```
Linux (system RAM)	Kernel dynamic	Buffers	Fixed-size buffers that contain blocks of information read from or written to disk. Check buffer memory usage: ```text cat /proc/meminfo	grep -i buffers ```
Linux (system RAM)	Kernel dynamic	Shmem	Shared memory mapped into the address spaces of two or more processes. Check shmem usage: ```text cat /proc/meminfo	grep -i shmem ```
Linux (system RAM)	User space	ZUSED (ZRAM)	Anonymous memory post compression by ZRAM.
Linux (system RAM)	User space	CMA	Contiguous physical memory typically mapped to hardware IPs such as video and display but allocated at runtime. Only movable allocations such as userspace process allocations can use CMA free memory; kernel allocations cannot.
Linux (system RAM)	User space	ANON	Memory allocated by userspace applications using `malloc()` or `new()`. Check per-process ANON usage: `text cat /proc/<pid>/smaps`
Linux (system RAM)	User space	ION	Enables buffer sharing between hardware IPs such as video, camera, and Qualcomm Linux. ION manages memory pools reserved at boot time. Mount debugfs: `text mount -t debugfs none /sys/kernel/debug` Check ION buffer usage: ```text cat /sys/kernel/debug/dma_buf/bufinfo	grep bytes ```
Linux (system RAM)	User space	KGSL	Memory allocated by the graphics driver. Overall KGSL usage: `text cat /sys/class/kgsl/kgsl/page_alloc` Process-level breakup: `text cat /sys/class/kgsl/kgsl/proc/<pid>/kernel`
Free memory	None	None	Free memory available for any allocation. Check available memory: ```text cat /proc/meminfo	grep -i MemFree ```

Understand the real-time (RT) kernel

A real-time system is a deterministic system, where response to an event is expected in a set time. A system is classified as compatible with RT if:

It’s devoid of unbounded latency.
The maximum response time is calculated with precision.
It meets the set criteria for scheduling of tasks (latency and deadline).

Linux can be configured as a real-time operating system (RTOS) in which real-time tasks have well-defined periodic execution cycles (cycle time) and meet execution criteria within specified limits (jitter). To install the patches, see Versions of PREEMPT_RT patches.

The real-time support is for kernelspace process and not for userspace.

This section isn’t applicable for QCS5430.

Figure : Build sequence

Set up the workspace

The Qualcomm Linux kernel supports the LTS RT kernel (6.18.x), which is maintained through the Yocto recipe in the meta-qcom layer in the recipes-kernel/linux/linux-qcom-rt_6.18.bb file. For more information about cloning the workspace and getting all Qualcomm Linux meta layers to use Qualcomm RT Linux, see Sync.

Enable the RT kernel

The Qualcomm Linux meta-qcom layer supports linux-qcom-rt_6.18.bb recipe that fetches and builds the Qualcomm Linux kernel for the supported machines by default. The meta-qcom layer applies the changes on top of the existing layer. During the kernel build, meta-qcom layer enables PREEMPT_RT using rt.config based on the kernel version, and allows real-time configurations.

Use linux-qcom-rt_6.18.bb for QLI.2.0.
Use linux-qcom-next-rt_git.bb for qcom-next.

For more information about supported machines, see Selecting MACHINE, DISTRO, and image.

Customize the RT kernel

If you are carrying any changes for the RT kernel, maintain them in the recipe as shown below:
- Maintain the patch file under recipes-kernel/linux/linux-qcom-6.18/<your_patch_file>.patch
- Append the patch file to SRC_URI in the recipes-kernel/linux/linux-qcom-rt_6.18.bb file.

SRC_URI += " \
    file://<your_patch_file>.patch \
    "

To apply any external configurations on the RT kernel:
- Maintain the configuration file in the recipes-kernel/linux/linux-qcom-6.18/configs/<your_config>.cfg recipe.
- Append the configuration file to SRC_URI in the recipes-kernel/linux/linux-qcom-rt_6.18.bb file.

SRC_URI += " \
    file://configs/qcom_rt.cfg \
"

To modify the kernel command-line, add the command-line parameters into the meta-qcom/ci/base.yml file to KERNEL_CMDLINE_EXTRA variable.

KERNEL_CMDLINE_EXTRA:append = " qcom_scm.download_mode=1 <new_parameter>"

Configure kernel settings for the RT kernel

Optional and mandatory kernel configurations are used in the RT kernel. To enable full preemption in the RT kernel, use CONFIG_PREEMPT_RT. The CONFIG_PREEMPT_RT flag is enabled by default as part of the rt.config used in linux-qcom-rt_6.18.bb recipe. The following example shows the kernel configuration:

zcat proc/config.gz | grep CONFIG_PREEMPT
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_RT=y

Set the following kernel configuration options when affining the RT task:

CONFIG_NO_HZ_COMMON - When enabled, it configures the kernel infrastructure for tickless operation.
CONFIG_NO_HZ_FULL - When enabled, it configures the kernel to avoid sending scheduling-clock interrupts to CPUs with a single runnable task.
CONFIG_CPUSETS - Use the CONFIG_CPUSETS configuration option to enable cpuset, where the CPU is grouped to form a set.

Figure: RT kernel verification

Build the RT kernel

To build the RT kernel, run the following commands:

To ensure that you are in the KAS shell, run the following command:

kas shell meta-qcom/ci/qcs6490-rb3gen2-core-kit.yml:meta-qcom/ci/linux-qcom-rt-6.18.yml:meta-qcom/ci/qcom-distro-kvm.yml:meta-qcom/ci/lock.yml

The qcom-distro-kvm.yml configuration is supported with KVM only for QLI 2.0. The RT kernel (PREEMPT_RT) is hypervisor-agnostic. KVM is selected as the default hypervisor for QLI 2.0 due to the following reasons:

Upstream community support — KVM and its drivers are part of the mainline Linux kernel, enabling broader ecosystem contributions and long-term support.
Standard VirtIO support — KVM supports all upstream VirtIO devices and is compatible with standard VMMs such as QEMU.
Open development — Active upstream community with broader developer familiarity.

The kas build command automatically includes xbl_config_kvm.elf in the generated qcomflash package. No manual UEFI BDS configuration is required.

Copy the kas lock file from meta-qcom-releases to meta-qcom, see Build a BSP image.

To compile the BitBake Qualcomm Linux multimedia image, run the following command:

bitbake qcom-multimedia-image

Tune the RT kernel

Tune the RT kernel to achieve a deterministic latency for RT tasks in the device. Set the CPU cores that run RT tasks to run at maximum operating frequency while preventing thermal mitigation of CPU frequency. For example, in an idle sleep scenario, RT tasks face scheduling latency due to CPU wake time delay.

These optimizations are automatically applied by the URM. when RT tests(cyclictest) is launched.

Manual configuration is not required under normal operation. The following steps are provided for reference, debugging, or custom setups.

To configure the system before running the tests, do the following for QCS6490, Qualcomm Dragonwing^™ IQ-9075, and Qualcomm Dragonwing^™ IQ-615 Development Kit:

Disable the timer migration:

echo 0 > /proc/sys/kernel/timer_migration

Affine all kernel work queues in /sys/devices/virtual/workqueue/* to the housekeeping CPUs:

for wq in /sys/devices/virtual/workqueue/*; do
   [ -w "$wq/cpumask" ] && echo 7F > "$wq/cpumask"
done
# 7F = CPUs 0-6 (binary 0111 1111)

Set CPU frequency governor to performance:

for policy in /sys/devices/system/cpu/cpufreq/policy*; do
   [ -w "$policy/scaling_governor" ] && echo performance > "$policy/scaling_governor"
done

Disable RT accounting/throttling:

echo -1 > /proc/sys/kernel/sched_rt_runtime_us

Set IRQ affinity to housekeeping CPUs:

ALLOW_CPUS="0,1,2,3,4,5,6"
cpu_list_to_mask() {
   MASK=0
   for cpu in $(echo $1 | tr ',' ' '); do
      MASK=$((MASK | (1 << cpu)))
   done
   printf "%x\n" "$MASK"
}
MASK=$(cpu_list_to_mask "$ALLOW_CPUS")
echo "Setting IRQ affinity to CPUs: $ALLOW_CPUS (mask=0x$MASK)"
for irq in /proc/irq/[0-9]*; do
   smp_file="$irq/smp_affinity"
   [ -w "$smp_file" ] && echo "$MASK" > "$smp_file" 2>/dev/null
done

To configure the system before running the tests, do the following for the Dragonwing^™ IQ-8275 Development Kit:

Disable the timer migration:

echo 0 > /proc/sys/kernel/timer_migration

Affine all kernel work queues in /sys/devices/virtual/workqueue/* to the housekeeping CPUs:

for wq in /sys/devices/virtual/workqueue/*; do
   [ -w "$wq/cpumask" ] && echo F7 > "$wq/cpumask"
done
# F7 = CPUs 0–2,4-7 (binary 1111 0111)

Set CPU frequency governor to performance:

for policy in /sys/devices/system/cpu/cpufreq/policy*; do
   [ -w "$policy/scaling_governor" ] && echo performance > "$policy/scaling_governor"
done

Disable RT accounting/throttling:

echo -1 > /proc/sys/kernel/sched_rt_runtime_us

Set IRQ affinity to housekeeping CPUs:

ALLOW_CPUS="0,1,2,4,5,6,7"
cpu_list_to_mask() {
   MASK=0
   for cpu in $(echo $1 | tr ',' ' '); do
      MASK=$((MASK | (1 << cpu)))
   done
   printf "%x\n" "$MASK"
}
MASK=$(cpu_list_to_mask "$ALLOW_CPUS")
echo "Setting IRQ affinity to CPUs: $ALLOW_CPUS (mask=0x$MASK)"
for irq in /proc/irq/[0-9]*; do
   smp_file="$irq/smp_affinity"
   [ -w "$smp_file" ] && echo "$MASK" > "$smp_file" 2>/dev/null
done

The following example shows how to add a kernel command-line parameter to disable RCU callbacks (rcu_nocbs) in meta-qcom/conf/machine/<machine-name.conf>:

CPU cores 7
IRQ affine to core 0-6
RCU no call back 7

QCOM_RT_CPU        = "7"
QCOM_IRQAFF        = "0-6"
QCOM_RCU_NOCBS     = "7"
QCOM_RCU_EXPEDITED = "1"
QCOM_CPUIDLE_OFF   = "1"

Test the RT kernel

A suite of tests is available in the Linux foundation RT test suite. The RT Linux kernel test obtains the following information:

Real-time performance of the RT Linux kernel
RT Linux kernel latencies and key performance indicators (KPIs)

Don’t reboot the system during the RT Linux kernel test as it runs for over 24 hours.

Cyclictest tool is used for benchmarking the RT Linux kernel systems. It’s used to evaluate the relative performance of the real-time systems. The Qualcomm Linux build has the cyclictest tool. For more information, see Cyclictest. This guide describes the following cyclictests:

Cyclictest with no-load: System load isn’t added to perform this test.
Cyclictest with stress-ng (next-generation): Specific percentage of load is added to perform this test to measure the worst case system latencies.

Run the following cyclic test for QCS6490, Qualcomm Dragonwing^™ IQ-9075, and Qualcomm Dragonwing^™ IQ-615 development kits:

cyclictest -a 7 -t 1 -m -l 100000000 -i 1000 -p 99 -h 100
# -a 7 → pin threads to CPU 7 (RT cores)
# -t 1 → 1 threads
# -m → lock memory (avoid page faults)
# -l 100000000 → long run
# -i 1000 → 1 ms interval
# -p 99 → RT priority
# -h 100 → histogram up to 100 µs

Run the following cyclic test for the Dragonwing^™ IQ-8275 Development Kit:
```
cyclictest -a 3 -t 1 -m -l 100000000 -i 1000 -p 99 -h 100
```

To access the RT test suite source code, see rt-tests/rt-tests.git. For more information, see RT-Tests.

Note the latencies.

To run cyclic test with stress-ng on QCS6490, Qualcomm Dragonwing IQ-9075, and Qualcomm Dragonwing^™ IQ-615, do the following:

Set the target CPU load. Define the desired CPU load percentage. If not specified, the default load is 60%:
```
LOAD=60
```
Run stress-ng on selected CPU cores in the background. Apply CPU load on cores 0, 1, 2, 4, 5, 6, and 7 using stress-ng. Each instance runs for one day and is pinned to a specific CPU core:
```
for cpu in 0 1 2 3 4 5 6; do
    taskset -c $cpu stress-ng --cpu 1 --cpu-load "$LOAD" \
    --temp-path . -t 1d &
done
```
Run the cyclic latency test. Execute cyclictest with high priority to measure scheduling latency. The test runs for approximately 27.78 hours:
```
cyclictest -a 7 -t 1 -m -l 100000000 -i 1000 -p 99 -h 100 --mainaffinity 6
```
Note the worst-case latencies.

To run cyclic test with stress-ng on Dragonwing IQ-8275, do the following:

Set the target CPU load:
```
LOAD=60
```

Run stress-ng on selected CPU cores:

for cpu in 0 1 2 4 5 6 7; do
     taskset -c $cpu stress-ng --cpu 1 --cpu-load "$LOAD" \
     --temp-path . -t 1d &
done

Run the cyclic latency test:

cyclictest -a 3 -t 1 -m -l 100000000 -i 1000 -p 99 -h 100 --mainaffinity 2

Note the worst-case latencies.

RT test results

Device	Distro and image	Cyclic test use cases (Duration: 24 hours)	Minimum latency (µs)	Avg latency (µs)	Max latency (µs)
QCS6490 RT core: 7 Boot Flow - KVM	performance_linux-qcom-rt-6.18_qcom-distro-kvm and qcom-multimedia-image	No load	1	1	6
QCS6490 RT core: 7 Boot Flow - KVM		Stress-NG	1	1	6
Dragonwing IQ-615 RT core: 7 Boot Flow - KVM	performance_linux-qcom-rt-6.18_qcom-distro-kvm and qcom-multimedia-image	No load	1	2	7
Dragonwing IQ-615 RT core: 7 Boot Flow - KVM		Stress-NG	1	2	9

​Understand the CPU scheduler

​Understand the CPU frequency governor

​Understand DVFS governors

​Configure the static map DVFS governor

​Understand the BWMON governor

​Understand the Userspace Resource Manager

​Understand memory management

​Configure RAM memory partitioning

​Understand the real-time (RT) kernel

​Set up the workspace

​Enable the RT kernel

​Customize the RT kernel

​Configure kernel settings for the RT kernel

​Build the RT kernel

​Tune the RT kernel

​Test the RT kernel

​RT test results

​Next steps

Understand the CPU scheduler

Understand the CPU frequency governor

Understand DVFS governors

Configure the static map DVFS governor

Understand the BWMON governor

Understand the Userspace Resource Manager

Understand memory management

Configure RAM memory partitioning

Understand the real-time (RT) kernel

Set up the workspace

Enable the RT kernel

Customize the RT kernel

Configure kernel settings for the RT kernel

Build the RT kernel

Tune the RT kernel

Test the RT kernel

RT test results

Next steps