A Close Look at ARM Mali and its CVE History

published: March 5, 2024
reading time: 13 minutes

In the ever-evolving landscape of technology, where innovation and progress are constant companions, there exists a parallel necessity for robust security measures. ARM Mali, renowned for its high-performance graphics processing units (GPUs) found in a multitude of devices, is no stranger to this reality. However, behind its impressive graphics capabilities lies a history of vulnerabilities, some of which have been brought to light through Common Vulnerabilities and Exposures (CVE) reports.

In this blog post, we delve into the realm of ARM Mali, shining a light on the CVE reports that have surfaced over time. From understanding the intricacies of these vulnerabilities to exploring their implications on device security, join us on a journey to unravel the lesser-known aspects of ARM Mali and the vulnerabilities that have shaped its security narrative. Whether you’re a developer, a tech enthusiast, or simply curious about the security landscape of mobile GPUs, this exploration aims to provide insights and understanding into ARM Mali’s CVE history.

Let’s embark on this investigative journey into the vulnerabilities that have impacted ARM Mali, understanding the significance of these reports and their implications for the broader tech community.

A Brief Overview of ARM Mali: History and GPU Architecture

ARM Mali, a prominent name in the world of mobile graphics processing, has a rich history dating back to its inception in 2006. Developed by ARM Holdings, a British semiconductor and software design company, the Mali series of GPUs quickly gained recognition for their efficiency and performance in mobile devices.

History:

The ARM Mali GPUs were designed to cater to the growing demand for high-quality graphics in smartphones, tablets, and other mobile devices. Since its launch, the Mali series has gone through several iterations, each bringing advancements in performance and power efficiency.

GPU Architecture:

The architecture of ARM Mali GPUs is designed to deliver impressive graphics performance while maintaining a balance between power consumption and efficiency. Here are some key aspects of Mali’s GPU architecture:

Unified Shader Architecture:

Mali GPUs utilize a unified shader architecture, allowing for flexible allocation of processing power between vertex and pixel shaders. This design choice enhances efficiency and enables better utilization of resources for different types of graphical tasks.

Scalability:

Mali GPUs are designed to be scalable, accommodating a wide range of devices with varying performance requirements. This scalability allows Mali GPUs to be integrated into entry-level to flagship smartphones and tablets. Texture Compression: Mali GPUs implement advanced texture compression techniques such as Adaptive Scalable Texture Compression (ASTC). This technology reduces memory bandwidth requirements while maintaining high-quality textures, resulting in improved performance and reduced power consumption.

Compute Capabilities:

In addition to graphics rendering, Mali GPUs also offer compute capabilities through APIs like OpenCL and Vulkan. This allows developers to harness the GPU’s parallel processing power for tasks beyond traditional graphics, such as machine learning and computational photography.

Efficiency and Power Management:

Mali GPUs are known for their efficiency, achieved through optimized architecture and power management techniques. Features like Dynamic Voltage and Frequency Scaling (DVFS) dynamically adjust the GPU’s clock speed and voltage based on workload, maximizing performance while minimizing power consumption.

The names of the Mali GPU architectures are inspired by Norse mythology, starting from “Utgard”, “Midgard”, “Bifrost” to the most recent “Valhall”. Most modern Android phones are running either “Valhall” or “Bifrost” architecture and their kernel drivers share much of the code. As these newer architectures are based largely on the “Midgard” architecture, there are sometimes macros in the “Valhall” or “Bifrost” driver with the “MIDGARD” prefix (e.g. MIDGARD_MMU_LEVEL). These macros may still be in active use in the newer drivers and the “MIDGARD” prefix merely reflects their historic origin.

Mali GPU Architecture

Source: https://images.anandtech.com/doci/14385/Mali-G77-4.png

Here’s an overview of the different Mali GPU generations:

Mali-200 Series: Release Year: 2008 The Mali-200 series was ARM’s first GPU IP to enter the market. It was designed for low-power, small form factor devices such as smartphones and smartwatches. This series supported OpenGL ES 2.0 and was capable of handling basic graphics tasks.

Mali-300 Series: Release Year: 2010 The Mali-300 series was a step up from the Mali-200, offering improved performance and efficiency. It also supported OpenGL ES 2.0 and was used in mid-range smartphones and tablets.

Mali-400 Series: Release Year: 2010 The Mali-400 series was a significant advancement, providing a substantial boost in performance compared to its predecessors. It supported OpenGL ES 2.0 and 1.1, making it suitable for mid-range to high-end devices.

Mali-T600 Series (Midgard): Release Year: 2012 The Mali-T600 series introduced the Midgard architecture, a major overhaul in GPU design. It brought support for OpenGL ES 3.0 and offered significant improvements in performance and efficiency. This series was used in high-end smartphones and tablets.

Mali-T700 Series (Midgard): Release Year: 2014 The Mali-T700 series continued the Midgard architecture, further refining performance and efficiency. It added support for advanced graphics features and APIs like OpenGL ES 3.1 and DirectX 11. This series powered flagship devices.

Mali-T800 Series (Midgard): Release Year: 2014 Building upon the success of the Mali-T700 series, the Mali-T800 series offered even greater performance and efficiency improvements. It supported advanced features such as Vulkan and improved tessellation. Used in flagship and high-performance devices.

Mali-G31/G51/G52 Series (Bifrost): Release Year: 2016-2017 The Mali-G31, G51, and G52 series introduced the Bifrost architecture. These GPUs offered improved performance and efficiency over their predecessors and supported modern APIs like Vulkan and OpenGL ES 3.2. They were used in mid-range to high-end devices.

Mali-G71/G72/G76/G77 Series (Bifrost/Valhall): Release Year: 2016-2020 The Mali-G71, G72, G76, and G77 series continued the Bifrost and introduced the Valhall architecture with even greater performance improvements. These GPUs supported advanced features like ray tracing, machine learning, and 4K/8K video encoding and decoding. Used in flagship and high-performance devices.

Mali-G78/G79 Series (Valhall): Release Year: 2020-Present The Mali-G78 and G79 series are the latest in ARM’s GPU lineup, continuing the Valhall architecture. These GPUs offer improved performance, efficiency, and support for advanced features like ray tracing and AI processing. Designed for flagship devices.

COTS Boards

COTS stands for Commercial Off-The-Shelf. COTS boards refer to pre-built, ready-to-use computer hardware boards or systems that are commercially available off the shelf. These boards are designed and manufactured by companies for general-purpose use rather than for a specific custom application. COTS boards are designed to be readily available for purchase without the need for custom design or manufacturing. They are pre-built, standardized, and often come with documentation and support. These boards are not tailored for any specific application or industry. They are versatile and can be used in a wide range of applications, from industrial automation to embedded systems, research, and more. COTS boards typically include essential components such as a processor (CPU), memory (RAM), storage (often in the form of an onboard eMMC or SSD), input/output ports (USB, Ethernet, HDMI, etc.), and sometimes expansion slots for additional modules or peripherals. Many COTS boards come with extensive documentation, software libraries, and support from the manufacturer, which can be beneficial for developers and engineers. Some popular examples of COTS boards include Raspberry Pi, Arduino, BeagleBone, etc. These boards are widely used in hobbyist projects, educational settings, prototyping, and even in commercial products.

“HiKey” boards, which are popular development boards designed by the Linaro Community Board Group. The HiKey boards are designed to provide a platform for developers to create software and test it on ARM-based architecture. These boards are particularly focused on the ARM Cortex-A series processors. Here are a couple of examples:

HiKey 960
HiKey 970

Here are some popular development boards, their corresponding GPUs, and the operating systems (OS) they support:

Raspberry Pi 4 Model B:

Board: Raspberry Pi 4 Model B
GPU: Broadcom VideoCore VI GPU
OS Support: Raspberry Pi OS (formerly Raspbian), Ubuntu, Fedora, and various other Linux distributions.

HiKey 960:

Board: HiKey 960
GPU: ARM Mali-G71 GPU
OS Support: Android, Linux (Debian, Ubuntu, etc.)

HiKey 970:

Board: HiKey 970
GPU: ARM Mali-G72 GPU
OS Support: Android, Linux (Debian, Ubuntu, etc.)

Arduino Uno:

Board: Arduino Uno
GPU: N/A (Arduino Uno is a microcontroller board, not a GPU-equipped board)
OS Support: Arduino IDE programming environment; no specific operating system.

BeagleBone Black:

Board: BeagleBone Black
GPU: PowerVR SGX530 (in the AM335x SoC)
OS Support: Debian-based distributions (such as Debian, Ubuntu, etc.), and other Linux distributions.

Odroid XU4:

Board: Odroid XU4
GPU: ARM Mali-T628 MP6 GPU
OS Support: Android, Ubuntu, and other Linux distributions.

ARM Mali GPU Software Stack

ARM provides an official software stack for their Mali GPUs, tailored for different operating systems and development environments. Understanding this stack is crucial for developers looking to harness the power of Mali GPUs in their applications. Let’s delve into the two main branches of ARM’s official stack:

Linux/Debian Stack:

ARM’s Linux/Debian stack is designed for devices running Linux-based operating systems like Debian. This stack includes drivers and support for OpenGL ES, which is specifically designed for embedded systems, smartphones, and other devices with limited hardware resources. Additionally, OpenCL support is included, enabling developers to utilize the compute power of Mali GPUs for tasks such as image processing and machine learning. One notable aspect of this stack is the absence of support for OpenGL. Unlike desktop systems, which commonly use OpenGL for graphics rendering, ARM’s Linux/Debian stack focuses on OpenGL ES.

AOSP (Android Open Source Project) Stack:

The AOSP stack is tailored for Android-based devices, including smartphones and tablets. It offers a broader range of graphics API support compared to the Linux/Debian stack. While OpenGL support might still be present for compatibility reasons, newer Android versions are leaning towards Vulkan as the preferred graphics API. Vulkan provides high-performance access to the GPU, maximizing efficiency for graphics-intensive applications. OpenCL support may also be available for compute tasks, depending on the device and Android version.

Mali GPU Driver Components

The Mali GPU driver consists of two distinct parts, each playing a crucial role in enabling the GPU to function efficiently:

Open Source Kernel Driver:

Availability: Open source, updated on Arm Developer page.
Purpose: Manages communication between GPU hardware and OS.
Features: Handles memory, power management, task scheduling.
Benefits: Community contributions, regular updates ensure compatibility.

Proprietary User Space Driver:

Purpose: Compiles shader language programs (e.g., OpenGL ES).
Function: Translates code into Mali GPU instructions for efficiency.
Shading Languages: Supports OpenGL ES for 3D graphics rendering.
Efficiency: Optimizes shader code for Mali GPU execution.

Kernel Driver Components

Here’s a brief explanation of the terms related to the kernel driver for Mali GPUs:

Kbdev (kbase device):

Represents a GPU device in the kernel driver.
Responsible for managing the GPU device at the kernel level.

Kctx (GPU context):

Represents a GPU context, which is a state of the GPU at a specific point in time.
Manages information related to the current state of the GPU, such as shader programs, textures, and other resources.

GP (Geometry Processor):

Part of the Mali GPU responsible for handling geometry processing tasks.
Tasks include vertex processing, transformations, and other geometric calculations.

PP (Pixel Processor):

Component of the Mali GPU responsible for pixel processing tasks.
Tasks include fragment shading, texture mapping, and blending.

Group (Render Group):

Refers to all cores sharing the same Mali Memory Management Unit (MMU).
Helps in organizing and managing parallel processing tasks.

Kbase (Kernel Driver Instance for Midgard):

The kernel driver instance specifically designed for the Midgard architecture Mali GPUs.
Manages interactions between the GPU hardware and the operating system.

TLstream (Timeline Stream):

A timeline stream used for trace recording purposes.
Allows for monitoring and recording of GPU activities over time.

JS (Job Slot):

Represents a job slot exposed by the GPUs.
Jobs are scheduled and executed in these slots.

JC (Job Chain):

Refers to a sequence of jobs that are linked together for execution.
Helps in organizing and optimizing GPU tasks.

JD (Job Dispatcher):

Part of the kernel driver responsible for dispatching and managing job execution on the GPU.

AS (Address Space):

Refers to the address space allocated for the GPU.
Manages memory addresses for GPU operations.

LPU (Logical Processing Unit):

Used for timeline display purposes.
Represents a logical unit of processing, possibly related to trace recording and monitoring.

Exploring ARM Mali GPU Architecture: An Overview

Mali GPU Overview

When diving into the ARM Mali GPU architecture, one crucial aspect to understand is the concept of job chains. These job chains are essential components that encapsulate GPU executable code along with its metadata, providing a streamlined way for the GPU to process tasks efficiently.

Job Chains

Definition: A job chain can be described as a binary blob of GPU executable and its associated metadata.

Atom Structure: These job chains are wrapped in an atom structure, with “struct base_jd_atom_v2” used in the user/kernel (u/k) interface, as seen in “mali_base_kernel.h”. Meanwhile, “struct kbase_jd_atom” is used in the kernel/hardware (k/hw) interface, defined in “mali_kbase_defs.h”.

Purpose of kbase_jd_atom: This internal data structure, kbase_jd_atom, is specific to the kernel’s operation and is not shared with the hardware. It serves as an abstraction layer to separate the u/k interface (atom_v2), making it easier to modify and update.

Pointer to GPU Instructions: Both base_jd_atom_v2.jc and <job-chain GPU address/> contain a jc in their structure, pointing to the GPU kernel instructions.

Job Submission: The process of submitting a job chain begins with the kbase_api_job_submit() function in “mali_kbase_core_linux.c”. This function is invoked when a job is submitted from user space to the kernel.

Similarity of jc Structures: It is noted that both structures, base_jd_atom_v2.jc and <job-chain GPU address/>, likely contain the same GPU instructions. These instructions are baremetal and fundamental to the GPU’s execution.

GPU Model Specification: When compiling OpenCL (OCL) kernels, it is necessary to specify the GPU model, or else the OpenCL runtime checks the available GPUs in the system for compatibility.

When utilizing the Mali GPU driver, the first step is to create a kbase_context through a series of ioctl calls. This kbase_context serves as an essential component, defining an execution environment for the user space application to communicate with the GPU effectively. It’s worth noting that each device file interacting with the GPU maintains its distinct kbase_context.

Understanding Mali GPU Memory Mapping in Linux

Let’s delve into a simple yet fundamental aspect: mapping memory to the GPU. In our example, we’re focusing on the process of mapping GPU memory using the Mali driver in Pixel 6.

Mapping Pages to the GPU

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include "mali.h"
#include "mali_base_jm_kernel.h"
#include <stdbool.h>

#define POOL_SIZE 16384

// Function prototypes
void setup_mali(int fd);
void mem_alloc(int fd, union kbase_ioctl_mem_alloc* alloc);
void* map_gpu(int mali_fd, unsigned int pages, bool read_only, int group);
void* setup_tracking_page(int fd);
void* drain_mem_pool(int mali_fd);

int main() {
    int fd = open("/dev/mali0", O_RDWR);
    if (fd < 0) {
        perror("Error opening Mali device");
        return 1;
    }

    setup_mali(fd); // Setup Mali device

    void* tracking_page = setup_tracking_page(fd);
    printf("Tracking page address: %p\n", tracking_page);

    // Allocate enough pages so the page freed later will spill into the device pool
    void* drain = drain_mem_pool(fd);
    printf("Drain memory pool address: %p\n", drain);

    // Allocate GPU memory using map_gpu
    void* region = map_gpu(fd, 3, false, 1);

    // Print address of the mapped GPU memory region
    printf("Mapped GPU Memory Region Address: %p\n", region);

    close(fd);
    return 0;
}

// Setup Mali device
void setup_mali(int fd) {
    struct kbase_ioctl_version_check param = {0};
    if (ioctl(fd, KBASE_IOCTL_VERSION_CHECK, &param) < 0) {
        perror("version check failed\n");
        exit(1);
    }
    struct kbase_ioctl_set_flags set_flags = {1 << 3};
    if (ioctl(fd, KBASE_IOCTL_SET_FLAGS, &set_flags) < 0) {
        perror("set flags failed\n");
        exit(1);
    }
}

// Allocate memory for GPU
void mem_alloc(int fd, union kbase_ioctl_mem_alloc* alloc) {
    if (ioctl(fd, KBASE_IOCTL_MEM_ALLOC, alloc) < 0) {
        perror("mem_alloc failed");
        exit(1);
    }
}

// Map GPU memory
void* map_gpu(int mali_fd, unsigned int pages, bool read_only, int group) {
    union kbase_ioctl_mem_alloc alloc = {0};
    alloc.in.flags = BASE_MEM_PROT_CPU_RD | BASE_MEM_PROT_GPU_RD | BASE_MEM_PROT_CPU_WR | (group << 22);
    int prot = PROT_READ | PROT_WRITE;
    if (!read_only) {
        alloc.in.flags |= BASE_MEM_PROT_GPU_WR;
        prot |= PROT_WRITE;
    }
    alloc.in.va_pages = pages;
    alloc.in.commit_pages = pages;
    mem_alloc(mali_fd, &alloc);
    void* region = mmap(NULL, 0x1000 * pages, prot, MAP_SHARED, mali_fd, alloc.out.gpu_va);
    if (region == MAP_FAILED) {
        perror("mmap failed");
        exit(1);
    }
    return region;
}

// Setup tracking page
void* setup_tracking_page(int fd) {
    void* region = mmap(NULL, 0x1000, 0, MAP_SHARED, fd, BASE_MEM_MAP_TRACKING_HANDLE);
    if (region == MAP_FAILED) {
        perror("setup tracking page failed");
        exit(1);
    }
    return region;
}

// Drain memory pool
void* drain_mem_pool(int mali_fd) {
    return map_gpu(mali_fd, POOL_SIZE, false, 1);
}