Part 4: Comprehensive Research of Linux Operating System
Mitigations: User and Kernel Modes
We already know that the entire Linux operating system is written in C. And not just the operating system, but many binaries that run on it are written in C. Although there are multiple reasons for this, the main reason is that C is a very fast and powerful language. However, while speed and power are C’s big strengths, there are also disadvantages that come with it.
C is not a memory safe language. That is, if a programmer makes a mistake, the attacker can interfere with the memory. We saw this with the sudo vulnerability in the Vulnerability Analysis section. The ability to access and manipulate memory brings with it many problems. Code Execution is one of the most commonly known problems. In other words, an attacker with access to memory manipulates the application’s resources with different methods. Thus, the working mechanics of the application change and can be used for malicious purposes. This may result in dangerous situations. So, what do the Linux Operating System and compilers do about it?
User Mode
Since these problems brought by non-memory safe languages are known, operating systems have taken certain precautions in this regard. Let us examine the precautions used for User Mode.
-
Position Independent Executables (PIE)
Any executable on the computer is kept in physical memory during inactivity. In other words, it is stored in the hard disk and waits there until it is used. When it is wanted to be used, the operating system starts to allocate space for this executable in the virtual memory. The necessary linker library, libc, syscall routine, stack, and assembly codes of the application are allocated on this virtual memory and the runtime environment is prepared. If no measures are taken for memory, this allocation is made on the same areas in every execution. In other words, every time the application and other necessary packages run, they are allocated to the same addresses in the virtual memory. So, now memory addresses have a static value. This means that every time the application runs, the same data will be found at the same addresses. Right here, I would like to mention a well-known exploitation method – Return Oriented Programming (ROP).
For ROP, we can simply say that by manipulating the return instruction in an application, it directs the application to any desired place. And if the executable works with a static allocation, regardless of PIE, at this point, the success probability of an attack using ROP will increase. Because if the attacker knows the addresses where the application is running and these addresses never change, the attacker can easily exploit. PIE, which emerged as a solution to these situations, is used to randomize the allocated space in virtual memory when the application is running. Application data from physical memory is written to different addresses, even though the offsets do not change when allocating to virtual memory. Thus, every time the application runs, the addresses in the virtual memory are changed by PIE. Since the attacker does not have a static address, exploitation is made difficult and perhaps completely blocked.
-
Non-Executable Stack (NX)
NX, which is one of the User Mode mitigations methods, allocates a non-executable stack area to the application, as the name suggests. The data received from the user is stored on the stack until it is copied to a fixed area. In other words, the data presented by the user is stored in the stack. If the stack is executable, the user can manipulate the stack space as a result of any overflow. We talked about ROP earlier. Thanks to overflow, the attacker can write the return address, which in a way means redirecting the program’s run route. If the stack field is executable, the attacker can redirect the return address to malicious code written on the stack. In this way, an attacker can run the shellcode it uploaded to the stack. One of the measures taken to prevent this is NX.
The starting point of this procedure is the W^X principle. The W^X principle, also known as “write xor execute.” It is used to explain that computers should not have both a writable and executable area in their memory. This is the right approach because, with NX, the application is allocated to virtual memory, while the stack is not allocated with executable permissions. It can be only writable and readable, in this way, overflow will not be sufficient for exploitation alone. On the other hand, the W^X principle that we see in NX is not only valid for the stack.
Particular attention is paid to this principle in modern operating systems and compilers. It is very important that any memory area allocated in virtual memory should not be writable and executable at the same time. However, the exploitation techniques used today have also changed in this direction, and attackers can use exploit methods such as ret2libc without the need for any executable memory.
Another detail that should be mentioned is that although such protections provide security, they do not guarantee 100% security. If the developer accidentally leaves enough bugs for the attacker, many of the methods will be rendered useless.
-
Stack Canary / Canaries
We know that many of the low-level vulnerabilities emerging today are somehow related to memory. In fact, every application has a virtual memory while it is running. The application is integrated with its virtual memory, which means it uses this space for everything from the instructions it executes to the data it stores in the runtime. If there is any unauthorized intervention in this area, the working structure of the application or the data it stores will completely change.
Let us discuss how this intervention is done. The well-known method is Buffer Overflow. How can attackers trigger Buffer Overflow?
The piece of code where the application allocates memory is the most important target in this regard. So, for example, if an area in the virtual memory receives more data than it should, it will overwrite the next data and break its structure. This is where the Stack Canary prevention comes into play.
Stack Canary is a stack protection method used to prevent possible overwrites by putting certain data at the beginning and end of arrays that have the ability to affect virtual memory. For example, let’s say you have a 50 byte char array in your application. A user can write data using scanf into this array. If there is a vulnerability in scanf implementation, because the programmer does not impose any restrictions on the users, then they can write more than 50 bytes of data into the array.
Therefore, they can manipulate places in virtual memory that they cannot reach under normal conditions, and this brings along many possible critical problems. With Stack Canary, the compiler creates an array pair itself before and after the arrays created by the programmer in the functions. When the function is called, it first fills this array pair with a random value from virtual memory. At the end of the function, the last array is compared with the random value we mentioned. In this way, inferences are made about whether the arrays in between canary array pair and accessed by the user are exposed to any overflow.
If an overflow has occurred, the canary array, which is filled with a random value at the start of the function, must have changed. This check is made when exiting the function and if there is something wrong, the application is terminated. This provides protection against possible attacks. There are different types of stack canaries.
Although there are different versions such as Null Canary, 8-bit Canary, and Custom Canary, the working logic is the same. In short, an additional canary array is created between the function return address and the arrays, and it is checked whether it has changed while exiting the function. However, as in every protection method, there are also certain bypass methods for stack canary.
Another shortcoming of the stack canary is that it does not provide protection for the heap overflow. If the overflow occurs in a buffer allocated on the heap, the stack canary becomes useless.
-
RELRO (Relocation Read-Only)
Dynamically linked ELF binaries can locate the linked functions in virtual memory with the help of PLT and GOT tables. Such ELF binaries do not contain code for built-in functions. They get these functions from the libc libraries used by the operating system they are running. In this way, they become much more portable and small-sized applications.
These dynamically linked binaries need a table for the functions they want to use after loading the linked libc into their virtual memory. At this point, PLT and GOT are used. The function to be used is called from the PLT table, the input in the PLT table leads to the GOT table. If the function has not been called before, it is determined where it is in the memory and its address is written to the GOT, and after it is written once, the function is accessed with the help of the address in the GOT in each next use.
The PLT and GOT procedure is like this for a dynamically linked ELF binary. However, the source of the security problem that arises here is that the GOT table is writable and at the same time readable memory location. And with this feature, it can be used as an arbitrarily writable and readable area during the exploitation phase. In fact, different exploitation scenarios can be produced by changing the data here.
For this purpose, RELRO can be used to prevent such an attack. RELRO is basically divided into two. The first is Full RELRO. With Full RELRO, the addresses of all built-in functions contained in the binary are parsed from libc at the beginning of execution and written to the GOT table. After this stage, the GOT table is mapped as read-only. In this way, the use of the GOT table as an attack vector in a possible exploitation is prevented up to a certain point. In Partial RELRO, only the PLT section is mapped as read-only. However, Partial RELRO cannot provide as strong protection as Full RELRO, since exploitation scenarios target GOT.
Kernel Mode
We talked about mitigations for User Mode, but Linux operating system also offers certain mitigations for Kernel Mode. The main purpose of the mitigations we mentioned in User Mode is to prevent any vulnerable application running on the operating system from harming the entire operating system.
So, let’s say we have an existing and running application in the operating system. Using any vulnerability discovered in this application, Code Execution can be obtained on the operating system. This means that a problem caused by only one application affects the entire operating system. On the other hand, with User Mode protection methods, exploitation can be much more difficult or even impossible. We can say that there is actually a chain of protection. But another point that must be noted is that a break in any link of the chain can affect the entire chain and can make the protection completely dysfunctional.
As we mentioned before, the Kernel is actually a huge chunk of code written in C. It works separately from User Mode. It has its own virtual memory, runs at the highest privilege, and is capable of doing anything on the computer. In this context, there are certain mitigations in the Linux Operating system for Kernel Mode. With Kernel Mode mitigations, it is aimed to prevent attacks targeting the Kernel. We can say that the logic is the same as User Mode mitigations. Creating a chain of protection and making exploitation difficult. In this section, we will address three of them.
-
ASLR (Address Space Layout Randomization)
Although we can count ASLR in User Mode mitigations, I wanted to mention ASLR in Kernel Mode mitigations, because all applications running on the computer are under the control of the Kernel. Also, all its resources are provided by the operating system. Every application running on Linux has virtual memory. This memory is allocated by the operating system specifically to the application.
For example, when the application starts, the operating system allocates the application’s assembly codes and segments such as stack, heap with the required libraries into the virtual memory of the application. While the application is running, it uses these components that have been allocated to its virtual memory. On the other hand, each virtual memory is allocated specifically to each application. This means, that an application cannot access the virtual memory of another application under normal conditions. This operation can only be done by the Kernel.
This virtual memory is of critical importance for attackers. Because, with any overflow, the attacker, who can manipulate the virtual memory of the application, can jump anywhere in this memory with the ROP method. The attacker, who can run arbitrary commands with ROP within the virtual memory of the application, can also reach an operating system-level Code Execution.
In order to prevent such an attack, ASLR has been used in Linux and other modern operating systems. As the name suggests, ASLR offers us a kind of randomization. As we said at the beginning, the virtual memory of applications is allocated by the operating system. In operating systems with ASLR enabled, this allocation is done randomly. In this way, the virtual memory of the application has different values, even if the offsets do not change each time the application runs.
In this way, it will be difficult or completely blocked for an attacker who discovers an overflow in the application to obtain Code Execution at the operating system level. ASLR is one of the mitigations that is used in almost all operating systems today and provides strong protection. Another feature that makes it valuable is that it is not a one-time protection. In other words, every time the application is started, its virtual memory is randomly assigned by ASLR.
-
KASLR (Kernel Address Space Layout Randomization)
As we mentioned before, the Kernel uses its own virtual memory. Just like ASLR, the working logic of KASLR is the same. Randomizing the virtual memory used by the kernel and taking precautions against possible attacks. But there is an important detail that is worth mentioning.
We discussed that ASLR is not one-time protection. It can provide repeated randomization for applications running in User Mode. Each time the application is started, the virtual memory addresses will be randomized again and again.
However, the case is different for KASLR, because the Kernel is run during the computer boot phase and stays running until the computer shuts down. So, KASLR provides one-time protection at this point. While the computer is booting, the Kernel runs, the virtual memory it will use is randomized and the Kernel uses this memory until the computer shuts down. Because of the one-time protection it provides, KASLR is not considered as effective as ASLR.
However, although it provides a one-time protection, the main idea here is randomization. In other words, the existence of randomization is the point that should be paid attention rather than how many times it is done. Because both ASLR and KASLR become completely useless with any info leak. At this point, the important thing is the existence of mitigation.
Despite all the mitigations we have listed in today’s operating systems, exploitation is still possible. This means that exploitation cannot be completely prevented with mitigations. It’s just made harder. Despite KASLR being considered an insufficient mitigation, if we were using operating systems without KASLR implemented today, we would be faced with much more Kernel exploits.
-
SELinux (Security-Enhanced Linux)
Security-Enhanced Linux (SELinux) is a kind of security feature that acts like a filter between Kernel Mode and User Mode in the Linux Operating System. With this feature, which is active by default in distributions with Linux Kernels such as Red Hat, CentOS, Fedora, and Android, what applications can do in Kernel Mode can be limited.
The basic logic here is security policies. Some rules are set and with these rules, SELinux can make inferences about what can or what should not be accessed in Kernel Mode. This way, it prevents unauthorized access. For example, when an application tries to use a syscall without authorization or wants to access a file that is blocked, SELinux hinders all these attempts in line with the rules we mentioned. As a result, it is possible to prevent harmful activities that may occur in the operating system.
It should be mentioned that SELinux is one of the most effective mitigations in the Linux Operating System. With the correct configurations, SELinux provides very strong protection. Especially in Android, it is very difficult to go from memory corruption to code execution nowadays. One of the main reasons for this is SELinux. Although operating systems used for computers with SELinux sometimes cause problems in the operation of necessary applications, it would not be wrong to say that Android has a strong protection with SELinux.
With this filtering between User Mode and Kernel Mode, although malicious users somehow manipulate the application and come to the point of running commands arbitrarily, this can be prevented by SELinux. Another point to note here is that the rule sets that we mentioned above should be implemented properly and strictly where necessary.
-
Supervisor Mode Access Prevention / Supervisor Mode Execution Prevention (SMAP/SMEP)
As we discussed above, User Space and Kernel Space are two different modes that use different virtual memory spaces. In modern operating systems, User Space and Kernel Space are tried to be completely abstracted from each other. The main reason for this is security.
The less any User Space application interacts with the Kernel and the less data they exchange, the safer the environment will be. In this context, Modern CPU developers produce certain security implementations. Two of these security measures are SMAP and SMEP. In general, the purpose of these two security measures is to completely prevent unauthorized connections between Kernel Space and User Space.
With the specific APIs it contains, the Kernel can receive data from the virtual memory of an application with which it communicates in User Space. “copy_from_user” is a function that we can give as an example of this. Apart from this, if you try to access the memory area mapped as User Space with a shellcode-based operation from a Kernel with SMAP open, it will be inevitable to encounter certain error messages in the dmesg output and then see a Kernel panic.
The main difference between SMAP and SMEP is that one came out later than the other and provides more comprehensive protection. The first released version is SMEP. The purpose of SMEP was to prevent any User Mode code from being run in Kernel Mode. In this context, with SMEP in the past, a user mode application was prevented from running commands with root privilege by using any vulnerability in the Kernel.
SMAP was developed in later years. The main purpose of SMAP, which offers a more comprehensive security measure compared to SMEP, is to completely minimize the unauthorized communication between User Space and Kernel Space. Not only in the context of execution, even reading data from User Space without using an API from Kernel Space is prevented with SMAP.
On the other hand, these mitigations can be momentarily disabled by changing the AC flag in RFLAGS while using the APIs hosted by the Kernel. For example, if the Kernel needs to receive data from User Space, then helper APIs come into play. They set the AC flag for a short time and clear the flag again after copying the data. These operations are performed with the stac and clac instructions. These instructions are Ring 0 specific and can only be run from Kernel Space.
In our final Linux Research part, we will take a detailed look at some code examples for Syscall, Driver Entry, and Exploits per Mitigation.
Previous parts of the research can be found here Comprehensive Research of Linux OS.
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /
- /