Linux 5.9: New kernel version closes the license gap and supports FSGSBASE

Source: Heise.de added 16th Oct 2020

After the extremely extensive update to Linux 5.8, Linus Torvalds had promised a “normal” update for the first release candidate (rc1) from 5.9. It turned out differently: When the developers were still submitting a number of changes in rc7, Torvalds quickly extended the development phase for 5.9 by another week and another release candidate.

A large number of the commits make up new and improved drivers in 5.9. With the final elimination of a license gap and the now completed FSGSBASE support, there are also a few “eye-catchers”. Under the hood, the new release features improvements in real-time scheduling on asymmetrical CPU configurations, memory management and thread prioritization.

License gap closed Loadable kernel modules had to indicate clearly whether they were closed source or as open source code under the GNU General Public License (GPL). Since GPL modules have explicit access to “GPL-only symbols” in the kernel, which proprietary modules are denied, cheating has often been the case in the past. In particular, resourceful developers took advantage of a design gap in the kernel: GPL modules could previously depend on proprietary modules. Instead of placing the proprietary module under the GPL, the developers in question simply used a GPL-compliant, open-source connection module as a glue and translator between the kernel and the proprietary module.

Linux 5.9 closes this gap through the In the recent past, for example, Jonathan Lemon from Facebook tried to tinker a “GPL adapter module” between the proprietary Nvidia driver and the NetGPU core for performance reasons. An incident that may have contributed to the decision of the kernel developers.

FSGSBASE finally ready for use Linux 5.9 brings a seemingly never-ending story to an end: The new version supports the Intel commands of the FSGSBASE family. FSGSBASE combines some CPU commands which are used to read and set the segment registers FS and GS directly. What sounds like a small insignificant detail starts deep in the system and opens new horizons for Linux on x 86 _ 64 and safe application scenarios.

Background: Threads are often used the FS register to address your thread local memory. Each thread has its own FS value and can thus transparently address its own memory. The thread doesn’t have to worry about where the memory area is actually located: it applies its offsets to the (indirect) address in the FS register. The situation is similar with the GS register, which the Linux kernel uses to manage data per CPU.

Changing segment registers is reserved for privileged code in the kernel space. If the user space wants to change the values ​​for FS or GS, this requires detours (syscalls and associated context switches), which depress performance. What is negligible when the FS register is set once for a thread can become a drag in modern application scenarios. Intel therefore 2012 carried out the FSGSBASE instructions with the third generation of its core processors (code name “Ivy Bridge”) one. They enable FS and GS to be changed directly from the user space. The Syscall brake is no longer required. However, the kernel must explicitly set a special bit to activate the instructions.

The kernel relies on the FS and GS registers being correctly set when entering the kernel space. A change in GS in particular could have fatal consequences. Ultimately, this way, wrong data could be slipped onto CPUs and attack scenarios could be constructed. Making the kernel fit for FSGSBASE was therefore a long process: Intel had from 2012 to 2019 Patches submitted in seven versions that did not find their way into practice.

SGX implemented properly The FSGSBASE support available in Linux 5.9 also benefits SGX projects such as the prominent Intel-supported Graphene project.

Intel’s Software Guard Extension (SGX) enables the creation of enclaves. These enclaves are memory areas that are sealed off by the CPU using transparent encryption and integrity protection. Even privileged processes can be prevented by the CPU from accessing these enclaves. SGX thus allows code to be executed safely and uninfluenced even on an already compromised system.

SGX projects depend on a high-performance way of setting FS and GS (from the user space). So far, Graphene loaded its own small kernel module, which FSGSBASE activated, as an “emergency solution”. Such special paths are fraught with security risks – and fortunately no longer necessary in the future: Thanks to the official FSGSBASE support provided by the kernel team, SGX systems can be implemented securely, efficiently and, above all, in a controlled manner.

Flexible IP port combinations The Berkeley Packet Filter (BPF) introduces a new program type in Linux 5.9 called BPF_PROG_TYPE_SK_LOOKUP. Such programs are executed when a transport layer performs a lookup on a LISTEN socket. This is the case with a new connection request via TCP or when a UDP packet arrives for an unconnected socket. In this case, the BPF program can be used to flexibly control who receives which packet and when.

BPF removes the restrictions of the old bind () API and allows more flexible IP and port combinations . A possible application scenario are, for example, sockets that listen for an IP address instead of a single port, a port range or even all ports. Another use case are services on different IP addresses that share a port. Due to the port binding of bind (), such constellations are otherwise not permitted there.

ZSTD compression The possibility to use ZSTD (Zstandard) for compression for the kernel and initrd (initramfs) is new. ZSTD is characterized by high compression rates and very fast decompression. The latter can significantly accelerate the boot process.

The kernel developers give figures from Facebook as a reference: When the company switched from an initrd compressed with xz to one compressed with ZSTD, the decompression time when booting was reduced can be reduced from twelve to three seconds. Switching the compressed kernel image from xz to ZSTD saved the company two seconds of boot time, according to the kernel team.

Asymmetry and real-time Scheduling Thanks to a patch by the developer Dietmar Eggemann in Linux 5.9, the deadline scheduler for real-time tasks is suitable for asymmetrical CPU configurations.

Unlike the POSIX realtime scheduler, which assigns CPU time to tasks on the basis of priority levels, the deadline scheduler does not work with priorities. Instead, it evaluates a task based on its required duration, the activation period and the “deadline”; the time span within which the task should be completed. Based on these values, the kernel can determine which task needs CPU and when.

Up to now this only worked without problems with symmetrical CPU configurations. Asymmetrical configurations that combine different powerful CPUs in one system, however, caused the deadline scheduler to stumble in stressful situations. This has changed with the new version: The scheduler now knows how to handle such configurations.

Capacity-based scheduling The patch from Eggemann introduces a capacity-based calculation model: Instead of using a homogeneous CPU capacity for the “deadline” as a basis for calculation, the actually available and possibly different CPU capacities are now included in the calculation. This means that the “deadlines” can be correctly determined on an asymmetric system and the tasks can be precisely distributed.

However, the new solution requires that at least one CPU is not entrusted with the execution of deadline tasks . Otherwise, the task distribution can still get out of hand, as there may not be enough CPU capacity available for the actual calculation of the distribution. The developers want to address this problem of high-performance and high-load systems at a later date.

A solution is also being considered for the future in order to avoid overloading powerful CPUs with small tasks. This can lead to a kind of “fragmentation” of the CPU capacities. A larger task could not find a CPU that could provide the necessary computing capacity within the deadline. Solutions for this are also being considered, but not yet implemented.

Don’t miss any news! With our daily newsletter you will receive all the news from heise online every morning from the past 24 hours.

2019 New in the file systems

Read the full article at Heise.de

brands: Intel  NVIDIA  
media: Heise.de  
keywords: Facebook  Memory  Open Source  Software  

Related posts


Notice: Undefined variable: all_related in /var/www/vhosts/rondea.com/httpdocs/wp-content/themes/rondea-2-0/single-article.php on line 88

Notice: Undefined variable: all_related in /var/www/vhosts/rondea.com/httpdocs/wp-content/themes/rondea-2-0/single-article.php on line 88

Related Products



Notice: Undefined variable: all_related in /var/www/vhosts/rondea.com/httpdocs/wp-content/themes/rondea-2-0/single-article.php on line 91

Warning: Invalid argument supplied for foreach() in /var/www/vhosts/rondea.com/httpdocs/wp-content/themes/rondea-2-0/single-article.php on line 91