Ingo Molnar

Removing the Big Kernel Lock

"As some of the latency junkies on lkml already know, commit 8e3e076 in v2.6.26-rc2 removed the preemptible BKL feature and made the Big Kernel Lock a spinlock and thus turned it into non-preemptible code again. This commit returned the BKL code to the 2.6.7 state of affairs in essence," began Ingo Molnar. He noted that this had a very negative effect on the real time kernel efforts, adding that Linux creator Linus Torvalds indicated the only acceptable way forward was to completely remove the BKL. Ingo explained:

"This task is not easy at all. 12 years after Linux has been converted to an SMP OS we still have 1300+ legacy BKL using sites. There are 400+ lock_kernel() critical sections and 800+ ioctls. They are spread out across rather difficult areas of often legacy code that few people understand and few people dare to touch. It takes top people like Alan Cox to map the semantics and to remove BKL code, and even for Alan (who is doing this for the TTY code) it is a long and difficult task."

Defaulting To 4K Stacks

Andrew Morton replied to a commit message making 4k stacks the default, saying, "this patch will cause kernels to crash." Ingo Molnar replied, "what mainline kernels crash and how will they crash? Fedora and other distros have had 4K stacks enabled for years." He added, "we've conducted tens of thousands of bootup tests with all sorts of drivers and kernel options enabled and have yet to see a single crash due to 4K stacks." During the lengthy discussion it was suggested that nfs+xfs+raid kernel configurations, and using ndiswrapper are the most common reasons for overflowing a 4K stack size.

2.6.25, "Long Promised"

"It's been long promised, but there it is now," began Linux creator Linus Torvalds, announcing the 2.6.25 Linux kernel. He continued, "special thanks to Ingo who found and fixed a nasty-looking regression that turned out to not be a regression at all, but an old bug that just had not been triggering as reliably before. That said, that was just the last particular regression fix I was holding things up for, and it's not like there weren't a lot of other fixes too, they just didn't end up being the final things that triggered my particular worries." Linus added:

Memory Corruption Bug Solved, 2.6.25 Expected Today

"Finally found it ... the patch below solves the sparsemem crash and the test system boots up fine now," announced Ingo Molnar. He described the patch as fixing a "memory corruption and crash on 32-bit x86 systems. If a !PAE x86 kernel is booted on a 32-bit system with more than 4GB of RAM, then we call memory_present() with a start/end that goes outside the scope of MAX_PHYSMEM_BITS." He included a source snippet with the loop that could corrupt memory, "depending on what that memory is, we might crash, misbehave or just not notice the bug." Ingo went on to note that the bug was first introduced with sparsemem support in the 2.6.16 kernel:

"I believe this was the reason why my many bisection attempts were unsuccessful: the bug pattern was not stable and seemingly working kernels had the memory corruption too. It was pure luck that v2.6.24 'worked' and v2.6.25-rc9 broke visibly."

Kgdb Light

"While this is probably one of the last days of the merge window, please still consider pulling the 'kgdb light' git tree," began Ingo Molnar, explaining:

"This is a slimmed-down and cleaned up version of KGDB that i've created out of the original patches that we submitted two weeks ago. I went over the kgdb patches with Thomas and we cut out everything that we did not like, and cleaned up the result. KGDB is still just as functional as it was before (i tested it on 32-bit and 64-bit x86) - and any desired extra capability or complexity should be added as a delta improvement, not in this initial merge."

Debugging With kmemcheck

"With a lot of help from Ingo Molnar and Pekka Enberg over the last couple of weeks, we've been able to produce a new version of kmemcheck!" announced Vegard Nossum, adding, "the current version of the patch boots on real hardware, but we've seen freezes on some machines, so it's not perfect yet. (In other words, this patch is HIGHLY experimental, and run at your own risk, etc.)". He also offered a high level summary of the patch:

"kmemcheck is a patch to the linux kernel that detects use of uninitialized memory. It does this by trapping every read and write to memory that was allocated dynamically (e.g. using kmalloc()). If a memory address is read that has not previously been written to, a message is printed to the kernel log."

x86 Architecture Merges in 2.6.25

Ingo Molnar summarized his pull request for changes to the x86 architecture bound for mainline inclusion in 2.6.25 noting, "it's not a small merge, it consists of 908 commits from 96 individual arch/x86 developers (!)". He continued, "a number of core files are changed as well: most notably percpu, debugging details, timers, the firewire remote debugging patch and ... the KGDB remote debugging stub in kernel/kgdb.c." He went on to detail the extent of the testing this tree has received, "in the past few weeks tens of thousands of random x86.git bzImages were successfully built and booted on a number of (commodity) 32-bit and
64-bit testsystems - and there has been a fair amount of test exposure on -mm as well.
" Regarding the remote kernel debugger, Ingo explained:

Scheduler Merges for 2.6.25

Ingo Molnar posted a merge request for the latest git scheduler tree summarizing, "it contains various enhancements to the scheduler - find the full shortlog is below. 96 commits from 19 authors - scheduler developers have been busy again. :-/" He added, "the scheduling behavior of the kernel to normal users should not change over v2.6.24, but there are a good number of new features and enhancements under the hood." Ingo went on to list a number of these new features, including:

"Various instrumentation and debugging enhancements from Arjan van de Ven; Peter Zijlstra's RT time limit and RT throttling code for the RT scheduling class; Paul E. McKenney's preemptible RCU code; refcount based CPU-hotplug rework by Gautham R Shenoy; there's serious interest in running RT tasks on enterprise-class hardware, so Steven Rostedt and Gregory Haskins wrote a large number of enhancements to the RT scheduling class and load-balancer; Peter Zijlstra's high-resolution scheduler tick code; [...] and a good number of other, smaller enhancements."

x86 Architecture Changes Merging in 2.6.25

The final 2.6.24 Linux kernel is expected any day now, so the various subsystem maintainers have begun summarizing what changes are expected to be merged into the mainline kernel during the 2.6.25 merge window. Ingo Molnar spoke to changes for the x86 architecture, "there are 763 commits in x86.git so far, from more than 90 contributors, so it would be difficult to mention and credit every contribution in this mail." Along with a lengthy list of other changes, he included:

"Continued, intense arch/x86 unification and cleanup work by lots of people; FIFO ticket spinlocks for better spinlock scalability; 'regset' generalizations - the most important step towards utrace support (==next-gen ptrace); support for more than 255 CPUs [up to 4096 - in theory up to 65535]; almost complete 64-bit paravirt guest support; KGDB support on x86, finally!"

read more

Scheduler Fixes

Ingo Molnar sent a merge request to Linus Torvalds for the latest CFS fixes. CFS, the Completely Fair Scheduler, was merged into the mainline Linux kernel in July of 2007. It was first included in the 2.6.23 kernel, released in October of 2007. The scheduler appears to be quickly stabilizing, visible in the minimal assortment of fixes contained in the latest source code push. Ingo Molnar summarized the changes:

Scheduler Fixes

Ingo Molnar sent a merge request to Linus Torvalds for the latest CFS fixes. CFS, the Completely Fair Scheduler, was merged into the mainline Linux kernel in July of 2007. It was first included in the 2.6.23 kernel, released in October of 2007. The scheduler appears to be quickly stabilizing, visible in the minimal assortment of fixes contained in the latest source code push. Ingo Molnar summarized the changes:

Memory Management Improvements

A recent report on the lkml suggested improved IO/writeback performance in the recently released 2.6.24-rc1 kernel compared to the earlier 2.6.19.2 and 2.6.22.6 kernels. Credit was given to some patches by Peter Zijlstra. Ingo Molnar replied, "wow, really nice results! Peter does know how to make stuff fast :) Now lets pick up some of Peter's other, previously discarded patches as well :-)" He pointed to several patches "as a starter", then quipped, "I think the MM should get out of deep-feature-freeze mode - there's tons of room to improve :-/"

Unified x86 Architecture Code Quality

"Can we please finish up this merge a little more before we freeze 2.6.24?

Checkpatch --strict Mode

"[The] latest checkpatch.pl works really well on sched.c," commented Ingo Molnar, noting considerable improvements since the last release of the script. Andy Whitcroft recently released version 0.11 of the script, "this version brings a more cautious checkpatch.pl by default. The more subjective checks are only applied with the --strict option. It also brings the usual slew of corrections for false positives."