|Linux And Android Waste Coding Effort|
|Written by Harry Fairhead|
|Wednesday, 14 September 2022|
For many years it has been standard practice to test that you get the memory you ask for, but it has all be a huge waste of time. Operating systems get in on the act before you have a chance to do anything about it.
We try to write code that behaves well - or most of us do. One particular catastrophe that we have all been schooled in avoiding is running out of memory. A C/C++ programmer uses the malloc function to allocate memory. The function usually returns a pointer to the memory requested, but if there isn't enough memory it returns a NULL. So we all generally have been writing
This is so standard that it's a copy-and-pastable code snippet.
Now for the shock.
Malloc almost never returns a NULL, even if there is no memory available! In short, all that error-handling code is wasted code.
The point is that operating systems are in the business of allocating memory and they monitor the entire global system. Your program running out of memory is a small consideration as it means the entire machine, the operating system and all of the programs it is looking after are at risk. The answer to the problem is the OOM - Out Of Memory - killer. This is a monitor process that checks to see if an application is about to use more memory than the system has. If this is the case then it kills the process and hence frees up memory to keep the whole thing going.
OOM killers generally use heuristics to work out which processes to kill along with the one that actually precipitated the crisis. Usually memory-hungry programs and low-priority programs are selected, but it is difficult to predict the collateral damage from an OOM killer. This in itself is claimed to be a disadvantage of the approach in that the heuristic isn't designed to be fair in any sense.
The big problem is that the OOM killer doesn't give the process any scope for gracefully handling the problem. That is, the user could lose work as the result of running out of memory.
Some recent research, When malloc() Never Returns NULL— Reliability as an Illusion, to be presented at ISSRE 2022, suggests that this is a bigger problem than you might think and needs more attention. Gunnar Kudrjavets, a PhD candidate at the University of Groningen, and his colleagues tried an experiment to see what actually happens under different operating systems and discovered that only Windows allowed malloc() to return NULL. Linux, Android, FreeBSD, iOS and macOS all killed the process and malloc was deprived of returning a NULL. In the case of all of these operating systems the process consuming the memory was terminated along with others that fitted the heuristics being applied by the OOM killer. Why Windows? And is this a good thing or a bad thing? Is it that Windows is simply slow to adopt an OOM killer?
Basically you cannot simply test for a Null when you ask for more memory. The paper outlines a number of strategies for overcoming the problem, including monitoring memory availability before requesting an allocation.
Its conclusion is:
"Universally checking the result of a request to allocate memory has been a standard practice for decades. Our recommendation to ignore that guidance on a subset of OSs is clearly contrarian. However, software development practices need to adapt to a new reality. That new reality means, for example, in the case of popular mobile OSs such as Android and iOS, an application is not in control of what happens in case of an OOM event. The typical desktop applications that execute in non-administrative mode have the same limitations. They cannot change the OS settings, query the details about the memory usage of other applications, and cannot circumvent an official OOM killer to prolong their existence. As a result, all the code that is supposed to execute when an OOM condition happens will never run. Therefore, there is no reason for that code to be present."
We all need to remember that the interaction of our code with the operating system is more complex than we usually assume and asking for a resource may result in termination without the opportunity to handle the problem.
There must be a better way.
Gunnar Kudrjavets (University of Groningen), Jeff Thomas (Meta Platforms, Inc.), Aditya Kumar (Snap, Inc.), Nachiappan Nagappan (Meta Platforms, Inc.) and Ayushi Rastogi (University of Groningen)
or email your comment to: email@example.com
|Last Updated ( Friday, 16 September 2022 )|