Lab 13.1 Invoking the OOM Killer

rroberts · December 2020

Good afternoon,
Lab 13.1 has you turn off swap and then run stress-ng -m 12 -t 10s to fill your memory and invoke the OOM killer. I tried this on an antiX VM with 3 gb of memory and monitored dmesg, /var/log/messages, and /var/log/syslog and could see stress-ng run, but I don't see anything about an OOM Killer being called or any processes being stopped. What should I see?

Lab Text:

You should see the OOM (Out of Memory) killer swoop in and try to kill processes in a struggle to stay alive. You can see what is going on by running dmesg or monitoring /var/log/messages or /var/log/syslog, or through graphical interfaces that expose the system logs.
Who gets clobbered first?

Thanks in advance!

schuam · December 2020

Hello rroberts,

in order to see what's going on, I used two terminals. In one of them I ran:

free -s 1 -h

which shows you, how big you memory is and how much is used/free. With -s 1 it updates every second.

In the second terminal I ran:

stress -m 12 -t 10

I did that in Arch Linux, therefore stress, but I think stress-ng pretty much does the same.

Now (in the frist terminal) I could observe how the free part of the memory shrinks while stress runs. -m 12 didn't stress my system enough to invoke the OOM killer. So I increased the passed in number (at the end to 40). At that point stress tried to allocate so much memory, that the OOM killer kicked in and killed the process. Afterwards I ran dmesg and the last few lines looked like this:

[  767.925606] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-1.scope,task=stress,pid=1433,uid=1000
[  767.925620] Out of memory: Killed process 1433 (stress) total-vm:265804kB, anon-rss:222856kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:496kB oom_score_adj:0
[  767.973170] oom_reaper: reaped process 1433 (stress), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

It tells you that the OOM killer killed the stress process.

Well, I hope that helps.

rroberts · December 2020

Weird. I ran it on both my antiX and OpenSUSE vms and even at -m 8000 I could never get it to even make a dent in the memory. I tried it on my actual pc after that and got the OOM Killer to fire, so I'm wondering if VirtualBox doesn't have some safeguards in place or something.

Anyway, thanks for the feedback! I didn't even know what I was supposed to be looking for.

coop · December 2020

I confess I don't know much about "antiX linux" as it is not one we support for this course, although I don't see why it would matter.

As an alternative to using stress trying using the attached "C" program, lab_wastemem.c, after compiling it asgcc -o lab_wastemem lab_wastemem.cand then run it as in ./lab_wastemem 8096 , which would waste 8 GM of memory. (Put in a number high enough to run out of memory) If you put in way too much it may not even run. It will terminate itself after a while no matter what. In another window I recommend running gnome-system-monitor and watching the memory usage. (It will also be easier to trigger if you do "sudo swapoff -a" first.)

This tool does only the memory stress and is simpler than stress. We used to give it out, but too many students had to be walked through compiling and running the program so there was signal/noise ratio problems.

(sorry this software doesn't allow me to attach a C language file for some reason. So you will have to cut and paste it.
Watch out for underscores etc if you have a dumb browser and/or editor etc.)

/* simple program to defragment memory, J. Cooperstein 2/04
*/

include <stdio.h>

include <stdlib.h>

include <unistd.h>

include <string.h>

include <sys/sysinfo.h>

include <signal.h>

define MB (1024*1024)

define BS 16 /* will allocate BSMB at each step /

define CHUNK (MB*BS)

define QUIT_TIME 20

void quit_on_timeout(int sig)
{
printf("\n\nTime expired, quitting\n");
exit(EXIT_SUCCESS);
}

int main(int argc, char **argv)
{
struct sysinfo si;
int j, m;
char *c;

/* get total memory on the system */
sysinfo(&si);
m = si.totalram / MB;
printf("Total System Memory in MB = %d MB\n", m);
m = (9 * m) / 10;   /* drop 10 percent */
printf("Using somewhat less: %d MB\n", m);

if (argc == 2) {
    m = atoi(argv[1]);
    printf("Choosing instead mem = %d MB\n", m);
}

signal(SIGALRM, quit_on_timeout);
printf("Will quite in QUIT_TIME seconds if no normal termination\n");
alarm(QUIT_TIME);

for (j = 0; j <= m; j += BS) {
    /* yes we know this is a memory leak, no free,
     * that's the idea!
     */
    c = malloc(CHUNK);
    /* just fill the block with j over and over */
    memset(c, j, CHUNK);
    printf("%8d", j);
    fflush(stdout);
}
printf("\n\n    Quitting and releasing memory\n");
exit(EXIT_SUCCESS);

}

luisviveropena · December 2020

Hi @rroberts ,

How about the following commands?

stress-ng --vm 1 --vm-bytes 2g -t 60s

stress-ng --vm 1 --vm-bytes 4g -t 60s

stress-ng --vm 1 --vm-bytes 100% -t 60s

I don't think VirtualBox is playing an important role here (it works on my Ubuntu 18.04 vm).

Regards,
Luis.

rroberts · December 2020

Thanks for the feedback, guys. These suggestions worked on both VMs and got the available memory down to 75mb, but neither antiX or OpenSUSE fired off the OOM Killer. I did get the OOM Killer to work on my non-VM Fedora setup, so I'm not really concerned about it at this point.

luisviveropena · December 2020

Hi @rroberts ,

If you want to take a look to how memory is managed on *SUSE, you can take a look to the following documentation:

Overcommit Memory in SLES

https://www.suse.com/support/kb/doc/?id=000016945

Regards,
Luis.

MelvynDrag · January 2021

I also could not get the OOM Killer to work on a Fedora Digital Ocean server.

The free memory dropped down to 75MB and then recovered. There was nothing in dmesg about it.

When I ran the c program provided above I got the oom_reaper to appear in dmesg.

coop · January 2021

I don't know why stress-ng should behave differently than the simple C program. I hope you are all turning off swap before doing this exercise (as in sudo swapoff -a) as otherwise triggering can be very slow if not impossible.

lee42x · January 2021

It seems there is a difference between "stress" and "stress-ng" , looks like they fixed "stress-ng" to process the mmap failure instead of letting it fail. See attached:

smarques · April 2021

Indeed Lab 13.1 command should be updated as it is not working as described..
Using Ubuntu 20.04 LTS on Virtual Box with 4GB RAM, I could invoke OOM only using similar command as shared by Luis (using vm-bytes option with 100% for at least 20 sec)

Here below some extract of /var/log/syslog.
Stress-ng process gets killed having the highest oom_score

ubuntu stress-ng: invoked with 'stress-n' by user 1000
ubuntu stress-ng: system: 'ubuntu' Linux 5.8.0-49-generic #55~20.04.1-Ubuntu SMP Fri Mar 26 01:01:07 UTC 2021 x86_64
ubuntu stress-ng: memory (MB): total 3935.66, free 2870.41, shared 26.48, buffer 4.44, swap 0.00, free swap 0.00
ubuntu kernel: [ 1615.497838]** free invoked oom-killer**: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
ubuntu kernel: [ 1615.497843] CPU: 1 PID: 4065 Comm: free Tainted: G OE 5.8.0-49-generic #55~20.04.1-Ubuntu
ubuntu kernel: [ 1615.497845] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
(...)

ubuntu kernel: [ 1615.498328] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service,task=stress-ng,pid=4072,uid=1000
ubuntu kernel: [ 1615.498340] **Out of memory: Killed process 4072 (stress-ng) **total-vm:2966460kB, anon-rss:2929204kB, file-rss:0kB, shmem-rss:32kB, UID:1000 pgtables:5800kB oom_score_adj:1000
ubuntu kernel: [ 1615.533143] oom_reaper: reaped process 4072 (stress-ng), now anon-rss:0kB, file-rss:0kB, shmem-rss:32kB

coop · April 2021

Exactly how systems respond to OOM situations depends on quite a few variables including:

Linux Distribution
Kernel Version (including distro customizations)
Physical Machine? or Virtual, and if so which hypervisor and version and host OS
Actual RAM available
How much swap
How the condition is triggered. (last but not least)

Trying to account for all these variations with a one line command is just whack-a-mole; any fix will break something else. As long as you triggered the OOM killer I could not care less how.

Once upon a time the OOM killer was so naiive it always killed X as the guilty programs were generally running under X on workstations. It is smarter today!

Lab 13.1 Invoking the OOM Killer

Comments

include <stdio.h>

include <stdlib.h>

include <unistd.h>

include <string.h>

include <sys/sysinfo.h>

include <signal.h>

define MB (1024*1024)

define BS 16 /* will allocate BSMB at each step /

define CHUNK (MB*BS)

define QUIT_TIME 20

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)

Lab 13.1 Invoking the OOM Killer

Comments

include <stdio.h>

include <stdlib.h>

include <unistd.h>

include <string.h>

include <sys/sysinfo.h>

include <signal.h>

define MB (1024*1024)

define BS 16 /* will allocate BS*MB at each step */

define CHUNK (MB*BS)

define QUIT_TIME 20

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)

define BS 16 /* will allocate BSMB at each step /