HOWTO: Solve / investigate high Linux system load

On support websites I am noticing there is a lot of confusion about system load. This post is my attempt to make a short guide to understanding (and fixing) high Linux system load. Disclaimer: I might be wrong ;).

Get your current sysload:

$ w
 18:54:44 up 22 days, 22:47,  1 user,  load average: 2.54, 2.69, 2.40
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
karlo    pts/0    5ed07f32.cm-7-1b 18:49    0.00s  0.09s  0.00s w
$ cat /proc/loadavg 
2.49 2.68 2.39 1/149 2085

As you can see here, I have a system load of around 2.5. This means before an interrupt (program call) is done, it has to wait for 2.5 tasks. It’s like a queue. On a 20 core system, a system load of 2 would be no problem, but I have 2 cores:

$ cat /proc/cpuinfo | grep processor | wc -l
2

So. What to do?

Possible cause: CPU usage

First, let’s see if there are processes which use a lot of CPU. Use top (or, better!, htop) to see processes. If you are using htop, this is very easy. Start htop, press ‘<' or '>‘ and choose CPU%. This will sort by CPU usage percent. If there is a process which consumes a lot of CPU, this is probably the evildoer.

Possible cause 2: high IO

High input/output on your disks can lead to a high sysload. I use iostat and iotop to investigate this. You might have to install them first (which can be a pain on a system with high cpu load). On RHEL based systems, iostat is in the package sysstat.

To get the current IO, try “iostat 1”:

$ iostat 1
Linux 3.2.2 (srv02.leanback.eu) 	06/26/2013 	_x86_64_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.53    0.14    1.19    2.73    0.25   95.16

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdap1            0.84         2.63         2.59    5222348    5128012
xvdap2            5.87        33.14        44.00   65726181   87260968

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00    0.50   75.38    0.00   23.62

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdap1            0.00         0.00         0.00          0          0
xvdap2           46.00         0.00       788.00          0        788

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00   49.49    0.00   50.51

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdap1            0.00         0.00         0.00          0          0
xvdap2            7.00         0.00        32.00          0         32

^C

If you have high read/write numbers, this is the problem. Probably. Investigate what process is causing it with iotop (we need sudo):

Total DISK READ:      96.98 K/s | Total DISK WRITE:    1318.98 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                      
  766 be/4 root        0.00 B/s   50.43 K/s  0.00 % 12.26 % [kjournald]
 1321 be/4 root       31.03 K/s    0.00 B/s  1.29 %  0.00 % rsyslogd -n -c 5
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init

The process on top is the one which uses most IO. There might be something wrong there. Kill it. With fire (-9).

Not fixed yet? Is your system swapping?

If your system is using swap, this can lead to a high system load. Check out your memory usage with ‘free’:

$ free -m
             total       used       free     shared    buffers     cached
Mem:           990        958         32          0         27        276
-/+ buffers/cache:        653        336
Swap:         1023         73        950

As can be seen above, I have a total of about 1GB RAM on my VPS. There is only 32MB left. Meh. And it has used Swap (see the Swap row, and then under the ‘free’ column). This is not a good sign. Scroll back to the part about htop, and sort on memory. This will tell you what is using your precious RAM.

If that is done, you can:
1) Clear your SWAP. Only so this if you have enough free memory. As root, do: “swapoff -a; swapon -a”.
2) Disable swap as much as possible. As root, do “echo vm.swappiness=0 >> /etc/sysctl.conf ” and reboot. This will make sure your system only swapps if it is out of memory.

Well, you are still having problems…

There can be a lot of other reasons. Let’s try the next. Some hardware might be broken. For example, if a HDD is broken, or your RAID array is rebuilding, this will impact performance (and system load). You might want to check your dmesg output ” dmesg | less ” for errors. There might be something in /var/log/messages . In less, you can search using the ‘/’ key. ‘n’ will look for the next appearance, ‘N’ for the previous. Look for sata/sda/sdb/”error” et cetera.

If your HDDs and bios support SMART, you might get info from there. Install smartmontools , find out what devices your file systems are on ( cat /etc/fstab | cut -d’ ‘ -f1 | grep -v none ) and scan them:
$ sudo smartctl -a /dev/sda

For more info on the output of smartctl can be found in this article.

Good luck ;)