On support websites I am noticing there is a lot of confusion about system load. This post is my attempt to make a short guide to understanding (and fixing) high Linux system load. Disclaimer: I might be wrong ;).
Get your current sysload:
$ w 18:54:44 up 22 days, 22:47, 1 user, load average: 2.54, 2.69, 2.40 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT karlo pts/0 5ed07f32.cm-7-1b 18:49 0.00s 0.09s 0.00s w $ cat /proc/loadavg 2.49 2.68 2.39 1/149 2085
As you can see here, I have a system load of around 2.5. This means before an interrupt (program call) is done, it has to wait for 2.5 tasks. It’s like a queue. On a 20 core system, a system load of 2 would be no problem, but I have 2 cores:
$ cat /proc/cpuinfo | grep processor | wc -l 2
So. What to do?
Possible cause: CPU usage
First, let’s see if there are processes which use a lot of CPU. Use top (or, better!, htop) to see processes. If you are using htop, this is very easy. Start htop, press ‘<' or '>‘ and choose CPU%. This will sort by CPU usage percent. If there is a process which consumes a lot of CPU, this is probably the evildoer.
Possible cause 2: high IO
High input/output on your disks can lead to a high sysload. I use iostat and iotop to investigate this. You might have to install them first (which can be a pain on a system with high cpu load). On RHEL based systems, iostat is in the package sysstat.
To get the current IO, try “iostat 1”:
$ iostat 1 Linux 3.2.2 (srv02.leanback.eu) 06/26/2013 _x86_64_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.53 0.14 1.19 2.73 0.25 95.16 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvdap1 0.84 2.63 2.59 5222348 5128012 xvdap2 5.87 33.14 44.00 65726181 87260968 avg-cpu: %user %nice %system %iowait %steal %idle 0.50 0.00 0.50 75.38 0.00 23.62 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvdap1 0.00 0.00 0.00 0 0 xvdap2 46.00 0.00 788.00 0 788 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.00 49.49 0.00 50.51 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvdap1 0.00 0.00 0.00 0 0 xvdap2 7.00 0.00 32.00 0 32 ^C
If you have high read/write numbers, this is the problem. Probably. Investigate what process is causing it with iotop (we need sudo):
Total DISK READ: 96.98 K/s | Total DISK WRITE: 1318.98 K/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 766 be/4 root 0.00 B/s 50.43 K/s 0.00 % 12.26 % [kjournald] 1321 be/4 root 31.03 K/s 0.00 B/s 1.29 % 0.00 % rsyslogd -n -c 5 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
The process on top is the one which uses most IO. There might be something wrong there. Kill it. With fire (-9).
Not fixed yet? Is your system swapping?
If your system is using swap, this can lead to a high system load. Check out your memory usage with ‘free’:
$ free -m total used free shared buffers cached Mem: 990 958 32 0 27 276 -/+ buffers/cache: 653 336 Swap: 1023 73 950
As can be seen above, I have a total of about 1GB RAM on my VPS. There is only 32MB left. Meh. And it has used Swap (see the Swap row, and then under the ‘free’ column). This is not a good sign. Scroll back to the part about htop, and sort on memory. This will tell you what is using your precious RAM.
If that is done, you can:
1) Clear your SWAP. Only so this if you have enough free memory. As root, do: “swapoff -a; swapon -a”.
2) Disable swap as much as possible. As root, do “echo vm.swappiness=0 >> /etc/sysctl.conf ” and reboot. This will make sure your system only swapps if it is out of memory.
Well, you are still having problems…
There can be a lot of other reasons. Let’s try the next. Some hardware might be broken. For example, if a HDD is broken, or your RAID array is rebuilding, this will impact performance (and system load). You might want to check your dmesg output ” dmesg | less ” for errors. There might be something in /var/log/messages . In less, you can search using the ‘/’ key. ‘n’ will look for the next appearance, ‘N’ for the previous. Look for sata/sda/sdb/”error” et cetera.
If your HDDs and bios support SMART, you might get info from there. Install smartmontools , find out what devices your file systems are on ( cat /etc/fstab | cut -d’ ‘ -f1 | grep -v none ) and scan them:
$ sudo smartctl -a /dev/sda
For more info on the output of smartctl can be found in this article.
Good luck ;)