Kipmi0 eating up to 99.8% cpu on centos 6.4

According to the IPMI Document:

this thread can use a lot of CPU depending on the interface's performance. This can waste a lot of CPU and cause various issues with detecting idle CPU and using extra power. To avoid this, the kipmid_max_busy_us sets the maximum amount of time, in microseconds, that kipmid will spin before sleeping for a tick. This value sets a balance between performance and CPU waste and needs to be tuned to your needs. Maybe, someday, auto-tuning will be added, but that's not a simple thing and even the auto-tuning would need to be tuned to the user's desired performance.

So,we can execute this command to set the kipmid_max_busy_us parameter:

echo 100 > /sys/module/ipmi_si/parameters/kipmid_max_busy_us

In our system, after setting this parameter, the cpu of kipmi0 decreased to 15%.

You can try this.

To make the changes persistent you can configure the options for the ipmi_si kernel module.
Create a file in /etc/modprobe.d/, i.e./etc/modprobe.d/ipmi.conf, and add the following content:
# Prevent kipmi0 from consuming 100% CPU
options ipmi_si kipmid_max_busy_us=100

Now every time the ipmi_si kernel module is loaded into the kernel, the parameter should be automatically and correctly set.


Debugging the issue

Are the other systems identical to this system? You're going to have to determine that they are. There has to be something that's fundamentally different between them. Firmware? Same RPM versions?

You can use tools such as lshw, dmidecode, and looking at the dmesg log for clues as to what's different and what's the root cause.

I'd get a good baseline of the RPMs installed by running this command on one of the systems that's not exhibiting this issue and the one that is and compare the package lists to make sure they're all at the same versions.

 # machine #1
 $ rpm -aq | sort -rn > machine1_rpms.txt

 # machine #2
 $ rpm -aq | sort -rn > machine2_rpms.txt     

Then get the files on the same machine and do an sdiff of the 2 files:

 sdiff machine1_rpms.txt machine2_rpms.txt

Potential cause #1

The IBM website had this technote titled: Kipmi0 May Show Increased CPU Utilization on Linux, regarding this issue. According to this issue you can essentially ignore the problem.

description of issue

The kipmi0 process may show increased CPU utilization in Linux. The utilization may increase up to 100% when the IPMI (Intelligent Platform Management Interface) device, such as a BMC (Baseboard Management Controller) or IMM (Integrated Management Controller) is busy or non-responsive.

Fix

No fix required. You should ignore increased CPU utilization as it has no impact on actual system performance.

Work-around

  1. If using an IPMI device, reset the BMC or reboot the system.
  2. If not using an IPMI device, stop the IPMI service by issuing the following command:

    service ipmi stop

Potential solution #2

I found this post on someones blog simply titled: kipmi0 problem. This problem sounded identical to yours. The issue was traced to an issue with 2 kernel modules that were getting loaded as part of the lm_sensors package.

These were the 2 kernel modules:

  • ipmi_si
  • ipmi_msghandler

Work-around

You can manually remove these with the following commands:

rmmod ipmi_msghandler
rmmod ipmi_si

To make this fix permanent, you'lll need to disable the loading of these particular kernel modules within one of the lm_sensors configuration files, by commenting them out like so:

# /etc/sysconfig/lm_sensors
# MODULE_0=ipmi-si
# MODULE_1=ipmisensors
# MODULE_2=coretemp

Restart lm_sensors after making these changes:

/etc/init.d/lm_sensors

Tags:

Centos