How can I create an infinite loop that kills a process if something is found in dmesg?

Some issues:

  • You are running this in a busy loop, which will consume as much resources as it can. This is one instance where sleeping could conceivably be justified.
  • However, recent versions of dmesg have a flag to follow the output, so you could rewrite the whole thing as (untested)

    while true
    do
        dmesg --follow | tail --follow --lines=0 | grep --quiet 'BUG: workqueue lockup'
        killall someprocessname
    done
    
  • The code should be indented to be readable.
  • It is really strange, but [ is the same as test - see help [.

A variant of @l0b0's answer:

dmesg --follow | awk '
   /BUG: workqueue lockup/  { system ("killall someprocessname") ; rem="done at each occurrence. You could add further things, like print to a logfile, etc.,"
        }'

This let's awk do the looping, which has some advantages:

  • it will work until that process dies.
  • It also do not call more than 1 killall per occurence of the searchstring "BUG: workqueue lockup", which improves upon the other answer.

To test: You can put this into a script named thescript, and do nohup thescript &, so that thescript will keep running even after you quit your session.

Once you are satisfied it works, kill it, and then you can (instead of running it each time in a shell with nohup) transform it into a daemon script that you can then have started in your current runlevel.

ie: using another script as a model (you need to have at least the start, stop and status sections), you can modify thescript appropriately and then place it within /etc/rc.d/init.d, and have a symlink to it named Sxxthescript under the appropriate(s) /etc/rc.d/rcN, N being a number for your normal runlevel (see the top lines of who -a to know the current run-level). And have the appropriate Kxxthescript symlinks too, in every (or almost every) runlevels, so that the script is appropriately killed when switching runlevels.

Or do "the appropriate things" to have it run/stopped via systemd or any equivalent system your distribution uses.