Should I attempt to 'balance' my threads or does linux do this?

Linux:

The Linux kernel have a great implementation for the matter and have many features/settings intended to manage the ressources for the running process (over CPU governors, sysctl or cgroup), in such situation tuning those settings along with swap adjustment (if required) is recommended, basically you will be adapting the default functioning mode to your appliance.

Benchmark, stress tests and situation analysis after applying the changes are a must especially on production servers. The performance gain can be very important when the kernel settings are adjusted to the needed usage, on the other hand this require testing and a well understanding of the different settings which is time consuming for an admin.

Linux does use governors to load balance CPU ressources between the running application, many governors are available; depending on your distro's kernel some governor may not be available (rebuilding the kernel can be done to add missing or non upstream governors). you can check what is the current governor, change it and more importantly in this case, tune its settings.

Additional documentations: reading, guide, similar question, frequency scaling, choice of governor, the performance governor and cpufreq.

SysCtl:

Sysctl is a tool for examining and changing kernel parameters at runtime, adjustments can be made permanent with the config file /etc/sysctl.conf, this is an important part of this answer as many kernel settings can be changed with Sysctl, a full list of available settings can be displayed with the command sysctl -a, details are available on this and this article.

Cgroup:

The kernel provide the feature: control groups, which are called by their shorter name cgroups in this guide. Cgroups allow you to allocate resources such as CPU time, system memory, network bandwidth, or combinations of these resources among user-defined groups of tasks (processes) running on a system. You can monitor the cgroups you configure, deny cgroups access to certain resources, and even reconfigure your cgroups dynamically on a running system. The cgconfig (control group config) service can be configured to start up at boot time and reestablish your predefined cgroups, thus making them persistent across reboots.

Source, further reading and question on the matter.

Ram:

This can be useful if the system have a limited amount of ram, otherwise you can disable the swap to mainly use the ram. Swap system can be adjusted per process or with the swappiness settings. If needed the ressources (ram) can be limited per process with ulimit (also used to limit other ressources).

Disk:

Disk I/O settings (I/O Scheduler) may be changed as well as the cluster size.

Alternatives:

Other tools like nice, cpulimit, cpuset, taskset or ulimit can be used as an alternative for the matter.


The best answer to this is "suck it and see"... perform some stress tests and see what gives the best results. That's because very minor nuances in the behaviour of your threads can cause differences in performance.


The following largely based on my own experience...

Where to start?

Linux's ability prevent threads getting starved is pretty good. It doesn't necessarily mean that every thread will get an even share of the pie, but all threads will at least get some pie. If you have two threads contending for CPU time... let's say one trying to use 100% CPU and another trying to use only 10%... then don't be surprised if that balances out at 91% and 9% or somewhere around that.

Overall performance can reduce where a particular resource is heavily over subscribed. This is especially true for disk IO on spinning hard disks. The head has to physically move (seek) between places on disk and continual oscillating between different files can cause significant slow down. But this effect is often fairly small if one thread is heavily IO bound and another would like to do a little IO.

Together these two things mean that it is often better to be 20% over subscribed than 20% under subscribed. In other words, don't reserve CPU time for threads which are not trying to use much CPU.

Eg: If you have CPU bound threads and had disk IO bound threads and you have 8 cores and 1 hard disk, then start with 8 CPU bound threads and one hard disk IO bound thread. 7 and 1 might just leave a core idle most of the time. 8 and 1 will almost certainly not starve the HD thread meaning you fully use both HD and CPU.

The danger of short lived threads

Just be wary that Linux can struggle with a lot of short lived threads. This is more obvious with deliberate attempts to damage a system. But continually spawning threads / processes can push Linux to behave badly.

In your question you have described dedicated worker threads which sound like long lived threads. This sounds like the right approach.

The London Bus Effect

You wait for half an hour for a bus then 5 come along at once. This happens because passengers getting on the front bus slow it down. The lack of passengers on the later busses speed them up causing a bunching effect.

The same problem can exist in threading, especially with threads contending for resources. If you have threads predictably alternating between tasks, for example reading from one disk then writing to another, then they may tend to bunch together rather than stochastic disperse as you may expect. So one resource may slow the use of another. For this reason it can sometimes be better to further subdivide the tasks of a thread.

cgroups

I'll avoid going into too much detail. But I should mention that Linux has a capability called "cgroups" which allow you to group processes and limit their collective resources. This can be very useful in further performance tuning.

There's a short discussion of them here. But I would advise you to spend a bit of time on google to see their full capability because they may help you in the long run.