Which real-time priority is the highest priority in Linux

Short Answer

99 will be the winner for real time priority.

PR is the priority level (range -100 to 39). The lower the PR, the higher the priority of the process will be.

PR is calculated as follows:

  • for normal processes: PR = 20 + NI (NI is nice and ranges from -20 to 19)
  • for real time processes: PR = - 1 - real_time_priority (real_time_priority ranges from 1 to 99)

Long Answer

There are 2 types of processes, the normal ones and the real time For the normal ones (and only for those), nice is applied as follows:

Nice

The "niceness" scale goes from -20 to 19, whereas -20 it's the highest priority and 19 the lowest priority. The priority level is calculated as follows:

PR = 20 + NI

Where NI is the nice level and PR is the priority level. So as we can see, the -20 actually maps to 0, while the 19 maps to 39.

By default, a program nice value is 0 bit it is possible for a root user to lunch programs with a specified nice value by using the following command:

nice -n <nice_value> ./myProgram 

Real Time

We could go even further. The nice priority is actually used for user programs. Whereas the UNIX/LINUX overall priority has a range of 140 values, nice value enables the process to map to the last part of the range (from 100 to 139). This equation leaves the values from 0 to 99 unreachable which will correspond to a negative PR level (from -100 to -1). To be able to access to those values, the process should be stated as "real time".

There are 5 scheduling policies in a LINUX environment that can be displayed with the following command:

chrt -m 

Which will show the following list:

1. SCHED_OTHER   the standard round-robin time-sharing policy
2. SCHED_BATCH   for "batch" style execution of processes
3. SCHED_IDLE    for running very low priority background jobs.
4. SCHED_FIFO    a first-in, first-out policy
5. SCHED_RR      a round-robin policy

The scheduling processes could be divided into 2 groups, the normal scheduling policies (1 to 3) and the real time scheduling policies (4 and 5). The real time processes will always have priority over normal processes. A real time process could be called using the following command (The example is how to declare a SCHED_RR policy):

chrt --rr <priority between 1-99> ./myProgram

To obtain the PR value for a real time process the following equation is applied:

PR = -1 - rt_prior

Where rt_prior corresponds to the priority between 1 and 99. For that reason the process which will have the higher priority over other processes will be the one called with the number 99.

It is important to note that for real time processes, the nice value is not used.

To see the current "niceness" and PR value of a process the following command can be executed:

top

Which shows the following output:

enter image description here

In the figure the PR and NI values are displayed. It is good to note the process with PR value -51 that corresponds to a real time value. There are also some processes whose PR value is stated as "rt". This value actually corresponds to a PR value of -100.


This comment in sched.h is pretty definitive:

/*
 * Priority of a process goes from 0..MAX_PRIO-1, valid RT
 * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
 * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
 * values are inverted: lower p->prio value means higher priority.
 *
 * The MAX_USER_RT_PRIO value allows the actual maximum
 * RT priority to be separate from the value exported to
 * user-space.  This allows kernel threads to set their
 * priority to a value higher than any user task. Note:
 * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
 */

Note this part:

Priority values are inverted: lower p->prio value means higher priority.


I did an experiment to nail this down, as follows:

  • process1: RT priority = 40, CPU affinity = CPU 0. This process "spins" for 10 seconds so it won't let any lower-priority process run on CPU 0.

  • process2: RT priority = 39, CPU affinity = CPU 0. This process prints a message to stdout every 0.5 second, sleeping in between. It prints out the elapsed time with each message.

I'm running a 2.6.33 kernel with the PREEMPT_RT patch.

To run the experiment, I run process2 in one window (as root) and then start process1 (as root) in another window. The result is process1 appears to preempt process2, not allowing it to run for a full 10 seconds.

In a second experiment, I change process2's RT priority to 41. In this case, process2 is not preempted by process1.

This experiment shows that a larger RT priority value in sched_setscheduler() has a higher priority. This appears to contradict what Michael Foukarakis pointed out from sched.h, but actually it does not. In sched.c in the kernel source, we have:

static void
__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
{
        BUG_ON(p->se.on_rq);

        p->policy = policy;
        p->rt_priority = prio;
        p->normal_prio = normal_prio(p);
        /* we are holding p->pi_lock already */
        p->prio = rt_mutex_getprio(p);
        if (rt_prio(p->prio))
                p->sched_class = &rt_sched_class;
        else
                p->sched_class = &fair_sched_class;
        set_load_weight(p);
}

rt_mutex_getprio(p) does the following:

return task->normal_prio;

While normal_prio() happens to do the following:

prio = MAX_RT_PRIO-1 - p->rt_priority;  /* <===== notice! */
...
return prio;

In other words, we have (my own interpretation):

p->prio = p->normal_prio = MAX_RT_PRIO - 1 - p->rt_priority

Wow! That is confusing! To summarize:

  • With p->prio, a smaller value preempts a larger value.

  • With p->rt_priority, a larger value preempts a smaller value. This is the real-time priority set using sched_setscheduler().