Why do Rust mutexes not seem to give the lock to the thread that wanted to lock it last?

Mutex and RwLock both defer to OS-specific primitives and cannot be guaranteed to be fair. On Windows, they are both implemented with SRW locks which are specifically documented as not fair. I didn't do research for other operating systems but you definitely cannot rely on fairness with std::sync::Mutex, especially if you need this code to be portable.

A possible solution in Rust is the Mutex implementation provided by the parking_lot crate, which provides an unlock_fair method, which is implemented with a fair algorithm.

From the parking_lot documentation:

By default, mutexes are unfair and allow the current thread to re-lock the mutex before another has the chance to acquire the lock, even if that thread has been blocked on the mutex for a long time. This is the default because it allows much higher throughput as it avoids forcing a context switch on every mutex unlock. This can result in one thread acquiring a mutex many more times than other threads.

However in some cases it can be beneficial to ensure fairness by forcing the lock to pass on to a waiting thread if there is one. This is done by using this method instead of dropping the MutexGuard normally.

While parking_lot::Mutex doesn't claim to be fair without specifically using the unlock_fair method, I found that your code produced the same number of pings as pongs, by just making that switch (playground), not even using the unlock_fair method.

Usually mutexes are unlocked automatically, when a guard goes out of scope. To make it unlock fairly, you need to insert this method call before the guard is dropped:

let b = thread::spawn(move || loop {
    let mut data = data1.lock();
    thread::sleep(ten_millis);
    println!("pong ");
    *data += 1;
    if *data > 10 {
        break;
    }
    MutexGuard::unlock_fair(data);
});

The order of locking the mutex is not guaranteed in any way; it's possible for the first thread to acquire the lock 100% of the time, while the second thread 0%

The threads are scheduled by the OS and the following scenario is quite possible:

  1. the OS gives CPU time to the first thread and it acquires the lock
  2. the OS gives CPU time to the second thread, but the lock is taken, hence it goes to sleep
  3. The fist thread releases the lock, but is still allowed to run by the OS. It goes for another iteration of the loop and re-acquires the lock
  4. The other thread cannot proceed, because the lock is still taken.

If you give the second thread more time to acquire the lock you will see the expected ping-pong pattern, although there is no guarantee (a bad OS may decide to never give CPU time to some of your threads):

use std::sync::{Arc, Mutex};
use std::{thread, time};

fn main() {
    let data1 = Arc::new(Mutex::new(1));
    let data2 = data1.clone();

    let ten_millis = time::Duration::from_millis(10);

    let a = thread::spawn(move || loop {
        let mut data = data1.lock().unwrap();
        *data += 1;
        if *data > 10 {
            break;
        }

        drop(data);
        thread::sleep(ten_millis);
        println!("ping ");
    });

    let b = thread::spawn(move || loop {
        let mut data = data2.lock().unwrap();
        *data += 1;
        if *data > 10 {
            break;
        }

        drop(data);
        thread::sleep(ten_millis);
        println!("pong ");
    });

    a.join().unwrap();
    b.join().unwrap();
}

You can verify that by playing with the sleep time. The lower the sleep time, the more irregular the ping-pong alternations will be, and with values as low as 10ms, you may see ping-ping-pong, etc.

Essentially, a solution based on time is bad by design. You can guarantee that "ping" will be followed by "pong" by improving the algorithm. For instance you can print "ping" on odd numbers and "pong" on even numbers:

use std::sync::{Arc, Mutex};
use std::{thread, time};

const MAX_ITER: i32 = 10;

fn main() {
    let data1 = Arc::new(Mutex::new(1));
    let data2 = data1.clone();

    let ten_millis = time::Duration::from_millis(10);

    let a = thread::spawn(move || 'outer: loop {
        loop {
            thread::sleep(ten_millis);
            let mut data = data1.lock().unwrap();
            if *data > MAX_ITER {
                break 'outer;
            }

            if *data & 1 == 1 {
                *data += 1;
                println!("ping ");
                break;
            }
        }
    });

    let b = thread::spawn(move || 'outer: loop {
        loop {
            thread::sleep(ten_millis);
            let mut data = data2.lock().unwrap();
            if *data > MAX_ITER {
                break 'outer;
            }

            if *data & 1 == 0 {
                *data += 1;
                println!("pong ");
                break;
            }
        }
    });

    a.join().unwrap();
    b.join().unwrap();
}

This isn't the best implementation, but I tried to do it with as few modifications as possible to the original code.

You may also consider an implementation with a Condvar, a better solution, in my opinion, as it avoids the busy waiting on the mutex (ps: also removed the code duplication):

use std::sync::{Arc, Mutex, Condvar};
use std::thread;

const MAX_ITER: i32 = 10;

fn main() {
    let cv1 = Arc::new((Condvar::new(), Mutex::new(1)));
    let cv2 = cv1.clone();

    let a = thread::spawn(ping_pong_task("ping", cv1, |x| x & 1 == 1));
    let b = thread::spawn(ping_pong_task("pong", cv2, |x| x & 1 == 0));

    a.join().unwrap();
    b.join().unwrap();
}

fn ping_pong_task<S: Into<String>>(
        msg: S, 
        cv: Arc<(Condvar, Mutex<i32>)>, 
        check: impl Fn(i32) -> bool) -> impl Fn() 
{
    let message = msg.into();

    move || {
        let (condvar, mutex) = &*cv;

        let mut value = mutex.lock().unwrap();
        loop {
            if check(*value) {
                println!("{} ", message);
                *value += 1;
                condvar.notify_all();
            }

            if *value > MAX_ITER {
                break;
            }

            value = condvar.wait(value).unwrap();
        }
    }
}

I was of the belief that mutexes had some kind of way to determine who wanted to lock it last but it doesn’t look like that’s the case.

Nope. The job of a mutex is just to make the code run as fast as possible. Alternation gives the worst performance because you're constantly blowing out the CPU caches. You are asking for the worst possible implementation of a mutex.

How does the locking actually work?

The scheduler tries to get as much work done as possible. It's your job to write code that only does the work you really want to get done.

How can I get the desired output?

Don't use two threads if you just want to do one thing then something else then the first thing again. Use threads when you don't care about the order in which work is done and just want to get as much work done as possible.