How do I start threads in plain C?

AFAIK, ANSI C doesn't define threading, but there are various libraries available.

If you are running on Windows, link to msvcrt and use _beginthread or _beginthreadex.

If you are running on other platforms, check out the pthreads library (I'm sure there are others as well).


C11 threads + C11 atomic_int

Added to glibc 2.28. Tested in Ubuntu 18.10 amd64 (comes with glic 2.28) and Ubuntu 18.04 (comes with glibc 2.27) by compiling glibc 2.28 from source: Multiple glibc libraries on a single host

Example adapted from: https://en.cppreference.com/w/c/language/atomic

main.c

#include <stdio.h>
#include <threads.h>
#include <stdatomic.h>

atomic_int atomic_counter;
int non_atomic_counter;

int mythread(void* thr_data) {
    (void)thr_data;
    for(int n = 0; n < 1000; ++n) {
        ++non_atomic_counter;
        ++atomic_counter;
        // for this example, relaxed memory order is sufficient, e.g.
        // atomic_fetch_add_explicit(&atomic_counter, 1, memory_order_relaxed);
    }
    return 0;
}

int main(void) {
    thrd_t thr[10];
    for(int n = 0; n < 10; ++n)
        thrd_create(&thr[n], mythread, NULL);
    for(int n = 0; n < 10; ++n)
        thrd_join(thr[n], NULL);
    printf("atomic     %d\n", atomic_counter);
    printf("non-atomic %d\n", non_atomic_counter);
}

GitHub upstream.

Compile and run:

gcc -ggdb3 -std=c11 -Wall -Wextra -pedantic -o main.out main.c -pthread
./main.out

Possible output:

atomic     10000
non-atomic 4341

The non-atomic counter is very likely to be smaller than the atomic one due to racy access across threads to the non-atomic variable.

See also: How to do an atomic increment and fetch in C?

Disassembly analysis

Disassemble with:

gdb -batch -ex "disassemble/rs mythread" main.out

contains:

17              ++non_atomic_counter;
   0x00000000004007e8 <+8>:     83 05 65 08 20 00 01    addl   $0x1,0x200865(%rip)        # 0x601054 <non_atomic_counter>

18              __atomic_fetch_add(&atomic_counter, 1, __ATOMIC_SEQ_CST);
   0x00000000004007ef <+15>:    f0 83 05 61 08 20 00 01 lock addl $0x1,0x200861(%rip)        # 0x601058 <atomic_counter>

so we see that the atomic increment is done at the instruction level with the f0 lock prefix.

With aarch64-linux-gnu-gcc 8.2.0, we get instead:

11              ++non_atomic_counter;
   0x0000000000000a28 <+24>:    60 00 40 b9     ldr     w0, [x3]
   0x0000000000000a2c <+28>:    00 04 00 11     add     w0, w0, #0x1
   0x0000000000000a30 <+32>:    60 00 00 b9     str     w0, [x3]

12              ++atomic_counter;
   0x0000000000000a34 <+36>:    40 fc 5f 88     ldaxr   w0, [x2]
   0x0000000000000a38 <+40>:    00 04 00 11     add     w0, w0, #0x1
   0x0000000000000a3c <+44>:    40 fc 04 88     stlxr   w4, w0, [x2]
   0x0000000000000a40 <+48>:    a4 ff ff 35     cbnz    w4, 0xa34 <mythread+36>

so the atomic version actually has a cbnz loop that runs until the stlxr store succeed. Note that ARMv8.1 can do all of that with a single LDADD instruction.

This is analogous to what we get with C++ std::atomic: What exactly is std::atomic?

Benchmark

TODO. Crate a benchmark to show that atomic is slower.

POSIX threads

main.c

#define _XOPEN_SOURCE 700
#include <assert.h>
#include <stdlib.h>
#include <pthread.h>

enum CONSTANTS {
    NUM_THREADS = 1000,
    NUM_ITERS = 1000
};

int global = 0;
int fail = 0;
pthread_mutex_t main_thread_mutex = PTHREAD_MUTEX_INITIALIZER;

void* main_thread(void *arg) {
    int i;
    for (i = 0; i < NUM_ITERS; ++i) {
        if (!fail)
            pthread_mutex_lock(&main_thread_mutex);
        global++;
        if (!fail)
            pthread_mutex_unlock(&main_thread_mutex);
    }
    return NULL;
}

int main(int argc, char **argv) {
    pthread_t threads[NUM_THREADS];
    int i;
    fail = argc > 1;
    for (i = 0; i < NUM_THREADS; ++i)
        pthread_create(&threads[i], NULL, main_thread, NULL);
    for (i = 0; i < NUM_THREADS; ++i)
        pthread_join(threads[i], NULL);
    assert(global == NUM_THREADS * NUM_ITERS);
    return EXIT_SUCCESS;
}

Compile and run:

gcc -std=c99 -Wall -Wextra -pedantic -o main.out main.c -pthread
./main.out
./main.out 1

The first run works fine, the second fails due to missing synchronization.

There don't seem to be POSIX standardized atomic operations: UNIX Portable Atomic Operations

Tested on Ubuntu 18.04. GitHub upstream.

GCC __atomic_* built-ins

For those that don't have C11, you can achieve atomic increments with the __atomic_* GCC extensions.

main.c

#define _XOPEN_SOURCE 700
#include <pthread.h>
#include <stdatomic.h>
#include <stdio.h>
#include <stdlib.h>

enum Constants {
    NUM_THREADS = 1000,
};

int atomic_counter;
int non_atomic_counter;

void* mythread(void *arg) {
    (void)arg;
    for (int n = 0; n < 1000; ++n) {
        ++non_atomic_counter;
        __atomic_fetch_add(&atomic_counter, 1, __ATOMIC_SEQ_CST);
    }
    return NULL;
}

int main(void) {
    int i;
    pthread_t threads[NUM_THREADS];
    for (i = 0; i < NUM_THREADS; ++i)
        pthread_create(&threads[i], NULL, mythread, NULL);
    for (i = 0; i < NUM_THREADS; ++i)
        pthread_join(threads[i], NULL);
    printf("atomic     %d\n", atomic_counter);
    printf("non-atomic %d\n", non_atomic_counter);
}

Compile and run:

gcc -ggdb3 -O3 -std=c99 -Wall -Wextra -pedantic -o main.out main.c -pthread
./main.out

Output and generated assembly: the same as the "C11 threads" example.

Tested in Ubuntu 16.04 amd64, GCC 6.4.0.


Since you mentioned fork() I assume you're on a Unix-like system, in which case POSIX threads (usually referred to as pthreads) are what you want to use.

Specifically, pthread_create() is the function you need to create a new thread. Its arguments are:

int  pthread_create(pthread_t  *  thread, pthread_attr_t * attr, void *
   (*start_routine)(void *), void * arg);

The first argument is the returned pointer to the thread id. The second argument is the thread arguments, which can be NULL unless you want to start the thread with a specific priority. The third argument is the function executed by the thread. The fourth argument is the single argument passed to the thread function when it is executed.