Are threads implemented as processes on Linux?

I think this part of the clone(2) man page may clear up the difference re. the PID:

CLONE_THREAD (since Linux 2.4.0-test8)
If CLONE_THREAD is set, the child is placed in the same thread group as the calling process.
Thread groups were a feature added in Linux 2.4 to support the POSIX threads notion of a set of threads that share a single PID. Internally, this shared PID is the so-called thread group identifier (TGID) for the thread group. Since Linux 2.4, calls to getpid(2) return the TGID of the caller.

The "threads are implemented as processes" phrase refers to the issue of threads having had separate PIDs in the past. Basically, Linux originally didn't have threads within a process, just separate processes (with separate PIDs) that might have had some shared resources, like virtual memory or file descriptors. CLONE_THREAD and the separation of process ID(*) and thread ID make the Linux behaviour look more like other systems and more like the POSIX requirements in this sense. Though technically the OS still doesn't have separate implementations for threads and processes.

Signal handling was another problematic area with the old implementation, this is described in more detail in the paper @FooF refers to in their answer.

As noted in the comments, Linux 2.4 was also released in 2001, the same year as the book, so it's not surprising the news didn't get to that print.


You are right, indeed "something must have changed between 2001 and now". The book you are reading describes the world according to the first historical implementation of POSIX threads on Linux, called LinuxThreads (see also Wikipedia article for some).

LinuxThreads had some compatibility issues with POSIX standard - for example threads not sharing PIDs - and some other serious problems. To fix these flaws, another implementation called NPTL (Native POSIX Thread Library) was spearheaded by Red Hat to add necessary kernel and user space library support to reach better POSIX compliance (taking good parts from yet another competing reimplementation project by IBM called NGPT ("Next Generation Posix Threads"), see Wikipedia article on NPTL). The additional flags added to the clone(2) system call (notably CLONE_THREAD that @ikkkachu points out in his answer) is probably the most evident part of the kernel modifications. The user space part of the work eventually was incorporated into GNU C Library.

Still nowadays some embedded Linux SDKs use the old LinuxThreads implementation because they are using smaller memory footprint version of LibC called uClibc (also called µClibc), and it took a great number of years before the NPTL user space implementation from GNU LibC was ported and assumed as the default POSIX threading implementation, as generally speaking these special platforms do not strive to follow the newest fashions in lightning speed. The use of LinuxThreads implementation in operation can be observed by noticing that, indeed, PIDs for different threads on those platforms are different unlike the POSIX standard specifies - just like the book you are reading describes. Actually, once you called pthread_create(), you suddenly had increased the process count from one to three as additional process was needed to keep the mess together.

The Linux pthreads(7) manual page provides a comprehensive and interesting overview of the differences between the two. Another enlightening, though out-of-date, description of the differences is this paper by Ulrich Depper and Ingo Molnar about the design of NPTL.

I recommend you to not take that part of the book too seriously. I instead recommend Butenhof's Programming POSIX threads and POSIX and Linux manual pages about the subject. Many tutorials on the subject are inaccurate.


(Userspace) threads are not implemented as processes as such on Linux, in that that they do not have their own private address space, they still share the address space of the parent process.

However, these threads are implemented to use the kernel process accounting system, so are allocated their own Thread ID (TID), but are given the same PID and 'thread group ID' (TGID) as the parent process - this is in contrast to a fork, where a new TGID and PID are created, and the TID is the same as the PID.

So it appears that recent kernels had a separate TID that can be queried, it is this that is different for threads, a suitable code snippet to show this in each of the main() thread_function() above is:

    long tid = syscall(SYS_gettid);
    printf("%ld\n", tid);

So the entire code with this is would be:

#include <pthread.h>                                                                                                                                          
#include <stdio.h>                                                                                                                                            
#include <unistd.h>                                                                                                                                           
#include <syscall.h>                                                                                                                                          

void* thread_function (void* arg)                                                                                                                             
{                                                                                                                                                             
    long tid = syscall(SYS_gettid);                                                                                                                           
    printf("child thread TID is %ld\n", tid);                                                                                                                 
    fprintf (stderr, "child thread pid is %d\n", (int) getpid ());                                                                                            
    /* Spin forever. */                                                                                                                                       
    while (1);                                                                                                                                                
    return NULL;                                                                                                                                              
}                                                                                                                                                             

int main ()                                                                                                                                                   
{                                                                                                                                               
    pthread_t thread;                                                                               
    long tid = syscall(SYS_gettid);     
    printf("main TID is %ld\n", tid);                                                                                             
    fprintf (stderr, "main thread pid is %d\n", (int) getpid ());                                                    
    pthread_create (&thread, NULL, &thread_function, NULL);                                           
    /* Spin forever. */                                                                                                                                       
    while (1);                                                                                                                                                
    return 0;                                                                                                                                                 
} 

Giving an example output of:

main TID is 17963
main thread pid is 17963
thread TID is 17964
child thread pid is 17963