Running user applications is the reason to have a OS, and the process management is a crucial part of any kernel.
The Process#
A program(object code stored on some media) in execution is a process.
A process also includes a set of resources such as open files and pending signals, internal kernel data, processor state, a memory address space, one or more threads and a data section for global variables.
More details
Open files: Each process can have files open for reading, writing, or both. The operating system tracks these files using file descriptors, allowing the process to interact with files, devices, or sockets.
Pending signals: Signals are asynchronous notifications sent to a process to notify it of events (like interrupts or exceptions). Pending signals are those that have been sent but not yet handled by the process.
Internal kernel data: The kernel maintains various data structures for each process, such as process control blocks (PCB), scheduling information, and security credentials, which help manage and track the process.
Processor state: This includes the values of CPU registers, program counter, stack pointer, and other hardware-specific information. The processor state is saved and restored during context switches.
Memory address space: Each process has its own virtual memory space, which includes code, data, stack, and heap segments. This isolation ensures processes do not interfere with each other’s memory.
Threads: A process may contain one or more threads, which are independent sequences of execution within the same address space. Threads share resources but have separate execution contexts.
Data section for global variables: This section of memory stores global and static variables used by the process. It is initialized when the process starts and persists throughout its lifetime.
Threads
- A thread is essentially an execution context within a process.
- It shares the same address space (code, data, heap, open files, etc.) as other threads in that process — but each thread must still be able to run independently.
- Each thread has its own stack.
- Kernel schedules individual threads, not processes.
- In Linux, a thread is just a special kind of process. Both are represented by the same
data structure in the kernel:
task_struct. - CPU cores and threads:
- The number of CPU cores determines how many threads can run at the same instant in parallel, but not how many threads you can have total.
- You can create thousands of threads on a 4-core CPU. But only 4 threads can physically execute at once.
- Thread limitations:
- Memory per thread stack: Each thread typically reserves 8 MB by default (
ulimit -sorpthread_attr_setstacksize). - OS resource limits: Linux has per-process limits
(
ulimit -u, /proc/sys/kernel/threads-max). - Scheduler overhead: More threads mean more context-switching, which can hurt performance.
- Memory per thread stack: Each thread typically reserves 8 MB by default (
- So high-performance systems often use thread pools or async I/O instead of spawning thousands of threads.
Virtualizations
- Processes provide 2 virtualizations:
- Virtualized processor: which gives the process that illusion that it alone is running on the processor.
- Virtualized memory: which gives the process the illusion that it alone is using the entire memory.
- Threads share the virtual memory abstraction, whereas each receives its own virtualized processor.
- Threads share the same virtual address space within a process. It means, each thread sees the same heap, global variables, and code.
- Each receives its own virtual CPU - its own set of CPU registers, program counter,
and stack, which gives each thread the illusion that it had its own processor core.
- In reality, the kernel scheduler rapidly switches which thread’s state (registers, PC, stack) is loaded into the physical CPU — this is context switching.
- Processes provide 2 virtualizations:
fork()system call- This is how you create new processes in Linux, by duplicating an existing one.
- The process that calls the
fork()is the parent, whereas the new process is the child. - The parent resumes execution and child starts executing at the same place where the call
to
fork()returns. - The
fork()returns from the kernel twice: once in the parent process and again in the child: So afterfork():- In the parent,
fork()returns the child’s PID (> 0). - In the child,
fork()returns 0.
- In the parent,
- Sometimes, after a fork it is desirable to execute a new, different program. The
exec()family of function calls creates a new address space and loads a new program into it.
exit() and wait4()system call- A program exits via the
exit()system call. The function terminates the process and frees all its resources. - The parent process can inquire about the status of a terminated child via
wait4(), which enables a process to wait for termination of a child process and retrieve its exit status. - When a child process terminates, it becomes a zombie until the parent calls
wait4()using C library functions likewait()orwaitpid().
- A program exits via the
Process Descriptor and Task Structure#
The kernel stores the list of processes in a circular doubly linked list called the task list.
Each element in the list is a process descriptor of the type
struct task_struct, defined in<linux/sched.h>. It contains all the info about a specific process.The size of
task_structin my machine at the time of writing this is:sudo cat /sys/kernel/slab/task_struct/object_size 6872Which is around: 6.715 KB
NOTE: Linux uses a specialized memory caching system called the Slab allocator.
task_structis aka Process descriptor.NOTE:
schedis short for scheduler or scheduling.
Allocating the Process Descriptor#
Evolution:
- Pre 2.6 kernel series:
struct task_structwas stored at the end of the kernel stack of each process.- Architectures like x86(with few registers) could calculate location of the process descriptor via stack pointer without wasting an extra general-purpose register.
- 2.6 - 3.x (early 2000s):
thread_infointroduced, which was kept at the bottom of the kernel stack.thread_infocontained a pointer to thetask_struct.
- Modern kernels (4.x and later):
thread_infoembedded insidetask_struct.- Register-based addressing to access
task_structdirectly:- x86_64:
GSsegment register points totask_struct. - ARM64:
TPIDR_EL1register used for the same purpose.
- x86_64:
Table summary
| Era | Kernel versions | Design | How the kernel finds current | Key reason |
|---|---|---|---|---|
| 1️⃣ Pre–2.6 | 2.4 and earlier | task_struct physically stored at the end (or beginning) of each process’s kernel stack | Mask the stack pointer to find base address → cast to task_struct * | Save memory and registers |
| 2️⃣ 2.6 – 3.x | Early 2000s | task_struct moved to slab allocator; small thread_info placed at base of stack (contains pointer to task_struct) | Mask stack pointer → get thread_info → deref ->task | Needed flexibility + reuse (via slab) while keeping fast lookup |
| 3️⃣ Modern kernels (4.x, 5.x, 6.x) | Today | thread_info embedded inside task_struct; stack allocated separately; per-CPU registers (like %gs or TPIDR_EL1) point to current_task | Read from per-CPU register (%gs:OFFSET_current_task on x86-64, TPIDR_EL1 on ARM64) | Security, portability, extensibility |
Storing the Process Descriptor#
- System identifies each process by a unique integer called the process ID (PID).
- Evolution of PIDs:
- PID is
intin the kernel code. But the hard limit earlier was 32,768 (short int). - Modern Linux systems use a 22-bit PID, allowing up to 4,194,303 processes.
- The maximum PID can be viewed or modified via
/proc/sys/kernel/pid_max.
- PID is
- The kernel maintains a PID namespace to manage PIDs, allowing multiple processes to have the same PID in different namespaces (useful for containers).
- The kernel uses a hash table to map PIDs to their corresponding
task_struct. - Each
task_structcontains apidfield storing the process’s PID. - Tasks are typically referenced directly by a pointer to their
task_struct. And it is very useful to quickly lookup the process descriptor of the currently executing task. This is done using thecurrentmacro. - Underneath the
currentmacro has to be implemented for each architecture independently. You can find the source code for it at:arch/x86/include/asm/current.h arch/arm64/include/asm/current.h arch/riscv/include/asm/current.h arch/powerpc/include/asm/current.h - On modern systems to access the current task descriptor. Just use the
currentmacro.Note: There is no current -> task reference anymore as this design is changed in modern kernels.
Process States#
__state field in task_struct indicates the current state of the process:
- Main states are:
TASK_RUNNING: Process is either running or ready to run. Only possible state for a process executing in user-space.TASK_INTERRUPTIBLE: Process is sleeping but can be awakened by signals.TASK_UNINTERRUPTIBLE: Process is sleeping and cannot be awakened by signals.__TASK_TRACED: Process is being traced (e.g., by a debugger).__TASK_STOPPED: Process is stopped (e.g., viaSIGSTOP).
- There are many more states, we’ll learn about them later.
- Never modify the
__statefield directly. Always use the provided kernel functions/macros to change process states. - These are some:
API Exists now? Barrier? Typical use set_current_state(state)✅ Yes Before sleeping __set_current_state(state)✅ No When safe (after wake) set_task_state(task, state)✅ (less used) Yes Scheduler / freezer internals Direct assignment task->state = ...⚠️ Avoid N/A Only very low-level code
Process Context#
Program code: loaded from an executable file into the process’s address space.
User space: normal program execution happens here (unprivileged mode).
Kernel space entered when:
- a system call is made, or
- an exception/fault occurs.
When in kernel space due to a process -> kernel is “executing on behalf of the process.”
- This is called process context.
- The
currentmacro is valid (points to that task’stask_struct).
On returning from the kernel:
- process continues in user space, unless the scheduler picks a higher-priority task.
Interrupt context:
- triggered by hardware/soft interrupts, not tied to any process.
currentstill points to the interrupted task but is not meaningful here.- No sleeping or scheduling allowed.
Process Family Tree#
All processes are descendants of the
initprocess (PID 1).Kernel starts the
initprocess during the last stage of booting.initreads the init scripts and starts other processes.The
inittask’s process descriptor is statically allocated asinit_taskininit/init_task.c.Each process has a parent process (except
init) and may have multiple child processes.Since, task list is a circular doubly linked list, each process can access its parent and children.
The parent-child relationship is maintained via
parentandchildrenfields intask_struct.More details
Each process has:
children: head of a linked list containing all its child processes.sibling: entry in the parent’s children list.parent: pointer to its immediate parent process (task_struct).
Some useful Macros/functions to navigate:
next_task(task): Get the next task in the task list.prev_task(task): Get the previous task in the task list.for_each_process(task): Iterate over all processes.for_each_thread(task, thread): Iterate over each thread in a process.
… and many more.
NOTE: It is expensive to traverse the entire task list. Avoid doing it unless absolutely necessary.
References#
- Linux Kernel Development (3rd edition) by Robert Love.