Process Management

Running user applications is the reason to have a OS, and the process management is a crucial part of any kernel.

The Process
#

A program(object code stored on some media) in execution is a process.
A process also includes a set of resources such as open files and pending signals, internal kernel data, processor state, a memory address space, one or more threads and a data section for global variables.
More details
- Open files: Each process can have files open for reading, writing, or both. The operating system tracks these files using file descriptors, allowing the process to interact with files, devices, or sockets.
- Pending signals: Signals are asynchronous notifications sent to a process to notify it of events (like interrupts or exceptions). Pending signals are those that have been sent but not yet handled by the process.
- Internal kernel data: The kernel maintains various data structures for each process, such as process control blocks (PCB), scheduling information, and security credentials, which help manage and track the process.
- Processor state: This includes the values of CPU registers, program counter, stack pointer, and other hardware-specific information. The processor state is saved and restored during context switches.
- Memory address space: Each process has its own virtual memory space, which includes code, data, stack, and heap segments. This isolation ensures processes do not interfere with each other’s memory.
- Threads: A process may contain one or more threads, which are independent sequences of execution within the same address space. Threads share resources but have separate execution contexts.
- Data section for global variables: This section of memory stores global and static variables used by the process. It is initialized when the process starts and persists throughout its lifetime.
Threads
- A thread is essentially an execution context within a process.
- It shares the same address space (code, data, heap, open files, etc.) as other threads in that process — but each thread must still be able to run independently.
- Each thread has its own stack.
- Kernel schedules individual threads, not processes.
- In Linux, a thread is just a special kind of process. Both are represented by the same data structure in the kernel: task_struct.
- CPU cores and threads:
  - The number of CPU cores determines how many threads can run at the same instant in parallel, but not how many threads you can have total.
  - You can create thousands of threads on a 4-core CPU. But only 4 threads can physically execute at once.
  - Thread limitations:
    - Memory per thread stack: Each thread typically reserves 8 MB by default (ulimit -s or pthread_attr_setstacksize).
    - OS resource limits: Linux has per-process limits (ulimit -u, /proc/sys/kernel/threads-max).
    - Scheduler overhead: More threads mean more context-switching, which can hurt performance.
  - So high-performance systems often use thread pools or async I/O instead of spawning thousands of threads.
Virtualizations
- Processes provide 2 virtualizations:
  - Virtualized processor: which gives the process that illusion that it alone is running on the processor.
  - Virtualized memory: which gives the process the illusion that it alone is using the entire memory.
- Threads share the virtual memory abstraction, whereas each receives its own virtualized processor.
  - Threads share the same virtual address space within a process. It means, each thread sees the same heap, global variables, and code.
  - Each receives its own virtual CPU - its own set of CPU registers, program counter, and stack, which gives each thread the illusion that it had its own processor core.
    - In reality, the kernel scheduler rapidly switches which thread’s state (registers, PC, stack) is loaded into the physical CPU — this is context switching.
fork() system call
- This is how you create new processes in Linux, by duplicating an existing one.
- The process that calls the fork() is the parent, whereas the new process is the child.
- The parent resumes execution and child starts executing at the same place where the call to fork() returns.
- The fork() returns from the kernel twice: once in the parent process and again in the child: So after fork():
  - In the parent, fork() returns the child’s PID (> 0).
  - In the child, fork() returns 0.
- Sometimes, after a fork it is desirable to execute a new, different program. The exec() family of function calls creates a new address space and loads a new program into it.
exit() and wait4() system call
- A program exits via the exit() system call. The function terminates the process and frees all its resources.
- The parent process can inquire about the status of a terminated child via wait4(), which enables a process to wait for termination of a child process and retrieve its exit status.
- When a child process terminates, it becomes a zombie until the parent calls wait4() using C library functions like wait() or waitpid().

Process Descriptor and Task Structure
#

The kernel stores the list of processes in a circular doubly linked list called the task list.
Each element in the list is a process descriptor of the type struct task_struct, defined in <linux/sched.h>. It contains all the info about a specific process.
The size of task_struct in my machine at the time of writing this is:
```
sudo cat /sys/kernel/slab/task_struct/object_size
6872
```
Which is around: 6.715 KB
NOTE: Linux uses a specialized memory caching system called the Slab allocator.
task_struct is aka Process descriptor.
NOTE: sched is short for scheduler or scheduling.

Allocating the Process Descriptor
#

Evolution:

Pre 2.6 kernel series:
- struct task_struct was stored at the end of the kernel stack of each process.
- Architectures like x86(with few registers) could calculate location of the process descriptor via stack pointer without wasting an extra general-purpose register.
2.6 - 3.x (early 2000s):
- thread_info introduced, which was kept at the bottom of the kernel stack.
- thread_info contained a pointer to the task_struct.
Modern kernels (4.x and later):
- thread_info embedded inside task_struct.
- Register-based addressing to access task_struct directly:
  - x86_64: GS segment register points to task_struct.
  - ARM64: TPIDR_EL1 register used for the same purpose.

Table summary

Era	Kernel versions	Design	How the kernel finds `current`	Key reason
1️⃣ Pre–2.6	2.4 and earlier	`task_struct` physically stored at the end (or beginning) of each process’s kernel stack	Mask the stack pointer to find base address → cast to `task_struct *`	Save memory and registers
2️⃣ 2.6 – 3.x	Early 2000s	`task_struct` moved to slab allocator; small `thread_info` placed at base of stack (contains pointer to `task_struct`)	Mask stack pointer → get `thread_info` → deref `->task`	Needed flexibility + reuse (via slab) while keeping fast lookup
3️⃣ Modern kernels (4.x, 5.x, 6.x)	Today	`thread_info` embedded inside `task_struct`; stack allocated separately; per-CPU registers (like `%gs` or `TPIDR_EL1`) point to `current_task`	Read from per-CPU register (`%gs:OFFSET_current_task` on x86-64, `TPIDR_EL1` on ARM64)	Security, portability, extensibility

Storing the Process Descriptor
#

System identifies each process by a unique integer called the process ID (PID).
Evolution of PIDs:
- PID is int in the kernel code. But the hard limit earlier was 32,768 (short int).
- Modern Linux systems use a 22-bit PID, allowing up to 4,194,303 processes.
- The maximum PID can be viewed or modified via /proc/sys/kernel/pid_max.
The kernel maintains a PID namespace to manage PIDs, allowing multiple processes to have the same PID in different namespaces (useful for containers).
The kernel uses a hash table to map PIDs to their corresponding task_struct.
Each task_struct contains a pid field storing the process’s PID.
Tasks are typically referenced directly by a pointer to their task_struct. And it is very useful to quickly lookup the process descriptor of the currently executing task. This is done using the current macro.

Underneath the current macro has to be implemented for each architecture independently. You can find the source code for it at:

arch/x86/include/asm/current.h
arch/arm64/include/asm/current.h
arch/riscv/include/asm/current.h
arch/powerpc/include/asm/current.h

On modern systems to access the current task descriptor. Just use the current macro.
Note: There is no current -> task reference anymore as this design is changed in modern kernels.

Process States
#

__state field in task_struct indicates the current state of the process:

Main states are:
- TASK_RUNNING: Process is either running or ready to run. Only possible state for a process executing in user-space.
- TASK_INTERRUPTIBLE: Process is sleeping but can be awakened by signals.
- TASK_UNINTERRUPTIBLE: Process is sleeping and cannot be awakened by signals.
- __TASK_TRACED: Process is being traced (e.g., by a debugger).
- __TASK_STOPPED: Process is stopped (e.g., via SIGSTOP).
There are many more states, we’ll learn about them later.
Never modify the __state field directly. Always use the provided kernel functions/macros to change process states.

These are some:

API	Exists now?	Barrier?	Typical use
`set_current_state(state)`	✅	Yes	Before sleeping
`__set_current_state(state)`	✅	No	When safe (after wake)
`set_task_state(task, state)`	✅ (less used)	Yes	Scheduler / freezer internals
Direct assignment `task->state = ...`	⚠️ Avoid	N/A	Only very low-level code

Process Context
#

Program code: loaded from an executable file into the process’s address space.
User space: normal program execution happens here (unprivileged mode).
Kernel space entered when:
- a system call is made, or
- an exception/fault occurs.
When in kernel space due to a process -> kernel is “executing on behalf of the process.”
- This is called process context.
- The current macro is valid (points to that task’s task_struct).
On returning from the kernel:
- process continues in user space, unless the scheduler picks a higher-priority task.
Interrupt context:
- triggered by hardware/soft interrupts, not tied to any process.
- current still points to the interrupted task but is not meaningful here.
- No sleeping or scheduling allowed.

Process Family Tree
#

All processes are descendants of the init process (PID 1).
Kernel starts the init process during the last stage of booting.
init reads the init scripts and starts other processes.
The init task’s process descriptor is statically allocated as init_task in init/init_task.c.
Each process has a parent process (except init) and may have multiple child processes.
Since, task list is a circular doubly linked list, each process can access its parent and children.
The parent-child relationship is maintained via parent and children fields in task_struct.
More details
Each process has:
- children: head of a linked list containing all its child processes.
- sibling: entry in the parent’s children list.
- parent: pointer to its immediate parent process (task_struct).
Some useful Macros/functions to navigate:
- next_task(task): Get the next task in the task list.
- prev_task(task): Get the previous task in the task list.
- for_each_process(task): Iterate over all processes.
- for_each_thread(task, thread): Iterate over each thread in a process.
… and many more.
NOTE: It is expensive to traverse the entire task list. Avoid doing it unless absolutely necessary.

References
#

Linux Kernel Development (3rd edition) by Robert Love.

The Process#

Process Descriptor and Task Structure#

Allocating the Process Descriptor#

Storing the Process Descriptor#

Process States#

Process Context#

Process Family Tree#

References#