Skip to main content
  1. Notes/
  2. Linux Kernel/

Process Management

Running user applications is the reason to have a OS, and the process management is a crucial part of any kernel.

The Process
#

  • A program(object code stored on some media) in execution is a process.

  • A process also includes a set of resources such as open files and pending signals, internal kernel data, processor state, a memory address space, one or more threads and a data section for global variables.

    More details
    • Open files: Each process can have files open for reading, writing, or both. The operating system tracks these files using file descriptors, allowing the process to interact with files, devices, or sockets.

    • Pending signals: Signals are asynchronous notifications sent to a process to notify it of events (like interrupts or exceptions). Pending signals are those that have been sent but not yet handled by the process.

    • Internal kernel data: The kernel maintains various data structures for each process, such as process control blocks (PCB), scheduling information, and security credentials, which help manage and track the process.

    • Processor state: This includes the values of CPU registers, program counter, stack pointer, and other hardware-specific information. The processor state is saved and restored during context switches.

    • Memory address space: Each process has its own virtual memory space, which includes code, data, stack, and heap segments. This isolation ensures processes do not interfere with each other’s memory.

    • Threads: A process may contain one or more threads, which are independent sequences of execution within the same address space. Threads share resources but have separate execution contexts.

    • Data section for global variables: This section of memory stores global and static variables used by the process. It is initialized when the process starts and persists throughout its lifetime.

  • Threads

    • A thread is essentially an execution context within a process.
    • It shares the same address space (code, data, heap, open files, etc.) as other threads in that process — but each thread must still be able to run independently.
    • Each thread has its own stack.
    • Kernel schedules individual threads, not processes.
    • In Linux, a thread is just a special kind of process. Both are represented by the same data structure in the kernel: task_struct.
    • CPU cores and threads:
      • The number of CPU cores determines how many threads can run at the same instant in parallel, but not how many threads you can have total.
      • You can create thousands of threads on a 4-core CPU. But only 4 threads can physically execute at once.
      • Thread limitations:
        • Memory per thread stack: Each thread typically reserves 8 MB by default (ulimit -s or pthread_attr_setstacksize).
        • OS resource limits: Linux has per-process limits (ulimit -u, /proc/sys/kernel/threads-max).
        • Scheduler overhead: More threads mean more context-switching, which can hurt performance.
      • So high-performance systems often use thread pools or async I/O instead of spawning thousands of threads.
  • Virtualizations

    • Processes provide 2 virtualizations:
      • Virtualized processor: which gives the process that illusion that it alone is running on the processor.
      • Virtualized memory: which gives the process the illusion that it alone is using the entire memory.
    • Threads share the virtual memory abstraction, whereas each receives its own virtualized processor.
      • Threads share the same virtual address space within a process. It means, each thread sees the same heap, global variables, and code.
      • Each receives its own virtual CPU - its own set of CPU registers, program counter, and stack, which gives each thread the illusion that it had its own processor core.
        • In reality, the kernel scheduler rapidly switches which thread’s state (registers, PC, stack) is loaded into the physical CPU — this is context switching.
  • fork() system call

    • This is how you create new processes in Linux, by duplicating an existing one.
    • The process that calls the fork() is the parent, whereas the new process is the child.
    • The parent resumes execution and child starts executing at the same place where the call to fork() returns.
    • The fork() returns from the kernel twice: once in the parent process and again in the child: So after fork():
      • In the parent, fork() returns the child’s PID (> 0).
      • In the child, fork() returns 0.
    • Sometimes, after a fork it is desirable to execute a new, different program. The exec() family of function calls creates a new address space and loads a new program into it.
  • exit() and wait4() system call

    • A program exits via the exit() system call. The function terminates the process and frees all its resources.
    • The parent process can inquire about the status of a terminated child via wait4(), which enables a process to wait for termination of a child process and retrieve its exit status.
    • When a child process terminates, it becomes a zombie until the parent calls wait4() using C library functions like wait() or waitpid().

Process Descriptor and Task Structure
#

  • The kernel stores the list of processes in a circular doubly linked list called the task list.

  • Each element in the list is a process descriptor of the type struct task_struct, defined in <linux/sched.h>. It contains all the info about a specific process.

  • The size of task_struct in my machine at the time of writing this is:

    sudo cat /sys/kernel/slab/task_struct/object_size
    6872
    

    Which is around: 6.715 KB

    NOTE: Linux uses a specialized memory caching system called the Slab allocator.

  • task_struct is aka Process descriptor.

    NOTE: sched is short for scheduler or scheduling.

Allocating the Process Descriptor
#

Evolution:

  1. Pre 2.6 kernel series:
    • struct task_struct was stored at the end of the kernel stack of each process.
    • Architectures like x86(with few registers) could calculate location of the process descriptor via stack pointer without wasting an extra general-purpose register.
  2. 2.6 - 3.x (early 2000s):
    • thread_info introduced, which was kept at the bottom of the kernel stack.
    • thread_info contained a pointer to the task_struct.
  3. Modern kernels (4.x and later):
    • thread_info embedded inside task_struct.
    • Register-based addressing to access task_struct directly:
      • x86_64: GS segment register points to task_struct.
      • ARM64: TPIDR_EL1 register used for the same purpose.
Table summary
EraKernel versionsDesignHow the kernel finds currentKey reason
1️⃣ Pre–2.62.4 and earliertask_struct physically stored at the end (or beginning) of each process’s kernel stackMask the stack pointer to find base address → cast to task_struct *Save memory and registers
2️⃣ 2.6 – 3.xEarly 2000stask_struct moved to slab allocator; small thread_info placed at base of stack (contains pointer to task_struct)Mask stack pointer → get thread_info → deref ->taskNeeded flexibility + reuse (via slab) while keeping fast lookup
3️⃣ Modern kernels (4.x, 5.x, 6.x)Todaythread_info embedded inside task_struct; stack allocated separately; per-CPU registers (like %gs or TPIDR_EL1) point to current_taskRead from per-CPU register (%gs:OFFSET_current_task on x86-64, TPIDR_EL1 on ARM64)Security, portability, extensibility

Storing the Process Descriptor
#

  • System identifies each process by a unique integer called the process ID (PID).
  • Evolution of PIDs:
    • PID is int in the kernel code. But the hard limit earlier was 32,768 (short int).
    • Modern Linux systems use a 22-bit PID, allowing up to 4,194,303 processes.
    • The maximum PID can be viewed or modified via /proc/sys/kernel/pid_max.
  • The kernel maintains a PID namespace to manage PIDs, allowing multiple processes to have the same PID in different namespaces (useful for containers).
  • The kernel uses a hash table to map PIDs to their corresponding task_struct.
  • Each task_struct contains a pid field storing the process’s PID.
  • Tasks are typically referenced directly by a pointer to their task_struct. And it is very useful to quickly lookup the process descriptor of the currently executing task. This is done using the current macro.
  • Underneath the current macro has to be implemented for each architecture independently. You can find the source code for it at:
    arch/x86/include/asm/current.h
    arch/arm64/include/asm/current.h
    arch/riscv/include/asm/current.h
    arch/powerpc/include/asm/current.h
    
  • On modern systems to access the current task descriptor. Just use the current macro.

    Note: There is no current -> task reference anymore as this design is changed in modern kernels.

Process States
#

__state field in task_struct indicates the current state of the process:

  • Main states are:
    • TASK_RUNNING: Process is either running or ready to run. Only possible state for a process executing in user-space.
    • TASK_INTERRUPTIBLE: Process is sleeping but can be awakened by signals.
    • TASK_UNINTERRUPTIBLE: Process is sleeping and cannot be awakened by signals.
    • __TASK_TRACED: Process is being traced (e.g., by a debugger).
    • __TASK_STOPPED: Process is stopped (e.g., via SIGSTOP).
  • There are many more states, we’ll learn about them later.
  • Never modify the __state field directly. Always use the provided kernel functions/macros to change process states.
  • These are some:
    APIExists now?Barrier?Typical use
    set_current_state(state)YesBefore sleeping
    __set_current_state(state)NoWhen safe (after wake)
    set_task_state(task, state)✅ (less used)YesScheduler / freezer internals
    Direct assignment task->state = ...⚠️ AvoidN/AOnly very low-level code

Process Context
#

  • Program code: loaded from an executable file into the process’s address space.

  • User space: normal program execution happens here (unprivileged mode).

  • Kernel space entered when:

    • a system call is made, or
    • an exception/fault occurs.
  • When in kernel space due to a process -> kernel is “executing on behalf of the process.”

    • This is called process context.
    • The current macro is valid (points to that task’s task_struct).
  • On returning from the kernel:

    • process continues in user space, unless the scheduler picks a higher-priority task.
  • Interrupt context:

    • triggered by hardware/soft interrupts, not tied to any process.
    • current still points to the interrupted task but is not meaningful here.
    • No sleeping or scheduling allowed.

Process Family Tree
#

  • All processes are descendants of the init process (PID 1).

  • Kernel starts the init process during the last stage of booting.

  • init reads the init scripts and starts other processes.

  • The init task’s process descriptor is statically allocated as init_task in init/init_task.c.

  • Each process has a parent process (except init) and may have multiple child processes.

  • Since, task list is a circular doubly linked list, each process can access its parent and children.

  • The parent-child relationship is maintained via parent and children fields in task_struct.

    More details

    Each process has:

    • children: head of a linked list containing all its child processes.
    • sibling: entry in the parent’s children list.
    • parent: pointer to its immediate parent process (task_struct).

  • Some useful Macros/functions to navigate:

    • next_task(task): Get the next task in the task list.
    • prev_task(task): Get the previous task in the task list.
    • for_each_process(task): Iterate over all processes.
    • for_each_thread(task, thread): Iterate over each thread in a process.

    … and many more.

    NOTE: It is expensive to traverse the entire task list. Avoid doing it unless absolutely necessary.

References
#