Mutexes

Table of Contents

Caches, Threads, and Mutexes
#

Consider the function:

void *thread_handler(void *_) {

  pthread_mutex_lock(&ctr_lock);

  for (int i = 0; i < 100000; i++) {
    ctr++;
  }

  pthread_mutex_unlock(&ctr_lock);

  return NULL;
}

Here, when a thread cannot acquire a lock, it is typically put to sleep(blocked), and CPU is free to run other threads or processes.
Caches are not owned by threads. They are:
- Per-core (L1, usually L2)
- Shared across cores (L3) So, threads do not have their own cache.
What happens when a thread sleeps on a mutex?
1. Thread blocks(goes to sleep):
  - Kernel removes it from the run queue
  - CPU core starts running some other thread
  - That other thread:
    - Uses registers
    - Uses L1/L2 cache
    - Brings its own data into cache
2. Over time, cache lines gets evicted
  - New thread’s data replaces old cache lines
  - Original thread’s working set gradually disappears
3. When the thread wakes up, there are two possibilities:
  1. Thread wakes on same CPU core
    - Some cache lines might still be present
    - Likely many are gone
    - Partial cache misses
  2. Thread wakes on different CPU core (very common)
    - L1/L2 cache is completely different
    - Almost guaranteed cold cache
      A cold cache means the CPU cache contains little or none of the data that the currently running code needs, so the first accesses will miss the cache and fetch data from slower memory levels.
    - Data must be fetched from:
      - L3 cache, or
      - Main memory
    - This is called loss of cache affinity.
Why is this important?
- It explains why lock-free \(\neq\) faster.
- It explains why per-core data structures scale better.
- It explains why “just add threads” often slows programs down.
- It explains why fine-grained locking is critical.

Caches, Threads, and Mutexes#

Caches, Threads, and Mutexes
#