CSE221 - lec15: Scalability: RCU & Analysis of Linux ScalabilityLifan Sun's blog, Welcome to my blog.

CSE221 - lec15: Scalability: RCU & Analysis of Linux Scalability

date

Nov 24, 2024

slug

cse221-lec15

status

Published

tags

System

summary

type

Post

Multi-core CPUs

~2000s: 2 cpu cores

scalability goals: performance increases linearly with cores

notion image

initially: one big lock on the kernel

better approach: finer granularity locks on different subsystems

RCU Goals

concurrent reads, even during updates

low overhead: computation and storage overhead should be low

deterministic completion time (for example some real-time applications will need this)

Example: Linked-List Operations

notion image

how to enforce concurrency control?

Approach #1: Spin Lock

access is mutual exclusive

Approach #2: ReadWrite Lock

multiple readers or 1 writer

benefit: scalable read

drawbacks:

cannot read while writing
space overhead(use an integer)
still has overhead with only readers (serializing atomic access on the integer for RW lock)

thought experiment: what if readers skip the lock

concurrent writer: readers may see intermediate results

no writer: everything is ok

Read-Copy-Update (RCU)

RCU is a scalable synchronizing mechanisms

notion image

RCU #1: Writers make a copy

readers skip the lock, so they should be able to make progress even when update is happening.

only one operation is made during two states, the update process is done by the writer itself independently without interfering the original data structure, so there is no intermediate states

Problem: reordering of instructions (by compilers, or by modern processors at runtime - out of order execution)

RCU #2: memory barriers to enforce ordering

Problem: use after free

readers may hold pointers to freed space
e.g.: user read, writer update, user read by the old pointer again.

RCU #3: grace period

invariant enforced: readers cannot hold an RCU pointer across context switch (indicate usage of RCU pointers by enclosing it by rcu_critical_section)

writers delay free until all CPUs have context switch (can be monitored by some state variables)

Implementation: disable preemption in rcu_critical_section, writers wait

Linux RCU API

Reader

rcu_read_lock: enter rcu critical section, disable timer interrupt

rcu_read_unlock: leave rcu critical section, enable timer interrupt

rcu_dereference(ptr): use memory barriers when dereferencing rcu pointers

Writer

rcu_synchronize: wait for context switch - typically wait for ~ms

call_rcu(callback, args…): async alternative to rcu_synchronize

rcu_assign_pointer(ptr_addr, ptr): use memory barriers when assigning rcu pointers

RCU’s benefits over RW Lock

Readers	Writer
RCU can always read	RCU can take longer
almost no cost	ㅤ

Drawbacks of RCU

best when readers >> writers

complex logic

doesn’t help writer especially when there are concurrent writers

have to copy data structures

some data structures cannot use it (e.g. some applications need stronger semantics)

readers can see stale data

code has to drop reference across yield / sleep

rely on frequent context switch

Summary

multiple CPUs: need scalable synchronization mechanism

RCU: readers don’t wait

widely used in Linux

Analysis of Linux Scalability

Sources of scalability bottleneck

Applications

OS

Hardware

Analysis Approach

a set of apps: previous apps that reveal bottleneck & parallel and kernel intensive apps

find and fix bottleneck

iterative until OS is not bottleneck

Cache Coherency

notion image

Scalability Bottleneck

lock on shared data structures

writing to shared variable

competition on cache, memory bandwidth

lack enough concurrency / parallelism

Techniques for improving scalability

avoid lock contention & data contention (via per-core data structures)

sloppy counters: single global counters but partitioned across cores, true value = value(shared) - sum(per-core counter values)

wait-free synchronization: e.g. RCU

avoid false sharing: logically independent but share some resources physically, which incurs contention.

notion image

Summary

general techniques for scalability

Linux is quite scalable, maybe no need to design a new architecture just for scalability.

© Lifan Sun 2023 - 2025