Interior mutability patterns

Rusts type system requires that there only ever is one mutable reference to a value or one or more shared references. What happens when you need multiple references to some value, but also need to mutate through them? We use a trick called interor mutability: to the outside world you act like a value is immutable so multiple references are allowed. But internally the type is actually mutable.

All types that provide interior mutability have an UnsafeCell at their core. UnsafeCell is the only primitive that allows multiple mutable pointers to its interior, without violating aliasing rules. The only way to use it safely is to only mutate the wrapped value when there are no other readers. No, the garantee has to be even stronger: we can not mutate it and can not create a mutable reference to the wrapped value while there are shared references to its value.

Both the book and the std::cell module give a good alternative explanation of interor mutability.

What are some patterns that have been developed to use interior mutability safely?
How do multithreaded synchronization primitives that provide interior mutability follow similar principles?

We are going to look at four patterns:

Never handing out references
Ensuring at runtime a mutable reference is unique
Mutability only during initialization
Using a seperate ‘owner’ or ‘token’ to track references

Never handing out references

One way to make sure there are no shared references when mutating the wrapped value is to never hand out any references at all. The Cell type in the standard library is the prime example of this pattern.

Cell

Basic API (see documentation):

impl<T> Cell<T> {
    pub fn set(&self, val: T)
    pub fn replace(&self, val: T) -> T
}
impl<T: Copy> Cell<T> {
    pub fn get(&self) -> T
}
impl<T: Default> Cell<T> {
    pub fn take(&self) -> T
}

The idea of Cell is that when reading its contents you always either set or get the entire contents of the Cell. With a single thread there can only ever be one such operation in progress, so there is no change of multiple mutable accesses: a safe API for interior mutability.

Before Rust 1.17 (RFC) Cell was only implemented for Copy types with the set and get methods. Crates like movecell and mitochondria demonstrated a possible extension: MoveCell. You can use Cell with non-Copy types by taking its value out when reading, and replacing it with something else.

That brings us to the flexible API above: get is the normal way to read the value of a Cell, but is only available for Copy types. If a type can’t be copied, you can always replace it with some other value. And if the type comes with a Default, you can take out the value and replace it with the default.

Cell does not work with multiple threads. It has no way to coordinate with another thread that set is not used while the other uses get. Enter the synchronization primitives for a Cell-like API below.

Atomics

Did you know that Rusts atomic integers are just an UnsafeCell around an integer?

#[repr(C, align(4))]
pub struct AtomicU32 {
    v: UnsafeCell<u32>,
}

With atomic operations the processor ensures any read or write will be synchronized across threads. What can be done for integers, can be done for any type that is small enough to fit in an integer. Usualy this means the types can be up to 8 bytes, and most 64-bit platforms support atomic operations on up to 16 bytes.

Multiple crates allow using an UnsafeCell with atomic operations on various small types:

Crossbeam with its AtomicCell<T>
atomic with Atomic<T> (requires types are Copy).
atomig, with Atomic<T> (requires a #[derive(atom)]).

If you look past the names of the methods on AtomicCell, just notice how close the atomic operations resemble the methods on Cell. Basic API (see documentation):

impl<T> AtomicCell<T> {
    pub fn store(&self, val: T)
    pub fn swap(&self, val: T) -> T
}
impl<T: Copy> AtomicCell<T> {
    pub fn load(&self) -> T
}
impl<T: Default> AtomicCell<T> {
    pub fn take(&self) -> T
}
impl<T: Copy + Eq> AtomicCell<T> {
    pub fn compare_and_swap(&self, current: T, new: T) -> T
    pub fn compare_exchange(&self, current: T, new: T) -> Result<T, T>
}
unsafe impl<T: Send> Sync for AtomicCell<T> {}

The methods get, set and replace are now load, store and swap. AtomicCell also offers compare_and_swap, one of the essential tools for concurrent algorithms. Currently this operation is questionable, because transmuting small types with padding to an atomic integer and comparing it is probably Undefined Behavior. But I am confident the maintainers will figure out a solution.

In my opinion every of one of those crates has some disadvantage. Crossbeam has the potential problem with padding bytes. atomic has the same problem, and requires its types to be Copy. atomig can’t be derived for types from other crates. Still, as long as you are careful to use it on small types without padding, you are good with all of them.

Atomic operations are definitely the best concurrent solution for interior mutability on this list.

Locks

Exactly the same API as the one above for AtomicCell can be provided for larger types, by using some locking scheme.

A SeqLock or sequence lock protects its wrapped value with a sequence counter. When writing with set the sequence counter is incremented. Any read with get will first check the sequence counter, read the value, and check the sequence counter again. If the sequence counter changed, it will know there was a data race while reading and try again. This is technically UB in Rust and therefore a questionable locking method. The property that it does not have to do any atomic writes when just reading the value gives a very good performance. Note however that when a thread keeps writing to a seqlock, it can starve readers.
A spinlock will set and clear a flag on every read or write. It doesn’t allow for multiple concurrent reads.
atomic::Atomic<T> does not include a lock with its type, but uses an global array of 64 spinlocks. A hash of the address of the data is used as index to get a lock. This gives a nice little space efficiency.
crossbeam::AtomicCell<T> does the same thing with a set of 97 global seqlocks. For every read it will make one attempt to use a seqlock as seqlock, and if that fails use it as spinlock.

With a concurrent Cell-like API it is best to use it like the initial version of Cell: only with types that are Copy. The trick to move non-Copy values in and out of a Cell with replace and take requires that you can always replace it with some other value that remains useful to other threads, which seems often impossible to me. For non-Copy types, use one of the reference based solutions below.

Note that with a Cell-like API locks are held for only a very short time, about as long as it takes to do a memcpy. So you usually don’t need to ask the OS for assistance to wait until a thread is done (like Mutex, RwLock and co. below).

There is currently no good crate that offers a standalone SeqLock (see also my previous post on writing a seqlock in Rust). The seqlock crate hands out a mutable reference while reads may be in progress, which violates the aliasing rules. I also don’t think there is a crate that offers a spinlock with the API described here. But the fallback for larger types in atomic::Atomic<T> and especially in crossbeam::AtomicCell<T> are pretty cool.

Ensuring at runtime a mutable reference is unique

It is easily possible to hand out references, we just have to keep track of the number of shared and/or mutable references that are currently handed out. Smart Pointers give us the tools to do just that.

We register that a reference is handed out when creating a smart pointer. The smart pointer has a custom Drop implementation that unregisters itself when it goes out of scope. With a Deref implementation it acts just like a regular shared reference. And with a DerefMut it can also be used just like a mutable reference.

RefCell

Basic API (see documentation):

impl<T: ?Sized> RefCell<T> {
    pub fn borrow(&self) -> Ref<T>
    pub fn borrow_mut(&self) -> RefMut<T>
}

impl<T: ?Sized> Deref for Ref<T> { /* ... */ }
impl<T: ?Sized> Drop for Ref<'_, T> { /* ... */ }

impl<T: ?Sized> DerefMut for RefMut<T> { /* ... */ }
impl<T: ?Sized> Deref for RefMut<T> { /* ... */ }
impl<T: ?Sized> Drop for RefMut<'_, T> { /* ... */ }

A RefCell makes use of a simple signed integer borrow to keep track of the number of borrows at runtime. Normal references are counted as positive numbers, and a mutable reference as -1. When you try to call borrow_mut while there are any open borrows, i.e. the integer is not 0, it panics: the essential ingredient to make this a safe API.

The methods on RefCell return the Ref and RefMut types. Those have a DerefMut and/or Deref implementation, though which the content of the RefCell can be accessed like they are a regular reference. When those ‘smart references’ are dropped, they decrement the reference count of the RefCell again.

The standard library also offers a try_borrow and try_borrow_mut method, which return an error instead of panicing the thread.

There currently seems to be Undefined Behavior in some corner cases of Ref and RefMut because they internally convert a raw pointer to the value inside the RefCell too early to a normal reference. But we can be confident this is fixed in the standard library soon(ish).

Mutex

Simplified API (see documentation, the actual API is a bit more complex because it allows the mutex to be poisoned):

impl<T: ?Sized> Mutex<T> {
    pub fn lock(&self) -> MutexGuard<T>
}
unsafe impl<T: ?Sized + Send> Sync for Mutex<T> {}

impl<T: ?Sized> DerefMut for MutexGuard<T> { /* ... */ }
impl<T: ?Sized> Deref for MutexGuard<T> { /* ... */ }
impl<T: ?Sized> Drop for MutexGuard<'_, T> { /* ... */ }

A mutex provides mutably exclusive access to its value. In a way this is synchronization primitive with only a limited part of RefCells API: all borrows are mutable. Like RefCell all accesses happen through a smart pointer, in this case MutexGuard.

In modern implementations a mutex atomically sets a flag when the mutex gets locked, and unset it when done. If another thread sees the mutex is locked while it tries to access it, it will wait. The waiting can be a spin loop (see the spin crate), but it is much better to ask the operating system for help: “notify me know when it gets unlocked”. Because the fast path involves just an atomic write, this makes it a very efficient primitive.

Sadly not all implementations are modern, and some come with restrictions. For example Posix requires the mutex to be immovable even when not locked, forcing the standard library to put any mutex in an allocation on the heap any not allowing its use in statics. A custom implementation such as the one in parking_lot fixes these issues.

A RefCell panics when you try to take two mutable borrows in the same thread. A Mutex useally does not detect this case, and will just start waiting to acquire the lock. But because it is the current thread that holds the lock, it will never wake up again (and block any thread that tries to lock the Mutex after that).

RW lock

Simplified API (see documentation, the actual API is a bit more complex because it allows the RwLock to be poisoned):

impl<T: ?Sized> RwLock<T> {
    pub fn read(&self) -> RwLockReadGuard<T>
    pub fn write(&self) -> RwLockWriteGuard<T>
}
unsafe impl<T: ?Sized + Send + Sync> Sync for RwLock<T> {}

impl<T: ?Sized> Deref for RwLockReadGuard<T> { /* ... */ }
impl<T: ?Sized> Drop for RwLockReadGuard<'_, T> { /* ... */ }

impl<T: ?Sized> DerefMut for RwLockWriteGuard<T> { /* ... */ }
impl<T: ?Sized> Deref for RwLockWriteGuard<T> { /* ... */ }
impl<T: ?Sized> Drop for RwLockWriteGuard<'_, T> { /* ... */ }

/* optional */
impl<T: ?Sized> RwLockWriteGuard<T> {
    pub fn downgrade(self) -> RwLockReadGuard<T>
}

A reader-writer lock is the full-featured multithreaded alternative to RefCell. An RwLock will allow any number of readers to acquire the lock as long as a writer is not holding the lock.

There can be several strategies to implement an RW lock. Imagine this case: multiple threads keep taking read locks, and one thread wants to take a write lock. Should the reads be able to block the write indefinitely? Wikipedia lists three priority policies:

Read-preferring RW locks allow for maximum concurrency, but can lead to write-starvation if contention is high.

Write-preferring RW locks avoid the problem of writer starvation by preventing any new readers from acquiring the lock if there is a writer queued and waiting for the lock … The downside is that write-preferring locks allows for less concurrency in the presence of writer threads.

Unspecified priority RW locks does not provide any guarantees with regards read vs. write access.

The standard library explicitly does not specify the priority policy, it is just a thin wrapper around what the operating system provides. parking_lot documents it offers a write-preferring RW lock. The spinlock-based implementation in spin is an example of a read-preferring RW lock.

A nice extra that parking_lot offers is a downgrade method on its RwLockWriteGuard to turn it into a RwLockReadGuard, without releasing the lock in the meantime.

Reentrant Mutex

Basic API (documentation):

impl<T: ?Sized> ReentrantMutex<T> {
    pub fn lock(&self) -> ReentrantMutexGuard<T>
}
unsafe impl<T: ?Sized + Send> Sync for ReentrantMutex<T> {}

impl<T: ?Sized> Deref for ReentrantMutexGuard<T> { /* ... */ }
impl<T: ?Sized> Drop for ReentrantMutexGuard<'_, T> { /* ... */ }

A reentrant mutex may be locked more than once by the same thread, typically because of recursion. The smart pointer returned when locking the ReentrantMutex can’t be mutable, because it may not be unique. So it does not provide interior mutability by itself. But it does give the guarantees that allow using it with a single-threaded one, like RefCell, because:

it can hand out multiple shared references,
its wrapped type does not have to be sync, because the references are all handed out within the same thread.

Reentrant mutexes are considered a code smell, as they typically indicate you are holding a lock too long. But there are valid uses, like Stdout which may be locked by a thread, while println! will also lock it for each write.

A reentrant mutex requires a little more bookkeeping than a normal mutex. It has to keep track of the id of the thread that currently holds a lock, and it has to keep track of the number of locks held. The standard library does not expose its implementation of ReentrantMutex (remutex does for an old version), but parking_lot provides one.

Mutability only during initialization

There is a common pattern where you need to do something at runtime to initialize a variable, but can’t do so before sharing it with other threads. Statics are the prime example. lazy_static used to be the go-to solution, but multiple crates created and reinvented the OnceCell + Lazy solution, of which the once_cell crate seems to have arrived at the cleanest API.

OnceCell (unsync)

Basic API (documentation):

impl<T> OnceCell<T> {
    pub fn get(&self) -> Option<&T>
    pub fn set(&self, value: T) -> Result<(), T>
    pub fn get_or_init<F: FnOnce() -> T>(&self, f: F) -> &T
}

impl<T, F> Lazy<T, F> {
    pub const fn new(init: F) -> Lazy<T, F>
}
impl<T, F: FnOnce() -> T> Deref for Lazy<T, F>

For the singlethreaded case a OnceCell can be in two states (which by the way allows a size optimization using niche filling like Option):

uninitialized, which allow mutating it once with set;
initialized, which allow taking multiple shared references with get.

Any time get is called while uninitialized it will return None, and any time set is called while initialized it will return Err (with the provided value). get_or_init is the nice combination: it can always return a reference, because it will initialize the OnceCell if necessary by running the provided closure.

The key insight that makes OnceCell safe is that a mutation may really be done only once. Recursive initialization (possibly accidental) within the closure of get_or_init therefore must cause a panic.

Lazy acts like a smart pointer build on top of OnceCell. The initialization closure is provided once with new, and run on the first use of Lazy through Deref.

mutate_once provides an variation based on the same principles that can hand out a mutable smart pointer first. But as soon as one normal reference is returned, it can never be mutated again.

OnceCell (sync)

Basic API (documentation):

impl<T> OnceCell<T> {
    pub fn get(&self) -> Option<&T>
    pub fn set(&self, value: T) -> Result<(), T>
    pub fn get_or_init<F: FnOnce() -> T>(&self, f: F) -> &T
}
unsafe impl<T: Send + Sync> Sync for OnceCell<T> {}

impl<T, F> Lazy<T, F> {
    pub const fn new(init: F) -> Lazy<T, F>
}
impl<T, F: FnOnce() -> T> Deref for Lazy<T, F>
unsafe impl<T, F: Send> Sync for Lazy<T, F> where OnceCell<T>: Sync {}

The multi-threaded variant of OnceCell is almost the same. The state has to be kept in an atomic, and it has a third state: other threads may observe one thread is running the initialization clusure. At that point they should wait, either using a spinloop but preferably with assistance from the OS, just like Mutex.

Using a seperate ‘owner’ or ‘token’ to track references

A somewhat experimental concept is to bind a Cell to an ‘owner’ or ‘token’ type. If you have a shared reference to the owner type, you can create a shared reference to the inside of the bound Cell. And if you have a mutable reference to the owner type, you can also get one to the Cell.

The owner can be shared freely between threads and passed through channels. I suppose that if you already need to do some communication through channels, this is a nice solution to avoid also taking a mutex (or passing large amounts of data through the channel). Still, when and how to use this pattern is not really clear to me, this description may not do it right.

The qcell crate explores this technique (see also the initial post on Reddit and the announcement on The Rust Programming Language Forum).

A problem is how to conjure up a unique owner. One option is to just use a unique integer (QCell). Another is to use your own zero-sized marker type (TCell). And a third is to use the unique lifetime of a closure (LCell).

QCell

Basic API of QCell:

impl QCellOwner {
    pub fn new() -> Self
    pub fn cell<T>(&self, value: T) -> QCell<T>
    pub fn ro<'a, T>(&'a self, qc: &'a QCell<T>) -> &'a T
    pub fn rw<'a, T>(&'a mut self, qc: &'a QCell<T>) -> &'a mut T
    pub fn rw2<'a, T, U>(
        &'a mut self,
        qc1: &'a QCell<T>,
        qc2: &'a QCell<U>
    ) -> (&'a mut T, &'a mut U)
    pub fn rw3<'a, T, U, V>(
        &'a mut self,
        qc1: &'a QCell<T>,
        qc2: &'a QCell<U>,
        qc3: &'a QCell<V>
    ) -> (&'a mut T, &'a mut U, &'a mut V)
}

impl<T> QCell<T> {
    pub const fn new(owner: &QCellOwner, value: T) -> QCell<T>
}
impl<T: Send + Sync> Sync for QCell<T>

A new QCell needs a reference to a QCellOwner when it is created by either QCellOwner::cell or QCell::new. One ugly part of the API is that the methods to take a reference to the QCell, ro, and to take a mutable reference, rw, live on the QCellOwner.

While one cell is borrowed mutably, it is as if the owner is mutably borrowed. No other cells bound to the same QCellOwner can be borrowed at the same time. That is why if you need multiple references with at least one mutable, you need to take them all at once with rw2 or rw3.

TCell and LCell

The main difference between QCell, TCell and LCell is how the owner is constructed. For QCell it is as simple as QCellOwner::New().

For TCell you need to provide a marker type, that must be unique within the process, which is double-checked at runtime (there also is TLCell, which must be unique per thread):

struct Marker1;
let owner1 = TCellOwner::<Marker1>::new();

An LCellOwner can’t be simply constructed. You have to run the code that uses it in a closure, where the owner is an argument to the closure. GhostCell initially explored this approach. The lifetime of the closure is the id of the owner:

LCellOwner::scope(|mut owner| {
    /* use `owner` */
});

QCell is the most flexible. TCell and LCell move more checks to compile time, but require more annotations or a certain code structure:

Cell	Owner ID	Cell overhead	Owner check	Owner creation check
`QCell`	integer	`u32`	Runtime	Runtime
`TCell` or `TLCell`	marker type	none	Compile-time	Runtime
`LCell`	lifetime	none	Compile-time	Compile-time

Overview

You made it to the end ;-). How do the various API’s compare?

Cell-like, allowing no references inside:

type	requirements	`Sync`?	restrictions
`Cell<T>`	`Copy`	no	value must be copied or moved
`Cell<T>`	(none)	no	value must be moved with `replace` or `take`
`Atomic<T>`*	`Send`, integer-sized	yes	like `Cell`
`AtomicCell<T>`*	`Send`	yes	like `Cell`

* Note: I used Atomic<T> to describe atomic operations, and AtomicCell<T> to describe a lock-based fallback.

RefCell-like, ensuring at runtime there is only one &mut reference:

type	requirements	`Sync`?	`&`	`&mut`	restrictions
`RefCell<T>`	(none)	no	yes	yes	may panic
`Mutex<T>`	`Send`	yes	no	yes	may block or deadlock
`RwLock<T>`	`Send` + `Sync`	yes	yes	yes	may block or deadlock
`ReentrantMutex<RefCell<T>>`	`Send`	yes	yes	yes	may block or panic

OnceCell, only mutable during initialization:

type	requirements	`Sync`?	`&`	restrictions
`unsync::OnceCell<T>`	(none)	no	yes	only mutable once
`sync::OnceCell<T>`	`Send` + `Sync`	yes	yes	may block, only mutable once

This post concentrated on the possible API choices, and only touched upon the choices that can be made for the synchronization primitives.

Do you know of any other patterns? Or found a mistake? Please let me know.

Written on December 28, 2019