Heterogenous object graphs in Rust

Published: Tue 21 July 2020

In posts.

We are beginning to write new Shadow code in Rust instead of C. Before we get too far, we need to work out how to make Shadow's data structures play along with Rust's ownership models.

Conceptually, Shadow models many Hosts, each with multiple Processes, each with multiple Threads. These objects often need to access each-other; and sometimes need to mutate each-other. Ownership is mostly a tree, but occasionally references need to be stored outside the tree; e.g. in event objects that are stored in scheduler queues. Because of this, most objects have reference counts that are maintained manually.

Most of the simulation works without locking - safety is maintained by manually ensuring that a Host is the root of a partitioned graph of an objects, such that handing a Host over to a worker threads doesn't leave any pointers behind that might be accessed by some other thread.

How can we model these objects and their relationships in Rust?

Using raw pointers

Let's start by directly translating the existing structure. We'll start with just sketching out Host:

In [2]:
mod literal_translation {
    use std::collections::HashMap;

    struct Host {
        ref_count : i32,
        processes : HashMap<i32, *mut Process>,
    }

    fn host_new() -> *mut Host {
        // Allocate a pointer on the heap, and then convert to a raw pointer.
        Box::into_raw(Box::new(Host{ref_count: 1, processes: HashMap::new()}))
    }

    unsafe fn host_inc(hostp : *mut Host) {
        let host : &mut Host;
        host = &mut *hostp;
        host.ref_count += 1;
    }

    unsafe fn host_dec(hostp : *mut Host) {
        let host : &mut Host;
        host = &mut *hostp;
        host.ref_count -= 1;
        if host.ref_count == 0 {
            // Transfer ownership into the Box object and let it get dropped.
            Box::from_raw(hostp);
        }
    }

    struct Process {}
}

Using Rc (reference counted objects)

A seemingly obvious improvement is to use Rust's reference counting instead of doing it manually. We don't want to introduce synchronization here, so we'll start with Rc. While we're at it, let's replace host_new with a more idiomatic Host::new:

In [3]:
mod using_rc {
    use std::rc::Rc;
    use std::collections::HashMap;

    struct Host {
        // We now store ref-counted Process objects instead of raw pointers.
        processes : HashMap<i32, Rc<Process>>,
    }

    impl Host {
        // Our idiomatic constructor returns a literal Host objects; it's up to the caller
        // to move that into an Rc<Host>.
        fn new() -> Host {
            Host{processes: HashMap::new()}
        }
    }
    
    // Increment and decrement no longer needed!
    
   struct Process {}
}

Storing references

Unfortunately things get complicated when a Process needs to access the Host to which it belongs. Let's try the simplest thing that might work: storing a reference. We have to do add some lifetime specifiers as a result, effectively telling Rust that the Host must outlive the Process, which seems reasonable enough. We can get the structure definitions and Process::new to compile:

In [3]:
mod ref_to_parent {
    use std::rc::Rc;
    use std::collections::HashMap;

    struct Host<'a> {
        processes : HashMap<i32, Rc<Process<'a>>>,
    }

    struct Process<'a> {
        host: &'a Host<'a>,
    }
    
    impl<'a> Process<'a> {
        fn new(host : &'a Host<'a>) -> Process<'a> {
            Process{host}
        }
    }
}

Once we actually try to spawn a new process, though, we run into trouble:

In [5]:
mod ref_to_parent {
    use std::rc::Rc;
    use std::collections::HashMap;

    struct Host<'a> {
        next_pid : i32,
        processes : HashMap<i32, Rc<Process<'a>>>,
    }
    
    impl<'a> Host<'a> {
        fn spawn_process(&'a mut self) {
            self.processes.insert(self.next_pid, Rc::new(Process::new(self)));
            self.next_pid += 1;
        }
    }
    
    struct Process<'a> {
        host: &'a Host<'a>,
    }
    
    impl<'a> Process<'a> {
        fn new(host : &'a Host<'a>) -> Process<'a> {
            Process{host}
        }
    }
}
            self.processes.insert(self.next_pid, Rc::new(Process::new(self)));
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here
            self.processes.insert(self.next_pid, Rc::new(Process::new(self)));
                                                                      ^^^^ immutable borrow occurs here
    impl<'a> Host<'a> {
         ^^ lifetime `'a` defined here
            self.processes.insert(self.next_pid, Rc::new(Process::new(self)));
                                                         ^^^^^^^^^^^^^^^^^^ argument requires that `*self` is borrowed for `'a`
cannot borrow `self.processes` as mutable because it is also borrowed as immutable
            self.processes.insert(self.next_pid, Rc::new(Process::new(self)));
                                                                      ^^^^ borrow of `self.next_pid` occurs here
            self.next_pid += 1;
            ^^^^^^^^^^^^^^^^^^ assignment to borrowed `self.next_pid` occurs here
    impl<'a> Host<'a> {
         ^^ lifetime `'a` defined here
            self.processes.insert(self.next_pid, Rc::new(Process::new(self)));
                                                         ^^^^^^^^^^^^^^^^^^ argument requires that `*self` is borrowed for `'a`
cannot assign to `self.next_pid` because it is borrowed

Rust prevents a mutable reference and an immutable reference to a single object from existing at the same time. If we want to store a reference in a Process to its Host, we can never create a mutable reference to Host again. We can get a bit further though with interior mutability - putting any mutations that need to happen inside a guarded data structure, such as RefCell, such that those operations can be done from an immutable reference.

In [6]:
mod ref_to_parent_with_refcell {
    use std::rc::Rc;
    use std::collections::HashMap;
    use std::cell::RefCell;

    pub struct Host<'a> {
        next_pid : RefCell<i32>,
        processes : RefCell<HashMap<i32, Rc<Process<'a>>>>,
    }
    
    impl<'a> Host<'a> {
        pub fn new() -> Host<'a> {
            Host { next_pid : RefCell::new(0), processes : RefCell::new(HashMap::new())}
        }
        pub fn spawn_process(&'a self) {
            self.processes.borrow_mut().insert(*self.next_pid.borrow(), Rc::new(Process::new(self)));
            *self.next_pid.borrow_mut() += 1;
        }
    }
    
    struct Process<'a> {
        host: &'a Host<'a>,
    }
    
    impl<'a> Process<'a> {
        fn new(parent : &'a Host<'a>) -> Process<'a> {
            Process{host: parent}
        }
    }
}

{
    use ref_to_parent_with_refcell as m;
    let h = m::Host::new();
    h.spawn_process();
}
Out[6]:
()

How does Rust actually prevent the reference from process to host from dangling? Let's try it and see where things go wrong.

In [7]:
mod drop_ref_to_parent_with_refcell {
    use std::rc::Rc;
    use std::collections::HashMap;
    use std::cell::RefCell;

    pub struct Host<'a> {
        next_pid : RefCell<i32>,
        processes : RefCell<HashMap<i32, Rc<Process<'a>>>>,
    }

    impl<'a> Host<'a> {
        pub fn new() -> Host<'a> {
            Host { next_pid : RefCell::new(0), processes : RefCell::new(HashMap::new())}
        }
        pub fn spawn_process(&'a self) {
            let mut next_pid = self.next_pid.borrow_mut();
            self.processes.borrow_mut().insert(*next_pid, Rc::new(Process::new(self, *next_pid)));
            *next_pid += 1;
        }
        pub fn get_process(&self, pid : i32) -> Rc<Process<'a>> {
            self.processes.borrow().get(&pid).unwrap().clone()            
        }
    }
      
    pub struct Process<'a> {
        host: &'a Host<'a>,
        pid: i32,
    }
    
    impl<'a> Process<'a> {
        fn new(parent : &'a Host<'a>, pid: i32) -> Process<'a> {
            Process{host: parent, pid}
        }
    }
}

{
    use drop_ref_to_parent_with_refcell as m;
    let _p = {
      let h = m::Host::new();
      h.spawn_process();
      h.get_process(0)
    };
}
      h.spawn_process();
      ^ borrowed value does not live long enough
    };
    ^ `h` dropped here while still borrowed
    let _p = {
        ^^ borrow later stored here
`h` does not live long enough

Ah ha - because the Process's type is parameterized with its Host's lifetime, Rust won't allow a Process object to outlive its Host.

I suspect we'd run into trouble though proving to the compiler that this constraint is satisfied. Interestingly, it looks like if we implement the drop trait, the host becomes undroppable after calling spawn_process:

In [2]:
mod drop_ref_to_parent_with_refcell {
    use std::rc::Rc;
    use std::collections::HashMap;
    use std::cell::RefCell;

    pub struct Host<'a> {
        next_pid : RefCell<i32>,
        processes : RefCell<HashMap<i32, Rc<Process<'a>>>>,
    }

    impl<'a> Host<'a> {
        pub fn new() -> Host<'a> {
            Host { next_pid : RefCell::new(0), processes : RefCell::new(HashMap::new())}
        }
        pub fn spawn_process(&'a self) {
            let mut next_pid = self.next_pid.borrow_mut();
            self.processes.borrow_mut().insert(*next_pid, Rc::new(Process::new(self, *next_pid)));
            *next_pid += 1;
        }
        pub fn get_process(&self, pid : i32) -> Rc<Process<'a>> {
            self.processes.borrow().get(&pid).unwrap().clone()            
        }
    }
    impl<'a> Drop for Host<'a> {
        fn drop(&mut self) {
            println!("Dropped host!");
        }
    }
      
    pub struct Process<'a> {
        host: &'a Host<'a>,
        pid: i32,
    }
    
    impl<'a> Process<'a> {
        fn new(parent : &'a Host<'a>, pid: i32) -> Process<'a> {
            Process{host: parent, pid}
        }
    }
}

{
    use drop_ref_to_parent_with_refcell as m;
    let h = m::Host::new();
    h.spawn_process();
}
    h.spawn_process();
    ^ borrowed value does not live long enough
}
^ `h` dropped here while still borrowed
}
^ borrow might be used here, when `h` is dropped and runs the `Drop` code for type `drop_ref_to_parent_with_refcell::Host`
`h` does not live long enough

Storing weak pointers

We could relax the static constraints by having the Process store an Rc<Host> instead of a &Host. This is closer to what we do in current Shadow code; it introduces a cycle in the object graph, but we break that cycle when tearing down a Host by clearing its references to its Processes.

In Rust it's probably better to store a Weak<Host> instead. This ensures we don't have a cycle in the graph, but forces the Process to do a (cheap) upgrade to an Rc<Host> before it can actually operate on it.

In [5]:
mod weak_ref {
    use std::rc::{Rc, Weak};
    use std::collections::HashMap;
    use std::cell::RefCell;

    pub struct Host {
        next_pid : RefCell<i32>,
        processes : RefCell<HashMap<i32, Rc<Process>>>,
    }
    
    impl Host {
        pub fn new() -> Host {
            Host { next_pid : RefCell::new(0), processes : RefCell::new(HashMap::new())}
        }

        pub fn get_process(&self, pid : i32) -> Rc<Process> {
            self.processes.borrow().get(&pid).unwrap().clone()            
        }
    }

    // Implementing Drop is now ok.
    impl Drop for Host {
        fn drop(&mut self) {
            println!("Dropped host!");
        }
    }
    
    // To create a weak reference for the host, we need its `Rc` wrapper; `self` is insufficient.
    // Since passing `self` would be redundant (and an opportunity to provide inconsistent inputs),
    // I pulled this out to a standalone function.
    pub fn spawn_process(host: &Rc<Host>) {
        let weak = Rc::downgrade(host);
        host.processes.borrow_mut().insert(*host.next_pid.borrow(), Rc::new(Process::new(weak)));
        *host.next_pid.borrow_mut() += 1;
    }
    
    pub struct Process {
        host: Weak<Host>,
    }
    
    impl Process {
        fn new(parent : Weak<Host>) -> Process {
            Process{host: parent}
        }
        pub fn run(&self) {
            // We need to upgrade the weak pointer to an Rc pointer. The `unwrap` will panic
            // at runtime if the `Host` no longer exists.
            let h = self.host.upgrade().unwrap();
            println!("Process method, accessing Host. Host's next_pid: {}", *h.next_pid.borrow())
        }
    }
}

{
    use weak_ref as m;
    use std::rc::Rc;
    let host = Rc::new(m::Host::new());
    m::spawn_process(&host);
    let process = host.get_process(0);
    process.run();
}
Process method, accessing Host. Host's next_pid: 1
Dropped host!
Out[5]:
()

What about threads?

So far all of the above doesn't use any locks. There are some run-time checks in Rc and RefCell, but they are lockless. RefCell does implement Send, meaning we can transfer it and its containing structs between threads, but Rc implements neither Send nor Sync. An Rc can't be transferred between threads because there could be other outstanding Weak or Rc objects referring to the same underlying data, and it wouldn't be safe to have those being accessed by different threads.

In [10]:
{
    use weak_ref as m;
    use std::rc::Rc;
    use std::thread;
    
    let h = Rc::new(m::Host::new());
    m::spawn_process(&h);
    let worker = thread::spawn(move || {
        let p = h.get_process(0);
        p.run();
    });
}
    let worker = thread::spawn(move || {
                 ^^^^^^^^^^^^^ `std::rc::Rc<weak_ref::Host>` cannot be sent between threads safely
    let worker = thread::spawn(move || {
        let p = h.get_process(0);
        p.run();
    });
                                within this `[closure@src/lib.rs:291:32: 294:6 h:std::rc::Rc<weak_ref::Host>]`
`std::rc::Rc<weak_ref::Host>` cannot be sent between threads safely

We could simply replace these with their thread-safe counterparts Arc and Mutex, but that would introduce a fair amount of locking overhead, which shouldn't be necessary given that the whole object graph conceptually only belongs to a single thread at once.

Unfortunately, with this design its difficult to prove to the Rust compiler that the whole object graph is transferred from one thread to another -- that there aren't any lingering references. We could circumvent Rust's thread-safety checks by using unsafe to "smuggle" a Host across threads as a raw pointer. This should be safe as long as we don't accidentally hang onto any references to the graph's internals, nor the graph stores references to any unlocked data structures that don't conceptually belong to that Host.

In [23]:
{
    use weak_ref as m;
    use std::rc::Rc;
    use std::thread;
    
    let h = Rc::new(m::Host::new());
    m::spawn_process(&h);
    // Smuggle the host into the worker thread as a pointer, disguised as an integer.
    // I *think* this is safe as long as graph of objects reachable from the host really
    // is partitioned from all other objects. e.g. Rust wouldn't know to stop us here if
    // we continued to hold another Rc to the host or its internals, but accessing those
    // objects before we smuggle ownership of the host back to this scope would be undefined behavior.
    let host_ptr = Rc::into_raw(h) as usize;
    let worker = thread::spawn(move || {
        let h = unsafe { 
            Rc::from_raw(host_ptr as *const m::Host)
        };
        let p = h.get_process(0);
        p.run();
        // Smuggle the host back to the scheduler.
        Rc::into_raw(h) as usize
    });
    let h = unsafe { Rc::from_raw(worker.join().unwrap() as *const m::Host) };
}
Process method, accessing Host. Host's next_pid: 1
Dropped host!
Out[23]:
()

The above approach is probably workable, but gives up some of Rust's safety. It's pretty much the model of how safety is maintained in the current C code, and it mostly works, but going with this model puts a damper on taking advantage of Rust's "fearless concurrency".

Passing the graph around explicitly

Instead of objects storing references to other parts of the object graph, we could pass them in as method parameters when appropriate. e.g. since the Host should be the root of the graph that any of these need to access, we could pass it around to most methods, and in the objects themselves only store identifiers that can be used to look up parts of the graph as needed (e.g. pids, tid's, etc).

Note though that from a Host method, we can't pass a mutable reference to the Host itself to a method on one of the child object it owns, since that'd invalidate the reference to the child:

In [29]:
mod passing_mutable_graph_root {
    use std::collections::HashMap;

    pub struct Host {
        processes : HashMap<i32, Process>,
    }
    
    impl Host {
        fn run_processes(&mut self) {
            for (_pid, process) in self.processes.iter() {
                process.run(self);
            }
        }
    }
    
    pub struct Process {}
    
    impl Process {
        fn run(&self, _host : &mut Host) {
        }
    }
}
                process.run(self);
                ^^^^^^^^^^^^^^^^^ mutable borrow occurs here
            for (_pid, process) in self.processes.iter() {
                                   ^^^^^^^^^^^^^^ immutable borrow occurs here
            for (_pid, process) in self.processes.iter() {
                                   ^^^^^^^^^^^^^^^^^^^^^ immutable borrow later used here
cannot borrow `*self` as mutable because it is also borrowed as immutable

So, we'd still need to pass around immutable references and use RefCell or similar to implement internal mutability.

Since we're not using Rc wrappers though, the Host doesn't actually have a way to return a reference to other objects it owns. Here's a failed attempt:

In [10]:
mod passing_immutable_graph_root {
    use std::collections::HashMap;
    use std::cell::{Ref, RefCell};

    pub struct Host {
        next_pid : RefCell<i32>,
        processes : RefCell<HashMap<i32, RefCell<Process>>>,
    }
    
    impl Host {
        pub fn new() -> Host {
            Host { next_pid : RefCell::new(0), processes : RefCell::new(HashMap::new())}
        }

        pub fn get_process<'a>(&'a self, pid : i32) -> Ref<'a, Process> {
            self.processes.borrow().get(&pid).unwrap().borrow()
        }
    
        pub fn spawn_process(&self) {
            let mut next_pid = self.next_pid.borrow_mut();
            let mut processes = self.processes.borrow_mut();
            (*processes).insert(*next_pid, RefCell::new(Process::new(*next_pid)));
            (*next_pid) += 1;
        }
    }
    
    pub struct Process {
        pid : i32,
    }
    
    impl Process {
        fn new(pid : i32) -> Process {
            Process{pid}
        }
        pub fn run(&self, host: &Host) {
            println!("Process method, accessing Host. Host's next_pid: {}", *host.next_pid.borrow())
        }
    }
}

{
    use passing_immutable_graph_root as m;
    let h = m::Host::new();
    h.spawn_process();
    let p = h.get_process(0);
    p.run(&h);
}
            self.processes.borrow().get(&pid).unwrap().borrow()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by the current function
            self.processes.borrow().get(&pid).unwrap().borrow()
            ^^^^^^^^^^^^^^^^^^^^^^^ temporary value created here
cannot return value referencing temporary value

Using arenas

The problem above is due to nested ownership. Since processes are nested under the host, we can't borrow a single process without also borrowing the host to which it belongs.

We can get it working by flattening out ownership, passing around a "dumb" arena object rather than the Host itself. This lets us do most/all of the RefCell manipulation when borrowing individual objects from the arena. As expected, we can safely transfer the arena between threads.

In [6]:
mod passing_arena {
    use std::collections::HashMap;
    use std::cell::{Ref, RefCell};
    
    pub struct HostArena {
        pub host : RefCell<Host>,
        pub processes : RefCell<HashMap<i32, RefCell<Process>>>,
    }
    impl HostArena {
        pub fn new() -> Self {
            Self { host: RefCell::new(Host::new()), processes : RefCell::new(HashMap::new())}
        }
    }
    
    pub struct Host {
        // Notice that internals of Host no longer necessarily need to be wrapped in RefCell,
        // since the host itself is wrapped in a RefCell.
        next_pid : i32,
    }
    
    impl Host {
        pub fn new() -> Host {
            Host { next_pid : 0 }
        }
    
        pub fn spawn_process(&mut self, host_arena : &HostArena) {
            let mut processes = host_arena.processes.borrow_mut();
            (*processes).insert(self.next_pid, RefCell::new(Process::new(self.next_pid)));
            (self.next_pid) += 1;
        }
    }
    
    pub struct Process {
        pid : i32,
    }
    
    impl Process {
        fn new(pid : i32) -> Process {
            Process{pid}
        }
        pub fn run(&self, host_arena : &HostArena) {
            println!("Process method, accessing Host. Host's next_pid: {}", host_arena.host.borrow().next_pid)
        }
    }
}

{
    use std::thread;

    // Set up the host arena with one process on one thread...
    use passing_arena as m;
    let host_arena = m::HostArena::new();
    {
        // Limit the scope of the mutable borrow.
        let mut host = host_arena.host.borrow_mut();
        host.spawn_process(&host_arena);
    }
    
    // Move the arena into a worker thread and run it there.
    // Since we're no longer using Rc, the arena is `Send`
    // (can be safely transferred between threads).
    let worker = thread::spawn(move || {
        let processes = host_arena.processes.borrow();
        processes.get(&0).unwrap().borrow().run(&host_arena);
    });
    worker.join();
}
Process method, accessing Host. Host's next_pid: 1
Out[6]:
()

Dealing with mutation cycles (e.g. fork)

This gives us the start of a workable model. We'll still need to be careful to avoid patterns of circular mutable borrows, which will result in panics. Such cycles could represent real bugs though, and such panics are preferable to the more subtle problems that might arise in corresponding C code.

One clear problem we'll run into is when the execution of a Process wants to call fork - i.e., create another process on the host. Since the collection holding the Process will have already been borrowed to call Process::execute, trying to insert a new Process into that collection will fail (ht @ stevenengler for spotting this). We have a few ideas for breaking the cycle in such cases:

  • Have the return-type of Process::execute be an Enum message to the parent Host which can request that the Host performs some action, such as adding a provided Process object to the HostArena, and then calls execute again to continue.
  • Have Process::execute push the new process and/or a callback onto an event queue to be handled later.
  • Instead of HostArena providing direct access to HashMaps of objects, create (or find) an abstraction that could keep lists of pending additions and deletions. It could provide an api like fn run_with<F: CallOnce>(&self, id: i32, f : F) that borrows the internal HashMap, borrows the object (checking for relevant pending insertions or deletions), runs the function, and then performs the pending operations if there are no other outstanding borrows. Likewise it could provide insert and delete operations that push to the corresponding operation lists if the main HashMap is currently borrowed.

We're still thinking if there are other approaches we could take, including other ways we can tweak Shadow's design to fit better into Rust's ownership model. Happy to hear ideas :)

social