commonware-runtime: The Foundation for Reliable and Performant Consensus
In blockchain engineering, there are few moments more stressful than the production release of a new mechanism (whether a novel consensus optimization or a tweak to peer discovery).
While Twitter fame invariably awaits the successful, failure too often means nights and weekends parsing thousands of log lines (on top of whatever is required to unhalt a borked network). The sting of wayward improvements has deterred many engineers, reasonably so, from modifying well-behaved code with emergent (and difficult to test) stability.
At @commonwarexyz, we believe the relationship between engineers and mechanism iteration must be (substantially) advanced for onchain applications to achieve widespread adoption. If developers are forced to shy away from frequent and aggressive iteration, the onchain future we all hope for will remain just that ... an onchain future.
commonware-runtime: 1 Trait, 2 Dialects
As an early step towards this goal, we are excited to release commonware-runtime (Apache-2 and MIT), a new primitive for the configurable execution of concurrent Rust. The commonware-runtime::deterministic dialect enables code to be deterministically simulated (from a user-provided seed) without kernel-level virtualization. The commonware-runtime::tokio dialect allows for the same code (without modification) to be executed in production (using Tokio).
Want to develop your own simulator or novel production executor? No worries! Just implement the exported traits from commonware-runtime and you can drop in your runtime to any of the Commonware Library primitives. Want to employ commonware-runtime in your own application? Confirm your application behaves deterministically using commonware-runtime::deterministic::Auditor.
What is Deterministic Simulation and Why Does it Help?
Consider a not so uncommon test where 100 different agents send unique messages to each other. The way those messages are constructed (from external randomness, clocks, and storage) and the order that messages are sent has a significant impact on outcome. Sending a message before an agent is initialized means it is dropped. Delaying the delivery of a message until after some timeout leads to the sender being penalized.
What if you could control the order in which these agents interact and the external sources they depend on? With integrated deterministic simulation, you can run the same test with different sources of entropy to shuffle the order of agent execution (broadening exploration of the state space) and, critically, can reproduce any failing result (no more waiting for that random flake to occur again).
Introducing Deterministic Simulation to commonware-p2p
To demonstrate how to use commonware-runtime (and for our own sanity), we upgraded commonware-p2p to this new runtime. We can now deterministically simulate a collection of peers discovering each other and exchanging encrypted messages (from initialization to shutdown) all locally. In fact, we now run commonware-p2p::authenticated connectivity tests (25 peers) with ~10 different seeds during each run of CI (leading to different sequences of peer discovery, message passing, and ultimately simulation termination).
Basic unit testing is just the start of what we have planned for commonware-runtime. On the path to the production launch of the Commonware Library (and after), we will continuously test the stability of commonware-p2p and commonware-consensus with random topologies (honest/byzantine behavior, and network conditions) to ensure compatibility across versions and to test for regressions (as measured by observed bandwidth + number of steps required to complete).
Here's to one less Saturday night spent debugging.🍻
Want to help? Commonware is looking for a small group of founding engineers (1-10% equity). If you enjoy the never-ending challenge of building performant distributed systems, reach out to @commonwarexyz or hiring@commonware.xyz with a link to some of your most interesting work!
Acknowledgements
Our love of deterministic simulation stems from Will Wilson's (Antithesis) solution to Mario and TigerBeetle's "Viewstamped Replication Made Famous" challenge. Thanks to both for doing an incredible job explaining how advanced simulation leads to more reliable distributed systems.