d-engine: A Lightweight Distributed Coordination Engine for Rust

2026-01-16 03:57
443 views

A lightweight Raft implementation designed for embedding into Rust applications — the consensus layer for building reliable distributed systems. Built with a simple vision: make distributed coordinati...

A lightweight Raft implementation designed for embedding into Rust applications — the consensus layer for building reliable distributed systems.

Built with a simple vision: make distributed coordination accessible - cheap to run, simple to use.

Built on a core philosophy: choose simple architectures over complex ones.

Architecture

Single-threaded event loop: No race conditions, strict ordering. CPU-bound on single core—scale horizontally.

Role separation (SRP): Leader/Follower/Candidate/Learner each handle their own logic. Main loop just routes events.

Standard design for Raft implementations.

Two Integration Modes

Embedded mode - runs inside your Rust process:

let engine = EmbeddedEngine::start().await?;
engine.wait_ready(Duration::from_secs(5)).await?;;
let client = engine.client();
client.put(b"key".to_vec(), b"value".to_vec()).await?; // <0.1ms
  • Direct memory access via mpsc channels (no serialization)
  • <0.1ms latency for local operations
  • Single binary deployment
  • Benchmark: 203K writes/s, 279K reads/s (M2 Mac)

Standalone mode - separate server process with gRPC:

d-engine = { version = "0.2", features = ["client"], default-features = false }
  • Language-agnostic (Go, Python, Java, Rust)
  • Independent scaling
  • Benchmark: 64K writes/s, 12K reads/s

Choose based on your language and latency requirements.

Benchmarks

Embedded mode (M2 Mac, single machine):

  • 203K writes/s, 279K reads/s (linearizable)

Standalone mode (gRPC):

  • 64K writes/s, 12K reads/s (linearizable)

d-engine_bench_comparison_v0.2

Lab numbers only—production performance varies by workload and hardware.
Full details in benches/.

Design: Traits for Extension

d-engine provides working implementations (RocksDB storage, KV operations, gRPC transport). When defaults don't fit, implement the traits:

pub trait StorageEngine {
type LogStore: LogStore;
type MetaStore: MetaStore;
}
pub trait StateMachine {
async fn apply_chunk(&self, entries: Vec<Entry>) -> Result<()>;
}

Examples in repo show Sled storage backend, custom HTTP handlers with HAProxy for HA deployments.

Consistency Model

Three read policies (configurable per-operation or server-wide):

  • LinearizableRead: Strongest guarantee, verify quorum (~2ms)
  • LeaseRead: Leader lease-based, fast path (~0.3ms, requires NTP)
  • EventualConsistency: Local read, may be stale (~0.1ms)

Trade-offs documented. Defaults are sane.

Start Simple, Scale When Needed

// Start with 1 node (auto-elected leader)
let engine = EmbeddedEngine::start().await?;
// Scale to 3 nodes later: update config, zero code changes
// See examples/single-node-expansion

No Kubernetes, no complex setup. Just Rust + config file.

What It Is

  • Raft consensus implementation
  • Pluggable storage (RocksDB, Sled, custom)
  • Flexible consistency (linearizable/lease/eventual)
  • Production-ready core, API stabilizing toward v1.0

Current State & Direction

Version 0.2

Core Raft engine is production-ready (1000+ tests, Alpha version been TLA+ & Jepsen validated).
APIs are stabilizing toward v1.0. Pre-1.0 means breaking changes are acceptable if
they improve design.

Future Direction:

  • Cloud-native deployment (Cloudflare, AWS, GCP storage backends). The specific timeline depends on real-world use cases from early adopters.
  • Also exploring: etcd-compatible API layer.

Try It

[dependencies]
d-engine = "0.2"

If you're building distributed systems in Rust and need consensus, the code is there. Examples show embedded mode, standalone mode, custom storage, HA deployments with HAProxy.

Looking for Early Adopters

I'm looking for Rust developers building distributed systems who have real problems to solve:

  • Coordination bottlenecks in your architecture (slow consensus, expensive etcd clusters)
  • Specific use cases where strong consistency matters (leader election, distributed locks, metadata stores)
  • Production deployments where you need cheap, simple, and reliable coordination

I'm particularly interested in:

  • Production deployments (not toy projects)
  • Cloud-native scenarios (Cloudflare Workers, AWS Lambda, serverless patterns)
  • Cost-sensitive use cases where etcd is overkill

If you have a specific problem: Open an issue with your use case, or reach out directly.
Show me what's broken, what's expensive, what's too complex. Let's see if d-engine can solve it cheaply.

License: MIT or Apache-2.0 | Platforms: Linux, macOS | MSRV: Rust 1.88