Skip to main content

Designing for Reliability: N+1 Redundancy

Designing for Reliability: N+1 Redundancy

A mesh network is only as reliable as its weakest single point of failure. In graph theory terms, a node whose removal splits a connected graph into two or more disconnected components is called a cut vertex. Real-world Meshtastic deployments often develop cut vertices without operators realizing it - especially as networks grow organically. N+1 redundancy means ensuring that for every critical backbone node, at least one backup path exists so that the loss of any single node does not partition the network.

Identifying Critical Backbone Nodes

Start by drawing your network on paper or a whiteboard. Place each node as a dot and draw lines between nodes that can hear each other directly. Now ask: if I erase this dot, does the network split into two disconnected groups? Any node where the answer is yes is a cut vertex and a single point of failure.

For larger networks, the visual tool meshmap.net can import your mesh topology and highlight connectivity. Alternatively, run meshtastic --traceroute between nodes on opposite sides of a suspect backbone node - if the only path routes through that one node, it is critical.

Providing Backup Paths

Once critical nodes are identified, the fix is straightforward in concept: ensure each one has a backup. Two approaches work well in practice:

  • Second node at the same site: Place a second ROUTER or REPEATER node at the same location as the critical node. Both nodes cover the same area, so if one fails, the other continues forwarding. This costs hardware but is simple to maintain.
  • Alternate mesh path: Add a node at a different location that bridges the same two network segments. This is more work to plan but is more geographically robust - it protects against site-level failures (power outage, physical damage) rather than just single-node failures.

Testing Redundancy

Testing is non-negotiable. A backup path that exists on paper but has never been verified may fail in practice due to marginal signal, wrong node roles, or misconfigured hop limits. The test procedure is simple:

  1. Identify two nodes on opposite sides of the critical backbone node you want to test.
  2. Run a traceroute between them and record the path.
  3. Briefly power off the backbone node (or unplug its antenna to simulate failure).
  4. Run the traceroute again. The path should route around the powered-down node.
  5. If routing fails, the backup path is insufficient and must be improved before the critical node is trusted.

Schedule this test annually, or any time the physical environment changes significantly.

Network Mapping Tools

meshmap.net provides a visual overlay of node positions and can render connectivity based on neighbor data exported from your mesh. Use it to spot topological bottlenecks before they become outage events. The meshtastic --traceroute <destination_id> CLI command reveals the actual hop-by-hop path packets take in real time, which is the ground truth for redundancy verification.