Skip to main content

Designing for Redundancy

Why Single-Path Mesh Is Fragile

A tree-topology mesh - where each node has exactly one path back to the network core - is the natural shape that forms when coverage is barely adequate. In a tree topology, the failure of any interior node partitions the network. Nodes "below" the failed repeater become an isolated island: they can hear each other but cannot reach the wider network.

For a community mesh serving emergency communications, this is unacceptable. The very events that trigger heavy mesh use (storms, earthquakes, infrastructure failures) are also the events most likely to take repeaters offline through power loss, physical damage, or access loss.


N+1 Redundancy

N+1 redundancy means that every coverage area is served by at least N+1 repeaters where N is the minimum needed for coverage. In practice, for a mesh network, this translates to:

Every node in the network should be able to reach the network core via at least two independent paths through different physical repeaters.

A "network core" is the set of nodes with Internet gateway access or the central coordination point. In a city mesh, the core might be 3 - 5 well-connected anchor repeaters.

To verify N+1 for a given node:

  1. Identify all repeaters that node can directly reach (RSSI > −120 dBm).
  2. For each of those repeaters, confirm it has at least one other path back to core.
  3. If the node can only reach a single repeater, it has no redundancy. If that repeater fails, the node is isolated.

Ring Topology vs. Tree Topology

Tree Topology

Nodes connect to the nearest repeater, which connects to the nearest anchor, which connects to core. This forms a tree. Advantages: simple to plan and understand. Disadvantages: any broken branch isolates all nodes below it. Single points of failure are everywhere.

Core
 ├── Anchor A
 │ ├── Repeater A1
 │ │ └── Client nodes (isolated if A1 fails)
 │ └── Repeater A2
 └── Anchor B
 └── Repeater B1
 └── Client nodes (isolated if B1 or Anchor B fails)

Ring Topology

Anchor repeaters are interconnected in a ring so that each anchor has two paths back to core. Fill repeaters connect to two anchors where physically possible. This creates a lattice rather than a tree.

Core ── Anchor A ── Anchor B ── Anchor C ── Core
 \ | /
 Repeater Fill Repeater
 (hears A (hears (hears B
 and B) B and C) and C)

Ring topology requires more careful planning and more anchor sites (each anchor must be within radio range of two others), but it eliminates the single points of failure that make tree topologies fragile.

Recommendation: Design anchor-tier repeaters in a ring or lattice. Fill-tier repeaters can remain in a simplified tree to anchor, but each fill node should reach at least two anchors where terrain permits.


Identifying Single Points of Failure with Path Analysis

A single point of failure (SPOF) is any node whose failure disconnects part of the network. Identify SPOFs through path analysis:

  1. Draw the network graph. Each repeater is a node. Each radio link between repeaters is an edge. Include only links with at least −115 dBm (15 dB margin; reliable links only).
  2. Find cut vertices. A cut vertex is any node whose removal disconnects the graph. In graph theory this is computed with DFS (depth-first search). In practice, you can identify them visually: any node that is the sole bridge between two subgraphs is a cut vertex and therefore a SPOF.
  3. Prioritise SPOF mitigation. For each SPOF identified, either add a redundant link (find a fill repeater position that bypasses the SPOF) or ensure the SPOF node has UPS backup power, weatherproof housing, and remote monitoring.

Testing Redundancy by Taking a Node Offline

Theoretical redundancy analysis should be validated with live tests. The procedure is simple:

  1. Notify operators. Announce a planned maintenance window (e.g., "Node X will be taken offline for 30 minutes on Saturday 14:00 UTC for redundancy testing").
  2. Take the target node offline by powering it down or disconnecting its antenna.
  3. Measure impact. Using a network map (MeshMapper, Meshtastic node list, or MeshCore admin panel), observe which nodes lose connectivity. Nodes that disappear from the map are isolated - this is your actual failure impact, which may differ from the theoretical prediction.
  4. Document the partition. Record which nodes were isolated and for how long they would be unreachable in a real failure event.
  5. Restore the node and plan remediation for any isolated segments found.

Perform redundancy tests at least once per year, and after any significant change to the network topology (adding or removing anchor repeaters, significant coverage expansion).


Practical Redundancy Checklist

  • Every anchor repeater can reach at least two other anchors directly.
  • Every fill repeater can reach at least two anchors directly.
  • No anchor repeater is a single point of failure for more than one fill repeater.
  • All anchor repeaters have UPS or generator backup covering at least 72 hours.
  • Network graph has been drawn and cut vertices identified.
  • Redundancy live-test performed in the last 12 months.
  • Failure impact documented: "If node X fails, Y nodes lose connectivity."