Designing for Redundancy
Why Single-Path Mesh Is Fragile
A tree-topology mesh - where each node has exactly one path back to the network core - is the natural shape that forms when coverage is barely adequate. In a tree topology, the failure of any interior node partitions the network. Nodes "below" the failed repeater become an isolated island: they can hear each other but cannot reach the wider network.
For a community mesh serving emergency communications, this is unacceptable. The very events that trigger heavy mesh use (storms, earthquakes, infrastructure failures) are also the events most likely to take repeaters offline through power loss, physical damage, or access loss.
Per-Zone (+1) Redundancy
Note on terminology: in classic systems engineering, N+1 redundancy means N components are needed to carry the load plus one shared spare for the whole system. A mesh network needs something stronger than that — a single shared spare for the entire network does not help if the spare cannot reach the area that lost its repeater. What we want for a mesh is per-coverage-zone redundancy: every coverage area should have a backup path, not just the network as a whole. In practice, this translates to:
Every node in the network should be able to reach the network core via at least two independent paths through different physical repeaters.
A "network core" is the set of nodes with Internet gateway access or the central coordination point. In a city mesh, the core might be 3 - 5 well-connected anchor repeaters.
To verify per-zone redundancy for a given node:
- Identify all repeaters that node can directly reach (RSSI > −120 dBm).
- For each of those repeaters, confirm it has at least one other path back to core.
- If the node can only reach a single repeater, it has no redundancy. If that repeater fails, the node is isolated.
Ring Topology vs. Tree Topology
Tree Topology
Nodes connect to the nearest repeater, which connects to the nearest anchor, which connects to core. This forms a tree. Advantages: simple to plan and understand. Disadvantages: any broken branch isolates all nodes below it. Single points of failure are everywhere.
Core
├── Anchor A
│ ├── Repeater A1
│ │ └── Client nodes (isolated if A1 fails)
│ └── Repeater A2
└── Anchor B
└── Repeater B1
└── Client nodes (isolated if B1 or Anchor B fails)
Ring Topology
Anchor repeaters are interconnected in a ring so that each anchor has two paths back to core. Fill repeaters connect to two anchors where physically possible. This creates a lattice rather than a tree.
Core ── Anchor A ── Anchor B ── Anchor C ── Core
\ | /
Repeater Fill Repeater
(hears A (hears (hears B
and B) B and C) and C)
Ring topology requires more careful planning and more anchor sites (each anchor must be within radio range of two others), but it eliminates the single points of failure that make tree topologies fragile.
Recommendation: Design anchor-tier repeaters in a ring or lattice. Fill-tier repeaters can remain in a simplified tree to anchor, but each fill node should reach at least two anchors where terrain permits.
Identifying Single Points of Failure with Path Analysis
A single point of failure (SPOF) is any node whose failure disconnects part of the network. Identify SPOFs through path analysis:
- Draw the network graph. Each repeater is a node. Each radio link between repeaters is an edge. Include only reliable links — for example, those with about 15 dB of margin above the receiver's sensitivity. With a typical LongFast/LongSlow sensitivity near −130 dBm, that means including only links measured at roughly −115 dBm or stronger. (Adjust the cutoff to your preset's actual sensitivity floor.)
-
Find the bridge nodes (cut vertices). A cut vertex is any node whose removal
disconnects the graph. You can find these visually: any node that is the only link
between two clusters is a single point of failure. If you have your topology in software, a
graph tool (such as Gephi, or Python's
networkx.articulation_points()) will list them automatically — but for most community meshes, eyeballing the map for any node that is the sole bridge between two groups is enough. - Prioritise SPOF mitigation. For each SPOF identified, either add a redundant link (find a fill repeater position that bypasses the SPOF) or ensure the SPOF node has UPS backup power, weatherproof housing, and remote monitoring.
Testing Redundancy by Taking a Node Offline
Theoretical redundancy analysis should be validated with live tests. The procedure is simple:
- Notify operators. Announce a planned maintenance window (e.g., "Node X will be taken offline for 30 minutes on Saturday 14:00 UTC for redundancy testing").
- Take the target node offline by powering it down or disconnecting its antenna.
- Measure impact. Using a network map (MeshMapper, Meshtastic node list, or MeshCore admin panel), observe which nodes lose connectivity. Nodes that disappear from the map are isolated - this is your actual failure impact, which may differ from the theoretical prediction.
- Document the partition. Record which nodes were isolated and for how long they would be unreachable in a real failure event.
- Restore the node and plan remediation for any isolated segments found.
Perform redundancy tests at least once per year, and after any significant change to the network topology (adding or removing anchor repeaters, significant coverage expansion).
Practical Redundancy Checklist
- Every anchor repeater can reach at least two other anchors directly.
- Every fill repeater can reach at least two anchors directly.
- No anchor repeater is a single point of failure for more than one fill repeater.
- All anchor repeaters have UPS or generator backup covering at least 72 hours.
- Network graph has been drawn and cut vertices identified.
- Redundancy live-test performed in the last 12 months.
- Failure impact documented: "If node X fails, Y nodes lose connectivity."
No comments to display
No comments to display