Planned Maintenance Procedures for Live Networks
Planned Maintenance Procedures for Live Networks
Taking a backbone node offline for maintenance—whether for firmware updates, hardware replacement, or antenna adjustments—affects the users routing through it. With proper planning, that impact can be reduced to a brief interruption rather than a prolonged outage. This page describes a repeatable maintenance procedure for Meshtastic backbone nodes in active community networks.
Pre-Maintenance Checklist
Complete these steps before any planned maintenance window:
- Notify the community: Post advance notice (24-48 hours) in your community communication channels—Discord, Signal group, or whatever your community uses. Include the node name, scheduled time, and estimated duration. Users who depend on that backbone link can plan accordingly.
- Verify backup paths: Run traceroutes from nodes on either side of the target to confirm alternate routing exists. If no backup path is available, consider deploying a temporary node before taking the backbone node offline.
- Schedule during low-traffic hours: 2-5 AM local time is typically the quietest window for community mesh networks. Emergency networks may have different quiet windows—check your message logs to identify the lowest-traffic period.
- Document current configuration: If replacing hardware, record all node settings (node name, channel configuration, role, hop limit) so the replacement can be configured identically before going live.
During Maintenance
Where possible, power down gracefully rather than performing a hard power-off. A graceful shutdown allows the node to stop transmitting cleanly and reduces the chance of a neighbor node entering a routing loop while searching for the missing node. For solar-powered nodes, the practical approach is to disconnect the load output of the charge controller rather than physically unplugging the node in the dark.
Perform the maintenance task—firmware flash, hardware swap, antenna replacement—as quickly as practical. Every additional minute of downtime increases the chance of a user encountering a failed message delivery.
Post-Maintenance Verification
Before declaring the node returned to service, verify it from multiple directions:
- Send a test message from a node that previously routed through the maintained node and confirm delivery.
- Run traceroutes from multiple directions to confirm the node is routing normally.
- Confirm the room server sees the node as online and shows recent activity.
- Update your network maintenance log with the date, work performed, and any configuration changes.
Emergency Rollback Procedure
If a replacement node does not work as expected—wrong firmware, hardware fault, or configuration error—act quickly. Restore the original node if it is still functional, or connect a known-good spare configured to match the original settings. If neither is possible, notify the community immediately that the outage is extended and provide an estimated restoration time. Keeping one spare node per critical site is strongly recommended for networks where reliability is important.
After any unplanned extended outage, conduct a brief post-incident review: what failed, why, and what process change would prevent recurrence. Even a one-paragraph note in the maintenance log is valuable for future operators.