Building High-Availability SD-WAN Architectures

July 3, 2025

Design an HA SD-WAN infrastructure to eliminate single points of failure.

High Availability in Fortinet SD-WAN Architectures: A Practical Guide

Significantly under-caffeinated at my desk on my third cup of coffee of the day (and yes, that’s the right one) I find myself thinking about high-availability (HA) in SD-WAN architectures. Especially Fortinet SD-WAN. Ran networking since 1993, when the internet was first coming into homes, starting as a network admin. At that time, it was about muxing voice and data over PSTN lines, this was even before the first worms slithered into our system (Slammer worm was like a slap in the face, taught me early downtime wasn’t a choice).

Fast forward to today—when I run my own security firm PJ Networks and support three banks in their most recent migrations to zero-trust architectures—making sure you have constant connectivity through a HA SD-WAN setup is not something you SHOULD DO, but something you’d need to DO to stay alive.

HA Concepts: So, It Still Matters

High availability has your network (or pieces of it) keep operating even if some bits of it fail. Sounds simple, right? But there’s nuance here, particularly when you’re talking about SD-WANs, which let’s face it, are more complex animals than a good old router.

Here’s the catch: in the old world of networking, HA was all about hardware redundancy: two routers, one active one standby, and standback. With SD-WAN, it’s about software intelligence taking the best route dynamically as well.

Redundancy is the core. If one of those connections goes down—MPLS, broadband, LTE—traffic has to route around it, without you really knowing. That’s the endgame.

Active-Passive vs Active-Active: What’s Your Pick?

This debate is as old as networking itself. Spoiler: neither is a silver bullet.

Active-Passive: One link is active, the other is a backup.
Active-Active: The traffic is forwarded actively in all links and the load is balanced among them.

On Fortinet SD-WAN, I find active-active gets more utilization of bandwidth but can complicate failure detection, and also couples you into jitter and re-ordering (more so for voice) penalties/features.

Active-passive? Simpler failover mechanics. But sometimes your passive link is just a fancy paperweight waiting for the worst to happen.

For the banks that we assisted, a hybrid model was most effective. Active-active for the critical data centers, intelligent active-passive for the remote branches where unreliable broadband is predictable. Flexibility here is key.

Link Redundancy: It’s Not The Number Of Links, It’s What You Do With Them

You would perhaps assume more links means greater redundancy. Yes and no.

Some links are more includable than others:

MPLS: Steady and low jitter, but costly.
Broadband: Cheap and fast, with the occasional hiccup.
LTE/5G: Great as a backup, but watch for data caps and latency.

PJ Networks will always evaluate these important links prior to designing your topology. The question we ask ourselves is what happens if that wire gets cut? Or your ISP has an outage?

Sometimes, redundancy isn’t another link or a third, better-spanning tree (Hank) but better tuning of your existing paths to include better monitoring and fault resilience.

Monitor end to end latency, packet loss and jitter in real-time using Fortinet’s inbuilt performance SLAs.
Set up link health detection on a tight loop – and that doesn’t mean ping; that means things like TCP or UDP tests that can catch silent failures.

This is basic? Maybe. But so many people just use ping, which lies.

What Keeps Me Up At Night: A Non-Exhaustive Failure Scenario

Been there. Architected dozens of Fortinet SD-WAN architectures with HA to still find faults where you wouldn’t think it exists.

For example:

ISP failover was activated only after the application timed out.
Route instability caused by (up-down) flap of link.
Break session during failover, especially for VPN tunnels.
VoIP calls getting affected by out-of-order packets because of load balancing.

I have a client, mid-tier bank, lost 15 minutes during a peak hour because their HA was pretty much in theory. We revamped design, added more aggressive failover timers, and honed path health algorithms as part of the SD-WAN.

This is my takeaway: test your failure scenarios BEFORE deploying. Or regret it later.

Testing Failover Drills Are Not Optional

PJ Networks didn’t write the book on HA topologies but we do live by them. We practice failover drills routinely (ya, like fire drills but for your network). It is the only way to detect those subtle issues that sneak through simulations.

What we do:

You will also be able to schedule failure tests on every WAN link.
Simulate partial outages and measure how fast failover would be.
Verify session state and VPN tunnel failover.
Leverage SD-WAN dashboards to assess traffic shift.

Our 24×7 monitoring plays to that— after all, sometimes the question isn’t just if fail over works, but when it’s running. Too late = bad. Too early = annoying flapping.

PJ Network Drills: Our Secret Sauce

Look—I admit it. So shats in your service also have value Early in my career failure scenarios all just felt like theory until I was woken up in the middle of the night because, oh yeah, the system I’d built that stopped my customer from being able to send anyone money was a rather large US bank’s entire payment rail. As a result, at PJ Networks, we’ve made failover drills a standard part of our operating procedures.

We’ve seen some eye-openers:

A Fortinet box misconfigured with asymmetric routing leading to blackholing traffic after a failover.
Too much dependence on link metrics without user-level monitoring.
Underrated cross firewall session processing on path changes.

Our drills are what allow us to catch them before our clients experience them.

A Few Hard Truths

No architecture is perfect. If someone tells you they have a network that has zero downtime, just smile and walk away.
The AI-powered label? Meh. AI can be part of the solution, but I’ve seen vendors overpromise with sometimes ridiculous benchmarks while basic configuration hygiene is ignored.
Password policies with an expiration that’s too aggressive? Annoying. Not effective.
A few old tech rules can still outperform shiny modern gadgets.

Here’s the deal: Fortinet SD-WAN HA is a strong but nuanced feature that requires careful planning, real-world testing, and continual tuning.

Quick Take

High availability in SD-WAN is not plug-and-play – it requires thoughtful design and thorough testing.
Pick your active-passive or active-active models according to your use case, not the buzzwords waved your way.
Don’t just add more links. Monitor their health smartly.
Perform real failovers drills on a regular basis and 24×7 monitoring.
PJ Networks has years of experience that has Perfected from PSTN days to DefCon hacking villages practices to deliver solid infrastructure.

Final Thought

Running a secure and highly available SD-WAN is about more than tech. It’s knowing that failure will occur, and being prepared for it. You know, because like, if your network is down, how’s your security even doing?

If you have questions or want to talk about your HA design, let me know! Believe me — I’ve learned the hard way that getting it right can save you a lot of headaches, money and even, yes, sleep.

Sanjay Seth

High Availability in Fortinet SD-WAN Architectures: A Practical Guide

HA Concepts: So, It Still Matters

Active-Passive vs Active-Active: What’s Your Pick?

Link Redundancy: It’s Not The Number Of Links, It’s What You Do With Them

What Keeps Me Up At Night: A Non-Exhaustive Failure Scenario

Testing Failover Drills Are Not Optional

PJ Network Drills: Our Secret Sauce

A Few Hard Truths

Quick Take

Final Thought

Sanjay Seth

What's your reaction?

Related Posts

Patched but Still Hacked: Why CVE-2025-59718 Changes How We Think About Firewall Security

How to Implement Microsegmentation with Fortinet Firewalls for Server Security

Understanding Netskope: A Comprehensive Guide

Login

Register

Recover your password.