Skip to Content

Day 6 — Supplement A: Dynamic Routing Protocols

Day 6 — Supplement A: Dynamic Routing Protocols

Foundation Networking Course | Civilian Track Companion to Day 6 — Routing Protocol Fundamentals Document Version: v1.1

Document Map

This supplement is organized into seven Stages covering the dynamic routing material end-to-end. Part numbering restarts at 1 since this is a new day.

  • Stage I — Why Dynamic Routing Exists

    • Part 1 — The Problem Static Can't Solve

    • Part 2 — AS / IGP / EGP

  • Stage II — Distance-Vector

    • Part 3 — The Two Families

    • Part 4 — Neighbor Relationships

    • Part 5 — Bellman-Ford in Action

    • Part 6 — Periodic Updates

  • Stage III — Distance-Vector Failure Modes

    • Part 7 — How Loops Form

    • Part 8 — Count to Infinity

    • Part 9 — Loop Prevention

  • Stage IV — Link-State

    • Part 10 — A Different Approach

    • Part 11 — LSA Flooding

    • Part 12 — Dijkstra SPF

  • Stage V — Comparing the Two

    • Part 13 — Side by Side

    • Part 14 — Which to Use When

  • Stage VI — Convergence

    • Part 15 — Six Phases

    • Part 16 — Optimization

  • Stage VII — Metrics and Decisions

    • Part 17 — Metrics

    • Part 18 — AD vs Metric

Purpose of This Supplement

Day 5's supplements built up to one big idea: a routing table is just a list of "destination → next hop" entries, and there are only two ways those entries get there. Either a human types them in (static routing — covered in Day 5 Supplement C), or the device learns them from other devices (dynamic routing — handed off to this document).

Day 6 is that second half of the story. This supplement walks through, in the same step-by-step style as the Day 5 supplements, how dynamic routing actually works — what problem it solves, the two main families of protocols (distance-vector and link-state), the algorithms they use, the failure modes they have, and how networks use them in practice.

The single most important framing for this entire day:
A routing protocol does not replace the routing table — it builds it. The protocol is the mechanism that exchanges information between routers and produces the entries in the routing table. The router then uses that routing table — exactly the way Day 5 Supplement C described — to forward packets. Routing protocols and routing tables are two different things, working together: the protocol fills the table, and the table forwards the packets.

By the end of this supplement, a participant should be able to explain — in plain language — what RIP, OSPF, EIGRP, and IS-IS each do differently, why some networks use one and not another, how a routing loop forms and how it's prevented, what "convergence" actually means, and why administrative distance and metric are two separate decisions made in a specific order.

Stage I — Why Dynamic Routing Exists

The motivation. Day 5 Supplement C ended with "static doesn't scale and doesn't adapt." This stage expands on that and frames the problem space dynamic routing protocols were invented to solve.

Part 1 — The Problem Static Can't Solve

Why Routing Protocols Exist at All

Day 5 Supplement C already gave the short version: static routing doesn't scale and doesn't adapt. Let's expand on both of those, because they're the only two reasons dynamic routing exists.

The Scaling Problem

Imagine a network with 50 routers. Each router connects to several networks. Adding a single new subnet anywhere in this network requires logging into every other router and adding the same static route. If the network has 50 routers and 200 subnets, that's potentially 10,000 manually-configured static routes to maintain. Add one new subnet: 50 more route entries. Delete one: 50 more changes to make, one by one.

This isn't just tedious — it's error-prone. A typo on one router means traffic to that subnet silently fails from that router. A missed router means inconsistent reachability. At scale, manually maintaining routing information becomes mathematically infeasible.

The Adaptation Problem

A static route assumes its next hop is always reachable. If the link to that next hop fails, the route stays in the routing table, packets continue being sent into the void, and they're silently dropped. There's no automatic mechanism for the device to notice the failure and try a different path.

If there is a different path available (a backup link, a different router that knows another route), static routing can't take advantage of it without human intervention. Someone has to log in and manually remove the dead route, manually add a new one, and do it on every affected device.

In a real production network — where uptime matters, where links fail, where traffic must be rerouted automatically — this is unacceptable.

What Dynamic Routing Lets You Do

Dynamic routing protocols solve both problems at once. A routing protocol is a mechanism that lets routers automatically:

  • Discover which networks are reachable, and through which paths

  • Share that information with their neighboring routers

  • Detect when a link or router fails, and recalculate paths around the failure

  • Adapt to topology changes without administrator intervention

That's it. Those four functions are the entire reason dynamic routing exists. Everything else — the specific protocols, the algorithms, the metrics, the convergence behavior — is just how different protocols accomplish these four things differently.

Routing Protocol vs. Routed Protocol — A Quick Terminology Note

Worth being precise about one easily-confused distinction:

  • routed protocol is what carries user data. IP is the routed protocol — it's the thing that gets routed.

  • routing protocol is what carries information about routes. OSPF, RIP, EIGRP, BGP — these are routing protocols. They don't carry your data; they carry routing information that helps routers learn where to send your data.

Routing protocols run on top of the routed protocol they're advertising. OSPF messages travel inside IP packets. BGP messages travel inside TCP connections (which themselves are inside IP packets). The routing protocol's job is to tell other routers about IP destinations, so that those routers can correctly forward IP traffic. The routed protocol (IP) is the cargo; the routing protocol (OSPF, RIP, etc.) is the manifest.

Part 2 — AS / IGP / EGP

The Organizational Structure

Before we look at specific protocols, there's an organizational reality of the internet that shapes how routing protocols are designed. The internet isn't one big network — it's tens of thousands of separately-operated networks that have agreed to interconnect.

Autonomous Systems

Each of those independently-operated networks is called an Autonomous System (AS). An AS is a network — or a collection of networks — under a single administrative control, with its own routing policy.

Examples of Autonomous Systems:

  • An ISP (your local ISP, a backbone carrier like Lumen or NTT)

  • A large enterprise (Google, Microsoft, a university, a multinational corporation)

  • A government network

  • A content delivery network

Each AS has its own unique number assigned by IANA — the AS Number (ASN). For example, Google operates AS 15169. AT&T operates AS 7018. Your ISP has one. These numbers are globally unique, like IP addresses, so any AS in the world can be referenced unambiguously.

Inside one AS, the administrator decides how routing works — which protocols to run, which paths to prefer, what the addressing structure looks like. Between different ASes, there has to be a standard way for them to advertise reachability to each other.

This creates two different routing "domains," and they have different needs.

IGP — Interior Gateway Protocols

Inside one AS, routing happens between routers under the same administrative control. The router operator knows the full topology, controls all the devices, and wants fast convergence and accurate path selection above all else.

The routing protocols designed for this case are called Interior Gateway Protocols (IGPs). Examples include:

  • OSPF (Open Shortest Path First) — the most widely used IGP in enterprises

  • EIGRP (Enhanced Interior Gateway Routing Protocol) — Cisco's protocol, common in Cisco-heavy networks

  • IS-IS (Intermediate System to Intermediate System) — used in large ISP cores

  • RIP (Routing Information Protocol) — older, simpler, mostly legacy now

All four are IGPs. They differ in their mechanics — distance-vector vs. link-state, metric calculation, convergence speed — but they all share the assumption that they're running inside one administrative domain.

EGP — Exterior Gateway Protocols

Between different ASes, the situation is different. There's no shared administrative control, no shared topology view, and routing decisions need to reflect commercial agreements between ASes as much as technical "shortest path" considerations.

The routing protocols designed for this case are called Exterior Gateway Protocols (EGPs). Today, there is exactly one EGP in active use: BGP (Border Gateway Protocol).

BGP was already covered in detail in Day 5 Supplement B (Part 12) — including how it carries route advertisements between ASes, how policy-based routing works, and how the entire public internet is built on BGP-learned routes between thousands of ASes. This supplement won't re-explain BGP in depth; refer to Day 5 Supplement B Part 12 for that material.

Why the IGP/EGP Distinction Matters

The mechanics of an IGP and the mechanics of an EGP are different because the problems they solve are different. Inside an AS, you want speed and accuracy. Between ASes, you want policy control and scalability.

For the rest of this supplement, we focus on IGPs — specifically on the two algorithmic families that all IGPs belong to: distance-vector and link-state. That's where Day 6's main content lives.

Stage II — Distance-Vector

The first of the two protocol families. This stage covers the conceptual framing of both families, the neighbor-relationship mechanics they share, and then dives into distance-vector specifically — Bellman-Ford propagation and the periodic-update model.

Part 3 — The Two Families

Distance-Vector and Link-State

There are only two fundamental ways to design an IGP. Every IGP in existence belongs to one of two families. Understanding these two families is the conceptual key to all of Day 6.

Distance-Vector

A distance-vector router knows only two things about each destination:

  1. The distance — some metric measuring how "far" the destination is (hop count for RIP, a composite of bandwidth and delay for EIGRP).

  2. The direction — which neighbor to send packets to in order to reach it (the next hop).

That's it. A distance-vector router does not know the topology of the network beyond its immediate neighbors. It doesn't know what's behind those neighbors, or what the network looks like as a whole. It just trusts its neighbors when they say "I can reach destination X at distance Y."

How it shares information: A distance-vector router periodically sends its entire routing table to its directly connected neighbors. The neighbors apply the same logic — if a neighbor's "distance to X" plus the cost to reach that neighbor is better than what they currently have, they update.

The standard analogy: Imagine you're driving and you only have road signs to guide you. Each sign tells you the next town and the distance to it. You don't have a map of the whole country — you just trust that following the signs gets you to your destination. That's distance-vector: you know the next hop and the distance, nothing more.

Link-State

A link-state router knows the entire topology of the network. Every router in the same area maintains an identical "map" of the network — every router, every link, every link's cost. This map is called the Link-State Database (LSDB).

How it shares information: A link-state router doesn't share its routing table. Instead, it shares information about its own directly connected links — what links it has, what they cost, and which neighbors are on each. These advertisements are called LSAs (Link-State Advertisements), and they're flooded through the network so every router gets a copy. The LSDB is built by each router collecting these LSAs from every other router in the area.

Once the LSDB is built (and it's identical on every router in the area), each router runs an algorithm called Dijkstra's Shortest Path First (SPF) on its own copy of the LSDB, computing the best path from itself to every other destination. The result is a shortest-path tree, and the first hop on each path becomes the routing-table entry.

The standard analogy: Imagine you have a full road atlas — every road, every town, every distance marked. You can sit down with the atlas, look at where you are and where you want to go, and calculate the best route yourself, without trusting anybody else. That's link-state: you have the full map, and you compute paths independently.

The Fundamental Difference

The deepest difference between the two families isn't about algorithms or messages — it's about trust.

  • Distance-vector routers trust their neighbors. A neighbor says "I can reach X at distance 3," and the router believes them. There's no way to verify, no independent view of the network. This is sometimes called "routing by rumor" — each router knows only what its neighbors tell it.

  • Link-state routers trust nobody, calculate everything themselves. Each router collects raw topology information (LSAs) from every other router, builds the full map independently, and runs SPF on its own. No router needs to trust another router's routing decisions — only their raw link information.

This single difference explains almost everything else about how the two families behave: why link-state converges faster, why distance-vector is prone to loops, why link-state requires more CPU and memory, why distance-vector is simpler to configure. All of those follow from this one design choice.

Part 4 — Neighbor Relationships

How Routers Find Each Other

Both distance-vector and link-state protocols share one foundational mechanic: routers need to find their neighbors before they can exchange any routing information. The mechanism for this is called forming a neighbor relationship or establishing an adjacency.

Hello Packets

When a routing protocol starts on an interface, the router begins sending small periodic messages called Hello packets out that interface. The Hello packet says, in essence: "Hi, I'm running OSPF (or RIP, or EIGRP) on this interface, here are my parameters."

When another router on the same segment is also running the same protocol, it sees the Hello, recognizes a potential neighbor, and responds with its own Hello. The two routers compare parameters (Hello/Dead intervals, area ID, authentication, K-values, etc.), and if everything matches, they form an adjacency — a confirmed neighbor relationship.

Once the adjacency is established, the routers can start exchanging actual routing information. For distance-vector, that's the routing table at the next update interval. For link-state, that's the LSDB synchronization process.

Default Timer Values

For OSPF on Ethernet networks:

  • Hello interval = 10 seconds (a Hello is sent every 10 seconds)

  • Dead interval = 40 seconds (4× Hello)

For OSPF on slower network types (non-broadcast, NBMA):

  • Hello interval = 30 seconds

  • Dead interval = 120 seconds

These defaults can be tuned, and they have a major effect on convergence speed (which Part 16 covers in detail).

The Dead Interval

What happens if Hellos stop arriving? Each router tracks the time since the last Hello from each neighbor. If no Hello arrives within the dead interval, the router declares the neighbor down and removes all routes learned through it from the routing table.

The dead interval is what makes routing protocols adaptive. The moment a neighbor is declared dead, the router knows it needs to recompute paths. This is also why the dead interval is configurable — shorter intervals mean faster failure detection, but more sensitivity to brief network blips.

What Stops an Adjacency From Forming

A common troubleshooting scenario: two routers are physically connected and both have the routing protocol configured, but they refuse to become neighbors. The reasons fall into a small set:

  • Mismatched timers (one router has Hello = 10s, the other has Hello = 30s)

  • Mismatched area ID (in OSPF, both routers must agree which area the interface is in)

  • Mismatched authentication (one has authentication configured, the other doesn't, or the keys differ)

  • Mismatched K-values (for EIGRP — both ends must use the same K-value configuration)

  • Mismatched network type (broadcast vs. point-to-point, etc.)

Any one of these prevents adjacency. The protocol won't try to work around it — it just won't form the relationship, and no routing information flows.

Part 5 — Bellman-Ford in Action

How Distance-Vector Actually Works

Now we get into the mechanics. Distance-vector protocols use an algorithm called Bellman-Ford to propagate routing information across the network.

The Process

Each router follows the same simple procedure:

  1. Start — at startup, each router knows only the networks directly connected to its own interfaces, with distance 0 (or 1, depending on convention).

  2. Send — periodically, each router sends its entire routing table to each of its directly connected neighbors.

  3. Receive — when a router receives an update from a neighbor, it processes each entry: if the neighbor's distance to a destination, plus the cost to reach that neighbor, is better than what the router currently has, the router updates its table.

  4. Repeat — at every update interval, this happens again. Eventually, every router learns about every destination in the network.

Worked Example — A Four-Router Chain

Let's trace this on a four-router chain: A — B — C — D, where each router is directly connected to one network (Net-A, Net-B, Net-C, Net-D respectively).

At startup, each router knows only its own connected network:

Round

What Router A Knows

Initial (round 0)

Net-A (0 hops) only



After the first update round, each router sends its table to its neighbor:

Round

What Router A Knows

After round 1

Net-A (0), Net-B (1 via B)



In this round, A learned about Net-B because B advertised it. But A didn't yet learn about Net-C or Net-D, because B itself didn't know about them yet.

After the second update round, B has now learned about Net-C from C, so B advertises it to A:

Round

What Router A Knows

After round 2

Net-A (0), Net-B (1), Net-C (2 via B)



After the third round, C has learned about Net-D from D, then advertised it to B, which now advertises it to A:

Round

What Router A Knows

After round 3

Net-A (0), Net-B (1), Net-C (2), Net-D (3 via B) — fully converged



A took three update rounds to learn about Net-D, because Net-D is three hops away and information propagates one hop per update interval. This is the fundamental nature of distance-vector: routes spread one hop per update.

Why This Matters

In a 4-router chain, this is fast — three rounds, maybe 90 seconds with RIP's 30-second update interval. But in a 20-router network, the farthest destination might take 19 rounds to propagate. With RIP's defaults, that's nearly 10 minutes of convergence after every topology change.

This slowness is the fundamental limitation of distance-vector. It's what makes RIP unsuitable for anything beyond very small networks, and it's the problem link-state was designed to solve.

Part 6 — Periodic Updates

The Other Source of Slowness

Distance-vector protocols have a second mechanic that makes them slower than they need to be: periodic updates.

What RIP Does

A RIP router sends its entire routing table out every active interface every 30 seconds. This serves two purposes:

  1. Propagate new route information — any changes get carried forward to neighbors at the next interval.

  2. Confirm the sender is still alive — neighbors keep their RIP-learned routes only as long as they keep hearing updates. If updates stop arriving, the routes time out.

So the 30-second interval serves both routing and keepalive purposes — efficient in one sense, but also problematic.

The Downside

Periodic updates consume bandwidth and CPU. On a small network, that's fine. On a network with thousands of routes, sending the full table every 30 seconds wastes substantial bandwidth.

More importantly, periodic updates are a source of slow convergence. If a link fails at second 1, and the next periodic update is at second 30, the failure isn't propagated until second 30 — even though the failure is known immediately. That's 29 seconds of "I know there's a problem, but I haven't told anyone yet."

How EIGRP Improved On This

EIGRP — Cisco's enhanced distance-vector protocol — solved the periodic update problem by getting rid of it. EIGRP sends updates only when something changes, called triggered updates. When a link fails, the affected routers immediately send updates to their neighbors describing the failure. No waiting for the next interval.

This is a huge improvement, but it has a trade-off: without periodic updates, you need a separate mechanism to confirm neighbors are still alive. EIGRP uses Hello packets (every 5 seconds by default on Ethernet) for this purpose. The Hellos are small and frequent enough to detect failures quickly, while the routing updates themselves are only sent when needed.

This combination — Hellos for liveness, triggered updates for changes — became the standard pattern in modern routing protocols. OSPF and IS-IS use the same idea. RIP's 30-second-full-table-broadcast model is essentially obsolete.

Stage III — Distance-Vector Failure Modes

Distance-vector's most famous problem: routing loops. This stage walks through how loops form, what "count to infinity" means, and the four mechanisms used to prevent or contain loops.

Part 7 — How Loops Form

How Routing Loops Form in Distance-Vector

Now we come to the most important — and infamous — failure mode of distance-vector protocols. Understanding why loops form is essential to understanding the loop-prevention mechanisms we'll cover next.

The Setup

Consider three routers in a line: A — B — C. Router A is connected to Net-X. After convergence:

  • A's table: Net-X is directly connected (0 hops)

  • B's table: Net-X is 1 hop away, via A

  • C's table: Net-X is 2 hops away, via B

Everything's fine. Now Net-X fails — the interface on A connected to Net-X goes down.

The Loop Forms — Time Step by Time Step

T=0 — A detects the failure and removes Net-X from its routing table. But A hasn't told B yet.

Router

What it has for Net-X

A

Unreachable

B

1 hop via A (stale — A hasn't sent a triggered update)

C

2 hops via B (stale)



T=1 — Before A can send its update, B's periodic update timer fires. B sends its table to C (and A). B's table still says "Net-X: 1 hop via A" because B doesn't know about the failure yet. C receives this and just maintains its existing entry.

T=2 — A receives B's update saying "I can reach Net-X in 1 hop." A trusts the update — that's how distance-vector works — and concludes: "Oh, B has a path to Net-X. I'll use B. Cost = B's distance (1) + cost to B (1) = 2."

Router

What it has for Net-X

A

2 hops via B (WRONG)

B

1 hop via A

C

2 hops via B



A now has a route to Net-X via B. But B's route to Net-X goes via A. That's the loop. A sends packets for Net-X to B. B sends them back to A. A sends them back to B. Forever.

T=3 — B's next periodic update goes out. A's table now says "Net-X: 2 hops via B." B receives this and updates: "I learned this from A, so the distance via A is now 3 hops."

Router

What it has for Net-X

A

2 hops via B

B

3 hops via A (WRONG, increased)

C

2 hops via B



And the pattern continues. With each update round, the hop count climbs as the loop persists.

Why It Forms

The root cause is exactly what makes distance-vector vulnerable: each router trusts what its neighbor says, with no independent view. B says "I can reach Net-X" and A believes it, even though B's path to Net-X actually goes through A itself. Neither router has the full topology, so neither can detect the circular dependency.

A Side Note Worth Knowing — Why TTL Helps

A Layer 3 loop is bad, but it's less catastrophic than a Layer 2 loop (the kind covered in Day 4's STP material). The reason is the TTL (Time To Live) field in every IP packet — every router that forwards a packet decrements TTL by 1, and when TTL hits 0, the packet is dropped.

So even when a loop forms in distance-vector, each individual packet only loops a limited number of times before being killed by TTL expiration. Routers also send an ICMP Time Exceeded message back to the source when this happens, alerting the sender. The loop still causes packet loss and the destination is unreachable, but the network as a whole isn't destroyed by an infinite traffic storm the way an L2 loop would do.

(This is also exactly the mechanism the traceroute command exploits intentionally — sending packets with deliberately low TTL to see which router drops them.)

Note: This Example Assumes Split Horizon Is Off

A careful reader will spot this: in real RIP with default settings, the loop above wouldn't fully form in this exact way, because of one of the loop-prevention mechanisms we'll cover next — split horizon. Split horizon would have stopped B from advertising Net-X back toward A in the first place.

The example above is the classic textbook walkthrough showing why split horizon was invented. It illustrates what would happen without it. The next part covers split horizon and the other mechanisms that exist precisely to prevent this exact scenario.

Part 8 — Count to Infinity

When Loops Drag On

The previous example showed how a loop forms. Now let's see what happens if nothing prevents the metric from climbing.

The Problem

Once a loop forms, the metric to the failed network keeps growing with each update round:

Round

A → Net-X

B → Net-X

0 (failure)

Unreachable

1 (stale)

1

2 (via B)

3 (via A)

2

4 (via B)

5 (via A)

3

6 (via B)

7 (via A)

...

...

...

8

16 (infinity)

16 (infinity)



This is called count to infinity — the metric grows toward "infinity" with each update cycle.

Why "Infinity" Matters

If there's no upper limit on the metric, the routers will count up forever. The loop will persist indefinitely, and the network has no way to break out.

This is why RIP defines "infinity" as 16 hops. Any route with a hop count of 16 or higher is considered unreachable. So even if a loop forms, the metric eventually hits 16, the route is marked unreachable, and the loop ends.

But this takes time. With RIP's 30-second update interval, going from a metric of 2 to a metric of 16 takes 7 update rounds — roughly 3.5 minutes. That's 3.5 minutes during which traffic for Net-X is looping uselessly between A and B, causing packet loss, and possibly congesting the link with looped traffic.

Count to infinity is bounded — it doesn't run forever — but the convergence delay it causes is unacceptable in modern networks.

Part 9 — Loop Prevention

Split Horizon, Route Poisoning, Holddown, Triggered Updates

Distance-vector protocols use several techniques to prevent or contain loops. No single mechanism solves the loop problem on its own — they're meant to be used together as a defense in depth.

Mechanism 1 — Split Horizon

The rule: "Do not advertise a route back out the interface you learned it from."

In our earlier example, B learned about Net-X from A on B's interface to A. With split horizon, B will not advertise Net-X back out that same interface — i.e., it won't tell A about a route A taught it in the first place.

This prevents the immediate loop. A never receives "B has a path to Net-X," because B never says it. When A's interface to Net-X goes down, A correctly concludes Net-X is unreachable, and stays that way.

Split horizon is enabled by default on RIP. It's the single most important loop-prevention mechanism in distance-vector protocols.

Mechanism 2 — Route Poisoning

The rule: "When you detect a failure, immediately advertise the failed route with a metric of 'infinity.'"

When A detects that Net-X is down, instead of just silently removing the route, A immediately sends an update to its neighbors saying "Net-X is now at metric 16 (unreachable)." This is called poisoning the route.

B receives this poisoned route and immediately knows Net-X is unreachable through A. B then poisons the route in its own table and sends a poisoned update to C. The "unreachable" information propagates rapidly through the network.

Route poisoning works with split horizon — split horizon prevents new loops from forming, and route poisoning rapidly tells everyone about a failure.

Mechanism 3 — Holddown Timer

The rule: "After marking a route as unreachable, ignore any updates about that route for a fixed period."

This addresses a specific timing problem. Suppose A poisons Net-X. The poison propagates to B and C. But what if a slow update — sent before the failure but not yet received — arrives at C saying "Net-X is 2 hops away via B"? Without protection, C might update its table and try to use that stale information, restarting the loop.

The holddown timer prevents this. After C learns that Net-X is unreachable, it ignores any updates trying to re-introduce Net-X for some period (RIP default is 180 seconds). This gives the "unreachable" information time to propagate fully across the network before any new positive information is accepted.

The trade-off: if Net-X actually does come back up during the holddown period, that good news is also ignored until the timer expires. Faster failure handling, slower recovery.

Mechanism 4 — Triggered Updates

The rule: "On any change to your routing table, send an update immediately — don't wait for the next periodic interval."

This is the same idea EIGRP used to eliminate periodic updates entirely. When something changes (a route fails, a route comes back, a metric changes), affected routers send an update right away. This drastically reduces convergence time.

RIP added triggered updates as an enhancement (modern RIP implementations include them by default). They don't replace periodic updates in RIP, but they supplement them — providing immediate notification of changes between the normal 30-second cycles.

Why None of These Alone Is Enough

Each mechanism addresses one aspect of the loop problem:

  • Split horizon prevents the most obvious loops but doesn't handle more complex topologies (e.g., a triangle of three routers where the path could "double back" through a different neighbor).

  • Route poisoning announces failures quickly but doesn't prevent the cause of loops; it just speeds up recovery.

  • Holddown timers prevent flapping but introduce delays.

  • Triggered updates speed everything up but don't prevent loops themselves.

Combined, they make distance-vector workable. But even with all four mechanisms in place, distance-vector still converges slower than link-state, and the residual risk of brief loops never fully goes away. This is why OSPF replaced RIP in most networks — link-state's design eliminates the underlying cause of loops, rather than patching the symptoms.

Stage IV — Link-State

The second protocol family — fundamentally different from distance-vector. This stage covers the conceptual approach, the LSA flooding mechanism, and Dijkstra's SPF algorithm.

Part 10 — A Different Approach

Link-State Concepts

Link-state protocols (OSPF, IS-IS) solve the routing problem in a completely different way than distance-vector. They don't have loop problems, they converge much faster, and they scale to much larger networks — but they're also more complex to operate.

The Core Idea

Where distance-vector says "I trust my neighbors," link-state says: "I trust nobody. I'll gather raw facts from every router in the network and compute everything myself."

Specifically, each link-state router:

  1. Discovers its directly connected neighbors (via Hello packets, as in Part 4).

  2. Advertises the state of each of its own links — what neighbors are on it, what the link cost is.

  3. Receives similar advertisements from every other router in the network.

  4. Builds an internal database (the LSDB) of every link in the network — a complete topology map.

  5. Computes the shortest path from itself to every destination, using Dijkstra's SPF algorithm, based on its own copy of the LSDB.

Every router in the same area ends up with an identical LSDB — the same view of the entire network. From that identical view, each router independently computes its own shortest-path tree to every destination.

LSAs and the LSDB

The advertisement messages are called Link-State Advertisements (LSAs). Each LSA contains structured information about one router's connectivity. Multiple LSAs from multiple routers are flooded through the network until every router has every LSA, and the collected LSAs form the Link-State Database (LSDB).

The LSDB is the complete map of the network. As long as the LSDB on every router is identical (and that's the protocol's job — to make sure it is), every router will independently compute the same shortest paths. No two routers will disagree on the topology, which is exactly what causes distance-vector loops.

LSA Fields — What's in an Advertisement

Each LSA contains specific structured information. The key fields:

Field

Content

Router ID

A unique identifier of the advertising router. Usually a loopback IP. Each LSA is tied to the router that originated it.

Sequence Number

A monotonically increasing number. Higher = newer version of the LSA. When a router receives an LSA with a sequence number higher than the one in its LSDB, it updates and re-floods. Lower or equal = ignore.

Age

A timer in seconds, starting from 0 when the LSA is generated. When age reaches MaxAge (3600 seconds in OSPF), the LSA expires and is removed from the LSDB. The originator re-floods a refreshed LSA (with a new sequence number) every 30 minutes to prevent expiration.

Neighbor List

The router IDs of all adjacencies on each of this router's interfaces. Tells receivers who this router is connected to.

Link Cost

The metric of each connected link. Used by SPF to compute path costs.



These fields together let every router build an accurate picture of every other router's connectivity. Combine all the LSAs in the LSDB and you have a complete graph of the network: nodes (routers), edges (links), and edge weights (costs).

Why This Design Resists Loops

In distance-vector, each router decides what's best based on what its neighbors tell it about paths. If a neighbor's path information is stale or wrong, the router makes wrong decisions.

In link-state, each router decides what's best based on the topology. The topology is the same on every router. Path computation is deterministic — given the same LSDB, every router using the same algorithm arrives at the same result. There's no rumor, no trust, no propagation delay between adjacent routers making decisions on outdated information. Either the LSDB is consistent (in which case all routers agree) or it's in the process of being synchronized (in which case the protocol explicitly handles the transition).

This is why link-state doesn't have loop problems after convergence. Loops in link-state can only happen during the brief synchronization window after a topology change — and the synchronization is fast because LSAs are flooded immediately, not propagated hop-by-hop over update intervals.

Part 11 — LSA Flooding

Building the LSDB

OK — every link-state router needs an identical copy of every other router's LSA. How does that happen?

The Flooding Process

Link-state uses flooding — a deliberate, reliable propagation of LSAs through the network so every router gets every LSA.

The six steps of LSA flooding:

  1. Generated — A router's link comes up (or changes state). The router generates an LSA describing the change, with the cost of the link and its neighbors.

  2. Flood — The router sends the LSA out all interfaces to all directly connected neighbors.

  3. Forward — Each neighbor receives the LSA. It checks the sequence number against its LSDB. If the received LSA is newer (higher sequence number), the neighbor updates its LSDB and re-floods the LSA out all interfaces except the one it came in on.

  4. Suppress — If the LSA has the same or older sequence number than what's already in the LSDB, the receiver silently discards it (no need to re-flood — it's old news).

  5. Acknowledge — Each receiver acknowledges receipt so the sender knows the flood was successful. If no acknowledgment is received within a short time, the LSA is retransmitted.

  6. Complete — Eventually, every router in the area has received the LSA. The LSDBs are identical again.

Why This Works

The combination of "flood out all interfaces except where it came in" plus "ignore duplicates by checking sequence numbers" guarantees that the LSA reaches every router quickly and exactly once per router. There's no risk of LSAs circulating forever because the sequence-number check stops duplicates immediately.

This is why link-state converges fast: an LSA can reach every router in a network in milliseconds (limited only by network propagation delays), not minutes (limited by periodic update intervals).

After Flooding, SPF Runs

Once every router has the new LSA, each router updates its LSDB and runs SPF on its own copy. Because the LSDBs are identical, every router computes the same shortest-path tree from its own root. Every router updates its routing table accordingly, and the network is converged.

This whole process — flood, update LSDB, run SPF, install new routes — typically completes in seconds for a topology change. Compare that with distance-vector's minutes.

Part 12 — Dijkstra SPF

Dijkstra's SPF Algorithm in Plain Steps

Each link-state router needs to compute, from its own copy of the LSDB, the shortest path to every other router in the network. The algorithm it uses is Dijkstra's Shortest Path First (SPF).

The Algorithm in Five Steps

  1. Start — Place the local router (yourself) into the "confirmed" set with cost 0. All other routers are in the "candidate" set with cost infinity (unknown).

  2. Examine neighbors of the most recently confirmed router — For each neighbor of the latest router added to the confirmed set, compute the total cost: (the cost from yourself to the most recently confirmed router) + (the link cost from that router to its neighbor). If this is less than the candidate's current best, update it.

  3. Select best candidate — Find the candidate with the lowest cost in the candidate set. Move it from candidates to confirmed.

  4. Repeat — Go back to step 2 with the newly confirmed router. Continue until every router in the LSDB has been moved to confirmed.

  5. Result — A complete shortest-path tree from yourself to every destination. The first hop on each path becomes the routing-table entry for that destination.

Why It Works

The clever part of Dijkstra is in step 3 — "select the lowest-cost candidate." This guarantees that once a router is moved to confirmed, no future iteration could find a shorter path to it. This is because any other path would have to go through some not-yet-confirmed router with a higher current cost, and adding any non-negative edge can only increase the cost.

The algorithm runs in time proportional to roughly O(N log N) where N is the number of routers. On modern hardware, computing SPF on an LSDB with thousands of routers takes milliseconds.

Why Identical LSDBs Are the Key Insight

Here's the most important conceptual point about link-state, worth landing for participants:

All LSDBs in the area are identical. Therefore, every router runs the same algorithm on the same input, and arrives at the same shortest-path tree from its own root. No two routers disagree on the best path.

This is exactly what causes distance-vector loops — routers disagreeing on the topology because each only sees what its neighbors tell it. Link-state eliminates that disagreement by design: identical inputs + identical algorithm = identical conclusions.

Stage V — Comparing the Two

Both families have now been covered in their own terms. This stage compares them directly, criterion by criterion, and gives practical guidance on which one fits which network.

Part 13 — Side by Side

Distance-Vector vs. Link-State Comparison

Now that we've covered both families in detail, here's the full comparison across the key design criteria.

Criterion

Distance-Vector

Link-State

What it stores

Routing table only — no topology view

Full topology map (LSDB)

What it shares

Complete routing table, to neighbors only

LSAs of own links, flooded to entire area

Algorithm

Bellman-Ford (distributed across routers)

Dijkstra SPF (local, run independently by each router)

Convergence speed

Slow — one hop per update interval

Fast — flooding + local SPF, typically seconds

Routing loops

Possible; require multiple prevention mechanisms (split horizon, poison reverse, holddown, triggered updates)

Not possible after convergence; only possible during the brief LSDB sync window

CPU / memory

Low — just the routing table

Higher — LSDB plus SPF computation on every change

Bandwidth (steady state)

Higher — full table sent at each interval

Lower — LSAs sent only on change

Scalability

Poor — works for small networks, breaks down past tens of routers

Good — areas limit LSDB size and SPF scope

Configuration complexity

Simple — enable on interfaces, done

More complex — areas, router IDs, costs, possibly authentication

Typical protocols

RIP v1/v2 (legacy), EIGRP (Cisco's modern hybrid)

OSPF, IS-IS



A Quick Note on EIGRP

You'll notice EIGRP is listed under distance-vector in the comparison, but in practice it's often called a hybrid or advanced distance-vector protocol. EIGRP uses distance-vector's basic approach (each router shares its routing table, not a topology map) but adds many of link-state's benefits — triggered updates, fast convergence (using the DUAL algorithm to pre-compute backup paths), and loop prevention.

For exam-and-conceptual purposes, EIGRP is a distance-vector protocol. For real-world behavior, it sits between traditional DV and LS.

Part 14 — Which to Use When

Protocol Placement

The comparison above gives you the theoretical differences. Here's how those translate into practical decisions about which protocol class fits which network.

Scenario

Recommended

Why

2–3 routers, single site, stable

Distance-vector or even static

Simplicity outweighs convergence speed at this scale. Setting up OSPF for 3 routers is overengineering.

Enterprise campus, 10–200 routers

Link-state (OSPF)

Fast convergence is essential. Area design keeps the LSDB and SPF computation manageable. Loop-free by design.

Large ISP core network

Link-state (IS-IS or OSPF)

IS-IS handles very large topologies elegantly; OSPF is equally common in enterprises that grew large.

Between organizations or ISPs

Path-vector (BGP)

Different domain entirely — policy control and scalability between ASes. Covered in Day 5 Supplement B Part 12.



The Rule of Thumb

For most organizations, anything beyond a handful of routers benefits from OSPF. Distance-vector (especially RIP) is largely obsolete in modern networks — the only place you'll find it is on very small networks, legacy systems, or as a teaching example.

EIGRP remains common in Cisco-heavy networks because it's easy to configure and converges quickly, but it's Cisco-proprietary (though now an open standard via RFC 7868, it's still primarily implemented by Cisco), which limits its use in multi-vendor environments.

OSPF is the default choice for enterprise IGPs, full stop. Day 7 and beyond will likely focus heavily on it for that reason.

Stage VI — Convergence

The word "convergence" has appeared throughout the document. This stage pins it down precisely — the six phases after a failure, what affects convergence speed, and BFD.

Part 15 — Six Phases

Convergence: What It Means and Why It Matters

We've used the word "convergence" throughout this document. Now let's pin it down precisely.

Definition

Convergence is the state where every router in the routing domain has a consistent, accurate, and loop-free view of the network. After a change — a link failure, a new router coming online, a metric adjustment — the network is converged when all routers have updated their routing tables to reflect the new reality and are forwarding traffic correctly.

Convergence isn't binary, exactly — there's a process of converging that takes time. During that process, traffic may be dropped, looped, or sent down suboptimal paths. The faster the network converges, the less impact a topology change has on users.

The Six Phases of Convergence After a Link Failure

When a link in the network fails, the convergence process happens in six distinct phases:

Phase

What Happens

Time Scale

1. Failure detection

The physical link goes down (instantly detectable in most cases) — or the routing-protocol Hello dead-timer expires — or sub-second detection via BFD (see below).

Milliseconds (physical) → tens of seconds (timer-based)

2. Local table update

The router with the failed interface removes affected routes from its routing table and installs any local alternates.

Milliseconds

3. Notify neighbors

DV: send triggered/periodic poison update. LS: flood a new LSA describing the change.

Immediate to seconds (LS) → up to 30 seconds (DV waiting for next interval)

4. Propagation

The information spreads across the network. LS: flooding completes in seconds. DV: one hop per update interval.

Seconds (LS) → minutes (DV)

5. Recalculation

LS: each router runs SPF on its updated LSDB. DV: each router applies Bellman-Ford on the new update. Install new routes.

Milliseconds (SPF) → seconds (Bellman-Ford on large networks)

6. Converged

All tables are consistent. Traffic flows on the new best paths.

Sub-second (LS + BFD) → several minutes (DV)



Why Convergence Time Matters in Practice

A 30-second convergence delay might sound tolerable, but the impact depends heavily on what's running on the network:

  • A file download — 30 seconds of paused traffic, then it resumes. Annoying but recoverable.

  • A VoIP call — 30 seconds of silence. The call is effectively dropped. Customer is on the phone yelling.

  • A trading system — 30 seconds of unprocessed market data. Potentially major financial losses.

  • A video stream — 30 seconds of buffering. Users abandon the stream.

In modern enterprise and service-provider networks, sub-second convergence is the design goal. That's why link-state protocols (OSPF, IS-IS) are preferred over distance-vector, and why mechanisms like BFD exist to detect failures faster than routing-protocol timers can.

A precision note about OSPF convergence times: You'll often see claims like "OSPF converges in under 1 second." This is achievable — but it requires either BFD or aggressive Hello/Dead timer tuning. With OSPF's default timers (Hello 10s, Dead 40s) and no BFD, convergence after a non-physical failure can take 40+ seconds just for failure detection, plus SPF time. Fast convergence in OSPF doesn't happen automatically; it's a design goal that requires deliberate configuration.

Part 16 — Optimization

What Affects Convergence, and How to Optimize It

The six phases above each have their own bottleneck. Optimizing convergence means addressing each one.

Factor

Effect on Convergence

How to Optimize

Failure detection speed

Physical link-down is fastest (microseconds). Hello-timer detection takes 10–40 seconds with defaults. BFD detects in milliseconds.

Use BFD on critical links. Tune Hello and Dead timers if BFD isn't available.

Protocol type

Link-state converges far faster than distance-vector. OSPF with tuned timers converges in well under 1 second.

Use OSPF or IS-IS where convergence speed matters.

Network size

Larger networks take longer for LSAs to flood and longer for SPF to compute.

Design OSPF areas to limit LSDB size. Keep area 0 (the backbone) lean.

SPF throttle timers

Running SPF on every single LSA wastes CPU on unstable networks. Modern OSPF uses throttling — SPF runs briefly after the first change, then waits longer between subsequent runs.

Configure with a short initial delay (e.g., 50ms) and exponential back-off.

Update interval (for DV)

RIP's 30-second update interval is the dominant source of slow DV convergence.

Use triggered updates (modern RIP). Better: use a protocol without periodic updates (EIGRP, OSPF, IS-IS).



What Is BFD?

The slides mention BFD as a key sub-second-detection mechanism, so it's worth explaining.

BFD — Bidirectional Forwarding Detection — is a separate protocol that runs alongside any routing protocol. Its only purpose is to detect link failures very quickly. It does this by sending small "are you alive?" packets between two endpoints at a configurable interval — typically 50 to 300 milliseconds. If the other end stops responding for a few consecutive intervals, BFD declares the path down and signals the routing protocol immediately.

Why use BFD instead of just tuning Hello/Dead timers down? Because:

  • BFD packets are small and lightweight, designed specifically for fast detection.

  • Tuning routing-protocol Hellos very low can destabilize the routing protocol itself.

  • BFD is protocol-agnostic — it works with OSPF, IS-IS, EIGRP, BGP, and even static routes.

In modern enterprise and service-provider networks, BFD is the standard way to achieve sub-second failure detection. Without it, you're limited by routing-protocol timers, which means several seconds at minimum.

Stage VII — Metrics and Decisions

The final stage. Every routing protocol has to decide "best path" — this stage covers how each protocol measures it, and how the router uses Administrative Distance and metric together to make its final routing decision.

Part 17 — Metrics

How Each Protocol Measures Cost

Every dynamic routing protocol has to answer one question: "Of all the paths to a destination, which one is best?" The answer depends on what the protocol measures, and how — that measurement is called the metric.

Lower metric wins. That's universal. But what the metric is differs by protocol, and metrics from different protocols are not comparable with each other.

The Four Protocols' Metrics

Protocol

Metric

How Calculated

Limitation

RIP

Hop count

Each router along the path adds 1. Max valid metric is 15; 16 = unreachable.

Completely ignores bandwidth — a path through 2 slow hops looks "better" than a path through 4 fast ones, even if the 4-hop path is 1000× faster.

OSPF

Cost

Reference bandwidth (default 100 Mbps) divided by the link's actual bandwidth. Costs are summed along the path.

With the default reference bandwidth, all links faster than 100 Mbps have cost 1 — so a 1 Gbps link and a 10 Gbps link look identical. Solution: raise the reference bandwidth (e.g., to 100,000 Mbps for 100 Gbps networks).

EIGRP

Composite

256 × [(10⁷ / minimum bandwidth) + sum of delays], by default. Combines bandwidth and delay. Other K-values (load, reliability, MTU) can be enabled but rarely are.

Complex formula. If K-values are mismatched between neighbors, the adjacency won't form.

IS-IS

Cost

Default 10 per interface (narrow metrics). Wide metrics make it bandwidth-aware.

Narrow metrics ignore bandwidth — configure wide metrics for modern networks.



Why Different Metrics Matter — A Worked Example

Suppose Router A has two paths to Network Z:

  • Path 1: A → B → Z, two hops, each link is 56 kbps (very slow)

  • Path 2: A → C → D → E → Z, four hops, each link is 1 Gbps (fast)

What does each protocol pick?

Protocol

Path 1 Cost

Path 2 Cost

Selected

RIP

2 hops

4 hops

Path 1 (RIP picks the path with fewer hops, ignoring that it's vastly slower)

OSPF (default ref BW 100 Mbps)

~1786 + ~1786 = ~3572

1 + 1 + 1 + 1 = 4

Path 2 (OSPF correctly picks the bandwidth-aware path)



RIP picks the slow path because it can't tell that 56 kbps is much slower than 1 Gbps — hops are all it sees. OSPF picks the fast path because its metric is bandwidth-aware. This is exactly why bandwidth-aware metrics are essential in real networks, and why RIP is unsuitable for any network where bandwidth varies meaningfully.

Part 18 — AD vs Metric

Administrative Distance vs. Metric: The Two-Stage Decision

This is the conceptual capstone of Day 6, and it builds directly on material we've already covered in detail.

Quick Recap (Already Covered in Day 5 Supplement C)

Day 5 Supplement C Part 5 covered Administrative Distance (AD) in detail — including the full AD table (Connected 0, Static 1, EIGRP 90, OSPF 110, IS-IS 115, RIP 120, EIGRP external 170, iBGP 200, unreachable 255). Refer back to that document for the values and the rationale.

What's new in Day 6 is the interaction between AD and metric — how they work together as a two-stage decision.

The Two-Stage Decision

When a router has to choose between routes for the same destination, it doesn't just compare metrics. It runs a two-stage decision:

Stage

Mechanism

Question Answered

1 — Choose the source

Administrative Distance

"Of all the protocols telling me about this network, which one do I trust the most?"

2 — Choose the path

Metric (specific to the chosen protocol)

"Within the routes I get from that trusted protocol, which path is best?"



Stage 1 runs first, and the loser's metric is never even considered. This is the critical insight.

A Concrete Example

Suppose a router learns about the network 10.5.0.0/16 from two sources:

  • OSPF: metric 20 (cost 20)

  • RIP: metric 3 (3 hops)

You might naively look at the metrics and say "RIP wins — 3 is less than 20." But that's wrong, because the metrics aren't comparable.

The router compares AD first:

  • OSPF AD = 110

  • RIP AD = 120

OSPF has lower AD, so OSPF wins. The OSPF route (metric 20) is installed in the routing table. The RIP route's metric of 3 is never even considered.

If RIP had been the only source, RIP would win by default — there'd be nothing else to compare against. But the moment OSPF is also offering the route, OSPF wins because it's trusted more — regardless of metric.

Why Metrics Aren't Comparable Across Protocols

This is worth understanding deeply. RIP's metric is hop count. OSPF's metric is cost based on bandwidth. EIGRP's metric is a composite of bandwidth and delay. These are different units measuring different things.

A RIP metric of 3 means "3 routers in the path." An OSPF metric of 20 means "the sum of bandwidth-based costs of all the links."

There's no meaningful way to compare these numbers. Saying "3 < 20, so RIP wins" is like saying "3 inches < 20 millimeters, so 3 inches wins" — wrong, because you're comparing different units.

That's exactly why AD exists. Since metrics from different protocols aren't comparable, the router needs a separate way to decide between protocols. AD is that mechanism. Within one protocol, metrics are comparable (apples to apples), and the lowest metric wins. That's the second-stage decision.

Equal-Cost Paths: ECMP

There's a third stage worth covering: what if two routes come from the same protocol, with the same metric? They're tied across both stages.

In this case, the router installs both routes and load-balances traffic across them. This is called Equal-Cost Multi-Path (ECMP).

ECMP is automatic in most dynamic protocols — if OSPF sees two paths with the same cost to the same destination, both go into the routing table. The router alternates between them on a per-flow or per-packet basis (depending on the platform). This is how networks gain throughput by adding parallel links.

ECMP is also why "redundant links between the same two routers" actually adds capacity, not just failover. Without ECMP, only one of the parallel links would carry traffic.

Key Takeaway for Participants

"A routing protocol does not replace the routing table — it builds it. The protocol's whole job is to exchange information with other routers and produce the entries that the routing table then uses to forward packets. Routing tables and routing protocols are two different things, working together."
"There are only two ways to design an IGP. Distance-vector routers know just distance and direction, share their whole table with neighbors, and trust what they're told — which is simple but slow and loop-prone. Link-state routers know the full topology, share only their own links (flooded to everyone), and compute everything independently — which is faster, loop-resistant, and scalable, but more complex to operate. Modern enterprise networks use link-state (almost always OSPF) for exactly this reason."
"When multiple routes for the same destination come from different protocols, the router uses Administrative Distance to choose the source first — and the loser's metric is never even considered. Within one protocol's routes, the metric chooses the best path. Across protocols, metrics aren't comparable, which is exactly why AD exists. And when two routes from the same protocol have the same metric, the router uses both at once — that's ECMP."

Document History

Version

Date

Changes

v1.0

Previous

Initial published version. Covers all eight Day 6 lessons mapped to 18 parts: why dynamic routing exists and the AS/IGP/EGP organizational structure, the distance-vector and link-state families with neighbor relationships, distance-vector mechanics including Bellman-Ford propagation and periodic updates, distance-vector failure modes including routing loops, count-to-infinity, and the four loop-prevention mechanisms, link-state concepts including LSAs, the LSDB, LSA flooding, and Dijkstra's SPF algorithm, the side-by-side comparison and protocol placement guidance, convergence in detail with the six-phase process and the factors affecting it including BFD, per-protocol metrics with worked examples, and the two-stage AD-then-metric decision with ECMP for ties.

v1.1

Current

Renamed and restructured for navigation only — no content changes. Renamed from "Day 6 Supplement — Dynamic Routing Protocols" to "Day 6 — Supplement A: Dynamic Routing Protocols" under the new convention. Added Stage I–VII groupings above the existing 18 Parts so the document's arc is visible from the table of contents. Each Part now has a short reference label plus the original long teaching title as a subtitle. Cross-references updated: "Supplement 1 Part 2" → "Day 5 Supplement B Part 12" (for BGP); "Supplement 2 Part 5" → "Day 5 Supplement C Part 5" (for AD table).



Day 6 closes the dynamic routing material. Together with the Day 5 supplements (A, B, and C), this completes the story of how IP routing actually works — from the packet's perspective (Day 5 Supplements A and B), the routing table's perspective (Day 5 Supplement C), and the routing protocol's perspective (this supplement).


Rating
0 0

There are no comments for now.

to be the first to leave a comment.