What I learned about MSTP 802.1s

STP History

STP was invented by Radia Perlman and in 1990 802.1D STP was standardized by the IEEE. In 2004 it was supplemented by 802.1w rSTP (rapid STP). In 1998 802.1s MSTP was defined. This was 18 years ago at the time I’m writing this, and we haven’t gone very far from here yet.

However there are new technologies on the horizon; 802.1aq – shortest path bridging, Cisco FabricPath, Juniper Qfabric and TRILL (Radia Perlman is a contributor here too!)

Benefits of MSTP versus other options

The original IEEE STP has a single instance for all VLANs. This means that no L2 traffic engineering is possible.

pvSTP – Cisco proprietary Per-VLAN STP allows L2 traffic engineering. It uses a unique STP instance for each VLAN which while it is helpful does introduces scaling issues

MSTP traffic engineering

MST permits multiple STP instances, and each instance can control groups of VLANs. This allows L2 traffic engineering without the scaling issues of pvSTP.

Disadvantages of MSTP

Probably the most significant drawback is that it is more complex than STP or rSTP – or at least it CAN be more complex as it certainly doesn’t have to be.

With MSTP all switches in a Region must have the same VLAN to instance mapping. If you add a VLAN to an instance, it must be added to all switches in the Region.

If a switch does not have the same VLAN to instance mapping it will leave the Region. This isn’t necessarily awful, but it means that your careful traffic engineering will probably not be working the way you planned – which may be really really bad depending on your network.

Structure of MSTP

MSTP has four structural parts.

  • MST Region – defines what switches participate in this MSTP
  • Region Boundary – this is the edge between two MSTP Regions or between a MSTP region and another STP based network
  • CIST (Common and Internal Spanning Tree) – Used for interacting with other STP based networks
  • MSTI (Multiple Spanning Tree Instance) – rSTP instances that only exist within a MSTP Region

MSTP Region

This defines the switches that participate in MSTP. Within a region you may have multiple instances, which are just rSTP.

The switches at the edge of the Region mark the boundary of the Region, and as you’ll see later they run CIST for compatibility when facing other networks.

Each switch in the Region must agree on these parameters:

  • Region name (32 bytes)
  • Revision number (2 bytes)
  • A table that associates each VLAN to a particular MST Instance

Region Boundary

This is the logical edge of a Region, and is where MSTP presents CIST to any neighbouring networks. CIST can be a simple single STP or simulated pvSTP.

CIST (Common and Internal Spanning Tree)

This is a virtual STP presented to networks external to MSTP. The idea is to allow MSTP to coexist alongside other networks that may be running STP, rSTP, pvSTP or even other MSTP Regions that (for whatever reason) aren’t integrated with this one.

MSTP assumes a simple STP neighbouring network and presents the same, but if a pvSTP network is detected a Cisco switch may simulate pvSTP at the Region boundary.

To the neighbouring network the entire MSTP Region appears as a single switch, which has some interesting effects for L2 traffic engineering.

MSTI (Multiple Spanning Tree Instance)

Within an MSTP Region are simple rSTP instances it is these MSTI that allows an administrator to group VLANs. Two or more instances may be engineered in such a way as to make use of multiple uplink interfaces, and each MSTI retains all the features of rSTP.

Even if you have a thousand VLANs you can put them all into a single MSTI and avoid the resource constraints of some switches.

A simple MSTP only configuration

MSTP Only

Recommended Configuration for Interoperating other STP networks

MSTP Integration with pvSTP

  • Root Bridge is within the MSTP Region
  • Predefine VLAN Mapping for all VLANs – even those that don’t exist yet
  • Use VTPv3 if your VLAN to instance mapping is expected to change a lot. This can handle the configuration propagation as you move things around, which is important because any discrepancy of the VLAN to instance mapping will cause that switch to leave the Region
  • Configuration

    MSTP Root Switch
    spanning-tree mode mst
    spanning-tree mst configuration
    name VIRL
    revision 19
    instance 1 vlan 2-1999
    instance 2 vlan 2000-4094
    spanning-tree mst 0 priority 24576
    spanning-tree mst 1-2 priority 4096

    MSTP Member Switch
    spanning-tree mode mst
    spanning-tree mst configuration
    name VIRL
    revision 19
    instance 1 vlan 2-1999
    instance 2 vlan 2000-4094

    Alternative Configuration for Interoperating with pvSTP

    MSTP Integration with pvSTP

    This is not the recommended configuration, but with some careful effort it can be made to work.

    • Root Bridge is within the pvSTP network
    • A MSTP switch remains the root bridge for the ISTs within the Region (but this is not shared outside the Region)
    • Only one of the Region boundary ports can be active (all others are blocked).
    • If the Region is running PVSTP simulation, VLAN 1 must have a lower priority than VLANs 2-4094. If not then MST0 sets the port as Designated and not root – causing inconsistency.

    Configuration

    PVSTP Root Switch
    spanning-tree mode rapid-pvst
    spanning-tree vlan 1 priority 8192
    spanning-tree vlan 2-4094 priority 4096

    MSTP Root Switch
    spanning-tree mode mst
    spanning-tree mst configuration
    name VIRL
    revision 19
    instance 1 vlan 2-1999
    instance 2 vlan 2000-4094
    spanning-tree mst 1-2 priority 24576

    Do I need to configure MSTP at all?

    What happens if I just turn MSTP on for all switches and not join the domains? It will still work; each switch will manage its own single switch MSTP Region as every interconnection between one switch and another will be CIST (single spanning tree) which will elect its own root bridge.

    That will work but doesn’t deliver the L2 traffic engineering advantages of MSTP. It might make sense if you’re just trying to get loads of VLANs activated and pvSTP is hitting the hardware limits of your switches.

    Common Misconfiguration Avoidance

    • Do not keep VLANs in Instance 0 as it is used to control the CIST (instance facing STP networks outside this Region). VLANs are here by default but it can cause confusing behaviour if you use this so it is recommended to specify an instance >= 1.
    • VLAN 1 has to be in Instance 0 so don’t use VLAN 1 for network traffic.
    • Do not manually prune the trunk VLAN map – that is don’t use “switchport trunk allowed vlan remove 100-999” for your traffic engineering. If you do this in the wrong place you could get a situation where MSTP selects one path but you’ve broken it by removing the VLANs manually. It is recommended to use the MSTP topology to control L2 traffic engineering, and not use manual trunk pruning. Another way to look at this is that switchport trunk maps must match the instance mapping.

    Best Practices

    • Set the root bridge and secondary root bridge
    • Configure edge ports where applicable
    • If you’re interconnecting at L2 to a network you don’t control use BPDU filter
    • spanning-tree pathcost method long
    • Keep data VLANs out of Instance 0
    • VLAN 1 has to be in Instance 0 so don’t use VLAN 1 for network traffic
    • Some vendors will permit mapping reserved VLANs, and others will not
    • A mis-match will split the network into two regions

    Useful Links

    Understanding MSTP (cisco.com)

    PVSTP Simulation on MST Switches (cisco.com)

    Why PVSTP VLAN 1 must have a higher (worse) priority than VLANs 2-4094 when the root is outside of the MSTP Region (supportforums.cisco.com)

    INE’s Understanding MSTP (blog.ine.com)

    INE MSTP Tutorial: Inside a Region (blog.internetworkexpert.com)

    INE MSTP Tutorial: Outside a Region (blog.internetworkexpert.com)

All About Spanning Tree Protocol

Spanning Tree Protocol

I discussed STP in a couple of earlier articles, here and here, but I would like to go into a little more detail because I think this is really important.

Spanning Tree Protocol can help us design fault-tolerant networks in two ways; primarily by detecting and disabling port misconfigurations, and secondly by allowing administrators to build failover network links.

STP is a complicated protocol, and it comes with a suite of different applications that can help fine-tune the system. I highly recommend a network engineer studies the Cisco documentation on STP, and then builds a lab environment before deploying.

CIO.com has a fascinating article that describes how critical STP is, and how good network design can help eliminate the worst of these consequences.

STP Root

The first thing to remember about STP is to ensure that the root switch is defined properly. The root switch should be the most central switch on the network (not necessarily the most powerful), and it should be the least disturbed switch — this means the LAN core switches are often the best choice as a STP root.

STP can choose a root switch automatically, but unfortunately this is not usually what you want. In fact, one of STPs parameters is to select the switch with the lowest MAC address, which is usually the oldest switch on the network. That in itself wouldn’t be a problem, except that on many networks the oldest switches are pushed to the edge of the network.

STP Parameter Tuning

STP is designed with an average network in mind. But your network isn’t average — to get the best performance out of STP you will have to modify the parameters that make it work.

Here is a Cisco guide on this — Understanding and Tuning Spanning Tree Protocol Timers. Note Cisco’s caution note: If you make mistakes with tuning STP you risk “LAN meltdown”.

The simplest way to tune the STP timers is to set the STP diameter. The link above has a guide to help calculate the STP diameter of your network — but be wary while setting this parameter. If your network changes (future adds/moves) it may expand beyond the STP diameter that you have set and then bad things happen.

The diagram above duplicates what is in the Cisco document, but it identifies one of the calculations. You need to identify the worst-case scenario for the number of switches a packet has to cross — in this case it is CADBE, and DACBE; STP diameter here is 5.

My advice is to leave these parameters alone unless you have a good reason to make changes. The risk is high, and what is gained is a slightly more rapid recovery from a network failure — the long term consequence is that someone will have to remember that this was done.

Loop Protection

Loops in Layer 2 networks are very, very bad. The layer 2 header has no time-to-live value, so a looped frame can continue to loop forever. Add in some broadcast traffic and you have a recipe for disaster, Cisco calls it LAN meltdown.

A key element of STP is that it prevents loops — STP is designed to detect and resolve Layer 2 loops. It is not enough to run STP only on the core switches of a network (although it helps), to fully protect a LAN all switches must be running STP.

Redundant Connections

STP can be used to create failover links within the LAN, allowing the network administrator to design networks with multiple links between switches. Essentially the network administrator intentionally creates a loop, and allows STP to block one of the links. If the primary link were to fail, then STP would re-calculate the topology and would bring up the secondary link.

Host Port Configurations

Running STP on your network has some interesting side effects. It sure is great knowing that Layer 2 loops are automatically detected and mitigated, but sometimes running STP can make life difficult for users.

For example, when a port comes online STP does not trust the interface. That means it will listen to the port to see if any BDPUs come through, but it will not forward any traffic. The process from Blocking to Listening, to Learning and finally to Forwarding state can take up to a minute. Unfortunately, that means no traffic will pass the interface, which can cause some hosts that depend on DHCP for IP address assignment to fail. If you unplug and reconnect the network cable (or disable and enable the interface) the cycle starts again so that tried and true end-user approach to fixing network faults will not help. Thankfully most often a user can simply “Repair” the network connection in software, and the OS will make a successful DHCP transaction.

Cisco (and other network vendors) have created a clever solution for this problem. spanning-tree portfast disables STP on a port; so ports are always in a forwarding state.

Unfortunately, spanning-tree portfast leaves your network at risk from a user inadvertently connecting a single cable to two ports — creating a network loop that cannot be detected that can potentially eat up all the CPU processing on the switch. Cisco (and other network vendors) have come up with a solution for this too. spanning-­tree portfast bpduguard will automatically disable a port if a single BPDU frame is detected. This is not an intelligent protocol, so do NOT use spanning-tree portfast bpduguard on any switch uplinks as it will definitely shut those down as soon as it receives the first BPDU frame.

Usually when a port is shutdown, the user (or network administrator) will see the problem and resolve it. But if the port remains disabled, it requires an administrator to manually bring the port online. Cisco decided this is too much work for us network administrators, so they defined the errdisable recovery cause bpduguard command — this brings the port back online after 5 minutes (by default). If the loop still exists, it will be detected and the port will go offline again

Advanced Configurations

All of the below configurations are discussed in detail in the Cisco documentation on STP, but I will review them in brief.

Bpdufilter is an alternative to bpduguard. Like bpduguard it watches for incoming BPDU frames, but in addition it filters outbound BPDU frames (less traffic for hosts which discard these frames anyway), and if a BPDU frame is received on this port (a switching loop is created) then the port loses its portfast status and STP starts to monitor the port. There is some risk here in the event of a user looping two ports at their own desk, as the loop would not be detected.

Uplinkfast allows the network administrator to define redundant switch uplinks so they skip the Listening and Learning state in the event of a link failure. This allows the network to converge faster (Cisco says about 5 seconds) than it would have done otherwise.

Backbonefast allows switch backbone interfaces to detect and resolve indirect link failures faster than the normal STP timeouts would allow. Essentially, it allows a switch to detect a network link failure on a neighboring switch, and update its own topology very quickly.

Rootguard prevents a particular port from becoming a root port. This prevents a new switch connected to the edge of the network from becoming a root switch. Recall that the network administrator spends a lot of effort design the STP topology so the root switch is in a logical location — usually the core.

Loopguard detects a unidirectional link (usually a wiring fault) and moves the interface to the Listening state. This prevents STP calculation failures and neighboring switches will make incorrect assumptions about the network topology. There is another Layer 2 solution to this problem as well called udld which may provide more interoperability and flexibility than loopguard; Cisco has documentation on udld and loopguard here.

Rapid STP

Cisco has a white paper describing RSTP; the document is very concise and explains the differences between RSTP and STP quite clearly.

In short RSTP brings in the features of uplinkfast and backbonefast, (which were Cisco proprietary features in STP), updates the algorithm so that topology detection cascades across the network, instead of the slow plodding detection of STP, and finally it updates the timers so that convergence happens much faster.