What I learned about MSTP 802.1s

STP History

STP was invented by Radia Perlman and in 1990 802.1D STP was standardized by the IEEE. In 2004 it was supplemented by 802.1w rSTP (rapid STP). In 1998 802.1s MSTP was defined. This was 18 years ago at the time I’m writing this, and we haven’t gone very far from here yet.

However there are new technologies on the horizon; 802.1aq – shortest path bridging, Cisco FabricPath, Juniper Qfabric and TRILL (Radia Perlman is a contributor here too!)

Benefits of MSTP versus other options

The original IEEE STP has a single instance for all VLANs. This means that no L2 traffic engineering is possible.

pvSTP – Cisco proprietary Per-VLAN STP allows L2 traffic engineering. It uses a unique STP instance for each VLAN which while it is helpful does introduces scaling issues

MSTP traffic engineering

MST permits multiple STP instances, and each instance can control groups of VLANs. This allows L2 traffic engineering without the scaling issues of pvSTP.

Disadvantages of MSTP

Probably the most significant drawback is that it is more complex than STP or rSTP – or at least it CAN be more complex as it certainly doesn’t have to be.

With MSTP all switches in a Region must have the same VLAN to instance mapping. If you add a VLAN to an instance, it must be added to all switches in the Region.

If a switch does not have the same VLAN to instance mapping it will leave the Region. This isn’t necessarily awful, but it means that your careful traffic engineering will probably not be working the way you planned – which may be really really bad depending on your network.

Structure of MSTP

MSTP has four structural parts.

  • MST Region – defines what switches participate in this MSTP
  • Region Boundary – this is the edge between two MSTP Regions or between a MSTP region and another STP based network
  • CIST (Common and Internal Spanning Tree) – Used for interacting with other STP based networks
  • MSTI (Multiple Spanning Tree Instance) – rSTP instances that only exist within a MSTP Region

MSTP Region

This defines the switches that participate in MSTP. Within a region you may have multiple instances, which are just rSTP.

The switches at the edge of the Region mark the boundary of the Region, and as you’ll see later they run CIST for compatibility when facing other networks.

Each switch in the Region must agree on these parameters:

  • Region name (32 bytes)
  • Revision number (2 bytes)
  • A table that associates each VLAN to a particular MST Instance

Region Boundary

This is the logical edge of a Region, and is where MSTP presents CIST to any neighbouring networks. CIST can be a simple single STP or simulated pvSTP.

CIST (Common and Internal Spanning Tree)

This is a virtual STP presented to networks external to MSTP. The idea is to allow MSTP to coexist alongside other networks that may be running STP, rSTP, pvSTP or even other MSTP Regions that (for whatever reason) aren’t integrated with this one.

MSTP assumes a simple STP neighbouring network and presents the same, but if a pvSTP network is detected a Cisco switch may simulate pvSTP at the Region boundary.

To the neighbouring network the entire MSTP Region appears as a single switch, which has some interesting effects for L2 traffic engineering.

MSTI (Multiple Spanning Tree Instance)

Within an MSTP Region are simple rSTP instances it is these MSTI that allows an administrator to group VLANs. Two or more instances may be engineered in such a way as to make use of multiple uplink interfaces, and each MSTI retains all the features of rSTP.

Even if you have a thousand VLANs you can put them all into a single MSTI and avoid the resource constraints of some switches.

A simple MSTP only configuration

MSTP Only

Recommended Configuration for Interoperating other STP networks

MSTP Integration with pvSTP

  • Root Bridge is within the MSTP Region
  • Predefine VLAN Mapping for all VLANs – even those that don’t exist yet
  • Use VTPv3 if your VLAN to instance mapping is expected to change a lot. This can handle the configuration propagation as you move things around, which is important because any discrepancy of the VLAN to instance mapping will cause that switch to leave the Region
  • Configuration

    MSTP Root Switch
    spanning-tree mode mst
    spanning-tree mst configuration
    name VIRL
    revision 19
    instance 1 vlan 2-1999
    instance 2 vlan 2000-4094
    spanning-tree mst 0 priority 24576
    spanning-tree mst 1-2 priority 4096

    MSTP Member Switch
    spanning-tree mode mst
    spanning-tree mst configuration
    name VIRL
    revision 19
    instance 1 vlan 2-1999
    instance 2 vlan 2000-4094

    Alternative Configuration for Interoperating with pvSTP

    MSTP Integration with pvSTP

    This is not the recommended configuration, but with some careful effort it can be made to work.

    • Root Bridge is within the pvSTP network
    • A MSTP switch remains the root bridge for the ISTs within the Region (but this is not shared outside the Region)
    • Only one of the Region boundary ports can be active (all others are blocked).
    • If the Region is running PVSTP simulation, VLAN 1 must have a lower priority than VLANs 2-4094. If not then MST0 sets the port as Designated and not root – causing inconsistency.

    Configuration

    PVSTP Root Switch
    spanning-tree mode rapid-pvst
    spanning-tree vlan 1 priority 8192
    spanning-tree vlan 2-4094 priority 4096

    MSTP Root Switch
    spanning-tree mode mst
    spanning-tree mst configuration
    name VIRL
    revision 19
    instance 1 vlan 2-1999
    instance 2 vlan 2000-4094
    spanning-tree mst 1-2 priority 24576

    Do I need to configure MSTP at all?

    What happens if I just turn MSTP on for all switches and not join the domains? It will still work; each switch will manage its own single switch MSTP Region as every interconnection between one switch and another will be CIST (single spanning tree) which will elect its own root bridge.

    That will work but doesn’t deliver the L2 traffic engineering advantages of MSTP. It might make sense if you’re just trying to get loads of VLANs activated and pvSTP is hitting the hardware limits of your switches.

    Common Misconfiguration Avoidance

    • Do not keep VLANs in Instance 0 as it is used to control the CIST (instance facing STP networks outside this Region). VLANs are here by default but it can cause confusing behaviour if you use this so it is recommended to specify an instance >= 1.
    • VLAN 1 has to be in Instance 0 so don’t use VLAN 1 for network traffic.
    • Do not manually prune the trunk VLAN map – that is don’t use “switchport trunk allowed vlan remove 100-999” for your traffic engineering. If you do this in the wrong place you could get a situation where MSTP selects one path but you’ve broken it by removing the VLANs manually. It is recommended to use the MSTP topology to control L2 traffic engineering, and not use manual trunk pruning. Another way to look at this is that switchport trunk maps must match the instance mapping.

    Best Practices

    • Set the root bridge and secondary root bridge
    • Configure edge ports where applicable
    • If you’re interconnecting at L2 to a network you don’t control use BPDU filter
    • spanning-tree pathcost method long
    • Keep data VLANs out of Instance 0
    • VLAN 1 has to be in Instance 0 so don’t use VLAN 1 for network traffic
    • Some vendors will permit mapping reserved VLANs, and others will not
    • A mis-match will split the network into two regions

    Useful Links

    Understanding MSTP (cisco.com)

    PVSTP Simulation on MST Switches (cisco.com)

    Why PVSTP VLAN 1 must have a higher (worse) priority than VLANs 2-4094 when the root is outside of the MSTP Region (supportforums.cisco.com)

    INE’s Understanding MSTP (blog.ine.com)

    INE MSTP Tutorial: Inside a Region (blog.internetworkexpert.com)

    INE MSTP Tutorial: Outside a Region (blog.internetworkexpert.com)

(mis)Adventures with Spanning Tree

Introduction

I know I’ve gone on and on about STP (Spanning Tree Protocol) but something happened today that reminded me of what Tom Jacoby of IOSecure once said to me. I don’t remember the exact phrase, but the general idea was that STP was prone to code error and administrative error, and as much as even a properly configured STP configuration works, not very many people really understand it.

That said, I’ve put together some nice STP setups in my career. Compared to some of the other modern methods we have it is old tech to use it for fault tolerance, but there is something satisfying about building the configuration. Maybe it is because it is hard and complicated — I’m not even sure why I like it.

What is on my mind today was a client who is growing out of a bunch of 3500XLs into 2950 switches. Even the 2950s aren’t exactly new, but this is my most cost sensitive client so I’m a bit hamstrung on the hardware. The configuration has a 3750 (stack of two switches) for the core that provides routing for the network, and a flat network beyond that is pushed to 19 edge switches (3500XLs of varying code levels, and the newer 2950s) — no trunking just access ports to the edge switches, and everything is running STP with the 3750 stack as the root switch.

Now this isn’t a best practices network by any means, but the customer has a legacy configuration to migrate from and it may very well be with them for some time. Part of my job is to provide the guidance to move towards a scalable, stable and supportable network.

We had some problems setting up LACP on the 3500XLs (they only supported Cisco’s proprietary Etherchannel and in a very feature-poor way), but it was working great on the 2950s. My client was so impressed by the idea of having redundancy for the edge switches, that we created backup links for each edge switch (even on the 3500XLs) and let STP sort out the loops where we weren’t using Etherchannel. Some time passed, and things seemed like they were working fine until out of the blue all the 2950s stopped working.

Problem

The 2950s just dropped link. My client would connect a different port and it would light up, and then drop the link again — not even a light on the 3750 interface. I suspected some security profile issue, but as there wasn’t any of that configured it seemed like a long shot, so we agreed that my client would revert the 2950s to his stock of 3500XLs and figure out the 2950s later. Unfortunately the problems persisted through the next day so we decided to figure out why the 2950s failed.

The 3750 was reporting weird errors in its log:

Jul 4 22:40:31.438 UTC: %SW_MATM-4-MACFLAP_NOTIF: Host 0000.c6ca.9aff in vlan 1 is flapping between port Po19 and port Fa1/0/6

What the heck? MAC address flapping? There must be a loop somewhere. But because the 3750 wasn’t experiencing any actual CPU pain I left this and focused on the 2950s to see if I could get them working and back into production to save the day.

The 2950 console reported this:

[code]]czo4NzpcIiVQTS00LUVSUl9ESVNBQkxFOiBsb29wYmFjayBlcnJvciBkZXRlY3RlZCBvbiBHaTQvMSwgcHV0dGluZyBHaTQvMSBpbiB7WyYqJl19ZXJyLWRpc2FibGUgc3RhdGVcIjt7WyYqJl19[[/code]

Another weird error! Err-disable state and yet we haven’t configured any security policies configured on the interface — this error message led me to Cisco’s site and an explanation of how the 2950s handle L2 keepalives which is a default configuration (and so doesn’t show up in the configuration). Cisco’s interpretation is that the 2950s put out a keepalive to detect physical wiring loops, but will also trigger in the event of a STP loop anywhere on the network. You can disable the keepalive and get the switches to connect, but that doesn’t make the loop go away so I set to work on finding it.

Resolution

I went through each of the 3500XLs, and discovered two in which STP had not blocked the second link to the 3750. This is how I figured out which ones were the culprits:

  1. show cdp neighbors
  2. show spanning-tree (brief)

In the first switch, CDP showed the 3750 connected on ports 24 and 48, but STP didn’t show anything for port 48, so I had to assume that it was also forwarding even though it should show a FORWARDING state in that case. In the second switch, CDP showed the 3750 connected on ports 24 and 48, but this time STP actually showed both ports in the FORWARDING state.

To resolve the issue, I just killed the second link. As soon as I did that the MAC flap entries in the 3750 log stopped, and the 2950s were able to connect again.

Conclusion

The moral of this story is don’t put all your eggs in the STP basket — especially with very old switches and very old STP code. Maybe a better moral is don’t implement advanced features with old code — the 3500XLs are doing just fine switching but as soon as we asked them to do something out of the ordinary two of them fell on their faces.

All About Spanning Tree Protocol

Spanning Tree Protocol

I discussed STP in a couple of earlier articles, here and here, but I would like to go into a little more detail because I think this is really important.

Spanning Tree Protocol can help us design fault-tolerant networks in two ways; primarily by detecting and disabling port misconfigurations, and secondly by allowing administrators to build failover network links.

STP is a complicated protocol, and it comes with a suite of different applications that can help fine-tune the system. I highly recommend a network engineer studies the Cisco documentation on STP, and then builds a lab environment before deploying.

CIO.com has a fascinating article that describes how critical STP is, and how good network design can help eliminate the worst of these consequences.

STP Root

The first thing to remember about STP is to ensure that the root switch is defined properly. The root switch should be the most central switch on the network (not necessarily the most powerful), and it should be the least disturbed switch — this means the LAN core switches are often the best choice as a STP root.

STP can choose a root switch automatically, but unfortunately this is not usually what you want. In fact, one of STPs parameters is to select the switch with the lowest MAC address, which is usually the oldest switch on the network. That in itself wouldn’t be a problem, except that on many networks the oldest switches are pushed to the edge of the network.

STP Parameter Tuning

STP is designed with an average network in mind. But your network isn’t average — to get the best performance out of STP you will have to modify the parameters that make it work.

Here is a Cisco guide on this — Understanding and Tuning Spanning Tree Protocol Timers. Note Cisco’s caution note: If you make mistakes with tuning STP you risk “LAN meltdown”.

The simplest way to tune the STP timers is to set the STP diameter. The link above has a guide to help calculate the STP diameter of your network — but be wary while setting this parameter. If your network changes (future adds/moves) it may expand beyond the STP diameter that you have set and then bad things happen.

The diagram above duplicates what is in the Cisco document, but it identifies one of the calculations. You need to identify the worst-case scenario for the number of switches a packet has to cross — in this case it is CADBE, and DACBE; STP diameter here is 5.

My advice is to leave these parameters alone unless you have a good reason to make changes. The risk is high, and what is gained is a slightly more rapid recovery from a network failure — the long term consequence is that someone will have to remember that this was done.

Loop Protection

Loops in Layer 2 networks are very, very bad. The layer 2 header has no time-to-live value, so a looped frame can continue to loop forever. Add in some broadcast traffic and you have a recipe for disaster, Cisco calls it LAN meltdown.

A key element of STP is that it prevents loops — STP is designed to detect and resolve Layer 2 loops. It is not enough to run STP only on the core switches of a network (although it helps), to fully protect a LAN all switches must be running STP.

Redundant Connections

STP can be used to create failover links within the LAN, allowing the network administrator to design networks with multiple links between switches. Essentially the network administrator intentionally creates a loop, and allows STP to block one of the links. If the primary link were to fail, then STP would re-calculate the topology and would bring up the secondary link.

Host Port Configurations

Running STP on your network has some interesting side effects. It sure is great knowing that Layer 2 loops are automatically detected and mitigated, but sometimes running STP can make life difficult for users.

For example, when a port comes online STP does not trust the interface. That means it will listen to the port to see if any BDPUs come through, but it will not forward any traffic. The process from Blocking to Listening, to Learning and finally to Forwarding state can take up to a minute. Unfortunately, that means no traffic will pass the interface, which can cause some hosts that depend on DHCP for IP address assignment to fail. If you unplug and reconnect the network cable (or disable and enable the interface) the cycle starts again so that tried and true end-user approach to fixing network faults will not help. Thankfully most often a user can simply “Repair” the network connection in software, and the OS will make a successful DHCP transaction.

Cisco (and other network vendors) have created a clever solution for this problem. spanning-tree portfast disables STP on a port; so ports are always in a forwarding state.

Unfortunately, spanning-tree portfast leaves your network at risk from a user inadvertently connecting a single cable to two ports — creating a network loop that cannot be detected that can potentially eat up all the CPU processing on the switch. Cisco (and other network vendors) have come up with a solution for this too. spanning-­tree portfast bpduguard will automatically disable a port if a single BPDU frame is detected. This is not an intelligent protocol, so do NOT use spanning-tree portfast bpduguard on any switch uplinks as it will definitely shut those down as soon as it receives the first BPDU frame.

Usually when a port is shutdown, the user (or network administrator) will see the problem and resolve it. But if the port remains disabled, it requires an administrator to manually bring the port online. Cisco decided this is too much work for us network administrators, so they defined the errdisable recovery cause bpduguard command — this brings the port back online after 5 minutes (by default). If the loop still exists, it will be detected and the port will go offline again

Advanced Configurations

All of the below configurations are discussed in detail in the Cisco documentation on STP, but I will review them in brief.

Bpdufilter is an alternative to bpduguard. Like bpduguard it watches for incoming BPDU frames, but in addition it filters outbound BPDU frames (less traffic for hosts which discard these frames anyway), and if a BPDU frame is received on this port (a switching loop is created) then the port loses its portfast status and STP starts to monitor the port. There is some risk here in the event of a user looping two ports at their own desk, as the loop would not be detected.

Uplinkfast allows the network administrator to define redundant switch uplinks so they skip the Listening and Learning state in the event of a link failure. This allows the network to converge faster (Cisco says about 5 seconds) than it would have done otherwise.

Backbonefast allows switch backbone interfaces to detect and resolve indirect link failures faster than the normal STP timeouts would allow. Essentially, it allows a switch to detect a network link failure on a neighboring switch, and update its own topology very quickly.

Rootguard prevents a particular port from becoming a root port. This prevents a new switch connected to the edge of the network from becoming a root switch. Recall that the network administrator spends a lot of effort design the STP topology so the root switch is in a logical location — usually the core.

Loopguard detects a unidirectional link (usually a wiring fault) and moves the interface to the Listening state. This prevents STP calculation failures and neighboring switches will make incorrect assumptions about the network topology. There is another Layer 2 solution to this problem as well called udld which may provide more interoperability and flexibility than loopguard; Cisco has documentation on udld and loopguard here.

Rapid STP

Cisco has a white paper describing RSTP; the document is very concise and explains the differences between RSTP and STP quite clearly.

In short RSTP brings in the features of uplinkfast and backbonefast, (which were Cisco proprietary features in STP), updates the algorithm so that topology detection cascades across the network, instead of the slow plodding detection of STP, and finally it updates the timers so that convergence happens much faster.

High Availability — LAN — STP

The parent article on High Availability.

Switching on a LAN provides some of the most basic network connectivity options, and are often overlooked. Nonetheless most switches (Cisco, HP, Dell and others) support these configurations, but one thing I can guarantee is that you will find limitations on pretty much every platform. If you’re after inter-operability, do your testing so you can understand these limitations.

Spanning Tree Protocol

I discussed STP in an earlier article, but I would like to go into a little more detail here.

Spanning Tree Protocol can help us design fault-tolerant networks in two ways; primarily by detecting and disabling port misconfigurations, and secondly by allowing administrators to build failover network links.

STP is a complicated protocol, and it comes with a suite of different applications that can help fine-tune the system. I highly recommend a network engineer studies the Cisco documentation on STP, and then builds a lab environment before deploying.

I also recommend Cisco’s STP Problems and Design Considerations document. It just might help identify why things are happening the way they are.

Loop Protection

Loops in Layer 2 networks are very, very bad. The layer 2 header has no time-to-live value, so a looped frame can continue to loop forever. Add in some broadcast traffic and you have a recipe for disaster, Cisco calls it LAN meltdown.

A key element of STP is that it prevents loops — STP is designed to detect and resolve Layer 2 loops. It is not enough to run STP on the core switches of a network (although it helps), to fully protect a LAN all switches must be able to run STP.

Redundant Connections

STP Example

If you configure two switches with two network connections, STP will detect the loop and block one of the ports. There are calculations that help STP decide which interface to block, but that is for a more technical review of STP.

In the example on the left, I’ve configured two switches to use two links between them. As long as the configuration stays simple; this would actually work with a STP capable switch and a dumb switch or even a hub.

STP detects the second link and blocks the port. The calculation of which port to block is determined by an algorithm based on a interface speeds. This is customizable, so you can make sure that STP opens and fails predictably.

Failure Mode

If something happens to the active interface, STP detects this change and stops passing traffic so it can recalculate the topology. Once it is complete things look as they do on the left.

Here STP detected a failure on the active interface, and opened the secondary connection.

STP doesn’t depend on physical failures to detect network changes — each switch is constantly sending out Hello frames to the root switch. If any switch on the network is disconnected from the root switch for 3 Hello frames the entire network stops and recalculates the topology.

Advanced Configurations

When designing a network, you must always consider the complexity if your design and the requirements of your client. Sometimes for a client without a network savvy administrator to maintain the network, relying on STP for redundancy is a bad idea; and there are other options.