What I learned about MSTP 802.1s

STP History

STP was invented by Radia Perlman and in 1990 802.1D STP was standardized by the IEEE. In 2004 it was supplemented by 802.1w rSTP (rapid STP). In 1998 802.1s MSTP was defined. This was 18 years ago at the time I’m writing this, and we haven’t gone very far from here yet.

However there are new technologies on the horizon; 802.1aq – shortest path bridging, Cisco FabricPath, Juniper Qfabric and TRILL (Radia Perlman is a contributor here too!)

Benefits of MSTP versus other options

The original IEEE STP has a single instance for all VLANs. This means that no L2 traffic engineering is possible.

pvSTP – Cisco proprietary Per-VLAN STP allows L2 traffic engineering. It uses a unique STP instance for each VLAN which while it is helpful does introduces scaling issues

MSTP traffic engineering

MST permits multiple STP instances, and each instance can control groups of VLANs. This allows L2 traffic engineering without the scaling issues of pvSTP.

Disadvantages of MSTP

Probably the most significant drawback is that it is more complex than STP or rSTP – or at least it CAN be more complex as it certainly doesn’t have to be.

With MSTP all switches in a Region must have the same VLAN to instance mapping. If you add a VLAN to an instance, it must be added to all switches in the Region.

If a switch does not have the same VLAN to instance mapping it will leave the Region. This isn’t necessarily awful, but it means that your careful traffic engineering will probably not be working the way you planned – which may be really really bad depending on your network.

Structure of MSTP

MSTP has four structural parts.

  • MST Region – defines what switches participate in this MSTP
  • Region Boundary – this is the edge between two MSTP Regions or between a MSTP region and another STP based network
  • CIST (Common and Internal Spanning Tree) – Used for interacting with other STP based networks
  • MSTI (Multiple Spanning Tree Instance) – rSTP instances that only exist within a MSTP Region

MSTP Region

This defines the switches that participate in MSTP. Within a region you may have multiple instances, which are just rSTP.

The switches at the edge of the Region mark the boundary of the Region, and as you’ll see later they run CIST for compatibility when facing other networks.

Each switch in the Region must agree on these parameters:

  • Region name (32 bytes)
  • Revision number (2 bytes)
  • A table that associates each VLAN to a particular MST Instance

Region Boundary

This is the logical edge of a Region, and is where MSTP presents CIST to any neighbouring networks. CIST can be a simple single STP or simulated pvSTP.

CIST (Common and Internal Spanning Tree)

This is a virtual STP presented to networks external to MSTP. The idea is to allow MSTP to coexist alongside other networks that may be running STP, rSTP, pvSTP or even other MSTP Regions that (for whatever reason) aren’t integrated with this one.

MSTP assumes a simple STP neighbouring network and presents the same, but if a pvSTP network is detected a Cisco switch may simulate pvSTP at the Region boundary.

To the neighbouring network the entire MSTP Region appears as a single switch, which has some interesting effects for L2 traffic engineering.

MSTI (Multiple Spanning Tree Instance)

Within an MSTP Region are simple rSTP instances it is these MSTI that allows an administrator to group VLANs. Two or more instances may be engineered in such a way as to make use of multiple uplink interfaces, and each MSTI retains all the features of rSTP.

Even if you have a thousand VLANs you can put them all into a single MSTI and avoid the resource constraints of some switches.

A simple MSTP only configuration


Recommended Configuration for Interoperating other STP networks

MSTP Integration with pvSTP

  • Root Bridge is within the MSTP Region
  • Predefine VLAN Mapping for all VLANs – even those that don’t exist yet
  • Use VTPv3 if your VLAN to instance mapping is expected to change a lot. This can handle the configuration propagation as you move things around, which is important because any discrepancy of the VLAN to instance mapping will cause that switch to leave the Region
  • Configuration

    MSTP Root Switch
    spanning-tree mode mst
    spanning-tree mst configuration
    name VIRL
    revision 19
    instance 1 vlan 2-1999
    instance 2 vlan 2000-4094
    spanning-tree mst 0 priority 24576
    spanning-tree mst 1-2 priority 4096

    MSTP Member Switch
    spanning-tree mode mst
    spanning-tree mst configuration
    name VIRL
    revision 19
    instance 1 vlan 2-1999
    instance 2 vlan 2000-4094

    Alternative Configuration for Interoperating with pvSTP

    MSTP Integration with pvSTP

    This is not the recommended configuration, but with some careful effort it can be made to work.

    • Root Bridge is within the pvSTP network
    • A MSTP switch remains the root bridge for the ISTs within the Region (but this is not shared outside the Region)
    • Only one of the Region boundary ports can be active (all others are blocked).
    • If the Region is running PVSTP simulation, VLAN 1 must have a lower priority than VLANs 2-4094. If not then MST0 sets the port as Designated and not root – causing inconsistency.


    PVSTP Root Switch
    spanning-tree mode rapid-pvst
    spanning-tree vlan 1 priority 8192
    spanning-tree vlan 2-4094 priority 4096

    MSTP Root Switch
    spanning-tree mode mst
    spanning-tree mst configuration
    name VIRL
    revision 19
    instance 1 vlan 2-1999
    instance 2 vlan 2000-4094
    spanning-tree mst 1-2 priority 24576

    Do I need to configure MSTP at all?

    What happens if I just turn MSTP on for all switches and not join the domains? It will still work; each switch will manage its own single switch MSTP Region as every interconnection between one switch and another will be CIST (single spanning tree) which will elect its own root bridge.

    That will work but doesn’t deliver the L2 traffic engineering advantages of MSTP. It might make sense if you’re just trying to get loads of VLANs activated and pvSTP is hitting the hardware limits of your switches.

    Common Misconfiguration Avoidance

    • Do not keep VLANs in Instance 0 as it is used to control the CIST (instance facing STP networks outside this Region). VLANs are here by default but it can cause confusing behaviour if you use this so it is recommended to specify an instance >= 1.
    • VLAN 1 has to be in Instance 0 so don’t use VLAN 1 for network traffic.
    • Do not manually prune the trunk VLAN map – that is don’t use “switchport trunk allowed vlan remove 100-999” for your traffic engineering. If you do this in the wrong place you could get a situation where MSTP selects one path but you’ve broken it by removing the VLANs manually. It is recommended to use the MSTP topology to control L2 traffic engineering, and not use manual trunk pruning. Another way to look at this is that switchport trunk maps must match the instance mapping.

    Best Practices

    • Set the root bridge and secondary root bridge
    • Configure edge ports where applicable
    • If you’re interconnecting at L2 to a network you don’t control use BPDU filter
    • spanning-tree pathcost method long
    • Keep data VLANs out of Instance 0
    • VLAN 1 has to be in Instance 0 so don’t use VLAN 1 for network traffic
    • Some vendors will permit mapping reserved VLANs, and others will not
    • A mis-match will split the network into two regions

    Useful Links

    Understanding MSTP (cisco.com)

    PVSTP Simulation on MST Switches (cisco.com)

    Why PVSTP VLAN 1 must have a higher (worse) priority than VLANs 2-4094 when the root is outside of the MSTP Region (supportforums.cisco.com)

    INE’s Understanding MSTP (blog.ine.com)

    INE MSTP Tutorial: Inside a Region (blog.internetworkexpert.com)

    INE MSTP Tutorial: Outside a Region (blog.internetworkexpert.com)


Things I know about BGP

Full routes
Currently the IPv4 Internet is at about 515k prefixes and has been growing at about 15% each year for the last few years. That means in 5 years the IPv4 Internet may contain more than 1 million prefixes.

Currently the IPv6 Internet is at about 21k prefixes and has been growing at about 30% each year (on average). That means in 5 years the IPv6 Internet may contain more than 75k prefixes

Any router that receives full routes must hold them in memory, so if you want to accept full routes you have to watch your memory footprint to make sure that your platform can handle it.

Partial routes
are when you accepting a fraction of the full routing table. This can be some subnet of full routes, or just a default route. Your carrier can filter routes for you or you can filter them yourself.

When you’re using a default route your carrier can advertise this to you or you can just use a static route, but at least a carrier advertised default will disappear if the BGP session goes down.

Even with partial routes you can still control inbound and outbound traffic paths for your prefixes, but the limitation is that your router cannot make best path selection on prefixes that it doesn’t know about. This means that if your upstream carrier has a problem (maybe they lose their own upstream providers?) then your own routes may not reflect this and your traffic may get dropped.

Configuring Partial routes
This configuration shows how to limit learned prefixes to those on your upstream ASN +1. That means you’ll learn routes that are part of your upstream carrier’s ASN, plus any routes of their directly connected neighboring ASNs.

ip as-path access-list 1 permit ^65533_[0-9]*$
router bgp 65534
neighbor filter-list 1 in

sh ip bgp | begin Network
Network Next Hop Metric Weight Path
* 0 65533 65531 65530
*> 0 65532 65530 ?

sh ip bgp | begin Network
Network Next Hop Metric Weight Path
*> 0 65532 65530 ?

Why do we want to manipulate traffic?
Sometimes you may have circuits with a cost difference so load balancing isn’t sensible. Some networks have better peering so your customers are closer over that link. Some networks just have better performance or latency.

Or it might just be as simple as you want to push traffic to another circuit for maintenance – if you need to reload some hardware or if your carrier has a planned outage.

Controlling outbound traffic with local preference
Local preference is a blunt instrument, it simply alters the preference for all prefixes learned from a peer. The default local preference is 100, and the higher local preference wins. This is very useful for setting a backup peer.

route-map rm-bgp-localpref permit 10
set local-preference 500
router bgp 65534
Neighbor route-map rm-bgp-localpref in

sh ip bgp | begin Network
Network Next Hop Metric LocPrf Path
* 65533 65531 ?
*> 0 500 65532 65531 ?

Controlling outbound traffic with weight
Weight is a fine tool as it can be applied per-prefix (with ACLs). The default weight is 0, and the higher weight wins. This is useful for directing particular flows of traffic over particular paths.

ip access-list standard acl-bgp-weight
route-map rm-bgp-weight permit 10
match ip address acl-bgp-weight
set weight 100
route-map rm-bgp-weight permit 20
router bgp 65534
neighbor route-map rm-bgp-weight in

sh ip bgp | begin Network
Network Next Hop LocPrf Weight Path
*> 500 0 65532 65531 65533 ?
* 0 65533 ?
* 500 0 65532 65531 65533 ?
*> 100 65533 ?

Controlling inbound traffic with AS_PATH prepending

A blunt instrument, you make your advertisements look further away on one circuit compared to another. The effect is that routers that can see both paths will prefer the shorter one, encouraging traffic to use the shorter path. Even though this can be applied per-prefix, it is still a blunt too because sometimes the prepended path is still the best one.

ip prefix-list pfl-bgp-prepend seq 10 permit
route-map rm-bgp-prepend permit 10
match ip address prefix-list pfl-bgp-prepend
set as-path prepend 65534 65534 65534 65534 65534 65534 65534 65534 65534 65534
router bgp 65534
Neighbor route-map rm-bgp-prepend out

There are no tools on the local router to show this, so you have to use a BGP looking glass to validate:
Note that looking glass sites far from you will probably not see your prepends, as BGP routers only share their best path with each other – so your prepended path probably won’t make it to the other side of the planet.

Controlling inbound traffic with MED (multi-exit discriminator)
A fine tool, very effective per-prefix control but it only applies when peering to a single AS with multiple circuits. This isn’t a very common configuration.

MED a peer to use a particular circuit for some (or all) of your advertised prefixes, but sometimes metrics are filtered by peers so you must work with them to make sure it is supported.

ip access-list standard acl-bgp-med
route-map rm-bgp-med permit 10
match ip address name acl-bgp-med
set metric 200
router bgp 65534
neighbor route-map rm-bgp-med out

There are no tools on the local router to show this, so you have to work with your peer to validate this.

Blackhole incoming DDoS using BGP
This is a local administrator activated mechanism, whereby you can use BGP to indicate to your upstream provider to null route traffic for a particular prefix of yours. This ensures the DDoS target is offline but saves the rest of the network, and you don’t pay for bandwidth for incoming DDoS. This must be arranged with your upstream provider.

ip route Null0 tag 111
route-map rm-bgp-blackhole permit 10
match tag 111
set community 65534:666
router bgp 65534
redistribute static route-map rm-bgp-blackhole
neighbor send-community
neighbor send-community
ip bgp new-format

show ip bgp community | begin Network
Network Next Hop Metric LocPrf Weight Path
*> 0 32768 ?

Multicast routing on the 3750 with IP Base

So I wanted to use multicast with a 3750 that’s running IP Base. From previous experience I know I can’t just set up ip pim sparse-dense-mode on a VLAN interface if you’re running IP Base – you need IP Services for that.

PIM SDM is supported on a routed interface (using no switchport) but this is a pretty cumbersome way to use multicast. I guess it would be okay if you have a single physical server, and you can give it a dedicated IP subnet but most environments just aren’t designed that way – there’s no way this would work if you wanted your multicast source to be a VM!

Nonetheless, here’s how you’d go about it:
ip routing
ip multicast-routing
interface GigabitEthernet1/0/1
description Dedicated Interface to the Multicast Source
no switchport
ip address
ip pim sparse-dense-mode

Then you configure your server to use an IP in the network, and start the multicast stream.

The documentation stated that the 3750 only supports PIM stub routing, but the Software Advisor Tool says that PIMv2 is indeed supported. I found that pretty confusing so I asked my SE, and he confirmed that PIM stub routing is supported. Now I’m not a multicast expert, but I didn’t even know if this meant it would work!

So the answer was to try. I built a lab with a 3750 and configured it to support multicast and used the only PIM framework available, pim passive like this:
ip routing
ip multicast-routing
interface Vlan10
description Client VLAN
ip pim passive
interface Vlan20
description Server VLAN
ip pim passive

I used iperf to generate and receive the multicast streams. Generate a stream from VLAN 20:
iperf -c -u -T 32 -t 3600 -i 1
Receive the stream on VLAN 10:
iperf -s -u -B -i 1

You can use show ip mroute to review the current multicast senders and receivers, and clear ip mroute * to refresh the table. I found myself doing this a few times, and also restarting the iperf server/client devices to make sure that I was seeing up-to-date information.

And it worked, but then to my dismay I found that anybody in either VLAN could spark up a new multicast stream and share it. This is probably okay because I’ve not heard of multicast being used in a malicious way before, but that could be because it isn’t widely deployed. In the interest of security, I decided that I had better lock this down so only the server VLAN could initiate a multicast stream, and I found that this worked:
ip routing
ip multicast-routing
ip access-list extended acl-allow-igmp-receive
permit igmp any any
ip access-list extended acl-block-igmp-source
deny ip any
permit ip any any
interface Vlan10
description Client VLAN
ip access-group acl-block-igmp-source in
ip pim passive
ip igmp access-group acl-allow-igmp-receive
interface Vlan20
description Server VLAN
ip pim passive

In this case the ACL acl-allow-igmp-receive allows multicast clients to receive ALL multicast streams, and the ACL acl-block-igmp-source blocks those clients from initiating streams of their own. Both ACLs are required to to allow the clients to receive but not send multicast.

Cisco CLI Shortcuts

Some of these shortcuts I’ve used for a long time, and a couple of them I learned today. I can’t believe I’ve gone so long without knowing how to go to the beginning of a line. My left arrow key is going to miss me.

CTRL-A will take you to the beginning of a line. Very handy for negating a configuration item like this:
no ip route
CTRL-E will take you to the end of the line. Not as handy but at least you know how to move around now.

The default command will set an interface (and many other configuration items) to their default settings. This is very handy if you have complex interface configurations and you want to start from scratch without removing the configuration line by line.
default interface Fa0/0

Learn the short-hand available on your platform. I didn’t realize the time-savings of this method until I saw a TAC engineer do this, and now I am addicted to it. If you look at this configuration:
configuration terminal
interface FastEthernet0/0
switchport trunk encapsulation dot1q
switchport mode trunk
switchport trunk allowed vlan 10,100,2000-2010
switchport nonegotiate
copy running-config startup-config

You can do the same thing with a lot less typing, and you don’t have to tab-complete all the time either:
conf t
int fa0/0
sw tr en do
sw mo tr
sw tr al vl 10,100,2000-2010
sw no
cop ru st

This saves time, and whether you’re designing, testing, configuring or troubleshooting it pays to get more work done in less time.

You can add or remove VLANs from a trunk without having to type the whole line:
interface FastEthernet0/0
switchport trunk allowed vlan remove 100
switchport trunk allowed vlan add 200

On all platforms you can use the | include command to match basic text searches on any show command:
show run | include username
This command easily shows the usernames and privilege levels of those users in the configuration.

This handy command shows only CPU processes that are actually using CPU cycles, and it illustrates the | exclude command to filter out text:
show proc cpu | exclude 0.00

I often want to see configuration that I know starts somewhere in the middle, or bottom of the config. On simple router configs it isn’t a big deal to page through the data, but complex switches like the 6500 series can easily get to be thousands of lines long and paging through all that can get tiring (and boring). Use the | begin command to match text and start showing the configuration there:
show run | begin vty
This will start displaying the configuration at the first line that matches “vty” and now you can review your remote access configuration without having to hit the space bar a few hundred times.

On router platforms you can use the | section command to match entire sections of the configuration:
show run | section router
This will list the full configuration of any dynamic routing protocols you have configured. Try this on a CME router and you’ll really be happy when you can list all the configured ephones and ephone-dns:
show run | section ephone-dn

If you’re testing the bandwidth of a link you might want to get interface statistics:
show interface Gi0/8 | i rate
Queueing strategy: fifo
5 minute input rate 3000 bits/sec, 3 packets/sec
5 minute output rate 4000 bits/sec, 4 packets/sec

But you’ll notice this is a 5 minute statistic, which means you’ll have to be loading this interface for at least 5 minutes before you get a true reading. For the impatient there is a solution, we can set the statistical interval to 30 seconds (the minimum):
conf t
interface Gi0/8
load-interval 30
sh int gi 0/8 | i rate
Queueing strategy: fifo
30 second input rate 1000 bits/sec, 1 packets/sec
30 second output rate 1000 bits/sec, 1 packets/sec

That’s better – now you only have to load the interface for 30 seconds to figure out the utilization. As far as data-rate analysis goes it is as close as you need to get most of the time; you’ll have to use different methods to get more granular than this.

Macros Make Your Life Easier

Macros are sequences of commands stored on the device (router/switch/whatever) that automate common tasks.  The most common application of a macro is setting interface configurations in complex environments.  For example, imagine if you ran a single 6500 chassis with a few hundred interfaces, and your client was always changing the interface designation from these profiles: end user desktop, server, and IP phone.

Sure you could do this manually.  You could even have these various configurations as templates in a text file that you just paste in whenever you need it.  This totally works, and is pretty much how everybody does it.

There is another way, and it is called Macros.  A macro is just a series of configuration commands that are remembered by your device, and that you can apply when you need them.

The Macro Template


This is an macro to represent the default configuration of an interface.  The objective here is to be able to use the UNUSED macro whenever an interface is to change its profile.


This SERVER macro puts the server in the right VLAN, and also enables some STP functions that would otherwise have been disabled.  I always run STP facing servers — just in case somebody accidentally creates a switching loop within the server architecture.


This macro handles an interface that faces an IP phone with a desktop attached.  In this case we need to assign the voice and access VLANs, enable PoE, enable Auto QoS and enable CDP.  CDP allows Cisco phones to automatically trunk the VLANs, negotiate the required power levels, and tells the switch to prioritize voice traffic with Auto QoS.


Lastly the DESKTOP macro assigns the access VLAN, and leaves all other configuration at default.

Using Macros

So now that you’ve done all this work, how does it make your life easier?  Like this:


And that’s it!  You’ve just configured 48 interfaces with the right templated configurations.  Maybe you need to make sure that interfaces 8 — 16 are configured as desktop ports?


The nice thing about this is that your templates are stored on the device, and not on your laptop/desktop so you can make these changes from anywhere.

Other Uses for Macros

Macros also work in global configuration mode, so you can use them to prevent accidental slip-ups.  I made an earlier posting about blackholing IP addresses with BGP; one of the caveats is that it is possible for my client to accidentally blackhole his entire network.  Naturally this would be the worst possible scenario, even worse than a DDoS.

To accommodate this, I created a macro that takes a single IP address as an input and then writes the appropriate command to blackhole just one IP.


In this case, $IP is a variable that is accepted by the following command:


This takes a single IP and correctly applies the route so that there is no chance of a tired, over caffeinated, stressed out finger accidentally setting the wrong mask and blackholing more IPs than are necessary.

How To Change The IP Address Or Management VLAN Of A Device Remotely

One of the challenges of working with remote devices is when you have to change the IP address. For example, if you have to change an IP from to you might do this:

The Maverick Approach (when you don’t care about downtime)

Connect to the switch at


Then connect to the switch at


And that’s it! If your initial IP change didn’t work your switch will reload and you’ll be back at, and you can try again.

A Safer Approach

Connect to the switch at


Then connect to the switch at


Then connect to the switch at to make sure it worked, and to remove the staging IP.


The reason we have to go through this contortion of using a third, temporary IP is because Cisco does not permit you to have a secondary IP without a primary IP configured.

More Complex Changes

The solution above works if you want to change the IP, but what if you need to do something more complex?  What if you need to move the management IP from one VLAN to another?  This might happen if you’re in an environment that was using VLAN 1 everywhere, and you’ve decided to enact of the recommendations in the Cisco Best Practices guide so now you need to move the management IP from VLAN 1 to VLAN 777.

In this case you can’t just configure in VLAN 777, because that subnet already exists in VLAN 1.  You can’t remove the IP from VLAN 1, because then you’ll lose your connection to the device.

The solution is to use a script, as below.  You’ll note that my script included changing the VLAN of interface FastEthernet 1/0/1 to VLAN 777; this is the interface that my connection is coming through and because my management IP is going to be on VLAN 777 it is necessary to do this.  Make sure you think about what the final configuration will look like after your script completes, remember that you need to be able to connect to this device or else you’re going to have to reload and start over.


Create this script in a text file, and copy it to your device.  I used tftp.


Then I can confirm the contents are what I think they should be like this:


That looks right, so we can apply the script now.  I’m cautious when I’m working remotely, so I always set a reload timer; this way if something goes really wrong I can always get back to the original configuration.


Now we can start the VLAN change!


When I did this, my ssh session didn’t even drop.  All this really does it copy the contents of the file flash:/device-vlan-script.text right into the running configuration, and the device treats the commands just like it would when the device is booting up.  All I have to do now is cancel the timed reload.


BGP Blackhole Community

A client of mine is in the server-hosting (bandwidth selling as he calls it) business, and as such he has a lot of public IP addresses attached to servers that he doesn’t directly manage.  These servers are sometimes the focus of internet attacks that sometimes have the ability to eclipse the legitimate traffic of his entire business.  My client brought me in to advise on ways to mitigate the risks of this.

A Simple Model

The first time I spoke with this client, he had a very simple network model.

The network provider acted as the default gateway for several subnets, and all networking gear onsite was Layer 2 only.  This created some interesting failure modes, particularly in a Denial of Service event.  The scenario that brought me on board was when my client disabled the victim in an attacked (I think he unplugged the network from that server) but this caused a flood of traffic to all ports on the network.  This was a natural reaction of the networking design that was in place at the time; when my client unplugged that host its MAC address was forgotten by the switches (a 10 minute timer) and switches WILL flood unicast traffic to all ports when they do not know the destination interface for a particular MAC address.

To solve this problem, on the aggregation switch I created a Layer 2 ACL that would drop all traffic for this MAC address.  Once this was in place the traffic would still come across the network link to the aggregation switch, but it would be dropped there and this traffic would not be forwarded to all ports.

Moving to a Layer 3 Solution

After some deliberation with my client, and discussion of his long-term plans I recommended that he take over the routing for his own network.

In this configuration we installed two 3750 switches, and arranged with the provider to have a small routed network between the 3750s and their network.  They setup static routes for my client’s networks, and I just configured a default route back to the provider.  To provide additional availability, we stacked the 3750s, and setup LACP bundles to each of the Server Switches.  I had my client setup a server that monitored all traffic with NTOP, this allowed us to see exactly what IP was being attacked so my client could take steps to resolve the issue.

The steps at this point were to write a Layer 3 ACL to drop traffic for that IP, and contact the network provider to ask them to drop traffic for this IP for 24 hours.  The idea here was to minimize the impact of the attack on the rest of the network, because the DDoS attacks were completely saturating my client’s network connection.  I considered writing QoS templates to use in the event of an attack, but the variety of incoming attacks and the knowledge level of my client make this unworkable; the problem here wasn’t how to rate-limit the traffic, but more of how to identify which rate-limiting mechanism to put into place  at the right moment.

Unfortunately the frequency and severity of the DDoS have increased.  My client’s network provider has stated they are unable to offer any additional active DDoS protection, and in fact the provider was becoming frustrated with handling my client’s requests to block incoming traffic.

Private BGP and Blackhole Communities

This led us to the current solution, which was to enact a private BGP relationship with the provider.  Here we advertise the routes that are active on my client’s network, but my client has the ability to tag particular routes with a community identifier which the provider uses to indicate a route to Null; essentially my client can enforce a blackhole route on the provider’s network without having to call the provider, and the blackhole route is effective almost immediately.  The advantage here is that the provider is not annoyed with my client, and he has much clearer control over the networks that are blocked.


Here you can see that we have specified a BGP AS in the private AS range (64512 — 65535).  With this model, all my client has to do is add a route with this tag, and it will be blocked by the provider; saving his bandwidth costs and saving the rest of his customers from a bad networking experience.

Showing the Status of the Blackhole Community

Once you have blocked a route, you can verify that it was taken up in BGP like this:


(mis)Adventures with Spanning Tree


I know I’ve gone on and on about STP (Spanning Tree Protocol) but something happened today that reminded me of what Tom Jacoby of IOSecure once said to me. I don’t remember the exact phrase, but the general idea was that STP was prone to code error and administrative error, and as much as even a properly configured STP configuration works, not very many people really understand it.

That said, I’ve put together some nice STP setups in my career. Compared to some of the other modern methods we have it is old tech to use it for fault tolerance, but there is something satisfying about building the configuration. Maybe it is because it is hard and complicated — I’m not even sure why I like it.

What is on my mind today was a client who is growing out of a bunch of 3500XLs into 2950 switches. Even the 2950s aren’t exactly new, but this is my most cost sensitive client so I’m a bit hamstrung on the hardware. The configuration has a 3750 (stack of two switches) for the core that provides routing for the network, and a flat network beyond that is pushed to 19 edge switches (3500XLs of varying code levels, and the newer 2950s) — no trunking just access ports to the edge switches, and everything is running STP with the 3750 stack as the root switch.

Now this isn’t a best practices network by any means, but the customer has a legacy configuration to migrate from and it may very well be with them for some time. Part of my job is to provide the guidance to move towards a scalable, stable and supportable network.

We had some problems setting up LACP on the 3500XLs (they only supported Cisco’s proprietary Etherchannel and in a very feature-poor way), but it was working great on the 2950s. My client was so impressed by the idea of having redundancy for the edge switches, that we created backup links for each edge switch (even on the 3500XLs) and let STP sort out the loops where we weren’t using Etherchannel. Some time passed, and things seemed like they were working fine until out of the blue all the 2950s stopped working.


The 2950s just dropped link. My client would connect a different port and it would light up, and then drop the link again — not even a light on the 3750 interface. I suspected some security profile issue, but as there wasn’t any of that configured it seemed like a long shot, so we agreed that my client would revert the 2950s to his stock of 3500XLs and figure out the 2950s later. Unfortunately the problems persisted through the next day so we decided to figure out why the 2950s failed.

The 3750 was reporting weird errors in its log:

Jul 4 22:40:31.438 UTC: %SW_MATM-4-MACFLAP_NOTIF: Host 0000.c6ca.9aff in vlan 1 is flapping between port Po19 and port Fa1/0/6

What the heck? MAC address flapping? There must be a loop somewhere. But because the 3750 wasn’t experiencing any actual CPU pain I left this and focused on the 2950s to see if I could get them working and back into production to save the day.

The 2950 console reported this:


Another weird error! Err-disable state and yet we haven’t configured any security policies configured on the interface — this error message led me to Cisco’s site and an explanation of how the 2950s handle L2 keepalives which is a default configuration (and so doesn’t show up in the configuration). Cisco’s interpretation is that the 2950s put out a keepalive to detect physical wiring loops, but will also trigger in the event of a STP loop anywhere on the network. You can disable the keepalive and get the switches to connect, but that doesn’t make the loop go away so I set to work on finding it.


I went through each of the 3500XLs, and discovered two in which STP had not blocked the second link to the 3750. This is how I figured out which ones were the culprits:

  1. show cdp neighbors
  2. show spanning-tree (brief)

In the first switch, CDP showed the 3750 connected on ports 24 and 48, but STP didn’t show anything for port 48, so I had to assume that it was also forwarding even though it should show a FORWARDING state in that case. In the second switch, CDP showed the 3750 connected on ports 24 and 48, but this time STP actually showed both ports in the FORWARDING state.

To resolve the issue, I just killed the second link. As soon as I did that the MAC flap entries in the 3750 log stopped, and the 2950s were able to connect again.


The moral of this story is don’t put all your eggs in the STP basket — especially with very old switches and very old STP code. Maybe a better moral is don’t implement advanced features with old code — the 3500XLs are doing just fine switching but as soon as we asked them to do something out of the ordinary two of them fell on their faces.

Upgrading The IOS On A Switch (The Right Way)

Since I’ve made a couple of postings describing flash upgrade horror stories, I thought I’d include a description of how to do it the right way.

Selecting an image

First verify your hardware, either use the web interface or use show version at the CLI.

  1. Go to cisco.com and login with your CCO account.
  2. Click Support from the menu
  3. Click Download Software in the “select a task” window
  4. Choose Switches Software from the list
  5. Choose LAN Switches from the list
  6. Expand the Cisco Catalyst 2950 Series Switches group
  7. Choose the type of 2950 switch, in this example choose Cisco Catalyst 2950G 48 EI Switch
  8. Choose IOS Software from the list
  9. And select the software version you want

Some advice on choosing images:

  • Generally speaking, I’d say take the latest image. If your hardware is relatively old, then the newest code is probably bugfixes only and will not be introducing new features (and their bugs). If this was newer hardware, I’d recommend you carefully read the release notes.
  • Never load the deferred releases — these have serious bugs in stability or performance. If the code you’re running right now is deferred, or isn’t listed at all then I’d say an upgrade to newer code is very important.
  • Choose an image that represents what flash resources you have — I like the image C2950 EI AND SI IOS CRYPTO AND WEB BASED DEVICE MANAGER because it supports ssh, and it has a web interface (SDM) built in. But the web SDM takes up flash space, so make sure you review your switch flash resources to make sure it can handle the larger file.

Prepare the TFTP server with images

You will need the tftp SERVER, not just the client. Check your documentation, but in Linux you usually have to modify /etc/default/tftpd to enable the tftpd service. Solarwinds makes a free win32 tftp server as well.

The tftpd root folder can be anywhere, but usually it is either /tftpboot or /var/lib/tftpboot; in Windows you can easily specify this folder to be wherever you want, but I’d recommend you create a tftpboot folder off the root of your drive. Put the tar files in the tftpd root folder.

It is a little easier to test these issues from a PC client; so you can save some troubleshooting time by testing if you can download the image from a PC tftp client, Windows ships with tftp so just login and download a file to test. If you’re running Linux then you might have to load the tftp client from your distribution repos.

Loading the new image

Connect to the CLI, and go into enable mode. You can tell you’re in enable mode when you see the # in the prompt.


Verify the space in the flash with dir. Chances are pretty good there is only room for one image at a time, but if you’re lucky you can fit two images.


If by some happy chance you’ve got more than 16Mb flash you probably have room for two images. But I think this is not the case, so you will have to erase the flash. You can go file by file with delete, but you’re better off just recursively wiping it like this.


Give yourself some peace of mind, and write the current running configuration to flash again to make sure it isn’t lost when you reboot. The startup config is written to an internal nvram flash, so this probably isn’t required, but I do it because it makes me feel better.


Now the switch is ready to load the images from your TFTP server. Remember that TFTP is a UDP protocol, and has no inherent error correction — so make sure the link between the TFTP server and the switch is not lossy (like wireless or over the internet), a server on the same IP subnet is a good solution.

You can load a binary IOS image by tftp like this: copy tftp flash and then just follow the prompts in the script. But the problem is that the binary IOS image doesn’t have the web interface files, and you (probably) don’t have the space to copy the archived binary+webfiles and then extract it locally. So Cisco has a script that can download the archived binary+webfiles and extract it on the fly.


And now you watch the magic. Once the system is done downloading and extracting the images, you will reload and if all went well, you’ll be running on the new code.

NB: If you managed to squeeze two IOS binaries on one image, you will have to specify which one you want to boot from — this is not an issue if you erased the flash as described earlier.

  1. Figure out exactly which binary you want to boot from; use dir to list the files and look for those that end in “.bin”
  2. Make sure that there isn’t already a boot variable set; sh run | i boot should list any entries. If they’re in there, remove them. These boot parameters are processed in order, so you want to make sure your new image is first in the config — you can add the lines for any other images you’ve got on the flash after your new image.
  3. Set the new boot image:

Finally you write the new configuration and reload.


Recovering A Bad Flash — Part 2

Today I encountered another flash problem. I was working on an older switch, a WS-C2950G-48-EI running 12.1(13)EA1c. My client had asked me to upgrade the software because he wanted access to the html interface on the switch.

Now I’m at least an hour away from this client so I wanted to do the upgrade from my office, but I don’t like to do TFTP over the greater Internet as there’s no error correction with this.

With newer hardware we can upgrade with HTTP — this is fantastic because I can just load the code onto a webserver here in my office, punch a hole in the firewall and upgrade the hardware. Easy as pie.

But this old switch only supports FTP and TFTP. I figured I’d give FTP a whirl — it is a bit more complex than http (okay a lot more complex) but it should work.


And away you go! You can also do this:


You can do this too if you’re working with Cisco’s tar files:


I initally had a problem with FTP getting through the firewall — the client was throwing an error: “no such user”. That didn’t make sense because the username definitely works. I double checked locally, and remotely from another server to make sure my credentials worked.

The problem was that FTP uses a data channel (port 20) in addition to the control channel (port 21) and this can be a bit confusing for firewalls to track the sessions. So I turned on FTP inspection on my office firewall and things seemed to work, at least the debug suggested that the client was able to login.

You can debug FTP client sessions like this:


But it still wasn’t working. The sessions would login, start a download but never finish. The debugs showed that the client sent an ABOR code — which aborts the download.

Interestingly, many clients don’t handle the ABOR command well. The site FTPGuide.com says: The abort command may require “special action” to force recognition from the server. Unfortunately the only special action I had at my immediate disposal was to kill the terminal session.

That was my fatal mistake.

Three FTP sessions later, and I can’t view the flash drive anymore. sh flash and dir both report:


Now I’m in trouble. I’ve deleted the old image to make space for the new one, and now I can’t write the new image because the flash is non-responsive. I was able to erase flash and format flash but while these claimed to work, neither of them helped to make the flash usable again.

Still I suspected this problem was related to the FTP sessions I attempted, so I found a new command (new to me):


Ah ha! Clearly this is my problem. These FTP sessions are tying up the flash, if only I could kill the PIDs listed here, but after much searching I find that Cisco doesn’t allow us to kill individual PIDs. I guess they feel that if a PID doesn’t close nicely then something is so wrong with the device that it needs more help than just a kill.

But that doesn’t help me. I realize that I can clear tcp sessions that are originating from the system, if only I know the source and destination ports. I had my firewall log open and was able to pick out the ports and clear these sessions, but I could also have used show ip sockets to figure it out.


After clearing a few of these I was able to get access to the flash drive and load an image back on. This time I asked my client to setup a TFTP server locally and I used that.

The moral of the story — don’t use FTP.