Upgrading The IOS On A Switch (The Right Way)

Since I’ve made a couple of postings describing flash upgrade horror stories, I thought I’d include a description of how to do it the right way.

Selecting an image

First verify your hardware, either use the web interface or use show version at the CLI.

  1. Go to cisco.com and login with your CCO account.
  2. Click Support from the menu
  3. Click Download Software in the “select a task” window
  4. Choose Switches Software from the list
  5. Choose LAN Switches from the list
  6. Expand the Cisco Catalyst 2950 Series Switches group
  7. Choose the type of 2950 switch, in this example choose Cisco Catalyst 2950G 48 EI Switch
  8. Choose IOS Software from the list
  9. And select the software version you want

Some advice on choosing images:

  • Generally speaking, I’d say take the latest image. If your hardware is relatively old, then the newest code is probably bugfixes only and will not be introducing new features (and their bugs). If this was newer hardware, I’d recommend you carefully read the release notes.
  • Never load the deferred releases — these have serious bugs in stability or performance. If the code you’re running right now is deferred, or isn’t listed at all then I’d say an upgrade to newer code is very important.
  • Choose an image that represents what flash resources you have — I like the image C2950 EI AND SI IOS CRYPTO AND WEB BASED DEVICE MANAGER because it supports ssh, and it has a web interface (SDM) built in. But the web SDM takes up flash space, so make sure you review your switch flash resources to make sure it can handle the larger file.

Prepare the TFTP server with images

You will need the tftp SERVER, not just the client. Check your documentation, but in Linux you usually have to modify /etc/default/tftpd to enable the tftpd service. Solarwinds makes a free win32 tftp server as well.

The tftpd root folder can be anywhere, but usually it is either /tftpboot or /var/lib/tftpboot; in Windows you can easily specify this folder to be wherever you want, but I’d recommend you create a tftpboot folder off the root of your drive. Put the tar files in the tftpd root folder.

It is a little easier to test these issues from a PC client; so you can save some troubleshooting time by testing if you can download the image from a PC tftp client, Windows ships with tftp so just login and download a file to test. If you’re running Linux then you might have to load the tftp client from your distribution repos.

Loading the new image

Connect to the CLI, and go into enable mode. You can tell you’re in enable mode when you see the # in the prompt.

[code]]czo2NTpcIlVzZXIgQWNjZXNzIFZlcmlmaWNhdGlvbg0KUGFzc3dvcmQ6DQpTVzAyJmd0O2VuDQpQYXNzd29yZDoNClNXMDIjXCI7e1smKiZdfQ==[[/code]

Verify the space in the flash with dir. Chances are pretty good there is only room for one image at a time, but if you’re lucky you can fit two images.

[code]]czo2MDA6XCJTVzAyI2Rpcg0KRGlyZWN0b3J5IG9mIGZsYXNoOi8NCjIgICAtcnd4IDM3MjE5NDYgTWFyIDA3IDE5OTMgMjI6Mzk6MDJ7WyYqJl19ICswMDowMCBjMjk1MC1pNmsybDJxNC1tei4xMjEtMjIuRUExMy5iaW4NCjMgICAtcnd4ICAgIDIzMTYgTWF5IDI5IDIwMDggMjA6M3tbJiomXX0yOjEwICswMDowMCB2bGFuLmRhdA0KNCAgIC1yd3ggICAgIDExMiBNYXIgMDcgMTk5MyAyMjozNjoxMCArMDA6MDAgaW5mbw0KMTYge1smKiZdfSAtcnd4ICAgIDI4MDQgTWFyIDAxIDE5OTMgMDA6MDg6MjUgKzAwOjAwIGNvbmZpZy5vbGQNCjIyICAtcnd4ICAgICAzMzMgTWFyIDB7WyYqJl19NyAxOTkzIDIyOjQwOjQ4ICswMDowMCBlbnZfdmFycw0KMzM0IC1yd3ggICAgICAyNCBNYXIgMDEgMTk5MyAyMTowNDoyOCArMDA6MHtbJiomXX0wIHByaXZhdGUtY29uZmlnLnRleHQNCjYgICBkcnd4ICAgIDQ0MTYgTWFyIDA3IDE5OTMgMjI6NDA6MDQgKzAwOjAwIGh0bWwNCjE5e1smKiZdfSAgLXJ3eCAgICAgMTEyIE1hciAwNyAxOTkzIDIyOjQwOjM5ICswMDowMCBpbmZvLnZlcg0KMjQgIC1yd3ggICAgMjA2OCBNYXIgMDF7WyYqJl19IDE5OTMgMjE6MDQ6MjggKzAwOjAwIGNvbmZpZy50ZXh0DQo3NzQxNDQwIGJ5dGVzIHRvdGFsICgyMTM4NjI0IGJ5dGVzIGZyZWUpXCJ7WyYqJl19O3tbJiomXX0=[[/code]

If by some happy chance you’ve got more than 16Mb flash you probably have room for two images. But I think this is not the case, so you will have to erase the flash. You can go file by file with delete, but you’re better off just recursively wiping it like this.

[code]]czoyOTpcIlNXMDIjZGVsZXRlIC9yZWN1cnNpdmUgZmxhc2g6XCI7e1smKiZdfQ==[[/code]

Give yourself some peace of mind, and write the current running configuration to flash again to make sure it isn’t lost when you reboot. The startup config is written to an internal nvram flash, so this probably isn’t required, but I do it because it makes me feel better.

[code]]czoxOTpcIlNXMDIjY29weSBydW4gc3RhcnRcIjt7WyYqJl19[[/code]

Now the switch is ready to load the images from your TFTP server. Remember that TFTP is a UDP protocol, and has no inherent error correction — so make sure the link between the TFTP server and the switch is not lossy (like wireless or over the internet), a server on the same IP subnet is a good solution.

You can load a binary IOS image by tftp like this: copy tftp flash and then just follow the prompts in the script. But the problem is that the binary IOS image doesn’t have the web interface files, and you (probably) don’t have the space to copy the archived binary+webfiles and then extract it locally. So Cisco has a script that can download the archived binary+webfiles and extract it on the fly.

[code]]czo4NjpcIlNXMDIjYXJjaGl2ZSB0YXIgL3h0cmFjdCB0ZnRwOi8vMTkyLjE2OC4yLjExL2MyOTUwLWk2azJsMnE0LXRhci4xMjEtMjJ7WyYqJl19LkVBMTMudGFyIGZsYXNoOlwiO3tbJiomXX0=[[/code]

And now you watch the magic. Once the system is done downloading and extracting the images, you will reload and if all went well, you’ll be running on the new code.

NB: If you managed to squeeze two IOS binaries on one image, you will have to specify which one you want to boot from — this is not an issue if you erased the flash as described earlier.

  1. Figure out exactly which binary you want to boot from; use dir to list the files and look for those that end in “.bin”
  2. Make sure that there isn’t already a boot variable set; sh run | i boot should list any entries. If they’re in there, remove them. These boot parameters are processed in order, so you want to make sure your new image is first in the config — you can add the lines for any other images you’ve got on the flash after your new image.
  3. Set the new boot image:
[code]]czo5NjpcIlNXMDIjY29uZiB0DQpTVzAyKGNvbmZpZykjYm9vdCBzeXN0ZW0gZmxhc2g6L2MyOTUwLWk2azJsMnE0LW16LjEyMS0yMi57WyYqJl19RUExMy5iaW4NClNXMDIoY29uZmlnKSNlbmRcIjt7WyYqJl19[[/code]

Finally you write the new configuration and reload.

[code]]czozMjpcIlNXMDIjY29weSBydW4gc3RhcnQNClNXMDIjcmVsb2FkXCI7e1smKiZdfQ==[[/code]

Recovering A Bad Flash — Part 2

Today I encountered another flash problem. I was working on an older switch, a WS-C2950G-48-EI running 12.1(13)EA1c. My client had asked me to upgrade the software because he wanted access to the html interface on the switch.

Now I’m at least an hour away from this client so I wanted to do the upgrade from my office, but I don’t like to do TFTP over the greater Internet as there’s no error correction with this.

With newer hardware we can upgrade with HTTP — this is fantastic because I can just load the code onto a webserver here in my office, punch a hole in the firewall and upgrade the hardware. Easy as pie.

But this old switch only supports FTP and TFTP. I figured I’d give FTP a whirl — it is a bit more complex than http (okay a lot more complex) but it should work.

[code]]czo3NTpcImNvbmYgdA0KaXAgZnRwIHVzZXJuYW1lIHBhdWwNCmlwIGZ0cCBwYXNzd29yZCBzb21lcGFzcw0KZW5kDQpjb3B5IGZ0cCB7WyYqJl19Zmxhc2hcIjt7WyYqJl19[[/code]

And away you go! You can also do this:

[code]]czo1MzpcImNvcHkgZnRwOi8vcGF1bDpzb21lcGFzc0BzZXJ2ZXJpcC9maWxlbmFtZS5iaW4gZmxhc2g6XCI7e1smKiZdfQ==[[/code]

You can do this too if you’re working with Cisco’s tar files:

[code]]czo2ODpcImFyY2hpdmUgdGFyIC94dHJhY3QgZnRwOi8vcGF1bDpzb21lcGFzc0BzZXJ2ZXJpcC9maWxlbmFtZS50YXIgZmxhc2g6XCI7e1smKiZdfQ==[[/code]

I initally had a problem with FTP getting through the firewall — the client was throwing an error: “no such user”. That didn’t make sense because the username definitely works. I double checked locally, and remotely from another server to make sure my credentials worked.

The problem was that FTP uses a data channel (port 20) in addition to the control channel (port 21) and this can be a bit confusing for firewalls to track the sessions. So I turned on FTP inspection on my office firewall and things seemed to work, at least the debug suggested that the client was able to login.

You can debug FTP client sessions like this:

[code]]czo3MDpcImRlYnVnIGlwIGZ0cA0KdGVybSBtb24gIyB0aGlzIHdpbGwgcmVkaXJlY3QgZGVidWcgbG9ncyB0byB5b3VyIHNlc3Npb257WyYqJl19XCI7e1smKiZdfQ==[[/code]

But it still wasn’t working. The sessions would login, start a download but never finish. The debugs showed that the client sent an ABOR code — which aborts the download.

Interestingly, many clients don’t handle the ABOR command well. The site FTPGuide.com says: The abort command may require “special action” to force recognition from the server. Unfortunately the only special action I had at my immediate disposal was to kill the terminal session.

That was my fatal mistake.

Three FTP sessions later, and I can’t view the flash drive anymore. sh flash and dir both report:

[code]]czozOTpcIiVFcnJvciBvcGVuaW5nIGZsYXNoOi8gKE5vIHN1Y2ggZGV2aWNlKVwiO3tbJiomXX0=[[/code]

Now I’m in trouble. I’ve deleted the old image to make space for the new one, and now I can’t write the new image because the flash is non-responsive. I was able to erase flash and format flash but while these claimed to work, neither of them helped to make the flash usable again.

Still I suspected this problem was related to the FTP sessions I attempted, so I found a new command (new to me):

[code]]czoxOTU6XCIjc2ggZmlsZSBkZXNjcmlwdG9ycw0KRmlsZSBEZXNjcmlwdG9yczogRkQgUG9zaXRpb24gT3BlbiBQSUQgUGF0aA0KMCB7WyYqJl19MCAwMDAxIDUyIGZ0cDovL3NlcnZlci9jMjk1MC1pNmsybDJxNC10YXIuMTIxLTIyLkVBMTMudGFyDQoxIDAgMDAwMSA2NSBmdHA6L3tbJiomXX0vc2VydmVyL2h0bWwudGFyDQoyIDAgMDAwMSA2NiBmdHA6Ly9zZXJ2ZXIvZm9vLnR4dFwiO3tbJiomXX0=[[/code]

Ah ha! Clearly this is my problem. These FTP sessions are tying up the flash, if only I could kill the PIDs listed here, but after much searching I find that Cisco doesn’t allow us to kill individual PIDs. I guess they feel that if a PID doesn’t close nicely then something is so wrong with the device that it needs more help than just a kill.

But that doesn’t help me. I realize that I can clear tcp sessions that are originating from the system, if only I know the source and destination ports. I had my firewall log open and was able to pick out the ports and clear these sessions, but I could also have used show ip sockets to figure it out.

[code]]czo1MDpcIiNjbGVhciB0Y3AgbG9jYWwgc3dpdGNoSVAgMTEwMjAgcmVtb3RlIHNlcnZlcklQIDIxXCI7e1smKiZdfQ==[[/code]

After clearing a few of these I was able to get access to the flash drive and load an image back on. This time I asked my client to setup a TFTP server locally and I used that.

The moral of the story — don’t use FTP.

Recovering A Bad Flash — Part 1

Some people are perfect, they never make mistakes, and they never put themselves in a position where they have to dig themselves out of whatever problem they’ve caused.

I’m not that person. I make mistakes, but I learn from them. I learn how to solve the problems those mistakes cause, and I learn more about the systems themselves. I have to think this is pretty normal — and as my mentor Tom Jacoby always says: “the definition of an expert is someone who has already made all the mistakes!”

So yesterday I made a mistake that made me feel like a junior tech all over again — if only until I knew how I was going to fix it. I was upgrading software on four pre-production routers, and one of these had a flash that was just big enough for one image at a time. Standard procedure here is to delete the old image, load a new one on and reload the system.

This was no problem — I had already done another identical machine and it came up fine… The problem was this was the last router I had to work on so while it was copying I started cleaning up and then I find myself holding the power cables for all the routers in my hands. They’re all offline.

And now the last router will not boot — it just drops into ROMMON. I’ve recovered hardware over console xmodem and that is not nice — it can take hours to transfer a 30 meg image. Thankfully these routers use CF cards, and thankfully I have a USB CF reader. I mounted the card on my laptop, copied the new image over, booted the system and I was off to the races.

So what did I learn? Don’t rush the job. Think about what you’re doing and if you’re in a hurry just put that hurry out of your mind. Small mistakes can cause huge amounts of pain — if these were production routers instead of pre-production it would have been a very difficult day for me.