Spanning-tree portfast

So I’m labbing up on some tech, one of the labs concerned itself with portfast. Portfast is a legacy spanning-tree improvement made by Cisco that has found it’s way in rapid spanning tree and MST alike as edge ports. However, in Cisco IOS the portfast term can still be found.

Portfast enables a port to skip the listening and learning phases and go directly to the forwarding phase. This helps a port to become operational much faster than without portfast. However, this shorter time to productivity isn’t even its biggest advantage. When a port has been configured for portfast it won’t generate Topology Change Notifications (TCN) in spanning tree causing the CAM tables to time out faster (or instantly). In a big network this is a huge advantage.

Portfast introductions aside. Like I said I was labbing up on portfast. The lab manual said “configure port G0/0 as a portfast port” and “configure port G0/0 as a trunk port”. Normally I would configure this with the interface command:

 spanning-tree portfast trunk

Within my lab this worked as well, however, I saw the following when using the questionmark:

SW1(config-if)#spanning-tree portfast ?
  disable  Disable portfast for this interface
  edge     Enable portfast edge on the interface
  network  Enable portfast network on the interface

The command spanning-tree portfast trunk doesn’t exist according to the context sensitive help. The command does work as can be verified:

SW1#sh spanning-tree interface g 0/0 portfast 
VLAN0001            disabled
VLAN0002            disabled
VLAN0005            disabled
VLAN0007            disabled
!
!
SW1(config-if)#spanning portfast trunk
%Warning: portfast should only be enabled on ports connected to a single
 host. Connecting hubs, concentrators, switches, bridges, etc... to this
 interface  when portfast is enabled, can cause temporary bridging loops.
 Use with CAUTION

SW1(config-if)#end
!
SW1#sh spanning-tree interface g 0/0 portfast 
VLAN0001            enabled
VLAN0002            enabled
VLAN0005            enabled
VLAN0007            enabled

however, according to the context sensitive help the actual command should be:

spanning-tree portfast edge trunk

This is in line with the ‘new’ terminology introduced in Rapid Spanning Tree. Don’t confuse the above command with:

spanning-tree portfast network

This will enable bridge-assurance on the port if configured globally. Bridge assurance is a topic for another post.

Cisco ACI and Nutanix Foundation Discovery

Nutanix uses an IPv6 multicast system for its foundation discovery. When you do this on a flat l2 network this is no problem. However, when attempting to do this on ACI you need to enable a specific setting for this to work.

Nutanix itself doesn’t know which setting to enable, if you ask them they only give you instructions for enabling the IPv6 multicast settings on a ‘normal’ Cisco network. For those using ACI this is useless. I can also imagine when working on a production network you don’t want to test several settings.

For us it worked by enabling a ND policy in the Bridge Domain. For this you need to know which vlan is configured on the Nutanix node itself.

This setting can be found when going to Networks, Bridge domains and then selecting the correct bridge domain. Here you can go to L3 networking. On the bottom of this page you’ll find the ND policy option. When you select the default policy and submit this it should work.

There might be other solutions for this. Please share them if you know them.

Cisco ACI upgrade from 1.2 to 1.3

So last week I attempted an upgrade of our ACI environment from version 1.2 to version 1.3. I know 2.0 is already available, but does not offer anything we need at this point and the upgrade to 1.3 was done because of an annoying bug.

A minor upgrade shouldn’t be a big issue but apparently it was.

We started the upgrade normally. We uploaded the new software and started the APIC upgrade. Easy. Just follow the upgrade instructions and you’ll be fine. The APICs will all install the new software and reboot when required. ACI is even smart enough to wait for a rebooting controller to come back online and pass all the health checks. You can’t do anything wrong.

 

At least, thats what we thought. Apparently after installing the new version we couldn’t reach the APICs using HTTPS anymore. After some troubleshooting we had the following information:

  • Ping doesn’t work
  • SSH does work
  • HTTP(S) doesn’t work

We started looking further. A collegue of mine tried to access the APIC from a server in the same network as the APICs (we use the out-of-band addresses on the APIC). That worked. We were baffled. We knew because of this behaviour it had to be a policy on the APIC itself. Fortunately, using the server in the same subnet as the APIC we had HTTPS access to ACI again which made it possible to troubleshoot. However, since we’re both fairly new at this and weren’t the guys who implemented the network we didn’t know where to look.

Fortunately the supplier did know where to look and helped us fix the problem. It was indeed a policy. I’ll come back on this topic in a bit.

 

Unfortunately this issue was my own fault. It is documented in the release notes for version 1.2(2), which I glanced over when preparing for the change. The actual text from Cisco is:

When upgrading to the 1.2(2) release, a non-default out-of-band contract applied to the out-of-band node management endpoint group can cause unexpected connectivity issues to the APICs. This is because prior to the 1.2(2) release, the default out-of-band that was contract associated with the out-of-band endpoint group would allow all default port access from any address.  In 1.2(2), when a contract is provided on the out-of-band node management endpoint group, the default APIC out-of-band contract source address changes from any source address to only the local subnet that is configured on the out-of-band node management address. Thus, if an incorrectly configured out-of-band contract is present that had no impact in 1.2(1) and prior releases, upgrading to the 1.2(2) release can cause a loss of access to the APICs from the non-local subnets.

These release notes can be found here.

For all of you preparing to do the upgrade from 1.2(1) to a version higher, please remember this one as it will bite you.

To check whether you will encounter this you can go to Tenants > mgmt > Node Management EPGs > Out of Band EPG – default.

Here you can view whether you use the default contract. In our case a non-default contract was specified here. You can look up this contract at: Tenants > mgmt > Out of Band Contracts > Name of your contract

You need to specify HTTPS access in this contract to be able to reach the APIC.

Unfortunately I can’t post any screenshots here as the referenced environment is a production environment which I’m not allowed to show, but if you have any questions or need for clarification, please let me know.

OTV terminology

Within OTV there are several terms that might be confusing. This is a list of all the terms I’ve encountered so far with an explanation.

  • OTV edge device: This is the switch that performs all the OTV operations. Layer 2 frames enter the switch and if they need to be transported over the overlay network they will be encapsulated here and sent out of the join interface. All the OTV configuration is done on these devices (with the possible exception of MTU configuration)
  • OTV internal interface: This is the interface on the edge device that points toward the datacenter network. It is a layer 2 interface and it needs to support all the vlans that need to be extended across the overlay network. To be able to do so it must be a (dot1q) trunk port.
  • OTV join interface: This is the interface on the edge device that points toward the routed network that carries the data toward the other datacenters.
  • OTV overlay interface: A logical interface on the edge device. A large part of the OTV configuration is done on this interface. It ensures the encapsulation of the layer 2 frames into layer 3 packets.
  • Transport network: This is the IP network that will carry the encapsulated layer 2 frames across to the other datacenter(s). The only requirement to this network is that it needs to be an IP network. It must also support jumboframes or you need to reduce the MTU in your network to ensure the additional overhead of OTV does not cause any problems (OTV does not support fragmentation)
  • Overlay network: The logical network that connects OTV devices.
  • Site: Most often a datacenter, but nothing prevents you from connecting a campus to a OTV network, so it might be a campus as well.
  • Site vlan: The vlan used by OTV edge devices within a site to communicate. This vlan must not be extended across the transport network. All OTV edge devices within a site need to agree on the site vlan. Because the vlan is not extended you could in theory use the same vlan on all sites.
  • Authoritative Edge Device: When using multiple OTV edge devices an AED is required. This is the device responsible for forwarding the data on a specific vlan. An AED is configured per vlan, so when using two OTV edge devices you can use both actively but for different vlans. (Only one Edge device is allowed to forward traffic for a specific vlan because of the removal of BPDU’s. If both devices were to forward traffic this might cause loops or MAC flapping.)
  • Site-ID: An identifier for the site that all Edge devices in that site must agree upon.
  • OTV Shim: OTV specific header which is added to the OTV encapsulated data to clarify among others the vlan from which the frame originated (and must be restored to)

I might encounter other OTV specific terms, and if I do I will list them here.

Overlay Transport Virtualization basics

I’ve had the privilege of working with OTV on several occasions the last couple of years and I’ve fallen in love with the technology. The simplicity of the technology is striking, even though the configuration can be complex in some cases.

OTV is designed solely as a datacenter interconnect (DCI) technology, and as such it is very efficient in it. Even though other solutions are available based on newer technologies like VXlan (for example ACI) OTV still has its merits (I believe).

The goal of OTV is to extend layer 2 segments across multiple datacenters. In early years this was a no-go area for many companies, limiting their (virtual) environments to a single datacenter. This severely limited their options for scalability and robustness. The reason many companies did not want to extend their layer 2 segments across datacenters is that it introduced many risks for catastrophic failure. With an extended vlan between datacenters you risked broadcast storms or spanning tree misconfigurations to bring down your entire environment. To be honest, I’ve seen this happen on several occasions.

What makes OTV so special is that it enables you to extend your layer 2 segments without any danger. It is built in such a way that it creates multiple failure domains. It does this by blocking BPDU’s at the datacenter edge and using a IP transport network. The layer 2 frames are encapsulated in a GRE like manner and transported to the other datacenter (only when needed). There they are decapsulated and sent on their way.

OTV has several smart constructs to limit the amount of traffic between two datacenters, for example ARP suppression, in which ARP’s are handled locally when possible. Only if the system that needs to respond to the ARP is in another datacenter and is not yet known to the OTV edge switches will the ARP request be forwarded to the other datacenter(s).

There is a lot to tell about OTV and probably I will create more posts about OTV in the future.

1 2 3 5