Critical Bug in NSX – OSPF
Would like to share one of the critical bugs in NSX when OSPF is used as the routing protocol.
NSX supports dynamic routing protocols like BGP & OSPF (ISIS has been removed from the latest versions of NSX)
Dynamic routing protocols can be configured both in DLR Control VM & Edge Gateway device.
Edge Gateway device can be configured either in Active – Active scenario or in HA mode.
OSPF supports authentication-
- OSPF performs authentication at the area level.
- All the routers within the area must have the same authentication & the same password should be configured.
- For MD5 authentication to work, both the receiving and transmitting routers must have the same MD5 key.
Types of Authentication-
None: No authentication is required, which is the default value. Password: In this method of authentication, the password goes in clear-text over the network. MD5: This authentication method uses MD5 (Message Digest type 5) encryption. With MD5 authentication, the password does not pass over the network. An MD5 checksum is included in the transmitted packet.
When NSX Edge gateway is configured in HA (Active – Standby) mode & with MD5 authentication enabled, during the active edge gateway failure it takes longer time for the standby edge gateway to start forwarding the traffic. Even when graceful restart is enabled, OSPF fails to start gracefully. Adjacencies gets established only after the OSPF dead timer expires. Ex – If the OSPF Hello interval – 30 seconds & Dead Interval – 120 seconds, when the active edge gateway fails, it takes nearly 120 – 130 seconds for the standby edge gateway to start forwarding the traffic. OSPF Graceful restart does not work normally when OSPF authentication is set to MD5. MD5 sequence number should be maintained before & after graceful restart, but in this case the sequence number gets reset. OSPF adjacencies are deleted after NSX Edge HA failover. Adjacencies gets established only after the OSPF dead timer expires. In production environment, this is long downtime which will have huge impact.
This is known issue with the NSX & VMware has published the below Issue ID which is not yet resolved even in the latest version.
Issue 1747978: OSPF adjacencies are deleted with MD5 authentication after NSX Edge HA failover In an NSX for vSphere 6.2.4 environment where the NSX Edge is configured for HA with OSPF graceful restart configured and MD5 is used for authentication, OSPF fails to start gracefully. Adjacencies forms only after the dead timer expires on the OSPF neighbor nodes. Workaround: None
· Use authentication method as Password instead of MD5 authentication.