VXLAN in NSX-V - Troubled Underlay

Greetings, and first a little lyrics. I sometimes envy colleagues working remotely - it’s great to be able to work from anywhere in the world connected to the Internet, holidays at any time, responsibility for projects and deadlines, and not being in the office from 8 to 17. My position and work responsibilities practically exclude the possibility of a long absence in the data center. However - interesting cases like the one described below occasionally happen, and I understand that there are few positions where there is such scope for creative expression of an internal troubleshooter.

A small disclaimer - at the time of writing, the case has not been completely resolved, but given the speed of the vendors' response, a complete solution may take another months, and I want to share my findings right now. I hope, dear readers, you will forgive me this haste. But enough water - what about the case?

First introductory: there is a company (where I work as a network engineer) that hosts client solutions in VMWare's private cloud. Most of the new solutions connect to the VXLAN segments that are controlled by NSX-V - I will not evaluate how much time this solution gave me, in short - a lot. I even managed to train my colleagues to configure NSX ESG and small client solutions are deployed without my participation. An important note is the control plane with unicast replication. Hypervisors are redundantly connected by two interfaces to different physical Juniper QFX5100 switches (assembled in Virtual Chassis) and the timing policy route based on originating virtual port is for completeness.

Client solutions are very heterogeneous: from Windows IIS, where all web server components are installed on one machine to quite large ones - for example, load-balanced Apache web fronts + LB MariaDB in Galera + server-balloons synchronized using GlusterFS. Almost every server needs to be monitored separately, and not all components have public addresses - if you have encountered this problem and have a more elegant solution, I will be glad to advise.
My monitoring solution consists in “connecting” the firewall (Fortigate) to each internal client network (+ SNAT and, of course, strict restrictions on the type of allowed traffic) and monitoring the internal addresses - in this way a certain unification and simplification of monitoring is achieved. Monitoring itself comes from a cluster of PRTG servers. The monitoring scheme is something like this:

image

While we operated only with VLAN, everything was quite usual and predictably worked like a clock. After the implementation, NSX-V and VXLAN faced the question - is it possible to continue monitoring in the old way? At the time of this question, the fastest solution was to deploy the NSX ESG and connect the VXLAN trunk interface to the VTEP network. Quick in quotes - since using the GUI to configure client networks, SNAT and firewall rules can and unifies management in a single vSphere interface, but in my opinion it is rather cumbersome and, among other things, limits the set of tools for troubleshooting. Those who used the NSX ESG as a substitute for a “real” firewall, I think, would agree. Although, probably, such a solution would be more stable - after all, everything happens within the framework of one vendor.

Another solution is to use the NSX DLR in bridge mode between VLAN and VXLAN. Here, I think everything is clear - the benefit of using VXLAN is corny - because in this case, you still have to pull the VLAN to the monitoring installation. By the way, in the process of working out this solution, I ran into a problem when the DLR bridge did not send packets to the virtual machine with which it was on the same host. I know, I know - in books and guides on NSX-V it is explicitly stated that a separate cluster should be allocated for NSX Edge, but this is in the books ... Anyway, after a couple of months with support we did not solve the problem. In principle, I understood the logic of the action - the hypervisor core module responsible for VXLAN encapsulation was not activated if DLR and the monitored server were on the same host, since the traffic does not leave the host and, logically, it should be connected to the VXLAN segment - encapsulation is not needed.With support, we settled on the virtual interface vdrPort, which logically combines uplinks and it also performs bridging / encapsulation - there it was noticed a mismatch in the incoming traffic, which I took to work out in the current case. But as it was said, I did not finish this case to the end, as it was transferred to another project and the branch was initially dead-end and there was no particular desire to develop it. If I am not mistaken, the problem was observed in versions NSX and 6.1.4 and 6.2.since it was transferred to another project and the branch was initially dead-end and there was no particular desire to develop it. If I am not mistaken, the problem was observed in versions NSX and 6.1.4 and 6.2.since it was transferred to another project and the branch was initially dead-end and there was no particular desire to develop it. If I am not mistaken, the problem was observed in versions NSX and 6.1.4 and 6.2.

And here - bingo! Fortinet annonsiruet native support VXLAN . And not just point-to-point or VXLAN-over-IPSec, not software-based VLAN-VXLAN bridging - they started to implement all this since version 5.4 (and presented by other vendors), but real support for unicast control plane. When implementing the solution, I ran into another problem - the checked servers periodically “disappeared” and sometimes appeared in monitoring, although the virtual machine itself was still alive. The reason it turned out was that I forgot to enable Ping on the VXLAN interface. In the process of rebalancing the clusters, the virtual machines moved, and vMotion ended with Ping to indicate the new ESXI host to which the machine moved. My stupidity, but this problem once again undermined the credibility of the support was produced - in this case Fortinet. I’m not saying that every case related to VXLAN starts with the question “where is the VLAN-VXLAN softswitch in your settings?” This time I was advised to change the MTU - this is for Ping, which is 32 bytes. Then "play around" with tcp-send-mss and tcp-receive-mss in the policy - for VXLAN,which is encapsulated in UDP. Fuf, I'm sorry - it’s boiling. In general, I solved this problem on my own.

Having successfully rolled back the test traffic, it was decided to implement this solution. And in the production it turned out that after a day or two, everything that is monitored via VXLAN gradually falls off altogether. Deactivating / activating the interface helped, but only temporarily. Mindful of the slow-moving support, I went into Troubleshoot for my part - in the end, my company, my network is my responsibility.

Under the spoiler is the course of troubleshooting. Who is tired of letters and bragging - skip and go to the postanalysis.

Trouble Shooting
, — !

, , . . , Fortigate 5.6+, «diagnose debug flow» — . . , , RFC1918, . VXLAN ...15, ...254, VTEP.

VXLAN- . overlay ARP OVSDB, underlay ARP CAM. Fortigate VXLAN FDB OVSDB. :

 fortigate (root) #diag sys vxlan fdb list vxlan-LS
mac=00:50:56:8f:3f:5a state=0x0002 flags=0x00 remote_ip=...47 port=4789 vni=5008 ifindex=7

— MAC VTEP ...47. ESXI , , MAC , VTEP . CAM/ARP — ESXI :

fortigate (root) #get sys arp | grep ...47
...47 0 00:50:56:65:f6:2c dmz

— ? Juniper' — , — VLAN VTEP . DLR-, VDR — ESXI , VMWare. MAC «97:6e» , vmnic1 — , VTEP ...47 "--dir 2":

pktcap-uw --uplink vmnic1 --vni 5008  --mac 90:6c:ac:a9:97:6e --dir 2 -o /tmp/monitor.pcap

image

— ARP . ARP . , ...15 — ICMP ? , . , ( teaming policy), vNIC , , :

pktcap-uw --uplink vmnic4 --vni 5008  --mac 90:6c:ac:a9:97:6e --dir 2 -o /tmp/monitor.pcap

image

, . . — — VDR, . , , , . «» Ethernet underlay. - MAC VTEP IP. , , — , . ARP , . Ethernet :

fortigate (root) #get sys arp | grep ...47
...47 0 00:50:56:65:f6:2c dmz
fortigate (root) #get sys arp | grep ...42
...42 0 00:50:56:6a:78:86 dmz

So, what do we have in the end - after the migration of the virtual machine, the fortigate tries to send traffic to VTEP from the (correct) VXLAN FDB, but it uses the wrong MAC DST and the traffic is expected to be discarded by the receiving hypervisor interface. Moreover, in one case out of four, this MAC belonged to the original hypervisor, from which the migration of the machine began.

Yesterday I received a letter from Fortinet tech support - they opened a bug 615586 on my case. I don’t know how to rejoice or grieve: on the one hand, the problem is not in the settings, on the other, the fix will only come with an update of the firmware, the following, at best. The FAQ also heats up another bug that I discovered last month, though at that time in the HTML5 GUI vSphere. Well, the local QA department of the vendors is direct ...

I’ll venture to suggest the following:

1 - multicast control plane most likely will not be subject to the described problem - after all, the VTEP MAC addresses are obtained from the IP address of the group to which the interface is subscribed.

2 - most likely the problem of fortics in offload sessions on Network Processor (approximately analogous to CEF) - if each packet is passed through the CPU, tables containing the correct information — at least visually — will be used. In favor of this assumption is what helps to close / open the interface or wait a while - more than 5 minutes.

3 - changing the teaming policy, for example, to explicit failover, or introducing the LAG will not solve the problem, since the MAC stuck on the original hypervisor in encapsulated packets was observed.

In light of this, I can share what I recently discovered a blogwhere in one of the articles it was stated that stetfull firewalls and cached methods of data transfer are crutches. Well, I’m not so experienced in IT to say that, and I don’t agree with all the statements in the blog articles. However, something tells me that there is some truth in Ivan's words.

Thank you for attention! I will be glad to answer questions and hear constructive criticism.

All Articles