I would like to share a basic MPLS configuration example where I will explain how MPLS works and what are the different steps in order to configure it. This example is not related to MPLS layer 3 VPNs, this is just going to be the basic underlying logic of how an MPLS tunnel on its own works so I will neither talk about VRF nor MP-BGP which is the VPNv4 address family in order to exchange the customer routes and the MPLS VPN labels.
Let´s consider the following topology:
PE routers R2 and R4
P router: R3
CE routers R1 and R5
MPLS router functionality is divided into two major parts: The control plane and the data plane.
- At the control plane we have the following protocols in this topology OSPF, BGP and LDP. OSPF is the core IGP which is the base for building the EBGP session between R2 and R4 but also in order for LDP to associate a label to each learned OSPF prefixes in the core. LDP uses the LIB and the LFIB. The LIB is at the control plane and it is the database used to stored prefix/label binding advertised and received by LDP where an IP prefix is assigned a locally significant label that is mapped to a next-hop label that has been learned from a neighbor
- At the data plane we have the LFIB (Label Forwarding Information Base) and the FIB. The LFIB is used to forward packets based on labels. LDP adds labels to the FIB and the LFIB which means that when a PE router receives an unlabeled packet it will consult its FIB and send the packet as labeled based on the FIB content. On the other hand when a P router receives a labeled packet it will only use its LFIB to forward it.
Let´s enable MPLS on the R2, R3, R4 core interfaces. As soon as we enable MPLS each adjacent router discover each other through LDP (default ). From the debug mpls ldp bindings on R2 we can see the following output:
From the above output we can see that R2 is advertising the labels for all the prefixes it has in its routing table. All the connected routes of R2 like its loopback will be advertised with encoded label of 3. This label instructs upstream routers to remove the label instead of swapping it. What will be displayed in the routing table of the neighbor router will be “imp-null” rather than the value 3 or 1 for TDP. R2 also receives labels information from its LDP neighbor R3 where for example R3 advertises label 17 for prefix 126.96.36.199/32 which is the loopback of R4.
By default LDP peers will advertise a label/prefix binding for all the routes present in the routing table. We can see that from the above output because R2 is advertising LDP entries for networks that it has learned from R3 (for example 188.8.131.52/32 and 10.0.34.0/24).
Let´s look at the LFIB of R2 now and see which labels it has learned from R3:
10.0.34.0/24 prefix is associated with a Pop tag action which means that all traffic forwarded to this prefix should not be labeled by R2. R2 has been instructed to do that by R3 in the LDP labels exchange process (see the first output where R3 tells R2 that 10.0.34.0/24 should be associated with out label imp-null -> labels for: 10.0.34.0/24; nh 10.0.23.3, Fa0/1, inlabel 18, outlabel imp-null (from 184.108.40.206:0). The Pop tag action means that Penultimate Hop Popping mechanism is in use. PHP optimizes MPLS performance as it eliminates one LFIB lookup, in this case on R4. So R4 will only do a FIB lookup and not a LFIB lookup first and then a FIB lookup
But why prefix 220.127.116.11/32 isn´t associated with action Pop tag also. This prefix is the interface loopback of R3 so R3 should have advertised a label value of 3 (means imp-null) with prefix 18.104.22.168/32. But in this case the action is marked as Untagged. Let´s have a look at the LIB of R2 for this prefix:
There are 2 entries, one for the prefix 22.214.171.124/24 and one for the prefix 126.96.36.199/32. The loopback of R3 is actually configured with a /24. So what is happening is that LDP advertises a label for 188.8.131.52/24 as it is the only connected route present in the routing table for loopback 3 on R3. But the problem occurs when R3 advertise its loopback interface through OSPF to R2 as by default loopback interface in OSPF as advertised as /32 because they are treated as stub host by the OSPF process. We are facing a situation where there is discontigous in the control plane between OSPF and LDP. The end result of that is the action for 184.108.40.206/32 in the LFIB of R2 is marked as untagged because R3 has never advertised a label for 220.127.116.11/32 in the first place. To solve this issue the command ip ospf network point-to-point should be issued under the loopback of R3 which the result of R3 loopback being advertised as /24 now by OSPF. Now R2 shows a Pop tag action for 18.104.22.168/24 in its LFIB:
Now that we have MPLS connectivity we can test that a traceroute to 22.214.171.124 should show that we are labeling the packet to this destination with label 17.
If we look at a wireshark capture while doing the traceroute we can see that the original packet is hidden from the core as R3 will only see label 17 and not the original IP packet.
We can also note that MPLS as an ethertype of 0x8847. The bottom-of-stack field S:1 in the label header means that this label is the last label. If it was 0 it would have meant that there was another label afterwards which will be the case with Layer 3 MPLS VPNs. The wireshark output clearly shows that the MPLS label is inserted between the Layer 2 and the Layer 3 headers which give MPLS technology a lot of versatility as it is designed for use on virtually any media and layer 2 encapsulation.
Now that we have MPLS and IGP connectivity let´s configure the BGP connectivity (see my previous post on BGP over GRE for the configuration detail of EBGP):
R1 will EBGP peer with R2 and advertise its loopback 126.96.36.199/24
R2 will iBGP peer with R4
R4 will EBGP peer with R5 and R5 will advertise its loopback 188.8.131.52/24
Now that we have the BGP connectivity established let´s look at the bgp table of R2:
So from the above output we can see that R2 has learned the prefix from R1 (loopback 1) through EBGP and the prefix of R5 (loopback 5) through iBGP. So now that we have full connectivity (IGP,MPLS and BGP) we should be able to ping the loopback of R5 from R1 when sourcing traffic from the loopback of R1 without having the need of R3 knowing a route back to 184.108.40.206/24 or a route back to 220.127.116.11/24. First let´s look at the bgp table of R1:
Let´s try to ping 18.104.22.168 now:
R1#ping 22.214.171.124 source loopback 1
Success rate is 0 percent (0/5)
The ping is not going through. Although everything looks correct at the control plane level there is a failure in the data plane. So what is happening?
Let´s analyze the situation starting from R2. When R2 learned a route from its iBGP neighbors (in this case R4) it looks at the next-hop value and then correlates this with the MPLS LFIB. So if you look at the BGP table of R2 again:
It means that for the prefix 126.96.36.199/24 R2 will look for the label corresponding to the next-hop 10.0.34.4 in it LFIB:
From the output above we can see that there is no label associated with 10.0.34.0/24 as the action is Pop tag. So when a packet comes in and is going toward 10.0.34.4 R2 will not insert any label and it will forward the packet unlabeled out F0/1 towards 10.0.23.3 which is the next-hop (R3). The action is Pop tag because 10.0.34.0 is the direct connected link between R3 and R4.
The issue here is that the final destination (188.8.131.52) is exposed to the P router R3 which has no knowledge of the customer routes (CE) so R3 doesn´t know what do to with this unlabeled packet as it has no route for 184.108.40.206/24. Let´s do some debug on R3 to see what is happening when it receive this unlabeled packet. In order to see that the packet is drop by R3 we have to disable CEF as transit traffic is CEF switched. So we just have to disable CEF on the interface of R3 connecting to R2 by writing no ip route-cache.
From the output above we can see that not only R3 does not have a route to 220.127.116.11 destination but it also does not have a route back to R1. In the second line in the output R3 is actually trying to send an ICMP unreachable to R1 but as R3 has no route to R1, R3 is unable to route this.
The key point is that the BGP next-hop value is going to control what the label number is encapsulated when packets actually go towards the destination. That means that in general the BGP next-hop value should be pointing to the loopback interface of the remote PE router. In our example the next-hop value is pointing the connected links instead of the loopbacks of R2 and R4 which result in traffic black hole in the MPLS network as the label is PHPed one hop too soon.
The easiest way to fix this issue is to change the iBGP peering between R2 and R4 so that these 2 routers use their loopback address to peer to each other. Now that the peering has been changed let´s look at the BGP table of R2:
Now the next-hop for 18.104.22.168/24 is the loopback of R4 and not anymore the connected link between R4 and R3.
Let´try to ping again from R1 to the loopback of R5:
Let´s traceroute to 22.214.171.124:
As always I would like to challenge my readers and ask the following question: How is it possible that R1 can see hop 2? R3 has no routes to R1 as it is a P router.
Thanks for reading and your comments are more than welcome.