Home > BGP > BGP over GRE

BGP over GRE

While I was doing some BGP labs I came across an interesting topic which is BGP over GRE. Using automatic tunneling techniques along with BGP is the core of MPLS VPNs and I think it is worth seeing the effect of using manual tunnels along with BGP. Let´s have a look at the following topology which illustrates my example:

The core IGP is EIGRP while BGP is used between AS 100 and AS 200 to advertise both loopback IP. Only router R5 and R1 are running BGP. For the BGP peering R1 IP 100.1.0.1 and R2 IP 200.1.0.5 will be used. When looking at a traceroute output from R1 to R5 the number of hops showed is 3 which means that we will have to use either ebgp multihop or ttl security as EBGP TCP packets are sent with a TTL 1 which means that EBGP peers are supposed to be directly connected. In this example I will use ttl security which while having the same effect as ebgp multihop adds some security. Let´s build the EBGP session between R1 and R5:

Now we can see the EBGP session is up between R1 and R5 with R1 being the TCP client and R5 the TCP server: Here is a debug ip tcp transactions+debug ip bgp on R1:

Before any connection attempt is made, the BGP peer relation must have left the idle state and entered the Active state. Active state means that the remote router becomes reachable on a directly connected interface. Afterwards, the BGP process tries to establish a TCP session towards peer neighbor on port 179.

Once the TCP three way handshake is made (SYN[R1], SYN ACK[R5] and SYN[R1]) R1 and R5 send an BGP open message to each other containing the AS number, the BGP RID, BGP version, Hold time and optional parameters such as capabilities advertisement (for example route refresh capability used for soft refresh, or session authentication). Once the routers have sent the open message their BGP state goes from OpenSent to OpenConfirm. If both EBGP peers are agree on the different parameters stored in the open message their send each other a keepalive packet to signal this acceptance and the state is then Established (neighbor coming UP).

Now that the EBGP session is up between R1 and R5 let´s advertise their respective loopback in BGP:

If we look at R5 and do a debug ip bgp update then we see the following:

From the above output we can see that R5 install the prefix (30.30.30.0/24) advertised by R1 in its routing table as the recursive lookup for the next-hop succeeded. The routing process in this case has found an outgoing interface for 100.1.0.1.

 We can also see that R1 is sending some BGP attributes with the prefix. These attributes are called mandatory well-known attributes and must be present in all updates messages. These attributes are: origin, AS-path and Next-hop. In this case we can see the metric also but this attribute is called an optional Nontransitive attribute which means that this attribute should not be present in all updates and if it is not recognized by a neighbor the partial bit is set to indicate that the attribute was not recognized.

The same advertisement process is done on R5 and the prefix 20.20.20.0/24 is advertised to R1.

So let´s look at the BGP table of R1:

From the output above we can see that R1 has 2 network entries which are the one originate by R1 and the one advertised by R5 with a next-hop of 200.1.0.5. We can also see that the prefix advertised by R5 comes from AS 200 and it is seen as external which means that it has been learned by EBGP.

Now from a BGP point of view everything looks fine but what will happen when for example R1 is trying to ping the loopback of R5 20.20.20.20. Let´s try:

The packet is not going further than R2. We can tell that because R1 is receiving an ICMP type 3 (destination unreachable) from R2 (100.1.0.2). R2 sends this ICMP paccket because it is not able to find a next-hop for 20.20.20.20 so the routing process fails as there is no next-hop for this destination. Here is the output from R2 that illustrate the issue:

So why is that happening? Well the answer is that The IGP has no knowledge of the loopback of R1 and R5 and these are only advertised in BGP which result in prefix black-holing

So what is the solution in this case? Tunneling! By tunneling the packets between the loopbacks the non-BGP devices will never ever notice those packets and therefore will the unknown address 30.30.30.30 and 20.20.20.20 be hidden from the core network.

So let´s configure a tunnel on R1 and R5:

R1

R5

In order to send the traffic destined for the loopbacks into the tunnel the next-hop must be changed to point to the tunnel IP address of both BGP peers. In this example let´s configure two route-maps on R1.

  • The first route-map will be used in inbound direction to change the next-hop of the advertised prefix of R5 (30.30.30.0/24) to point to the tunnel IP of R5.
  • The second route-map will be used in outbound direction to change the next-hop of the advertised prefix of R1 (20.20.20.0/24) to point to the tunnel IP of R1.

Let´s apply both route maps:

After a clearing of the BGP session the BGP table of R5 and R1 are now updated with the new next-hop:

So now we can actually ping both loopbacks:

R1

That is it! A quick challenge question to my readers: Why is the traceroute showing only one hop?

/Laurent

Advertisements
Categories: BGP Tags: , ,
  1. February 26, 2012 at 06:36

    Reblogged this on The CCIE journey and commented:
    Good stuff here..

  2. February 26, 2012 at 12:32

    Hi Amaseghe,

    Thanks for your interest in my blog;-) I am glad that you found this topic interesting.

    Best of luck with your journey.

    /Laurent

  3. Gustavo
    June 21, 2012 at 17:43

    Salut Laurent !!!

    Je viens de lire votre travail et je l´ai trouvé magnifique, merci pour ça. J´aimerais savoir quand vous allez écrire l´autre post sur le PIM dense mode, parce que j´ai déjà lu le dernier sur Dense Mode, très bien expliqué.

    Merci pour l´attention et pardon mon mauvais français kkkkk.

  4. Rob
    June 3, 2013 at 05:23

    Thanks so much for this! I was presented with doing this scenario at work at this was the answer I was looking for. Thanks again!

  5. October 1, 2013 at 13:11

    Hi Laurent,
    Great post.
    There is an overhead when using a GRE tunnel – 24 bytes I believe. The problem I see is that in the real world traffic is sometime sent with the DF bit set, which means traffic would be dropped going through the GRE tunnel unless you are able to ‘up’ the end to end MTU (which is mostly set to 1500 on the Internet). You could use TCP adjust-mss but that is not ideal. Any thoughts on that?

  6. Amir Aziz
    December 29, 2013 at 10:59

    Hello,

    I want through your config and I really admire the effort and the great info you have provided. However, I managed to achieve the same result with both routers by simply creating a tunnel between the two devices (just as you did) and peered the routers through these tunnel interfaces. Naturally, the link came up and the the routers peered perfectly fine and also the next hop values changed to the IP address of the tunnel interface for the advertised prefix.

    The only caveat that I see in this design is the GRE tunnel may not necessary be formed through the same underlying IGP path as well as traffic forwarding maybe different. But…is that really a problem?

  7. Lloyd
    March 9, 2014 at 09:48

    it is treated as directly connected that is why the traceroute only shows 1 hop away. Instead of using route-map to change the next hop, we can simply route the BGP PEERING IPs of both R1 and R5 to the Tunnel 0 right?

  8. VJ
    May 26, 2014 at 10:23

    Answer to Laurent’s question:

    Why is the traceroute showing only one hop?

    Answer: Because the moment the traffic gets into the tunnel , it will look for the peer tunnel endpoint no matter what physical path it takes hence one single hop.

  9. February 6, 2015 at 02:26

    It’s seen as one hop because the entire gre traffic (the traceroute) is tunnelled across the core. All the core sees is traffic across the loopbacks which it knows whereas the outer ip which is the only header the core looks out is actually carrying ‘secret’ gre traffic.
    Gre tunnels are one hop away from their peers.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: