Basic QoS part 1 – Traffic Policing and Shaping on Cisco IOS Router

Home > QoS > Basic QoS part 1 – Traffic Policing and Shaping on Cisco IOS Router

Basic QoS part 1 – Traffic Policing and Shaping on Cisco IOS Router

September 19, 2012 Laurent Prat Leave a comment Go to comments

In this post I will talk about Cisco Router QoS and more particularly Traffic Shaping and Traffic Policing. I will describe and show how to configure Traffic Shaping and Traffic Policing using the legacy methods but also using the new methods. In this post I will neither talk about Frame Relay Traffic Shaping nor Frame Relay Traffic Policing which I will try to cover in another post. For now on I will use TS for Traffic Shaping and TP for Traffic Policing.

To illustrate the different examples in this post I will use the following topology:

IGP: EIGRP AS 10

Platform/IOS: Cisco 2691/12.4(15)T11 Adv IP services.

Side note: Please note that I am using IOS version older than 12.4(20)T so I will not talk about the new QoS model of Cisco which is called Hierarchical QoS (HFQ). I will only be using CBWFQ (Class-Based Weighed Fair Queuing) which is the previous version of MQC (Modular Quality of Service Command Line Interface) used by Cisco Router up to 12.4(20)T. Note that in the current CCIE R&S LAB v4.0 Cisco is using IOS image 12.4(15)T which uses CBWFQ as QoS model. Also in IOS version from 12.4(20)T and above as IOS image is using HFQ, many features differ from CBWFQ such as the queuing mechanisms, show outputs, etc.

Addressing: All the IP addresses are configured as shown on the diagram.

We will start to look at the legacy traffic shaping and policing methods in order to understand the different concepts of QoS when using TS and TP. Let´s start to talk about TS.

Legacy Generic Traffic Shaping or legacy GTS

Legacy GTS was the first IOS feature to limit outbound traffic rate leaving an interface or a subinterface. With this legacy feature traffic can be classified using an ACL. The CLI command has the following format:

traffic-shape rate bit-rate [burst-size [excess-burst-size ]] [buffer-limit]

Or when used with an ACL the command format is as follows:

traffic-shape group access-list bit-rate [burst-size [excess-burst-size]] [buffer-limit]

Legacy GTS uses WFQ (Weighted Fair Queuing) as the built-in traffic-shaping queue scheduler.

Side Note: Shaping is a way of delaying the outgoing traffic which is why a software queue is needed in order to place the delayed packet into this queue. I will try to cover the different QoS queuing mechanisms in another post.

If you look at the command format, the buffer limit, defines the maximum size of the WFQ buffer space that is to say how many packets can be hold in the queue. Before looking at the other arguments of both commands let´s dig into some bit of theory.

TS Slows traffic down as it is moving from the output queue to the transmit ring. The goal of TS is to “format” an outbound packet flow so that it conforms to a traffic contract agreed with the ISP. TS slows down the outgoing average bitrate resulting in a traffic flow consisting of uniformly spaced traffic bursts. ISPs traffic contract are usually verified using TP (Traffic Policing) using ingress policers. The burst values configured with TS should match the burst values used in the ingress policer at the ISP.

But why TS is needed:

For example when the customer´s interface physical rate, the Access Rate (AR) is higher than the contracted guaranteed rate by the ISP. Let´s say that a customer has contracted a guaranteed rate of 20 Mbps but the physical link to the ISP is 100 Mbps. That means the customer will always serialize packet at the AR rate without shaping and the ISP will always police the customer traffic flow at 20 Mbps which will result for example in bad throughput as TCP slow start may occurs. So the customer should shape to the contracted guaranteed rate offered by the ISP to avoid the disconnect between the customer physical AR and the ISP guaranteed rate.

Another case for using TS would be when a local side connection speed is higher than the remote side´s which means that the local site could overwhelm the remote site´s connection. For example, that would be the case in WAN hub-and-spoke design where the total sending rates of the spoke sites combined together could oversubscribe the hub´s site.

By configuring TS at the spokes we prevent the case where the HUB could be oversubscribed.

In order to slow down the output rate TS meter the traffic coming in the output queue and decide whether it exceeds the configured TS rate (target average rate). When traffic leaves an interface, packets are grouped into bursts separated by periods of silence. Theses burst are always sent at AR speed separated by pauses as illustrated in the figure bellow:

Source: INE

Let´s take the example of a Fast Ethernet interface. The serialization of this interface is 100 000 000 bps which is 100 000 bits per ms. The router is sending at AR speed all the time So if we were to send a 50 Mbps traffic flow over this interface, the router will still send at 100 000 bits per ms but there will be pauses between the bursts resulting in an average traffic rate of 50 Mbps. For example the router will send 100 000 bits per ms during 500 ms and then a 500 ms interface silence will occur.

TS delayed the bursts into the shaping queue when they exceed the desired average rate called the Committed Information Rate (CIR). In order to do that TS uses a token bucket model to determine if traffic exceed of conform to the configured CIR. Every time a packet tries to be dequeued to the transmit ring (hardware queue or Tx ring) TS compares the size of the packet to the current size of the token bucket. If the size of the packet is less or equal to the amount of credits in the token bucket the packet conforms and it is sent, otherwise the packet exceeds and it is delayed in the shaper queue which is in this WFQ (legacy GTS).

The amount of tokens that can be present in the token bucket are expressed in Bc (Burst Committed) Bytes and the token bucket is refilled with Bc every Tc (Time Commited) expressed in milliseconds. So the size of the token bucket is based on the CIR. If the CIR is 512 Kbps the size of the token bucket will be 640 Bytes if Tc=10 ms. So every 10 ms the token bucket will be refill with 640 Bytes. If the Tc value were to be larger the token bucket size will also be larger. If there are still some tokens in the bucket after a burst has been transmitted the next tokens added to the bucket at the next Tc interval will spilled over the bucket and will not be stored as illustrated by the following figure:

Source: INE

Every time a packet conforms to the average rate per Tc interval it is dequeued and Bc amount of Bytes equal to the packet size are taken from the token bucket and the packet is sent out the interface at AR. if not enough tokens the packet is queued into the shaper queue. So every Tc the shaper take Bc amount of Bytes from the shaper queue and send them at AR speed out the interface resulting in packets burst of almost the same size and separated by Tc intervals.

Side note: The Tc value defines how often the shaper runs

The Tc value is configured indirectly by changing the Bc and CIR values based on the formula:

Bc=CIR*Tc/1000

As we have seen before, if there are still tokens present in the token bucket when it is refilled at Tc interval, the new added tokens will spill over when the Max token bucket size is reached. That means that no more than Bc Bytes can be sent every Tc which can result in the shaper achieving less than the desired average rate over a second. Imagine that the scheduler has no traffic to send during a certain amount of time and then suddenly it has to send more than Bc Bytes in a Tc interval.

With the current token model it is not possible to send more than Bc per Tc interval. So to resolve this issue TS uses what it is known as a dual leaky token bucket with the first token bucket represented as Committed Burst (Bc) and the second token bucket as Excess Burst (Be).

The Be bucket is only filled if the Bc bucket has not been completely emptied in the previous interval. So if Bc is 10 bits and only 8 bits were sent the extra credit left over (2 bits) is moved to the Be bucket before Bc is refilled and in the next Tc the scheduler can dequeued Bc+Be amount of bits allowing to burst up to AR if using MaxBe.

The Bc bucket size is defined by the CIR and Tc. The Be bucket size is defined by the AR of the physical link since the packets are always serialized at this speed.

So the MaxBe size is defined by the following formula: MaxBe=(AR-CIR)*Tc/1000

That means that when the shaper is sending Bc+MaxBe per Tc interval it is transmitting at the AR.

Side note: As Be is populated due to a lack of Bc being used, the average sending rate over a second still never exceeds the CIR. By default if Be is not specified it is equal to Bc.

Alright! Enough with the theory. Let´s configure R1 with traffic shaping so only traffic going to R2´s loopback is shaped to 128 Kbps:

From the above configuration output, R1 is configured to shape at 128Kbps but only for traffic going to the loopback of R2. Also note that the BC value is 1280 bits or 160 Bytes resulting in the shaper running every 10 ms: So every 10 ms the shaper is taking 160 Bytes and sends them out the interface if the sending is trying to send more than 128Kbps of traffic.

CIR=128Kbps so we send 128000 bits in 1000 ms which is result in Bc= [10 (ms)*128000(bit)]/1000= 1280 bits sent in 10 ms. this result is achieved by applying the formula Bc=CIR*Tc/1000

In order for the shaper configured in R1 to take in action we will generate ping traffic from R4 and R5 with 160 Bytes packet. Now let´s look at the traffic shape queue on R1:

We can clearly see that the shaper is taken into action and both flows from R4 and R5 are present in the shaping queue as both flows are delayed in the WFQ queuing system because they are reaching the shaping rate limit which is 128 Kbps. As legacy traffic shaping is using WFQ for queuing method each flow has been assigned a weight which is based on the IP precedence present in the IP L3 header. In this case the IP precedence is 0 because we are using ICMP.

Each flow has 1 packet in the shaping queue which result in a total amount of 2 packets in the WFQ queue and the limit is 1204 packets which we set in the traffic-shape command before.

Side note: Note the size of the packets which is 174 Bytes although we sending ICMP packet of size of 160 Bytes. That is because WFQ take into account the Ethernet header overhead which is 14 Bytes (will be 18 Bytes if a VLAN tag was added).

In order to verify if we really shape at 128Kps we configure R2 to meter the traffic coming in its F0/0 interface destined to its loopback with a basic policy-map configured as follows:

So let´s check which rate we are achieving on R2:

So the achieved rate is 116Kbps which is much less than the traffic shaped CIR 128 Kbps. One explanation is that MQC doesn´t take into consideration L2 overhead which WFQ does. So to achieve a more precise shaping rate we could lower the packet size of the ping packet by a little more than 14 Bytes so there will be enough Bc Bytes in the token bucket every 10 ms when the shaper will run. So let´s ping with an ICMP packet size of 144 from R5 and check the result:

So the rate of 128Kbps is almost achieved. We could increment the Bc Bytes to the double to achieve an even more precise rate. So for example instead of 160 Bytes, we could use a Bc of 320 Bytes which will imply that the shaper will run every 20 ms.

Alright! Let´s see now how we use policing in order to do admission control and rate-limiting.

Legacy CAR for admission control and rate-limiting

Legacy CAR (committed Access Rate) is generally used to rate limit incoming traffic but also to do admission control (packet remarking). CAR does not buffer excess of traffic like legacy GTS so it doesn´t use any queuing mechanism. Instead the exceeding traffic can be remark to a lower IP precedence value for example or dropped.

CAR is usually used inbound at the ISP to enforce the CIR sold to the customer. CAR is similar to how traffic-shaping works but there are some key differences

The goal of TP (Traffic policing) is to meter the average speed by using a sliding “averaging time interval” Tc. This Tc is not the same as the one used in TS with legacy GTS which dictates how often the shaper has to run. With policing (Legacy CAR in this case) the Tc is used as a sliding window sliding across the packet line to measure if the amount of traffic already received during the current Tc plus the size of the new packet is less than or equal to the Bc (Committed Burst). So the larger the Tc the greater amount of averaging that is performed over the input rate. The key point about TP is to compare the average metered rate with the configured CIR. The average metered period is defined by the formula Tc=Bc/CIR. The following figure illustrates the sliding window method used by TP:

Source: INE

CIR for traffic policing is enforced by the fact that during each Tc interval, the amount of conforming traffic is no more than Bc.

Side note: TS produces uniforms packet burst separated by Tc interval while TP uses Tc to meter the amount of conforming and remark or drop exceeding traffic (more than Bc).

Like in shaping the depth of the token bucket is Bc Bytes so when a new packet of size X arrives, CAR check if it can borrow X bytes from the bucket. If it is ok, the packet conforms and X Bytes are subtracted form the bucket, if not the packet exceed.

Side note: In order to get good result with CAR the source traffic burst size should be a bit inferior to the burst size configured with CAR otherwise metering may be incorrect and too much packet will conform or exceed. That is why in an ISP/Customer environment The customer should use shaping with a CIR and Bc values in accordance with the service contracted with the ISP. So BC-CIR shaping customer should match or be less than Bc-CIR policing ISP.

Legacy CAR for admission control

So let´s configure legacy CAR on R2 (considered the ISP router in this scenario) for admission control with a CIR of 256Kbps (guaranteed rate by the ISP). Conform traffic going to the loopback interface of R6 is transmitted and marked with IP precedence of 6 while exceeding traffic is remark with an IP precedence of 0.

The command format for configuring legacy CAR is the following:

rate-limit {input | [output acl-index | [rate-limit] rate-limit-acl-index |dscpdscp-value qos-group qos-group-number]}burst-normal burst-max conform-actionconform-action exceed-actionexceed-action

So let´s configurethe following on R2:

We use a burst size of 4000 Bytes (Bc) which result in a Tc of: Bc=CIR*Tc/1000

à Tc=(Bc/CIR)*1000=(32000/256000)*1000=125 ms.

That means that the policer configured on R2 will measure average the rate received over a period of 125 ms and compare this average result ot the amount of Bc Bytes present in the bucket Bc. If the amount of Bytes measured over this interval is equal or less to 4000 Bytes, the traffic flow conforms and all conforming packets will be remarked with an IP precedence of 6, otherwise the traffic flow exceed and all the exceeding packets will be remarked with IP precedence of 0.

In order to test this scenario let´s configure R1 to shape traffic destined to the loopback of R6 with a CIR of 256Kbps and Burst of 4000 Bytes which correspond to the shaper running every 125 ms and effectively matching the ISP policer values.

Now let´s send a flood of ICMP traffic from R5 to the loopback of R6:

Let´s verify the traffic rate achieved on R2:

So we are effectively achieving the desired rate (256 Kbps). Now let´s check the policer statistic on R2:

So most of the packets are conforming. Some few packets are exceeding and this is due to small variations of inter-packet gaps. To resolve this we could use a higher Bc value on R2. Let´s check the received traffic on R6 and see if the admission control has been done correctly on R2:

So most of the packets are conforming to an average rate of 252 Kbps. Also note that the conforming packets are market with IP precedence value of 6.

Let´s try to increase the Burst value on R2 which will automatically increase the sliding window TC for a better averaging performed on the input packet rate. Let´s try a BC of 5000 Bytes instead resulting in a Tc of 156 ms.

So now, not a single packet exceeds which we can also confirm on R6 as all the packets received are now marked with IP precedence 6:

On the contrary, if we were to configured a lower Burst size on R2, the average rate monitored inbound will not be sufficient accurate and we will see many packets exceeding. So in order to achieve a correct metering the CAR burst size value should be greater than the Network burst.

Let´s talk of using Legacy CAR for rate-limiting.

Legacy CAR for Rate Limiting

CAR can be used for rate limit by setting the exceed action to drop. The challenge with CAR is to find the right burst value (Bc) when the incoming traffic is not already shaped. Cisco recommends a BC=1,5*CIR and then testing performance with a growing window. When we can predict the flow pattern, for example a uniform traffic pattern (2 packets separated by 1 Tc for example all the time) which could be produced by using TS as we have seen before, then it is easy to find the right Bc value and the average metering rate will be realistic, however with bursty traffic (packets sent in batches and space with silent time intervals) it is hard to know how long (which Tc) the router has to average for, should it be 200 ms, 500 ms? When Cisco recommends a Bc=1,5*CIR it will be good if you have a very large RTT like 1,5 sec, however in real scenario you may pick up the right Bc value empirically.

Side note: Actually a higher Bc will allow for unstable traffic rate while a smaller Bc is good for stable flows

Moreover CAR does not work well with TCP and may introduce TCP “global synchronization” problems where all TCP flows get globally synchronized and start with low window size at the same time ending up with a lower effective bandwidth. However to reduce this issue CAR uses the Be value (nothing to do with the Be value we have seen so far) to achieve a random drop on TCP flows similar to RED (Random Early Detection) avoidance behavior. If Be equals Bc, Be is disabled. Cisco recommends B2=2*BC.

So let´s simulate some HTTP traffic between R6 (HTTP server) and R5 (HTTP client) with R2 configuring to rate-limit to 256 Kbps with a Bc of 1,5*CIR=48 000 bytes which result in an averaging metering interval of 1500 ms.

So we can clearly see that many packets are marked as exceeded due to the burstiness of TCP and therefore the imprediction of the burst pattern.

Side note: In order for TCP to work well with CAR, TCP traffic should be pre-shaped before it is send out, then the CAR policer configured on the other end will meter the correct average rate based on the uniform TCP traffic burstinness achieved thanks to TS. This will avoid ending up with bad performance and lower effective bandwidth utilization.

Alright, now that we have seen the legacy methods both for TS and TP let´s look at the new methods which use a combination of MQC (Modular Quality of Service) and GTS for TS and CAR for TP.

MQC Class-Based Generic Traffic Shaping

In pre-HQF IOS the Queuing mechanism of MQC TS is still WFQ which means that each flow in the shaper queue is being associated a weight proportional to the IP precedence value located in the layer 3 IP header. But now the classification is based on class-map while before it was based on ACLs. So the classification is much more powerful and flexible as you can now use for example NBAR to define shaped traffic.

So let´s configure R1 to shape HTTP traffic going to R6´s loopback to 256Kbps. Other traffic like ICMP for example going to R6´s loopback should be shaped to 512Kbps. Let´s configure the following MQC class-based GTS on R1:

Now let´s generate HTTP traffic from R5 to R6´s loopback and ICMP traffic from R4 to R6´s loopback:

Let´s check the shaping queues on R1 (one WFQ queue per MQC class defined):

Side note: With HFQ QoS system this output above will not be available as HFQ use FIFO queuing mechanism for shaping.

The available bandwidth for each WFQ queue corresponds to the shaping rate configured. We can see that as with legacy GTS, MQC based GTS use the full packet size including the L2 overhead to limit the average rate. So for ping packets for example 14 Bytes of Ethernet header is added to the original packet size of 1000 Bytes for rate limiting calculation

And let´s verify the achieved rate on R2:

So R2 is considering a bigger HTTP packet size (60 Bytes) when measuring the HTTP traffic which I don´t yet why. We saw on R1 that the HTTP packet size measured is 54 Bytes, that is to say 40 Bytes of payload + 14 Bytes of Ethernet Header. That is also why we see a higher average rate (284 Kbps) on R2 than the configured shaping rate on R1 (256 Kbps) for HTTP traffic as R2 measured the average rate based on packet size of 60 Bytes and not 54 Bytes. As for ICMP the desired shaping rate is almost achieved (511 Kbps vs 512 Kbps).

Side Note: CBWFQ uses WFQ for shaping queue mechanism while HFQ uses FIFO queue

Side Note 2: Also note that in HFQ the class-default is not running WFQ anymore per default but FIFO instead. Also the class-default has 1 % minimum bandwidth guarantee and can use the remaining bandwidth if unused by the user class defined. That means that traffic in the class-default for HFQ will not starve as it was the case with CBWFQ because the weight allocated to the class-default traffic were much higher than the weight assigned to the user class defined traffic.

Alright, now let´s talk about the new method for TP in combination with MQC.

MQC Single-Rate Three-Color Policer

A Single Rate Three Color Policer or “srTCM” is the RFC-based implementation of the metering process.

Compared to CAR, now the Be value is used to represent accumulated credit for periods of inactivity and match perfectly with TS as srTCM uses now 2 token bucket. The second one represent Be. So if there are still credits in the token bucket Bc, this extra credits will go in the Be bucket, those achieving a more precise metering.

Three colors means that entering traffic is averaged over Tc and compared to the configured CIR and marked as follows with the following colors: Green (conform), Yellow (Exceed), Red (Violate).

To illustrate this method let´s configure the following on R2:

R2 will police ICMP and HTTP traffic to the CIR agreed which is 256Kbps for HTTP and 512Kbps for ICMP. Any ICMP conforming traffic (average rate measured over Tc<=Bc) will be transmit and remarked with IP precedence of 4 while any conforming HTTP traffic will be remarked with IP precedence of 6. Any exceeding ICMP traffic will be dropped and any HTTP exceeding traffic (average rate measured over Tc >Bc) will remarked with IP precedence 1.

R1 will shape as we did before with the same CIR values and a Bc for each class of 16000 bits to match the policer values of R2.

R6 will measure the different average rate achieved for ICMP and HTTP. So let´s generate ICMP traffic from R4 and HTTP traffic from R5 as before and let´s check the policer on R2:

So ICMP traffic is not exceeding at all and HTTP traffic is exceeding as we saw in the previous example. Let´s check the result on R6:

Let´s now talk about the last policer method for CB policing:

MQC Dual-Rate Three-Color Policer

Dual rate means that the policer meters the traffic using two token buckets at the same time and each token bucket is refill independently. One bucket represents Bc and the other Be. trTCM (tow-rate three color marker) is generally used in ISP for oversubscription scenarios.

Dual rate allow sustained excess bursting while with single rate excess is allowed but until Be tokens are present in the Be bucket (so only if Bc spillover Be). Be bucket does not rely on spillage when filling the Bc bucket.

The first rate represents the CIR and the second rate represents the PIR (Peak Information Rate) where customer is allowed to burst above CIR up to PIR but with no traffic guarantee. The configuration is almost the same as before but you have to add the keyword PIR after the CIR, so for example if we wanted to police with a peak rate of 1024Kbps and a CIR of 512Kbps we will use the following command:

Traffic exceeding will mean that the average metered rate will be superior to CIR but inferior to PIR (superior to Bc but inferior to Be) and traffic violating, the average rate will be superior to PIR (Bc+Be).

With the single rate policer it was only possible to meter with one Tc as there is one token bucket while here two Tc average intervals are used to meter both rates (CIR and PIR).

This Dual-Rate policer fits perfectly with MQC peak shaping.

MQC Peak Shaping

Peak shaping is using two token bucket (Bc and Be) and allows to send Bc+Be at each Tc interval. It is generally used in oversubscription scenarios and fits perfectly with the dual rate policer. Configuration wise the following command format is used:

Shape peak <CIR> <Bc> <Be> and by default PIR=2*CIR and Be=Bc if your omit to specify Be.

So I will stop here regarding QoS TP and TS on Cisco router.

Thanks for reading.

/Laurent

Comments (14) Trackbacks (0) Leave a comment Trackback

Daniel

February 8, 2013 at 17:58

Reply

Great post in helping me understand the concept, better explained than those QOS books.
- Laurent Prat
  
  May 30, 2013 at 12:04
  
  Reply
  
  Thanks a lot Daniel!
  /Laurent
Steve

April 10, 2013 at 04:37

Reply

This post is amazing, saved me hours and $$$
- Laurent Prat
  
  May 30, 2013 at 12:04
  
  Reply
  
  Thanks Steve!
  /Laurent
Jey

May 30, 2013 at 11:37

Reply

Very Good article, explaining the differences. Great help !!
- Laurent Prat
  
  May 30, 2013 at 12:03
  
  Reply
  
  Thanks!
Khalid

September 6, 2013 at 22:25

Reply

Laurent,

Thanks alot for this article which really felt like getting more knowledgeable on what’s going on with routers.

1 question please. I wanna know really on how to reduce my latency which can make me pc running smooth and feel faster in playing online game for example Tera Europe. i am playing from middleast country and the game server located at Germany.
above that, i am using fiber optic connection 10mb d/l; 1mb u/l ( with cisco router).
so is there a way to route out my latency to make it lower?.
Many thanks in advance,
Khalid
david

September 30, 2013 at 10:02

Reply

great post laurent, i don’t know how should i thank you for these valuable info, went through them all
- Laurent Prat
  
  October 7, 2013 at 15:04
  
  Reply
  
  Thanks, glad it helped.
  
  Regards,
  Laurent
DuyLinh

November 16, 2013 at 15:12

Reply

HI
i have a question for you . suppose shaping occur. a packet with 1000 byte (8000 bit) come into router but router have only 2000 bit for Bc and 2000 bit for Be. what happen with router ? Please help to understand about it. Thank you
prashantm108

March 19, 2015 at 09:29

Reply

thanks for your post
Vadim

July 14, 2015 at 11:00

Reply

Hi,
could somebody tell me Why Legasy Shaping, Legacy CAR is called Legacy?
Tunrayo

August 26, 2015 at 12:03

Reply

Nice post, but I have a question – just to confirm my understanding….

To go back to description of AR say for a Fast ethernet port (100Mbps). If you apply TS as you describe with a CIR of 128 Kbps and Tc=10ms. This means you are sending 1280 bits every 10ms.

The question is, will this 1280 bits be sent/serialized at 100Mbps? Meaning the duration of the transmission will be 12.8 micro seconds.

So every assuming there is a constant flow of 128kbps, this means every 10ms there will be a pulse of data with duration 12.8 micro seconds.
Arun

January 15, 2016 at 06:10

Reply

I have a question. Does TC function same both in Shaping & policing .