Firebind Blog

Keeping ISPs honest around the world...

Constant Bit Rate Streaming - Practical Implementations

Constant Bit Rate (CBR) streaming is transmitting data at a constant rate (measured in bits per second). CBR is an important concept for dealing with data compression and network-based concerns such as streaming media. Of particular importance for real-time media is a network’s ability to support a data rate that allows for high quality voice and video. Network quality can vary widely and poor quality can cause packet loss and congestion issues that impact voice and video.

Firebind employs UDP CBR streaming as part of simulating a variety of real-time traffic such as those for voice and video (e.g. VOIP, MPEG4). There are various approaches for simulating a CBR stream including bucket algorithms and packet scheduling (including NS3 1). A few approaches will be examined here. An example implementation for each approach (used for obtaining results) is available on GitHub 2.

Send and Compare

A simple approach is to send data and compare the instantaneous rate to the target rate. If the instantaneous rate is lower than the target then just keep sending. This is the way the popular iperf measurement tool works 3. Here is the basic pseudocode:

  start_time_nanos = now()
  while (not done) {
    rate = byte_count * 8 / (now() - start_time_nanos) / 10^9
    if (rate < target_rate) {
      byte_count = send(datagrams)
    }
  }

Pseudocode for a Send and Compare approach (used in iperf3)

This approach doesn’t make good use of the CPU as many iterations of the loop won’t send data, especially for lower data rates. So this may spike utilization on a single core for the duration of the test. Additionally the rate calculation doesn’t account for any UDP or IP header byte counts.

Rate (bps) Datagram Size (bytes) Datagrams/cycle Result (bps) Accuracy
10,000 200 1 10,161 98.39%
1,000,000 200 71 1,011,406 98.86%
100,000,000 200 71 100,015,791 99.98%
1,000,000,000 200 71 726,879,468 72.69%
1,000,000,000 1200 13 938,451,573 93.85%
10,000,000,000 1200 13 938,829,083 9.39%

Results for Send and Compare CBR approach

Some notes about these results:

  • These were obtained by taking the middle value of 3 runs of the CBR transmitter each of which had an overall duration of 10 seconds.
  • The result column is the rate observed at the receiver. The actual transmit rate as measured by the transmitter is almost always higher (especially for UDP).
  • The accuracy column is how close the receiving rate matches the transmitting rate.
  • The maximum efficiency of a Gigabit Ethernet link is around 94% 4. Note that the receiver-side rate measurement for 1200 byte datagrams is approaching this value.
  • Results for 1 Gbps with 200 byte datagrams are less accurate than for 1200 byte datagrams, likely because 1200 bytes results in better utilization of the MTU for my setup.
  • Results for 10 Gbps are understandable given the Gigabit link is only capable of 1 Gbps.

Send and Sleep

Another approach is Send and Sleep, where we send a calculated number of datagrams then pause (sleep) just long enough such that the entire duration of the send and sleep operations equal the period of one cycle of the corresponding data rate (i.e. the frequency). The algorithm, pseudocode, and results will be presented here.

This approach requires calculating a period associated with a data rate given a specific (possibly variable) amount of data being sent each cycle. Mathematically the period is the reciprocal of the frequency (i.e. our data rate). Naturally the data rate cannot be inverted to obtain a usable period because it is not practical (nor possible) to send 1 bit per cycle across the network (UDP headers are at least 8 bytes). Moreover, a longer period time is desirable for this approach so timer resolution has less of an effect on the pause (sleep) timing.

Given a desired data rate, datagram payload size, and datagrams per cycle we can calculate a period (cycle time) as such:

$$bits\ per\ cycle = {8 * datagrams\ per\ cycle * datagram\ payload\ size\ in\ bytes}$$

$$period = {bits\ per\ cycle \over data\ rate\ in\ bps}$$

An example calculation for 10k bps, 1 datagram per cycle, and a 200 byte datagram would be 1600 bits per cycle (8*1*200) and a period of 0.16 seconds (1600/10000). A practical improvement here would be to represent the IP and UDP headers in this calculation, yielding a period of 0.18 seconds.

So for each cycle perform the sending of datagrams and then pause (sleep) for the remaining time in the cycle. The pseudocode looks like this:

  datagrams_per_cycle = 1 
  bits_per_cycle = payload_size * datagrams_per_cycle * 8
  period = bits_per_cycle / data_rate
  while (not done) {
    send_start_time = now()
    send(datagrams)
    send_duration = now() - send_start_time
    sleep(period - send_time)
  }

Pseudocode for send and sleep approach

The inputs for this approach are:

  • Data Rate - Rate to perform CBR stream, in bits per second (bps).
  • Payload Size - Size of UDP datagram payload in bytes. Typical values are below the MTU and chosen based on the source of the traffic. Some typical VOIP payloads average around 200 bytes.
  • Datagrams Per Cycle - The number of datagrams to send each period.

Not shown above are possible inputs that would bound the CBR streaming activity. An overall duration or maximum byte count could be employed to stop transmission. In practice a loop like this would be fed by a queue or buffer with content ready to transmitted (e.g. from a packet generator or input device driver).

The Datagrams Per Cycle and Payload Size inputs drive the period sizing by determining the bits per cycle to send. A small bit count here ensures a desired small period for slow data rates (in the 10k bps range). For example, a 10k bps data rate with 16K bytes worth of datagrams per cycle results in a 13 second period (8 * 16K bytes / 10k). This is not so smooth if you only want to perform the entire CBR transmission for 10 seconds. So depending upon the traffic pattern a maximum period value may be in order. The average period should be below a second for VOIP applications, although there could be cases with limited computing power that may require larger periods and longer overall testing times.

For this example and corresponding test results I have used a calculation that has a maximum period value of 250ms. Increasing this value to fill the socket send buffer will yield only mildly better results for higher data rates. To use a 16K byte socket buffer (a popular default on Linux) you can do this:

$$datagrams\ per\ cycle = {16384 \over (payload\ size + 20 + 8)}$$

This is only an estimate as the size of the IP and UDP headers can vary (represented by the 20 and 8 above). The example code on GitHub uses a more sophisticated calculation that puts a ceiling on the period for small data rates.

The values derived from these inputs are:

  • Bits Per Cycle - The number of bits to be sent per cycle.
  • Period - The average duration a cycle should take to achieve data rate.

As mentioned, an example implementation for UDP is available on GitHub 2. Using the example implementation along with a laboratory testing rig comprised of two 64-bit Ubuntu 14.04.4 LTS machines (kernel version 3.13.0-86), one of these being the transmitter and the other the receiver. These both have Gigabit network interfaces connected by a ZyXEL GS1100 Gigabit switch. The receiver was configured with a UDP listener that measures the incoming data rate (receiver is available in the example implementation). No special configuration of these components was employed.

Given this, I have tested a few data rates and recorded the performance:

Rate (bps) Datagram Size (bytes) Datagrams/cycle Result (bps) Accuracy
10,000 200 1 10,161 98.39%
1,000,000 200 71 1,011,399 98.86%
100,000,000 200 71 99,934,183 99.93%
1,000,000,000 200 71 726,826,893 72.68%
1,000,000,000 1200 13 939,121,731 93.91%
10,000,000,000 1200 13 939,231,796 9.39%

Results for the Send and Sleep CBR Approach

Some notes about these results:

  • Close match to the Send and Compare approach. No improvement is evident.

  • The Datagrams Per Cycle number was calculated using a formula involving a maximum period constraint.

  • The maximum efficiency of a Gigabit Ethernet link is around 94% 4. Note that the receiver-side rate measurement for 1200 byte datagrams is approaching this value.

  • Results for 1 Gbps with 200 byte datagrams are less accurate than for 1200 byte datagrams, likely because 1200 bytes results in better utilization of the MTU for my setup. Results for 10 Gbps are understandable given the Gigabit link is only capable of 1 Gbps.

This is of course a simplification of the algorithm. Depending upon the sleep mechanism this approach could result in better CPU utilization. For a practical implementation of this there are some things to consider:

  • Timestamp Resolution - A timestamp that is small enough for larger data rates. For example a 100M bps rate with 16K bytes worth of datagrams per cycle results in a 1.3 ms period (8*16K/100M). A timer resolution below a millisecond is required here. Rates in the area of 100 Gbps will require nanosecond resolution.

  • Sleep Function - How accurate is the sleep function from low nanosecond times up to seconds?

  • Datagrams Per Cycle - How to calculate the amount of data to send per cycle.

  • Buffer Effects - sizing application and socket buffers to optimize transmit at target data rate.

Other Thoughts

Some things to consider here are:

  • There are several buffers that a CBR stream will encounter along its way to the destination. There is not much control to be had on how these will affect your data rate as many of them are off board and located on ISP and backbone network gear.

  • Socket buffer sizing can sometimes be adjusted either by the application or OS settings. Be sure to consider both socket buffers: the send buffer on source machine and the receive buffer on target machine. Typical defaults are in the neighborhood of 16k bytes and so the examples here are based on that.

  • Datagram size and how many datagrams are being sent at once (application buffering) can influence a buffers decision to flush now or later.

  • Memory models for any virtualized machine or layer can impose their own buffering (e.g. Java’s non-direct ByteBuffer or other machine and software virtualization layers).

  • Variations in datagram payload size are closer to real life traffic. Implementing a variable payload size may better simulate actual traffic patterns.

  • Other factors on the network include QoS mechanisms such as class of service, differentiated services (DSCP), and various router configurations.

Conclusion

Firebind employs a similar mechanism for the synthetic voice testing. When combined with the Firebind’s Protocol Script technology the CBR stream can carry any payload such as G.711 encoded voice. This approach results in network traffic indistinguishable from a real VOIP call. As part of a continuous monitoring program the Firebind system performs packet counting, rate recording and error handling to accurately measure network properties such as packet loss, latency, jitter, and other quality metrics.

References

  1. Network Simulator 3 - https://www.nsnam.org/ns-3-26/ (see onoff-application.cc OnOffApplication::ScheduleNextTx() - schedules next packet on a size divided by rate basis).

  2. Example implementation of this algorithm can be found on GitHub https://github.com/scancmdr/constant-bitrate-transmitter

  3. iperf3: A TCP, UDP, and SCTP network bandwidth measurement tool https://github.com/esnet/iperf (see iperf_ check_ throttle() in iperf_api.c).

  4. Rickard Nobel, "Actual throughput on Gigabit Ethernet" http://rickardnobel.se/actual-throughput-on-gigabit-ethernet/