Wednesday 26 February 2014

Traffic control for multimedia devices


Multimedia devices traditionally don't manage the network bandwidth required by applications. This causes a problem when users try to watch a streaming video or listen to a web radio seamlessly while other applications are downloading other content in the background. The background downloads can use too much bandwidth for the streaming video or web radio to keep up and users notice unnecessary interruptions in the playback.

I have been working on an approach to improve this using traffic control on Linux. This work was sponsored by Collabora.

What is traffic control


Traffic control is a technique to control network traffic in order to optimise or guarantee performance, low-latency, and/or bandwidth. This includes deciding which packets to accept at what rate in an input interface and determining which packets to transmit in what order at what rate on an output interface. 

On Linux, applications can send the traffic control configuration to the kernel using a Netlink socket with  the NETLINK_ROUTE protocol. By default, traffic control on Linux consists of a single queue which collects entering packets and dequeues them as quickly as the underlying device can accept them. The tc tool (from the iproute2 package) or the more recent "nl-*" tools (part of libnl) are different implementations but they can both be used to configure traffic control. Libnl has an incomplete support for traffic control but is in active development and progressing quickly.

Difficulty of shaping ingress traffic


Traffic control and shaping comes in two forms, the control of packets being received by the system (ingress) and the control of packets being sent out by the system (egress). Shaping outgoing traffic is reasonably straight-forward, as the system is in direct control of the traffic sent out through its interfaces. Shaping incoming traffic is however much harder as the decision on which packets to sent over the medium is controlled by the sending side and can't be directly controlled by the system itself.

However, for multimedia devices, control over incoming traffic is far more important then controlling outgoing traffic. Our use-case is ensuring glitch-free playback of a media stream (e.g. internet radio). In such a case, essentially, a minimal amount of incoming bandwidth needs to be reserved for the media stream.

For shaping (or rather influencing or policing) incoming traffic, the only practical approach is to put a fake bottleneck in place on the local system and rely on TCP congestion control to adjust its rate to match the intended rate as enforced by this bottleneck. With such a system it's possible to, for example, implement a policy where traffic that is not important for the current media stream (background traffic) can be limited, leaving the remaining available bandwidth for the more critical streams.

On Linux, ingress traffic control ("ingress qdisc" on the graph) happens before the Netfilter subsystem:
Netfilter-packet-flow
By Jengelh (Own work, Origin SVG PNG) [CC-BY-SA-3.0], via Wikimedia Commons



Difficulty of shaping on mobile networks


However, to complicate matters further, in mobile systems which are connected wirelessly to the internet and have a tendency to move around it's not possible to know the total amount of available bandwidth at any specific time as it's constantly changing. Which means, a simple strategy of capping background traffic at a static limit simply can't work.

The implemented solution


To cope with the dynamic nature, a traffic control daemon (tcmmd) has been implemented which can dynamically update the kernel configuration to match the current needs of the playback applications and adapt to the current network conditions. Furthermore to address the issues mentioned above, the implementation will use the following strategy:

  • Split the traffic streams into critical traffic and background traffic. Police the incoming traffic by limiting the bandwidth available to background traffic with the goal of leaving enough bandwidth available for critical streams.
  • Instead of having static configuration, let applications (e.g. a media player) indicate when the current traffic rate is too low for their purposes. This both means the daemon doesn't have to actively measure the traffic rate and allows it cope with streams that don't have a constant bitrate more naturally.
Communication between the traffic control daemon and the applications is done via D-Bus. The  D-Bus interface allow applications to register critical streams by passing the standard 5-tuple (source ip and port, destination ip and port and protocol) which uniquely identify a stream and indicate when a particular stream bandwidth is too low.

To allow the daemon to effectively control the incoming traffic, a so-called Intermediate Functional Block device (ifb0) is used to provide a virtual network device to provide an artificial bottleneck. This is done by transparently redirecting the incoming traffic from the physical network device through the virtual network device and shape the traffic as it leaves the virtual device again. The reason for the traffic redirection is to allow the usage of the kernels egress traffic control to effectively be used on incoming traffic. The results in the example setup shown below (with eth0 being a physical interface and ifb0 the accompanying virtual interface).




To demonstrate the functionality as described above, a simple demonstration media application using Gstreamer (tcdemo) has been written that communicates with the Traffic control daemon in the manner described.

Testing, the set-up


The traffic control feature in tcdemo can be enabled or disabled on the command line. This allowed me to compare the behaviour in both cases. 

On my left, I have a web server serving both the files for a video stream and the files for background downloads. On my right, I have a multimedia device rendering a video stream while downloading other files on the same web server.

Traffic control is only useful when the available bandwidth is limited. In order to have meaningful tests, I simulated a low bandwidth with the following commands on the web server:
tc qdisc add dev wlan0 root handle 1: cbq avpkt 1000 bandwidth 10Mbit
tc class add dev wlan0 parent 1: classid 1:1 cbq rate 3Mbit allot 1500 prio 3 bounded isolated
tc filter add dev wlan0 parent 1: protocol ip u32 match ip protocol 6 0xff match ip sport 80 0xffff flowid 1:1

Only the traffic from port 80/http was limited. It is important to note that the background traffic and the stream traffic were both going through the same bottleneck.

Tcdemo was playing a video file streamed over http while 8 wgets were downloading the same file continuously. The 9 connections were competing for the limited bandwidth. Without traffic control, tcdemo would not have got enough bandwidth.

The following graph shows what happened with traffic control. The video streaming is composed of several phases:
  1. tcdemo opened the HTTP connection and its GStreamer pipeline started downloading. At the same time, tcmmd was notified there was a new stream connection and it restricted any potential background traffic to a very low limit. As long as the initial GStreamer queue was buffering, the background traffic limit did not change.
  2. The GStreamer queue became full at t=4s and the video started to be played on the screen. The daemon increased the limit on the background traffic exponentially and the stream bandwidth got reduced as a consequence.
  3. Despite the stream bandwidth degrading slowly, GStreamer managed to keep its queue over 75% full until t=25s. When the queue is more than 75% full, Gstreamer does not report it because tcdemo chose that threshold with the low-percent property on GstQueue2 (the graph shows 100% in this case). 
  4. At t=30s, the GStreamer queue was less than 70% full and that threshold triggered tcmmd to restrict the background traffic to its minimum.
  5. The stream could use most of the bandwidth and the GStreamer queue became full quickly at t=31s. The background traffic could start its exponential growth again.
traffic control stats
Thanks to traffic control, the GStreamer queue never got empty in my test.

Get the sources


git clone git://git.collabora.co.uk/git/user/alban/tcmmd
git clone https://github.com/alban/tcmmd

FAQ


Q: Do I need any privileges to run this?
A: No privileges required for tcdemo, the GStreamer application. But tcmmd needs CAP_NET_ADMIN to change the TC rules.

Q: The 5-tuple contains the TCP source port. How does the application know that number?
A: The application can either call bind(2) before connect(2) to choose a TCP source port, or call getsockname(2) after connect(2) to retrieve the TCP source port assigned automatically by the kernel. The former allows to install the traffic control rules before the call to connect(2) triggers the emission and reception of the first packets on the network. The latter means the first few packets will be exchanged without being shaped by the traffic control. Tcdemo implements the latter to avoid more invasive changes in the souphttpsrc GStreamer element and libsoup. See bgo#721807.

Q: What happens if an application forgets to unregister a 5-tuple when the video stream finishes?
A: That would be bad manners from the application. The current traffic control settings would remain.  And if the application notifies tcmmd that its buffer was empty and forgets to notify any changes, the background traffic would be severely throttled. However, if the application just terminates or crashes, tcmmd would notice it immediately on D-Bus and the traffic control rules would be removed.

Q: Does tcmmd remove its traffic control rules when terminated?
A: It depends how it is terminated. Tcmmd removes its traffic control rules on SIGINT and SIGTERM. But the rules remain in other cases (SIGSEGV, SIGKILL, etc.). If it is a problem in case of crash, tcmmd initialisation properly removes previous rules, so you could start tcmmd and interrupt it with ctrl-c.

Q: Instead of using the 5-tuple, why not using setsockopt-SO_MARK?
A: First, SO_MARK requires CAP_NET_ADMIN which is not something that media player should have. It could be worked around by fd-passing the socket to a more privileged daemon to call setsockopt-SO_MARK but it's not elegant. More importantly, tcmmd's goal is not to shape the egress traffic but the ingress traffic. The shaping of incoming packets is performed very early in the Linux network stack: it happens before Netfilter, and before the packet is associated to a socket. So we can't check the SO_MARK of a socket to shape incoming packets.

Q: Instead of using the 5-tuple, why not using cgroups?
A: The granularity of cgroups are only per-process. So the traffic control would not be able to distinguish between different HTTP connections in a web browser used to render a video stream and used for background downloads. And for the same reason as setsockopt-SO_MARK, it would not work for shaping ingress traffic: we would not be able to link the packet to any process or cgroup.

Q: Instead of sending the 5-tuple to tcmmd, why not set the IP type-of-service (TOS) on outgoing packets with setsockopt-SO_PRIORITY to avoid changes in the application and have an iptables target to feed that information about connections back to the ingress traffic control?
A: It could be possible if the bandwidth was fixed, but on mobile networks, the application needs to be changed anyway to give feedback when the queue in the GStreamer pipeline get emptied.

Q: Why not play with the TCP windows instead shaping the ingress traffic?
A: As far as I know, Linux does not have the infrastructure for that. The TCP windows to manipulate would not be from the GStreamer application but from all other connections, so it can't be done from userspace.

Q: Does tcdemo require any new feature in GStreamer?
A: Yes, souphttpsrc needs this patch: bgo#721807

Q: Does tcmmd require any new feature in the Linux kernel?
A: No.

Q: Does tcmmd work on several network interfaces (e.g. eth0 + wlan0)?
A: No, at the moment tcmmd only support one interface and it has to be started after the interface is up. Patches welcome!

Q: Tcmmd uses both libnl and /sbin/tc via system() calls. Why?
A: My goal is to use libnl and avoid spawning processes to call /sbin/tc. I just didn't have time to finish this. It will involve checking that libnl has the right features. Some needed features such as u32 action support were implemented recently in the last version.

Q: How did you get the graphs?
A: I used tcmmd's --save-stats option and the script tests/plot-tcmmd-log.sh.

Q: Why is there so frequent Netlink communication between tcmmd and the kernel?
A: One part of this is to gather regular statistics in order to generate graphs if the option --save-stats was used. The other part is for implementing the exponential progression of the bandwidth allocated to the background traffic: at regular interval, tcmmd changes the rate assigned to a qdisc. It could be avoided by implementing a specialised qdisc in the kernel for our use case. It would require more thinking how to design the API for that new qdisc.

Q: Does it work with IPv6?
A: No. The architecture is not specific to IPv4 but it is just not implemented yet for IPv6. Tcmmd would need to generate new TC rules because the IP headers are different between IPv4 and IPv6.


Thanks Sjoerd for the architecture diagram and proof-reading.

3 comments:

  1. I agree with you that shaping inbound traffic is harder. But have you tried fq_codel as an underlying qdisc in your tool?

    ReplyDelete
    Replies
    1. I haven't try fq_codel. But I'm not sure where it would be useful.

      At the moment, it uses htb and u32 filters to classify the traffic in two classes (multimedia traffic and background traffic) and apply the desired bitrate on each class. Then, it uses sfq on each leaf qdisc.

      I could replace sfq by fq_codel in the qdisc leaves, and keep htb and u32 for the rest of the tree but since this is ingress traffic we are shapping, we are not limited by the speed of the network hardware but only by how fast the cpu can redirect packets between ifb0 and wlan0. So I am not sure sfq versus fq_codel would make a difference: I would expect their hash tables to remain mostly empty.

      Delete
  2. Update: this has been presented at the Linux Plumbers 2014 conference.
    http://www.linuxplumbersconf.org/2014/ocw/sessions/1923
    Slides: http://goo.gl/OPNJmD

    ReplyDelete