Multimedia
devices traditionally don't manage the network bandwidth required by
applications. This causes a problem when users try to watch a
streaming video or listen to a web radio seamlessly while other
applications are downloading other content in the background. The
background downloads can use too much bandwidth for the streaming video
or web radio to keep up and users notice unnecessary interruptions in
the playback.
I have been working on an approach to improve this using traffic control on Linux. This work was sponsored by Collabora.
What is traffic control
Traffic control is a technique to control network traffic in order to optimise or guarantee performance, low-latency, and/or bandwidth. This includes deciding which packets to accept at what rate in an input interface and determining which packets to transmit in what order at what rate on an output interface.
On
Linux, applications can send the traffic control configuration to the
kernel using a Netlink socket with the NETLINK_ROUTE protocol. By
default, traffic control on Linux consists of a single queue which
collects entering packets and dequeues them as quickly as the underlying
device can accept them. The tc tool (from the iproute2 package) or
the more recent "nl-*" tools (part of libnl) are different
implementations but they can both be used to configure traffic control.
Libnl has an incomplete support for traffic control but is in active
development and progressing quickly.
Difficulty of shaping ingress traffic
Traffic
control and shaping comes in two forms, the control of packets being
received by the system (ingress) and the control of packets being sent
out by the system (egress). Shaping outgoing traffic is reasonably
straight-forward, as the system is in direct control of the traffic sent
out through its interfaces. Shaping incoming traffic is however much
harder as the decision on which packets to sent over the medium is
controlled by the sending side and can't be directly controlled by the
system itself.
However,
for multimedia devices, control over incoming traffic is far more
important then controlling outgoing traffic. Our use-case is
ensuring glitch-free playback of a media stream (e.g. internet radio).
In such a case, essentially, a minimal amount of incoming bandwidth
needs to be reserved for the media stream.
For
shaping (or rather influencing or policing) incoming traffic, the only
practical approach is to put a fake bottleneck in place on the local
system and rely on TCP congestion control to adjust its rate to match
the intended rate as enforced by this bottleneck. With such a system
it's possible to, for example, implement a policy where traffic that is
not important for the current media stream (background traffic) can be
limited, leaving the remaining available bandwidth for the more critical
streams.
On Linux, ingress traffic control ("ingress qdisc" on the graph) happens before the Netfilter subsystem:
By Jengelh (Own work, Origin SVG PNG) [CC-BY-SA-3.0], via Wikimedia Commons
Difficulty of shaping on mobile networks
However,
to complicate matters further, in mobile systems which are connected
wirelessly to the internet and have a tendency to move around it's not
possible to know the total amount of available bandwidth at any specific
time as it's constantly changing. Which means, a simple strategy of
capping background traffic at a static limit simply can't work.
The implemented solution
To
cope with the dynamic nature, a traffic control daemon (tcmmd) has been
implemented which can dynamically update the kernel configuration to
match the current needs of the playback applications and adapt to the
current network conditions. Furthermore to address the issues mentioned
above, the implementation will use the following strategy:
- Split the traffic streams into critical traffic and background traffic. Police the incoming traffic by limiting the bandwidth available to background traffic with the goal of leaving enough bandwidth available for critical streams.
- Instead of having static configuration, let applications (e.g. a media player) indicate when the current traffic rate is too low for their purposes. This both means the daemon doesn't have to actively measure the traffic rate and allows it cope with streams that don't have a constant bitrate more naturally.
Communication
between the traffic control daemon and the applications is done via
D-Bus. The D-Bus interface allow applications to register critical
streams by passing the standard 5-tuple (source ip and port, destination
ip and port and protocol) which uniquely identify a stream and indicate
when a particular stream bandwidth is too low.
To
allow the daemon to effectively control the incoming traffic, a
so-called Intermediate Functional Block device (ifb0) is used to provide a
virtual network device to provide an artificial bottleneck. This is done
by transparently redirecting the incoming traffic from the physical
network device through the virtual network device and shape the traffic
as it leaves the virtual device again. The reason for the traffic
redirection is to allow the usage of the kernels egress traffic control
to effectively be used on incoming traffic. The results in the example
setup shown below (with eth0 being a physical interface and ifb0 the
accompanying virtual interface).
To
demonstrate the functionality as described above, a simple
demonstration media application using Gstreamer (tcdemo) has been
written that communicates with the Traffic control daemon in the manner
described.
Testing, the set-up
The
traffic control feature in tcdemo can be enabled or disabled on the
command line. This allowed me to compare the behaviour in both cases.
On
my left, I have a web server serving both the files for a video stream
and the files for background downloads. On my right, I have a multimedia
device rendering a video stream while downloading other files on the
same web server.
Traffic
control is only useful when the available bandwidth is limited. In
order to have meaningful tests, I simulated a low bandwidth with the
following commands on the web server:
tc qdisc add dev wlan0 root handle 1: cbq avpkt 1000 bandwidth 10Mbit tc class add dev wlan0 parent 1: classid 1:1 cbq rate 3Mbit allot 1500 prio 3 bounded isolated tc filter add dev wlan0 parent 1: protocol ip u32 match ip protocol 6 0xff match ip sport 80 0xffff flowid 1:1
Only the traffic from port 80/http was limited. It is important to note that the background traffic and the stream traffic were both going through the same bottleneck.
Tcdemo was playing a video file streamed over http while 8 wgets were downloading the same file continuously. The 9 connections were competing for the limited bandwidth. Without traffic control, tcdemo would not have got enough bandwidth.
The following graph shows what happened with traffic control. The video streaming is composed of several phases:
- tcdemo opened the HTTP connection and its GStreamer pipeline started downloading. At the same time, tcmmd was notified there was a new stream connection and it restricted any potential background traffic to a very low limit. As long as the initial GStreamer queue was buffering, the background traffic limit did not change.
- The GStreamer queue became full at t=4s and the video started to be played on the screen. The daemon increased the limit on the background traffic exponentially and the stream bandwidth got reduced as a consequence.
- Despite the stream bandwidth degrading slowly, GStreamer managed to keep its queue over 75% full until t=25s. When the queue is more than 75% full, Gstreamer does not report it because tcdemo chose that threshold with the low-percent property on GstQueue2 (the graph shows 100% in this case).
- At t=30s, the GStreamer queue was less than 70% full and that threshold triggered tcmmd to restrict the background traffic to its minimum.
- The stream could use most of the bandwidth and the GStreamer queue became full quickly at t=31s. The background traffic could start its exponential growth again.
Get the sources
git clone git://git.collabora.co.uk/git/user/alban/tcmmd git clone https://github.com/alban/tcmmd
FAQ
Q: Do I need any privileges to run this?
A: No privileges required for tcdemo, the GStreamer application. But tcmmd needs CAP_NET_ADMIN to change the TC rules.
Q: The 5-tuple contains the TCP source port. How does the application know that number?
A:
The application can either call bind(2) before connect(2) to choose a
TCP source port, or call getsockname(2) after connect(2) to retrieve the
TCP source port assigned automatically by the kernel. The former allows
to install the traffic control rules before the call to connect(2)
triggers the emission and reception of the first packets on the network.
The latter means the first few packets will be exchanged without being
shaped by the traffic control. Tcdemo implements the latter to avoid
more invasive changes in the souphttpsrc GStreamer element and libsoup.
See bgo#721807.
Q: What happens if an application forgets to unregister a 5-tuple when the video stream finishes?
A:
That would be bad manners from the application. The current traffic
control settings would remain. And if the application notifies tcmmd
that its buffer was empty and forgets to notify any changes, the
background traffic would be severely throttled. However, if the
application just terminates or crashes, tcmmd would notice it
immediately on D-Bus and the traffic control rules would be removed.
Q: Does tcmmd remove its traffic control rules when terminated?
A:
It depends how it is terminated. Tcmmd removes its traffic control
rules on SIGINT and SIGTERM. But the rules remain in other cases
(SIGSEGV, SIGKILL, etc.). If it is a problem in case of crash, tcmmd
initialisation properly removes previous rules, so you could start tcmmd
and interrupt it with ctrl-c.
Q: Instead of using the 5-tuple, why not using setsockopt-SO_MARK?
A:
First, SO_MARK requires CAP_NET_ADMIN which is not something that media
player should have. It could be worked around by fd-passing the socket
to a more privileged daemon to call setsockopt-SO_MARK but it's not
elegant. More importantly, tcmmd's goal is not to shape the egress
traffic but the ingress traffic. The shaping of incoming packets is
performed very early in the Linux network stack: it happens before
Netfilter, and before the packet is associated to a socket. So we can't
check the SO_MARK of a socket to shape incoming packets.
Q: Instead of using the 5-tuple, why not using cgroups?
A:
The granularity of cgroups are only per-process. So the traffic control
would not be able to distinguish between different HTTP connections in a
web browser used to render a video stream and used for background
downloads. And for the same reason as setsockopt-SO_MARK, it would not
work for shaping ingress traffic: we would not be able to link the
packet to any process or cgroup.
Q: Instead of sending the 5-tuple to tcmmd, why not set the IP type-of-service (TOS) on outgoing packets with setsockopt-SO_PRIORITY to avoid changes in the application and have an iptables target to feed that information about connections back to the ingress traffic control?
A: It could be possible if the bandwidth was fixed, but on mobile networks, the application needs to be changed anyway to give feedback when the queue in the GStreamer pipeline get emptied.
Q: Why not play with the TCP windows instead shaping the ingress traffic?
A: As far as I know, Linux does not have the infrastructure for that. The TCP windows to manipulate would not be from the GStreamer application but from all other connections, so it can't be done from userspace.
Q: Does tcdemo require any new feature in GStreamer?
A: Yes, souphttpsrc needs this patch: bgo#721807
Q: Does tcmmd require any new feature in the Linux kernel?
A: No.
Q: Does tcmmd work on several network interfaces (e.g. eth0 + wlan0)?
A: No, at the moment tcmmd only support one interface and it has to be started after the interface is up. Patches welcome!
Q: Tcmmd uses both libnl and /sbin/tc via system() calls. Why?
A:
My goal is to use libnl and avoid spawning processes to call /sbin/tc. I
just didn't have time to finish this. It will involve checking that
libnl has the right features. Some needed features such as u32 action
support were implemented recently in the last version.
Q: How did you get the graphs?
A: I used tcmmd's --save-stats option and the script tests/plot-tcmmd-log.sh.
Q: Why is there so frequent Netlink communication between tcmmd and the kernel?
A:
One part of this is to gather regular statistics in order to generate
graphs if the option --save-stats was used. The other part is for
implementing the exponential progression of the bandwidth allocated to
the background traffic: at regular interval, tcmmd changes the rate
assigned to a qdisc. It could be avoided by implementing a specialised
qdisc in the kernel for our use case. It would require more thinking how
to design the API for that new qdisc.
Q: Does it work with IPv6?
A: No. The architecture is not specific to IPv4 but it is just not implemented yet for IPv6. Tcmmd would need to generate new TC rules because the IP headers are different between IPv4 and IPv6.
Thanks Sjoerd for the architecture diagram and proof-reading.
I agree with you that shaping inbound traffic is harder. But have you tried fq_codel as an underlying qdisc in your tool?
ReplyDeleteI haven't try fq_codel. But I'm not sure where it would be useful.
DeleteAt the moment, it uses htb and u32 filters to classify the traffic in two classes (multimedia traffic and background traffic) and apply the desired bitrate on each class. Then, it uses sfq on each leaf qdisc.
I could replace sfq by fq_codel in the qdisc leaves, and keep htb and u32 for the rest of the tree but since this is ingress traffic we are shapping, we are not limited by the speed of the network hardware but only by how fast the cpu can redirect packets between ifb0 and wlan0. So I am not sure sfq versus fq_codel would make a difference: I would expect their hash tables to remain mostly empty.
Update: this has been presented at the Linux Plumbers 2014 conference.
ReplyDeletehttp://www.linuxplumbersconf.org/2014/ocw/sessions/1923
Slides: http://goo.gl/OPNJmD