DragonFly On-Line Manual Pages
POLLING(4) DragonFly Kernel Interfaces Manual POLLING(4)
NAME
polling -- network device driver polling support
SYNOPSIS
options IFPOLL_ENABLE
DESCRIPTION
Device polling (polling for brevity) refers to a technique that lets the
operating system periodically poll devices, instead of relying on the
devices to generate interrupts when they need attention. This might seem
inefficient and counterintuitive, but when done properly, polling gives
more control to the operating system on when and how to handle devices,
with a number of advantages in terms of system responsiveness and perfor-
mance.
In particular, polling reduces the overhead for context switches which is
incurred when servicing interrupts, and gives more control on the sched-
uling of a CPU between various tasks (user processes, software inter-
rupts, device handling) which ultimately reduces the chances of livelock
in the system.
Principles of Operation
In the normal, interrupt-based mode, devices generate an interrupt when-
ever they need attention. This in turn causes a context switch and the
execution of an interrupt handler which performs whatever processing is
needed by the device. The duration of the interrupt handler is poten-
tially unbounded unless the device driver has been programmed with real-
time concerns in mind (which is generally not the case for DragonFly
drivers). Furthermore, under heavy traffic load, the system might be
persistently processing interrupts without being able to complete other
work, either in the kernel or in userland.
Device polling disables interrupts by polling devices on clock inter-
rupts. This way, the context switch overhead is removed. Furthermore,
the operating system can control accurately how much work to spend in
handling device events, and thus prevent livelock by reserving some
amount of CPU to other tasks.
Enabling polling also changes the way software network interrupts are
scheduled, so there is never the risk of livelock because packets are not
processed to completion.
Enabling polling
Currently only network interface drivers support the polling feature. It
is turned on and off with help of ifconfig(8) command. An interface does
not have to be ``up'' in order to turn on its polling feature.
Loader Tunables
The following tunables can be set from loader.conf(5) (X is the CPU num-
ber):
net.ifpoll.burst_max
Default value for net.ifpoll.X.rx.burst_max sysctl nodes.
net.ifpoll.each_burst
Default value for net.ifpoll.X.rx.each_burst sysctl nodes.
net.ifpoll.user_frac
Default value for net.ifpoll.X.rx.user_frac sysctl nodes.
net.ifpoll.pollhz
Default value for net.ifpoll.X.pollhz sysctl nodes.
net.ifpoll.status_frac
Default value for net.ifpoll.0.status_frac sysctl node.
net.ifpoll.tx_frac
Default value for net.ifpoll.X.tx_frac sysctl nodes.
MIB Variables
The operation of polling is controlled by the following per CPU sysctl(8)
MIB variables (X is the CPU number):
net.ifpoll.X.pollhz
The polling frequency, whose range is 1 to 30000. Default is
6000.
net.ifpoll.X.rx.user_frac
When polling is enabled, and provided that there is some work to
do, up to this percent of the CPU cycles is reserved to userland
tasks, the remaining fraction being available for polling pro-
cessing. Default is 50.
net.ifpoll.X.rx.burst
Maximum number of packets grabbed from each network interface in
each timer tick. This number is dynamically adjusted by the ker-
nel, according to the programmed user_frac, burst_max, CPU speed,
and system load.
net.ifpoll.X.rx.each_burst
The burst above is split into smaller chunks of this number of
packets, going round-robin among all interfaces registered for
polling. This prevents the case that a large burst from a single
interface can saturate the IP interrupt queue. Default is 50.
net.ifpoll.X.rx.burst_max
Upper bound for net.ifpoll.X.rx.burst. Note that when polling is
enabled, each interface can receive at most (pollhz * burst_max)
packets per second unless there are spare CPU cycles available
for polling in the idle loop. This number should be tuned to
match the expected load. Default is 250 which is adequate for
1000Mbit network and pollhz=6000.
net.ifpoll.X.rx.handlers
How many active devices have registered for packet reception
polling.
net.ifpoll.X.tx_frac
Controls how often (every tx_frac / pollhz seconds) the tranmis-
sion queue is checked for packet transmission done events.
Increasing this value reduces the time spent on checking packets
transmission done events thus reduces bus load, but it also
increases chance that the transmission queue getting saturated.
Default is 1.
net.ifpoll.X.tx.handlers
How many active devices have registered for packet transmission
polling.
net.ifpoll.0.status_frac
Controls how often (every status_frac / pollhz seconds) the sta-
tus registers of the device are checked for error conditions and
the like. Increasing this value reduces the load on the bus, but
also delays the error detection. Default is 120.
net.ifpoll.0.status.handlers
How many active devices have registered for status polling.
net.ifpoll.X.rx.short_ticks
net.ifpoll.X.rx.lost_polls
net.ifpoll.X.rx.pending_polls
net.ifpoll.X.rx.residual_burst
net.ifpoll.X.rx.phase
net.ifpoll.X.rx.suspect
net.ifpoll.X.rx.stalled
net.ifpoll.X.tx.short_ticks
net.ifpoll.X.tx.lost_polls
net.ifpoll.X.tx.pending_polls
net.ifpoll.X.tx.residual_burst
net.ifpoll.X.tx.phase
net.ifpoll.X.tx.suspect
net.ifpoll.X.tx.stalled
Debugging variables.
SUPPORTED DEVICES
Device polling requires explicit modifications to the device drivers. As
of this writing, the bce(4), bge(4), bnx(4), dc(4), em(4), emx(4),
fwe(4), fxp(4), igb(4), jme(4), nfe(4), nge(4), re(4), rl(4), sis(4),
stge(4), vge(4), vr(4), and xl(4) devices are supported, with others in
the works. The emx(4), igb(4), and jme(4) support multiple reception
queues based polling. The modifications are rather straightforward, con-
sisting in the extraction of the inner part of the interrupt service rou-
tine and writing a callback function, *_npoll(), which is invoked to
probe the device for events and process them. (See the conditionally
compiled sections of the devices mentioned above for more details.)
In order to reduce the latency in processing packets, it is advisable to
set the sysctl(8) variable net.ifpoll.X.pollhz to at least 1000.
HISTORY
Device polling first appeared in FreeBSD 4.6. It was rewritten in
DragonFly 1.3.
AUTHORS
The device polling code was rewritten by Matt Dillon based on the origi-
nal code by Luigi Rizzo <luigi@iet.unipi.it>. Sepherosa Ziehau made the
polling frequency settable at runtime, added per CPU polling and added
multiple reception queue polling support.
DragonFly 3.5 November 16, 2012 DragonFly 3.5