DragonFly On-Line Manual Pages
POLLING(4) DragonFly Kernel Interfaces Manual POLLING(4)
polling -- network device driver polling support
Network device polling (polling for brevity) refers to a technique that
lets the operating system periodically poll network devices, instead of
relying on the network devices to generate interrupts when they need
attention. This might seem inefficient and counterintuitive, but when
done properly, polling gives more control to the operating system on when
and how to handle network devices, with a number of advantages in terms
of system responsiveness and performance.
In particular, polling reduces the overhead for context switches which is
incurred when servicing interrupts, and gives more control on the sched-
uling of a CPU between various tasks (user processes, software inter-
rupts, device handling) which ultimately reduces the chances of livelock
in the system.
Principles of Operation
In the normal, interrupt-based mode, network devices generate an inter-
rupt whenever they need attention. This in turn causes a context switch
and the execution of an interrupt handler which performs whatever pro-
cessing is needed by the network device. The duration of the interrupt
handler is potentially unbounded unless the network device driver has
been programmed with real-time concerns in mind (which is generally not
the case for DragonFly drivers). Furthermore, under heavy traffic load,
the system might be persistently processing interrupts without being able
to complete other work, either in the kernel or in userland.
Network device polling disables interrupts by polling network devices on
clock interrupts. This way, the context switch overhead is removed.
Furthermore, the operating system can control accurately how much work to
spend in handling network device events, and thus prevent livelock by
reserving some amount of CPU to other tasks.
Enabling polling also changes the way software network interrupts are
scheduled, so there is never the risk of livelock because packets are not
processed to completion.
It is turned on and off with help of ifconfig(8) command. An interface
does not have to be ``up'' in order to turn on its polling feature.
The following tunables can be set from loader.conf(5) (X is the CPU num-
Default value for net.ifpoll.X.rx.burst_max sysctl nodes.
Default value for net.ifpoll.X.rx.each_burst sysctl nodes.
Default value for net.ifpoll.X.rx.user_frac sysctl nodes.
Default value for net.ifpoll.X.pollhz sysctl nodes.
Default value for net.ifpoll.0.status_frac sysctl node.
Default value for net.ifpoll.X.tx_frac sysctl nodes.
The operation of polling is controlled by the following per CPU sysctl(8)
MIB variables (X is the CPU number):
The polling frequency, whose range is 1 to 30000. Default is
When polling is enabled, and provided that there is some work to
do, up to this percent of the CPU cycles is reserved to userland
tasks, the remaining fraction being available for polling pro-
cessing. Default is 50.
Maximum number of packets grabbed from each network interface in
each timer tick. This number is dynamically adjusted by the ker-
nel, according to the programmed user_frac, burst_max, CPU speed,
and system load.
The burst above is split into smaller chunks of this number of
packets, going round-robin among all interfaces registered for
polling. This prevents the case that a large burst from a single
interface can saturate the IP interrupt queue. Default is 50.
Upper bound for net.ifpoll.X.rx.burst. Note that when polling is
enabled, each interface can receive at most (pollhz * burst_max)
packets per second unless there are spare CPU cycles available
for polling in the idle loop. This number should be tuned to
match the expected load. Default is 250 which is adequate for
1000Mbit network and pollhz=6000.
How many active network devices have registered for packet recep-
Controls how often (every tx_frac / pollhz seconds) the tranmis-
sion queue is checked for packet transmission done events.
Increasing this value reduces the time spent on checking packets
transmission done events thus reduces bus load, but it also
increases chance that the transmission queue getting saturated.
Default is 1.
How many active network devices have registered for packet trans-
Controls how often (every status_frac / pollhz seconds) the sta-
tus registers of the network device are checked for error condi-
tions and the like. Increasing this value reduces the load on
the bus, but also delays the error detection. Default is 120.
How many active network devices have registered for status
Network device polling requires explicit modifications to the network
device drivers. As of this writing, the bce(4), bge(4), bnx(4), dc(4),
em(4), emx(4), fwe(4), fxp(4), igb(4), jme(4), mxge(4), nfe(4), nge(4),
re(4), rl(4), sis(4), stge(4), vge(4), vr(4), and xl(4) devices are sup-
ported, with others in the works. The bce(4), bnx(4), emx(4), igb(4),
jme(4), and mxge(4), support multiple reception queues based polling.
The bce(4), bnx(4), certain types of emx(4), and igb(4) support multiple
transmission queues based polling. The modifications are rather
straightforward, consisting in the extraction of the inner part of the
interrupt service routine and writing a callback function, *_npoll(),
which is invoked to probe the network device for events and process them.
(See the conditionally compiled sections of the network devices mentioned
above for more details.)
In order to reduce the latency in processing packets, it is advisable to
set the sysctl(8) variable net.ifpoll.X.pollhz to at least 1000.
Network device polling first appeared in FreeBSD 4.6. It was rewritten
in DragonFly 1.3.
The network device polling code was rewritten by Matt Dillon based on the
original code by Luigi Rizzo <firstname.lastname@example.org>. Sepherosa Ziehau made
the polling frequency settable at runtime, added per CPU polling and
added multiple reception and tranmission queue polling support.
DragonFly 3.7 May 23, 2013 DragonFly 3.7