DragonFly BSD
DragonFly users List (threaded) for 2009-10
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Analysis on make parallelism for buildworld

From: "Simon 'corecode' Schubert" <corecode@xxxxxxxxxxxx>
Date: Tue, 20 Oct 2009 18:11:21 +0200


the question on which make parallelism to use comes up repeatedly. However the answer usually is driven by anecdotal evidence and not by empirical data. To this end, I ran a small benchmark test to add one data point. I have no idea about confidence intervals, so somebody will have to chime in here.

Experimental setup

Machine: Dell Precision T3400
CPU: Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz (2826.24-MHz 686-class CPU)
Memory: avail memory = 2063409152 (2015048K bytes)
HDD: da0: <SATA Hitachi HDP72505 GM4O> Fixed Direct Access SCSI-4 device (via AHCI)
filesystem: HAMMER v2
/usr/src: v2.5.1-77-gd894b0e
/usr/obj: flags nohistory, nullfs mount

executed command: make -j $j_level buildworld buildkernel

make levels used: 1-10
repetitions: 5

There were no other tasks performed during the tests, although Xorg, windowmaker, terminals, xmms, firefox and thunderbird were running (idling). Standard background jobs were not disabled.

Discussion ========== The plot shows the median build time as line and the errorbars show the min/max build times. The max spike at -j4 is probably due to it running concurrently with the 3am hammer cleanup.

We can see a monotonic drop in total run time from -j1 to -j5. After that the run time plateaus. User and sys times increase at the same time, also plateauing beyond -j5. This shows that increased parallelism in make will add slightly to the total overhead (sys+user), but total run time is significantly reduced. Beyond -j ncpu+1 we can not see any improvement in run time.

A -j 2 build does not offer significant benefit over -j 1, which is not intuitive and might need some further investigation.

The -j 5 build achieves a 42% reduction in build time, respective to the -j 1 base line.

Compared to the -j 4 (i.e. -j ncpu) build, the -j 5 (i.e. -j ncpu+1) build reduces run time by an additional 5.4%. This shows that not all CPU cores can be kept busy if there is only a parallelism level of ncpu.

Conclusion ==========

I advise to run builds at -j ncpu+1 for 4-cpu systems. Until we have numbers for 2-cpu and UP systems, we can not provide conclusive advice, however I would try using -j3 for those two cases.

cheers simon

PNG image

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]