DragonFly On-Line Manual Pages
PARALLELCPU(1) User Contributed Perl Documentation PARALLELCPU(1)
NAME
PDL::ParallelCPU - Parallel Processor MultiThreading Support in PDL
(Experimental)
DESCRIPTION
PDL has support (currently experimental) for splitting up numerical
processing between multiple parallel processor threads (or pthreads)
using the set_autopthread_targ and set_autopthread_size functions.
This can improve processing performance (by greater than 2-4X in most
cases) by taking advantage of multi-core and/or multi-processor
machines.
SYNOPSIS
use PDL;
# Set target of 4 parallel pthreads to create, with a lower limit of
# 5Meg elements for splitting processing into parallel pthreads.
set_autopthread_targ(4);
set_autopthread_size(5);
$a = zeroes(5000,5000); # Create 25Meg element array
$b = $a + 5; # Processing will be split up into multiple pthreads
# Get the actual number of pthreads for the last
# processing operation.
$actualPthreads = get_autopthread_actual();
Terminology
The use of the term threading can be confusing with PDL, because it can
refer to PDL threading, as defined in the PDL::Threading docs, or to
processor multi-threading.
To reduce confusion with the existing PDL threading terminology, this
document uses pthreading to refer to processor multi-threading, which
is the use of multiple processor threads to split up numerical
processing into parallel operations.
Functions that control PDL PThreads
This is a brief listing and description of the PDL pthreading
functions, see the PDL::Core docs for detailed information.
set_autopthread_targ
Set the target number of processor-threads (pthreads) for multi-
threaded processing. Setting auto_pthread_targ to 0 means that no
pthreading will occur.
See PDL::Core for details.
set_autopthread_size
Set the minimum size (in Meg-elements or 2**20 elements) of the
largest PDL involved in a function where auto-pthreading will be
performed. For small PDLs, it probably isn't worth starting
multiple pthreads, so this function is used to define a minimum
threshold where auto-pthreading won't be attempted.
See PDL::Core for details.
get_autopthread_actual
Get the actual number of pthreads executed for the last pdl
processing function.
See PDL::get_autopthread_actual for details.
Global Control of PDL PThreading using Environment Variables
PDL PThreading can be globally turned on, without modifying existing
code by setting environment variables PDL_AUTOPTHREAD_TARG and
PDL_AUTOPTHREAD_SIZE before running a PDL script. These environment
variables are checked when PDL starts up and calls to
set_autopthread_targ and set_autopthread_size functions made with the
environment variable's values.
For example, if the environment var PDL_AUTOPTHREAD_TARG is set to 3,
and PDL_AUTOPTHREAD_SIZE is set to 10, then any pdl script will run as
if the following lines were at the top of the file:
set_autopthread_targ(3);
set_autopthread_size(10);
How It Works
The auto-pthreading process works by analyzing threaded array
dimensions in PDL operations and splitting up processing based on the
thread dimension sizes and desired number of pthreads (i.e. the pthread
target or pthread_targ). The offsets and increments that PDL uses to
step thru the data in memory are modified for each pthread so each one
sees a different set of data when performing processing.
Example
$a = sequence(20,4,3); # Small 3-D Array, size 20,4,3
# Setup auto-pthreading:
set_autopthread_targ(2); # Target of 2 pthreads
set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded
# This will be split up into 2 pthreads
$c = maximum($a);
For the above example, the maximum function has a signature of "(a(n);
[o]c())", which means that the first dimension of $a (size 20) is a
Core dimension of the maximum function. The other dimensions of $a
(size 4,3) are threaded dimensions (i.e. will be threaded-over in the
maximum function.
The auto-pthreading algorithm examines the threaded dims of size (4,3)
and picks the 4 dimension, since it is evenly divisible by the
autopthread_targ of 2. The processing of the maximum function is then
split into two pthreads on the size-4 dimension, with dim indexes 0,2
processed by one pthread
and dim indexes 1,3 processed by the other pthread.
Limitations
Must have POSIX Threads Enabled
Auto-PThreading only works if your PDL installation was compiled with
POSIX threads enabled. This is normally the case if you are running on
linux, or other unix variants.
Non-Threadsafe Code
Not all the libraries that PDL intefaces to are thread-safe, i.e. they
aren't written to operate in a multi-threaded environment without
crashing or causing side-effects. Some examples in the PDL core is the
fft function and the pnmout functions.
To operate properly with these types of functions, the PPCode flag
NoPthread has been introduced to indicate a function as not being
pthread-safe. See PDL::PP docs for details.
Size of PDL Dimensions and PThread Target
Due to the way a PDL is split-up for operation using multiple pthreads,
the size of a dimension must be evenly divisible by the pthread target.
For example, if a PDL has threaded dimension sizes of (4,3,3) and the
auto_pthread_targ has been set to 2, then the first threaded dimension
(size 4) will be picked to be split up into two pthreads of size 2 and
2. However, if the threaded dimension sizes are (3,3,3) and the
auto_pthread_targ is still 2, then pthreading won't occur, because no
threaded dimensions are divisible by 2.
The algorithm that picks the actual number of pthreads has some smarts
(but could probably be improved) to adjust down from the
auto_pthread_targ to get a number of pthreads that can evenly divide
one of the threaded dimensions. For example, if a PDL has threaded
dimension sizes of (9,2,2) and the auto_pthread_targ is 4, the
algorithm will see that no dimension is divisible by 4, then adjust
down the target to 3, resulting in splitting up the first threaded
dimension (size 9) into 3 pthreads.
Speed improvement might be less than you expect.
If you have a 8 core machine and call auto_pthread_targ with 8 to
generate 8 parallel pthreads, you probably won't get a 8X improvement
in speed, due to memory bandwidth issues. Even though you have 8
separate CPUs crunching away on data, you will have (for most common
machine architectures) common RAM that now becomes your bottleneck. For
simple calculations (e.g simple additions) you can run into a
performance limit at about
4 pthreads. For more complex calculations the limit will be higher.
COPYRIGHT
Copyright 2011 John Cerney. You can distribute and/or modify this
document under the same terms as the current Perl license.
See: http://dev.perl.org/licenses/
perl v5.20.2 2015-05-24 PARALLELCPU(1)