StarPU Handbook
Loading...
Searching...
No Matches
6. Compilation Configuration

The behavior of the StarPU library and tools may be tuned thanks to the following configure options.

6.1 Common Configuration

–enable-debug

Enable debugging messages.

–enable-spinlock-check

Enable checking that spinlocks are taken and released properly.

–enable-fast

Disable assertion checks, which saves computation time.

–enable-verbose

Increase the verbosity of the debugging messages. This can be disabled at runtime by setting the environment variable STARPU_SILENT to any value. –enable-verbose=extra increase even more the verbosity.

$ STARPU_SILENT=1 ./vector_scal

–enable-coverage

Enable flags for the coverage tool gcov.

–enable-quick-check

Specify tests and examples should be run on a smaller data set, i.e allowing a faster execution time

–enable-long-check

Enable some exhaustive checks which take a really long time.

–enable-new-check

Enable new testcases which are known to fail.

–with-hwloc

Specify hwloc should be used by StarPU. hwloc should be found by the means of the tool pkg-config.

–with-hwloc=prefix

Specify hwloc should be used by StarPU. hwloc should be found in the directory specified by prefix

–without-hwloc

Specify hwloc should not be used by StarPU.

–disable-build-doc

Disable the creation of the documentation. This should be done on a machine which does not have the tools doxygen and latex (plus the packages latex-xcolor and texlive-latex-extra).

–enable-build-doc-pdf

By default, only the HTML documentation is generated. Use this option to also enable the generation of the PDF documentation. This should be done on a machine which does have the tools doxygen and latex (plus the packages latex-xcolor and texlive-latex-extra).

–enable-icc

Enable the compilation of specific ICC examples. StarPU itself will not be compiled with ICC unless specified with CC=icc

–disable-icc

Disable the usage of the ICC compiler. Otherwise, when a ICC compiler is found, some specific ICC examples are compiled as explained above.

–with-check-flags

Specify flags which will be given to C, CXX and Fortran compilers when valid

Additionally, the script configure recognize many variables, which can be listed by typing ./configure –help. For example, ./configure NVCCFLAGS="-arch sm_20" adds a flag for the compilation of CUDA kernels, and NVCC_CC=gcc-5 allows to change the C++ compiler used by nvcc.

6.2 Configuring Workers

–enable-data-locality-enforce

Enable data locality enforcement when picking up a worker to execute a task. This mechanism is by default disabled.

–enable-blocking-drivers

By default, StarPU keeps CPU workers awake permanently, for better reactivity. This option makes StarPU put CPU workers to real sleep when there are not enough tasks to compute.

–enable-worker-callbacks

If blocking drivers are enabled, enable callbacks to notify an external resource manager about workers going to sleep and waking up.

–enable-maxcpus=count

Use at most count CPU cores. This information is then available as the macro STARPU_MAXCPUS.

The default value is auto. it allows StarPU to automatically detect the number of CPUs on the build machine. This should not be used if the running host has a larger number of CPUs than the build machine.

–enable-maxnumanodes=count

Use at most count NUMA nodes. This information is then available as the macro STARPU_MAXNUMANODES.

The default value is auto. it allows StarPU to automatically detect the number of NUMA nodes on the build machine. This should not be used if the running host has a larger number of NUMA nodes than the build machine.

–disable-cpu

Disable the use of CPUs of the machine. Only GPUs etc. will be used.

–enable-maxcudadev=count

Use at most count CUDA devices. This information is then available as the macro STARPU_MAXCUDADEVS.

–disable-cuda

Disable the use of CUDA, even if a valid CUDA installation was detected.

–with-cuda-dir=prefix

Search for CUDA under prefix, which should notably contain the file include/cuda.h.

–with-cuda-include-dir=dir

Search for CUDA headers under dir, which should notably contain the file cuda.h. This defaults to /include appended to the value given to --with-cuda-dir.

–with-cuda-lib-dir=dir

Search for CUDA libraries under dir, which should notably contain the CUDA shared libraries—e.g., libcuda.so. This defaults to /lib appended to the value given to --with-cuda-dir.

–disable-cuda-memcpy-peer

Explicitly disable peer transfers when using CUDA 4.0.

–enable-maxopencldev=count

Use at most count OpenCL devices. This information is then available as the macro STARPU_MAXOPENCLDEVS.

–disable-opencl

Disable the use of OpenCL, even if the SDK is detected.

–with-opencl-dir=prefix

Search for an OpenCL implementation under prefix, which should notably contain include/CL/cl.h (or include/OpenCL/cl.h on Mac OS).

–with-opencl-include-dir=dir

Search for OpenCL headers under dir, which should notably contain CL/cl.h (or OpenCL/cl.h on Mac OS). This defaults to /include appended to the value given to --with-opencl-dir.

–with-opencl-lib-dir=dir

Search for an OpenCL library under dir, which should notably contain the OpenCL shared libraries—e.g. libOpenCL.so. This defaults to /lib appended to the value given to --with-opencl-dir.

–enable-opencl-simulator

Enable considering the provided OpenCL implementation as a simulator, i.e. use the kernel duration returned by OpenCL profiling information as wallclock time instead of the actual measured real time. This requires the SimGrid support.

–enable-maximplementations=count

Allow for at most count codelet implementations for the same target device. This information is then available as the macro STARPU_MAXIMPLEMENTATIONS macro.

–enable-max-sched-ctxs=count

Allow for at most count scheduling contexts This information is then available as the macro STARPU_NMAX_SCHED_CTXS.

–disable-asynchronous-copy

Disable asynchronous copies between CPU and GPU devices. The AMD implementation of OpenCL is known to fail when copying data asynchronously. When using this implementation, it is therefore necessary to disable asynchronous data transfers.

–disable-asynchronous-cuda-copy

Disable asynchronous copies between CPU and CUDA devices.

–disable-asynchronous-opencl-copy

Disable asynchronous copies between CPU and OpenCL devices. The AMD implementation of OpenCL is known to fail when copying data asynchronously. When using this implementation, it is therefore necessary to disable asynchronous data transfers.

–disable-asynchronous-hip-copy

Disable asynchronous copies between CPU and HIP devices.

–disable-asynchronous-mpi-master-slave-copy

Disable asynchronous copies between CPU and MPI Slave devices.

–disable-asynchronous-tcpip-master-slave-copy

Disable asynchronous copies between CPU and MPI Slave devices.

–disable-asynchronous-fpga-copy

Disable asynchronous copies between CPU and Maxeler FPGA devices.

–enable-maxnodes=count

Use at most count memory nodes. This information is then available as the macro STARPU_MAXNODES. Reducing it allows to considerably reduce memory used by StarPU data structures.

–with-max-fpga=dir

Enable the Maxeler FPGA driver support, and optionally specify the location of the Maxeler FPGA library.

–disable-asynchronous-max-fpga-copy

Disable asynchronous copies between CPU and Maxeler FPGA devices.

6.3 Extension Configuration

–enable-starpupy

Enable the StarPU Python Interface (Python Interface)

–enable-python-multi-interpreter
Enable the use of multiple interpreters in the StarPU Python Interface (Multiple Interpreters)
–disable-mpi

Disable the build of libstarpumpi. By default, it is enabled when MPI is found.

–enable-mpi

Enable the build of libstarpumpi. This is necessary when using Simgrid+MPI.

–with-mpicc=path

Use the compiler mpicc at path, for StarPU-MPI. (MPI Support).

–enable-mpi-pedantic-isend

Before performing any MPI communication, StarPU-MPI waits for the data to be available in the main memory of the node submitting the request. For send communications, data is acquired with the mode STARPU_R. When enabling the pedantic mode, data are instead acquired with the STARPU_RW which thus ensures that there is not more than 1 concurrent MPI_Isend calls accessing the data and StarPU does not read from it from tasks during the communication.

–enable-mpi-master-slave

Enable the MPI Master-Slave support. By default, it is disabled.

–enable-mpi-verbose

Increase the verbosity of the MPI debugging messages. This can be disabled at runtime by setting the environment variable STARPU_SILENT to any value. –enable-mpi-verbose=extra increase even more the verbosity.

$ STARPU_SILENT=1 mpirun -np 2 ./insert_task

–enable-mpi-ft

Enable the MPI checkpoint mechanism. See MPI Fault Tolerance Support

–enable-mpi-ft-stats

Enable the statistics for the MPI checkpoint mechanism. See MPI Fault Tolerance Support

–enable-tcpip-master-slave

Enable the TCP/IP Master-Slave support (TCP/IP Support). By default, it is disabled.

–enable-nmad

Enable the NewMadeleine implementation for StarPU-MPI. See Using the NewMadeleine communication library for more details.

–disable-fortran

Disable the fortran extension. By default, it is enabled when a fortran compiler is found.

–disable-socl

Disable the SOCL extension (SOCL OpenCL Extensions). By default, it is enabled when an OpenCL implementation is found.

–enable-openmp

Enable OpenMP Support (The StarPU OpenMP Runtime Support (SORS))

–enable-openmp-llvm

Enable LLVM OpenMP Support (Example: An OpenMP LLVM Support)

–enable-bubble

Enable Hierarchical dags support (Hierarchical DAGS)

–enable-parallel-worker

Enable parallel worker support (Creating Parallel Workers On A Machine)

–enable-eclipse-plugin

Enable the StarPU Eclipse Plugin. See StarPU Eclipse Plugin to know how to install Eclipse.

6.4 Advanced Configuration

–enable-perf-debug

Enable performance debugging through gprof.

–enable-model-debug

Enable performance model debugging.

–enable-fxt-lock

Enable additional trace events which describes locks behaviour. This is however extremely heavy and should only be enabled when debugging insides of StarPU.

–enable-maxbuffers

Define the maximum number of buffers that tasks will be able to take as parameters, then available as the macro STARPU_NMAXBUFS.

–enable-fxt-max-files=count

Use at most count mpi nodes fxt files for generating traces. This information is then available as the macro STARPU_FXT_MAX_FILES. This information is used by FxT tools when considering multi node traces. Default value is 64.

–enable-allocation-cache

Enable the use of a data allocation cache to avoid the cost of it with CUDA. Still experimental.

–enable-opengl-render

Enable the use of OpenGL for the rendering of some examples.

–enable-blas-lib=prefix

Specify the blas library to be used by some of the examples. Librairies available :

  • none [default] : no BLAS library is used
  • atlas: use ATLAS library
  • goto: use GotoBLAS library
  • openblas: use OpenBLAS library
  • mkl: use MKL library (you may need to set specific CFLAGS and LDFLAGS with –with-mkl-cflags and –with-mkl-ldflags)

–enable-leveldb

Enable linking with LevelDB if available

–enable-hdf5

Enable building HDF5 support.

–with-hdf5-include-dir=path

Specify the directory where is stored the header file hdf5.h.

–with-hdf5-lib-dir=path

Specify the directory where is stored the library hdf5.

–disable-starpufft

Disable the build of libstarpufft, even if fftw or cuFFT is available.

–enable-starpufft-examples

Enable the compilation and the execution of the libstarpufft examples. By default, they are neither compiled nor checked.

–with-fxt=prefix

Search for FxT under prefix. FxT (http://savannah.nongnu.org/projects/fkt) is used to generate traces of scheduling events, which can then be rendered them using ViTE (Off-line Performance Feedback). prefix should notably contain include/fxt/fxt.h.

–with-perf-model-dir=dir

Store performance models under dir, instead of the current user's home.

–with-goto-dir=prefix

Search for GotoBLAS under prefix, which should notably contain libgoto.so or libgoto2.so.

–with-atlas-dir=prefix

Search for ATLAS under prefix, which should notably contain include/cblas.h.

–with-mkl-cflags=cflags

Use cflags to compile code that uses the MKL library.

–with-mkl-ldflags=ldflags

Use ldflags when linking code that uses the MKL library. Note that the MKL website (http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/) provides a script to determine the linking flags.

–disable-glpk

Disable the use of libglpk for computing area bounds.

–disable-build-tests

Disable the build of tests.

–disable-build-examples

Disable the build of examples.

–enable-sc-hypervisor

Enable the Scheduling Context Hypervisor plugin (Scheduling Context Hypervisor). By default, it is disabled.

–enable-memory-stats

Enable memory statistics (Memory Feedback).

–enable-simgrid

Enable simulation of execution in SimGrid, to allow easy experimentation with various numbers of cores and GPUs, or amount of memory, etc. Experimental.

The path to SimGrid can be specified through the SIMGRID_CFLAGS and SIMGRID_LIBS environment variables, for instance:

export SIMGRID_CFLAGS="-I/usr/local/simgrid/include"
export SIMGRID_LIBS="-L/usr/local/simgrid/lib -lsimgrid"

–with-simgrid-dir

Similar to the option --enable-simgrid but also allows to specify the location to the SimGrid library.

–with-simgrid-include-dir

Similar to the option --enable-simgrid but also allows to specify the location to the SimGrid include directory.

–with-simgrid-lib-dir

Similar to the option --enable-simgrid but also allows to specify the location to the SimGrid lib directory.

–with-smpirun=path

Use the smpirun at path

–enable-simgrid-mc

Enable the Model Checker in simulation of execution in SimGrid, to allow exploring various execution paths.

–enable-calibration-heuristic

Allow to set the maximum authorized percentage of deviation for the history-based calibrator of StarPU. A correct value of this parameter must be in [0..100]. The default value of this parameter is 10. Experimental.

–enable-mlr

Allow to enable multiple linear regression models (see Performance Model Example)

–enable-mlr-system-blas

Allow to make multiple linear regression models use the system-provided BLAS for dgels (see Performance Model Example)