Compiled Integrands for Speed; GPUs¶
As discussed in the section on Faster Integrands,
it is possible to create very fast integrands using
tools like the numba
JIT compiler to replace
Python code for an integrand with optimized machine code.
In this
section we discuss several other tools that can do
the same thing.
Cython and Pythran¶
Cython is a compiled hybrid of Python and C. The Cython version of the (lbatch) ridge integrand (see Faster Integrands) is:
# file: cython_ridge.pyx
import numpy as np
import vegas
# use exp from C
from libc.math cimport exp
@vegas.lbatchintegrand
def ridge(double[:, ::1] xbatch):
cdef int i # labels integration point
cdef int j # labels point on diagonal
cdef int d # labels direction
cdef int dim = xbatch.shape[1]
cdef int nbatch = xbatch.shape[0]
cdef int N = 1000
cdef double norm = (100. / np.pi) ** (dim / 2.) / N
cdef double dx2
cdef double[::1] x
cdef double[::1] ans = np.zeros(nbatch, float)
for i in range(nbatch):
# integrand for integration point x
x = xbatch[i]
for j in range(N):
x0 = 0.25 + 0.5 * j / (N - 1)
dx2 = (x[0] - x0) ** 2
for d in range(1, dim):
dx2 += (x[d] - x0) ** 2
ans[i] += exp(-100. * dx2)
ans[i] *= norm
return ans
We put this in a separate file called, say,
cython_ridge.pyx
, and compile it using
cythonize -i cython_ridge.pyx
This creates a compiled Python module (.so file) which can be imported and used with vegas:
import vegas
from cython_ridge import ridge
integ = vegas.Integrator(4 * [[0, 1]])
integ(ridge, nitn=10, neval=2e5)
result = integ(ridge, nitn=10, neval=2e5, adapt=False)
print('result = %s Q = %.2f' % (result, result.Q))
This code returns the following:
result = 1.00008(29) Q = 0.55
It runs about as fast as the numba
version
described in Faster Integrands (15 sec on a 2024 laptop).
Cython is much more flexible than numba
but the
code is typically more complicated. Another option,
more similar to numba
, is the Pythran compiler,
which compiles a subset of Python into optimized C++
code that is then compiled into a Python module (.so file).
To use Pythran, we write the integrand in low-level
Python and place it in a separate file called, say,
pythran_ridge.py
:
# in file pythran_ridge.py
import numpy as np
#pythran export ridge(float[:,:])
def ridge(xbatch):
" Ridge of N Gaussians distributed along part of the diagonal. "
dim = xbatch.shape[1]
N = 1000
norm = (100. / np.pi) ** (dim / 2.) / N
ans = np.zeros(len(xbatch), dtype=float)
for i in range(len(xbatch)):
# iterate over each integration point x in xbatch
x = xbatch[i]
for j in range(N):
x0 = 0.25 + 0.5 * j / (N - 1)
dx2 = (x[0] - x0) ** 2
for d in range(1, dim):
dx2 += (x[d] - x0) ** 2
ans[i] += np.exp(-100. * dx2)
ans[i] *= norm
return ans
The #pythran
line tells the compiler what to export
from the module.
This file is compiled by Pythran and converted into
an optimized Python module pythran_ridge
using:
pythran -DUSE_XSIMD -fopenmp pythran_ridge.py
The main code is then:
import vegas
from pythran_ridge import ridge
ridge = vegas.lbatchintegrand(ridge)
integ = vegas.Integrator(4 * [[0, 1]])
integ(ridge, nitn=10, neval=2e5)
result = integ(ridge, nitn=10, neval=2e5, adapt=False)
print('result = %s Q = %.2f' % (result, result.Q))
This runs as fast as the numba
and Cython versions.
GPUs¶
Another option for speeding up expensive integrands is to evaluate them using
GPUs. Vectorized numpy
code that is not too complicated is readily adapted
for GPUs using Python modules such
the jax
module (or mlx
for Apple hardware).
The following integrand ridge(x)
uses the jax
module to compile
and run the ridge
integrand (from the previous sections) on a GPU:
import jax
jnp = jax.numpy
@jax.jit
def _jridge(x):
N = 1000
x0 = jnp.linspace(0.25, 0.75, N)
dx2 = 0.0
for xd in x:
dx2 += (xd[None,:] - x0[:, None]) ** 2
return jnp.sum(jnp.exp(-100. * dx2), axis=0) * (100. / jnp.pi) ** (len(x) / 2.) / N
@vegas.rbatchintegrand
def ridge(x):
return _jridge(jnp.array(x))
Code for the mlx
module is almost identical:
import mlx.core as mx
@mx.compile
def _mridge(x):
N = 1000
x0 = mx.linspace(0.25, 0.75, N)
dx2 = 0.0
for xd in x:
dx2 += (xd[None,:] - x0[:, None]) ** 2
return mx.sum(mx.exp(-100. * dx2), axis=0) * (100. / mx.pi) ** (len(x) / 2.) / N
@vegas.rbatchintegrand
def ridge(x):
return _mridge(mx.array(x))
These integrands run 20 times faster than the
original (vectorized) batch integrand
(1 sec versus 20 sec using the built-in GPU on a 2024 laptop).
The speedup is substantial because this integrand is quite
costly to evaluate; the original batch integrand is just as fast as the GPU
versions when N=1
(instead of N=1000
).
C and Fortran¶
Older implementations of the vegas
algorithm
have been used extensively in C and Fortran codes. The Python
implementation described here uses a more powerful algorithm.
It is relatively straightforward to combine this version with integrands
coded in C or Fortran. Such integrands are usually substantially
faster than integrands coded directly in Python; they are similar in
speed to optimized Cython code.
There are
many ways to access C and Fortran integrands from Python. Here we
review a few of the options.
cffi
for C¶
The simplest way to access an integrand coded in C is to use the
cffi
module in Python. To illustrate, consider the following
integrand, written in C and stored in file cfcn.c
:
// file cfcn.c
#include <math.h>
double fcn(double x[], int dim)
{
int i;
double xsq = 0.0;
for(i=0; i<dim; i++)
xsq += x[i] * x[i] ;
return exp(-100. * sqrt(xsq)) * pow(100.,dim);
}
Running the following Python code creates (in file cfcn_cffi.c
) and
compiles code for a Python module called cfcn_cffi
that provides
access to fcn(x,dim)
:
# file cfcn_cffi-builder.py
from cffi import FFI
ffibuilder = FFI()
# specify functions, etc to be made available to Python
ffibuilder.cdef('double fcn(double x[], int dim);')
# specify code needed to build the Python module
ffibuilder.set_source(
module_name='cfcn_cffi',
source='double fcn(double x[], int dim);',
sources=['cfcn.c'], # other sources -- file containing fcn(x, dim)
libraries=['m'], # may need to specify the math library (-lm)
)
if __name__ == '__main__':
# create C code for module and compile it
ffibuilder.compile(verbose=True)
We integrate the function in this module by wrapping it in a
Python function (here f(x)
), which is integrated using vegas
:
import vegas
from cfcn_cffi import ffi, lib
def f(x):
_x = ffi.cast('double*', x.ctypes.data) # pointer to x's data
return lib.fcn(_x, 4)
def main():
integ = vegas.Integrator(4 * [[0., 1.]])
print(integ(f, neval=1e6, nitn=10).summary())
print(integ(f, neval=1e6, nitn=10).summary())
if __name__ == '__main__':
main()
The output shows 10 iterations that are used to adapt vegas
to the
integrand, and then an additional 10 iterations to generate the
final result:
itn integral wgt average chi2/dof Q
-------------------------------------------------------
1 4.1(2.1) 4.1(2.1) 0.00 1.00
2 7.403(42) 7.401(42) 2.52 0.11
3 7.366(27) 7.376(23) 1.51 0.22
4 7.4041(73) 7.4014(70) 1.48 0.22
5 7.4046(36) 7.4039(32) 1.15 0.33
6 7.4003(23) 7.4015(19) 1.09 0.37
7 7.4036(20) 7.4025(14) 1.01 0.42
8 7.4017(16) 7.4022(10) 0.89 0.52
9 7.4010(14) 7.40174(83) 0.84 0.57
10 7.4017(13) 7.40174(70) 0.74 0.67
itn integral wgt average chi2/dof Q
-------------------------------------------------------
1 7.4016(12) 7.4016(12) 0.00 1.00
2 7.4030(11) 7.40239(81) 0.79 0.37
3 7.4020(10) 7.40224(63) 0.44 0.64
4 7.40249(92) 7.40232(52) 0.31 0.82
5 7.40258(86) 7.40239(44) 0.25 0.91
6 7.40093(81) 7.40205(39) 0.70 0.62
7 7.40228(76) 7.40210(35) 0.60 0.73
8 7.40276(72) 7.40222(31) 0.61 0.75
9 7.40181(71) 7.40216(29) 0.57 0.80
10 7.40178(70) 7.40210(27) 0.53 0.85
The final estimate for the integral is 7.40210(27)
(about 10,000
times more accurate than the first iteration).
This code can be made substantially faster by converting the
integrand into a batch integrand. We do this by adding a batch
version of the integrand function to the cfcn_cffi
module:
# file cfcn_cffi-builder.py
from cffi import FFI
ffibuilder = FFI()
# specify functions, etc made available to Python
ffibuilder.cdef("""
void batch_fcn(double ans[], double x[], int n, int dim);
""")
# specify code needed to build the module
ffibuilder.set_source(
module_name='cfcn_cffi',
source="""
// code for module
double fcn(double x[], int dim);
void batch_fcn(double ans[], double x[], int n, int dim)
{
int i;
for(i=0; i<n; i++)
ans[i] = fcn(&x[i * dim], dim);
}
""",
sources=['cfcn.c'], # other sources -- file containing fcn(x, dim)
libraries=['m'], # may need to specify the math library (-lm)
)
if __name__ == '__main__':
# create C code for module and compile it
ffibuilder.compile(verbose=True)
The resulting module is then used by the following integration code:
import vegas
import numpy as np
from cfcn_cffi import ffi, lib
@vegas.batchintegrand
def batch_f(x):
n, dim = x.shape
ans = np.empty(n, float)
_x = ffi.cast('double*', x.ctypes.data)
_ans = ffi.cast('double*', ans.ctypes.data)
lib.batch_fcn(_ans, _x, n, dim)
return ans
def main():
integ = vegas.Integrator(4 * [[0., 1.]])
print(integ(batch_f, neval=1e6, nitn=10).summary())
print(integ(batch_f, neval=1e6, nitn=10).summary())
if __name__ == '__main__':
main()
Running this code gives identical results to those above, but in about 1/10 the time.
Obviously cffi
can also be used to access C++ functions by
creating C wrappers for those functions.
Another option for accessing C code is
the ctypes
module, but it is more complicated to use and typically
gives slower code.
cffi
for Fortran¶
Module cffi
can be used for Fortran integrands by calling the
Fortran function from C code. Consider a Fortran implementation of
the integrand discussed above, stored in file ffcn.f
:
c file ffcn.f
c
function fcn(x, dim)
integer i, dim
real*8 x(dim), xsq, fcn
xsq = 0.0
do i=1,dim
xsq = xsq + x(i) ** 2
end do
fcn = exp(-100. * sqrt(xsq)) * 100. ** dim
return
end
This file is compiled into ffcn.o
using something like
gfortran -c ffcn.c -o ffcn.o
The cffi
build script (batch version) from the previous section can be
used here with only three modifications: 1) the Fortran function must be
compiled separately and so is included in a list of object files
(extra_objects
); 2) the name of the Fortran function may have an extra
underscore (or other modification, depending on the compilers used; on UNIX
systems nm ffcn.o
lists the names in the file); and 3) the Fortran
function’s arguments are passed by address and so are pointers in C. The
modified script is:
# file ffcn_cffi-builder.py
from cffi import FFI
ffibuilder = FFI()
# specify functions, etc needed by Python
ffibuilder.cdef("""
void batch_fcn(double ans[], double x[], int n, int dim);
""")
# specify code needed to build the module
ffibuilder.set_source(
module_name='ffcn_cffi',
source="""
// code for module
double fcn_(double* x, int* dim); // Fortran function in ffcn.o
void batch_fcn(double ans[], double x[], int n, int dim)
{
int i;
for(i=0; i<n; i++)
ans[i] = fcn_(&x[i * dim], &dim);
}
""",
extra_objects=['ffcn.o'], # compiled Fortran
libraries=['m'], # may need to specify the math library (-lm)
)
if __name__ == "__main__":
# create C code for module and compile it
ffibuilder.compile(verbose=True)
The new Python module is used exactly
the same way as for C code, with module ffcn_cffi
replacing
module cfcn_cffi
(see the previous section):
import vegas
import numpy as np
from ffcn_cffi import ffi, lib
@vegas.batchintegrand
def batch_f(x):
n, dim = x.shape
ans = np.empty(n, float)
_x = ffi.cast('double*', x.ctypes.data)
_ans = ffi.cast('double*', ans.ctypes.data)
lib.batch_fcn(_ans, _x, n, dim)
return ans
def main():
integ = vegas.Integrator(4 * [[0., 1.]])
print(integ(batch_f, neval=1e6, nitn=10).summary())
print(integ(batch_f, neval=1e6, nitn=10).summary())
if __name__ == '__main__':
import numpy as np
np.random.seed(12)
This code runs about as fast as the corresponding C code in the previous section.
Cython for C (or C++ or Fortran)¶
A more flexible interface to a C integrand can be
created using Cython. To increase efficiency,
we use Cython code in file cfcn.pyx
to convert the original
function (in cfcn.c
) into a batch integral (again):
# file cfcn.pyx
import numpy as np
import vegas
cdef extern double fcn (double[] x, int n)
@vegas.batchintegrand
def batch_f(double[:, ::1] x):
cdef double[:] ans
cdef int i, dim=x.shape[1]
ans = np.empty(x.shape[0], type(x[0,0]))
for i in range(x.shape[0]):
ans[i] = fcn(&x[i, 0], dim)
return ans
We also have to tell Cython how to construct the cfcn
Python
module since that module needs to include compiled code
from cfcn.c
. This is done with a .pyxbld file:
# file cfcn.pyxbld
import numpy as np
def make_ext(modname, pyxfilename):
from distutils.extension import Extension
return Extension(name = modname,
sources=[pyxfilename, 'cfcn.c'],
libraries=[],
include_dirs=[np.get_include()],
)
def make_setup_args():
return dict()
Finally the integral is evaluated using the Python code
import vegas
# compile cfcn, if needed, at import
import pyximport
pyximport.install(inplace=True)
import cfcn
def main():
integ = vegas.Integrator(4 *[[0,1]])
print(integ(cfcn.batch_f, neval=1e4, nitn=10).summary())
print(integ(cfcn.batch_f, neval=1e4, nitn=10).summary())
if __name__ == '__main__':
main()
where, again, pyximport
guarantees that the cfcn
module
is compiled the first time the code is run.
This implementation is as fast as the other batch implementations, above.
Cython also works with C++. It can also work with Fortran code, analogously
to cffi
.
f2py
for Fortran¶
The f2py
package, which is distributed with numpy
,
makes it relatively easy to compile Fortran
code directly into Python modules. Using the same Fortran code as
above (in ffcn.f
), the code is compiled into a Python module using
f2py -m ffcn -c ffcn.f
and the resulting module provides access to the integrand from Python:
import vegas
import ffcn
def main():
integ = vegas.Integrator(4 *[[0,1]])
print(integ(ffcn.fcn, neval=1e4, nitn=10).summary())
print(integ(ffcn.fcn, neval=1e4, nitn=10).summary())
if __name__ == '__main__':
main()
Again you can make the code somewhat faster by converting the integrand
into a batch integrand inside the Fortran module. Adding the following
function to the end of file ffcn.f
above :
c part 2 of file ffcn.f --- batch form of integrand
subroutine batch_fcn(ans, x, dim, nbatch)
integer dim, nbatch, i, j
real*8 x(nbatch, dim), xi(dim), ans(nbatch), fcn
cf2py intent(out) ans
do i=1,nbatch
do j=1,dim
xi(j) = x(i, j)
end do
ans(i) = fcn(xi, dim)
end do
end
results in a second Python function ffcn.batch_fcn(x)
that takes the
integration points x[i,d]
as input and returns an array of
integrand values ans[i]
. (The second Fortran comment tells f2py
that array ans
should be returned by the correponding Python
function; f2py
also has the function automatically deduce dim
and
nbatch
from the shape of x
.)
The correponding Python script for doing the integral
is then:
import vegas
import ffcn_f2py
import numpy as np
def main():
integ = vegas.Integrator(4 *[[0,1]])
batch_fcn = vegas.batchintegrand(ffcn_f2py.batch_fcn)
print(integ(batch_fcn, neval=1e6, nitn=10).summary())
print(integ(batch_fcn, neval=1e6, nitn=10).summary())
if __name__ == '__main__':
main()
This runs roughly twice as fast as the original, and about the same speed as the batch versions of the C code, above.