# Speed comparison among numpy, cython, numba and tensorflow 2.0

## Speed comparison among numpy, cython, numba and tensorflow 2.0

Recently I have been working on speeding up some codes in pymatgen for finding the atomic neighbors within a cutoff radius. I was searching online and found that `cython`

is a rather powerful tool for accelerating python loops, and decided to give it a try.

A common comparison for `cython`

is `numba`

and I have heard many good things about it. A less common competitor is the recently released `tensorflow 2.0`

. In fact, back in the `tensorflow 1.x`

era, I did some simple comparisons and found that the speed was in fact faster than `numpy`

. The new `tensorflow 2.0`

is boasted to be 3x faster than `tensorflow 1.x`

, and it makes me wonder how faster would `tensorflow 2.0`

be for some simple computing tasks.

### Function decorate to record time

I like to do simple things myself so that I know what exactly happens in the code. So I am writing a timeit decorator instead of using `timeit`

package.

```
from time import time
import functools
def timeit(n=10):
"""
Decorator to run function n times and print out the total time elapsed.
"""
def dec(func):
@functools.wraps(func)
def wrapped(*args, **kwargs):
t0 = time()
for i in range(n):
func(*args, **kwargs)
print("%s iterated %d times\nTime elapsed %.3fs\n" % (
func.__name__, n, time() - t0))
return wrapped
return dec
```

### Computing functions using different methods

Here I am computing

\[matrix[i, j] = i^2 + j^2\]

for a matrix of size `[m, n]`

```
# import numba, tensorflow and numpy, load cython
import numba
import tensorflow as tf
import numpy as np
%load_ext cython
```

```
@tf.function
def compute_tf(m, n):
print('Tracing ', m, n)
x1 = tf.range(0, m-1, 1) ** 2
x2 = tf.range(0, n-1, 1) ** 2
return x1[:, None] + x2[None, :]
compute_tf(tf.constant(1), tf.constant(1)) # trace once
```

```
Tracing Tensor("m:0", shape=(), dtype=int32) Tensor("n:0", shape=(), dtype=int32)
<tf.Tensor: id=261, shape=(0, 0), dtype=int32, numpy=array([], shape=(0, 0), dtype=int32)>
```

I used the `tf.function`

decorate to define the graph and avoided repeated tracing the graph by using `tf.constant`

as input and perform the initial graph tracing. You will see that running this function will not invoke the `print`

function. It is only traced once

```
def compute_numpy(m, n):
x1 = np.linspace(0., m-1, m) ** 2
x2 = np.linspace(0., n-1, n) ** 2
return x1[:, None] + x2[None, :]
```

```
@numba.njit
def compute_numba(m, n):
x = np.empty((m, n))
for i in range(m):
for j in range(n):
x[i, j] = i**2 + j**2
return x
compute_numba(1, 1) # JIT compile first
```

```
@numba.njit(parallel=True)
def compute_numba_parallel(m, n):
x = np.empty((m, n))
for i in numba.prange(m):
for j in numba.prange(n):
x[i, j] = i**2 + j**2
return x
compute_numba_parallel(1, 1) # JIT compile first
```

```
array([[0.]])
```

Numpy and numba are almost the same. `numba`

is really handy in terms of turning on parallel computations.

```
%%cython
cimport cython
import numpy as np
cimport numpy as np
@cython.boundscheck(False)
@cython.wraparound(False)
def compute_cython(int m, int n):
cdef long [:, ::1] x = np.empty((m, n), dtype=int)
cdef int i, j
for i in range(m):
for j in range(n):
x[i, j] = i*i +j*j
return x
```

`cython`

needs more work and i am delegating the memory management to `numpy`

here and use `memoryview`

x. Basically it is like `C`

. Note that `cython`

can also turn on parallel computations like `numba`

by using `cython.parallel.prange`

. However it does require `openmp`

, which does not ship with `clang`

compiler in macos. So I am not testing the parallel version here.

### Results

```
m = 2000
n = 10000
n_loop = 10
timeit(n=n_loop)(compute_numpy)(m, n)
timeit(n=n_loop)(compute_numba)(m, n)
timeit(n=n_loop)(compute_numba_parallel)(m, n)
timeit(n=n_loop)(compute_cython)(m, n)
timeit(n=n_loop)(compute_tf)(tf.constant(m), tf.constant(n))
```

```
compute_numpy iterated 10 times
Time elapsed 0.971s
compute_numba iterated 10 times
Time elapsed 1.110s
compute_numba_parallel iterated 10 times
Time elapsed 0.651s
compute_cython iterated 10 times
Time elapsed 1.098s
compute_tf iterated 10 times
Time elapsed 0.190s
```

### Conclusion

`Tensorflow 2.0`

is amazing.

## Leave a comment