Python numpy array vs list

Your first example could be speed up. Python loop and access to individual items in a numpy array are slow. Use vectorized operations instead:

import numpy as np
x = np.arange(1000000).cumsum()

You can put unbounded Python integers to numpy array:

a = np.array([0], dtype=object)
a[0] += 1232234234234324353453453

Arithmetic operations compared to fixed-sized C integers would be slower in this case.


Do array.array or numpy.array offer significant performance boost over typical arrays?

I tried to test this a bit with the following code:

import timeit, math, array
from functools import partial
import numpy as np

# from the question
def calc1(x):
    for i in range(1,len(x)):
        x[i] = x[i-1] + 1

# a floating point operation
def calc2(x):
    for i in range(0,len(x)):
        x[i] = math.sin(i)

L = int(1e5)

# np
print('np 1: {:.5f} s'.format(timeit.timeit(partial(calc1, np.array([0] * L)), number=20)))
print('np 2: {:.5f} s'.format(timeit.timeit(partial(calc2, np.array([0] * L)), number=20)))

# np but with vectorized form
vfunc = np.vectorize(math.sin)
print('np 2 vectorized: {:.5f} s'.format(timeit.timeit(partial(vfunc, np.arange(0, L)), number=20)))

# with list
print('list 1: {:.5f} s'.format(timeit.timeit(partial(calc1, [0] * L), number=20)))
print('list 2: {:.5f} s'.format(timeit.timeit(partial(calc2, [0] * L), number=20)))

# with array
print('array 1: {:.5f} s'.format(timeit.timeit(partial(calc1, array.array("f", [0] * L)), number=20)))
print('array 2: {:.5f} s'.format(timeit.timeit(partial(calc2, array.array("f", [0] * L)), number=20)))

And the results were that list executes fastest here (Python 3.3, NumPy 1.8):

np 1: 2.14277 s
np 2: 0.77008 s
np 2 vectorized: 0.44117 s
list 1: 0.29795 s
list 2: 0.66529 s
array 1: 0.66134 s
array 2: 0.88299 s

Which seems to be counterintuitive. There doesn't seem to be any advantage in using numpy or array over list for these simple examples.


You first need to understand the difference between arrays and lists.

An array is a contiguous block of memory consisting of elements of some type (e.g. integers).

You cannot change the size of an array once it is created.
It therefore follows that each integer element in an array has a fixed size, e.g. 4 bytes.

On the other hand, a list is merely an "array" of addresses (which also have a fixed size).

But then each element holds the address of something else in memory, which is the actual integer that you want to work with. Of course, the size of this integer is irrelevant to the size of the array. Thus you can always create a new (bigger) integer and "replace" the old one without affecting the size of the array, which merely holds the address of an integer.

Of course, this convenience of a list comes at a cost: Performing arithmetic on the integers now requires a memory access to the array, plus a memory access to the integer itself, plus the time it takes to allocate more memory (if needed), plus the time required to delete the old integer (if needed). So yes, it can be slower, so you have to be careful what you're doing with each integer inside an array.


For most uses, lists are useful. Sometimes working with numpy arrays may be more convenient for example.

a=[1,2,3,4,5,6,7,8,9,10]

b=[5,8,9]

Consider a list 'a' and if you want access the elements in a list at discrete indices given in list 'b' writing

a[b]

will not work.

but when you use them as arrays, you can simply write

a[b]

to get the output as array([6,9,10]).