Converting numpy arrays of arrays into one whole numpy array

Perhaps late to the party, but I believe the most efficient approach is:

np.array(arr.tolist())

To give some idea of how it would work:

import numpy as np


N, M, K = 4, 3, 2
arr = np.empty((N,), dtype=object)
for i in range(N):
    arr[i] = np.full((M, K), i)


print(arr)
# [array([[0, 0],
#        [0, 0],
#        [0, 0]])
#  array([[1, 1],
#        [1, 1],
#        [1, 1]])
#  array([[2, 2],
#        [2, 2],
#        [2, 2]])
#  array([[3, 3],
#        [3, 3],
#        [3, 3]])]


new_arr = np.array(arr.tolist())
print(new_arr)
# [[[0 0]
#   [0 0]
#   [0 0]]

#  [[1 1]
#   [1 1]
#   [1 1]]

#  [[2 2]
#   [2 2]
#   [2 2]]

#  [[3 3]
#   [3 3]
#   [3 3]]]

...and the timings:

%timeit np.array(arr.tolist())
# 100000 loops, best of 3: 2.48 µs per loop
%timeit np.concatenate(arr).reshape(N, M, K)
# 100000 loops, best of 3: 3.28 µs per loop
%timeit np.array([x for x in arr])
# 100000 loops, best of 3: 3.32 µs per loop

np.concatenate should do the trick:

Make an object array of arrays:

In [23]: arr=np.empty((4,),dtype=object)
In [24]: for i in range(4):arr[i]=np.ones((2,2),int)*i
In [25]: arr
Out[25]: 
array([array([[0, 0],
       [0, 0]]), array([[1, 1],
       [1, 1]]),
       array([[2, 2],
       [2, 2]]), array([[3, 3],
       [3, 3]])], dtype=object)

In [28]: np.concatenate(arr)
Out[28]: 
array([[0, 0],
       [0, 0],
       [1, 1],
       [1, 1],
       [2, 2],
       [2, 2],
       [3, 3],
       [3, 3]])

Or with a reshape:

In [26]: np.concatenate(arr).reshape(4,2,2)
Out[26]: 
array([[[0, 0],
        [0, 0]],

       [[1, 1],
        [1, 1]],

       [[2, 2],
        [2, 2]],

       [[3, 3],
        [3, 3]]])
In [27]: _.shape
Out[27]: (4, 2, 2)

concatenate effectively treats its input as a list of arrays. So it works regardless of whether this is an object array, a list, or 3d array.

This can't be done simply with a reshape. arr is an array of pointers - pointing to arrays located elsewhere in memory. To get a single 3d array, all of the pieces will have to be copied into one buffer. That's what concatenate does - it creates a large empty file, and copies each array, but it does it in compiled code.


np.array does not change it:

In [37]: np.array(arr).shape
Out[37]: (4,)

but treating arr as a list of arrays does work (but is slower than the concatenate version - array analyses its inputs more).

In [38]: np.array([x for x in arr]).shape
Out[38]: (4, 2, 2)

I had the same issue extracting a column from a Pandas DataFrame containing an array in each row:

joined["ground truth"].values
# outputs
array([array([0, 0, 0, 0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0, 0, 0, 0]),
       array([0, 0, 0, 0, 0, 0, 0, 0]), ...,
       array([0, 0, 0, 0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0, 0, 0, 0]),
       array([0, 0, 0, 0, 0, 0, 0, 0])], dtype=object)

np.concatenate didn't help because it merged the arrays into a flat array (same as np.hstack). Instead, I needed to vertically stack them with np.vstack:

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])