How do I stack vectors of different lengths in NumPy?

In general, there is an ambiguity in putting together arrays of different length because alignment of data might matter. Pandas has different advanced solutions to deal with that, e.g. to merge series into dataFrames.

If you just want to populate columns starting from first element, what I usually do is build a matrix and populate columns. Of course you need to fill the empty spaces in the matrix with a null value (in this case np.nan)

a = ones((3,))
b = ones((2,))
arraylist=[a,b]

outarr=np.ones((np.max([len(ps) for ps in arraylist]),len(arraylist)))*np.nan #define empty array
for i,c in enumerate(arraylist):  #populate columns
    outarr[:len(c),i]=c

In [108]: outarr
Out[108]: 
array([[  1.,   1.],
       [  1.,   1.],
       [  1.,  nan]])

Short answer: you can't. NumPy does not support jagged arrays natively.

Long answer:

>>> a = ones((3,))
>>> b = ones((2,))
>>> c = array([a, b])
>>> c
array([[ 1.  1.  1.], [ 1.  1.]], dtype=object)

gives an array that may or may not behave as you expect. E.g. it doesn't support basic methods like sum or reshape, and you should treat this much as you'd treat the ordinary Python list [a, b] (iterate over it to perform operations instead of using vectorized idioms).

Several possible workarounds exist; the easiest is to coerce a and b to a common length, perhaps using masked arrays or NaN to signal that some indices are invalid in some rows. E.g. here's b as a masked array:

>>> ma.array(np.resize(b, a.shape[0]), mask=[False, False, True])
masked_array(data = [1.0 1.0 --],
             mask = [False False  True],
       fill_value = 1e+20)

This can be stacked with a as follows:

>>> ma.vstack([a, ma.array(np.resize(b, a.shape[0]), mask=[False, False, True])])
masked_array(data =
 [[1.0 1.0 1.0]
 [1.0 1.0 --]],
             mask =
 [[False False False]
 [False False  True]],
       fill_value = 1e+20)

(For some purposes, scipy.sparse may also be interesting.)

Tags:

Python

Numpy