numpy, named columns

I know this is an old question, but a more recently available option would be to try using pandas. The DataFrame type is designed for structured data like this, where columns are named and can be of different types.


NumPy structured arrays have named columns:

import numpy as np
    
a = range(100)
A = np.array(list(zip(*[iter(a)] * 2)), dtype=[('C1', 'int32'),('C2', 'int64')])
print(A.dtype)
[('C1', '<i4'), ('C2', '<i8')]

You can access the columns by name like this:

print(A['C1'])
# [ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48
#  50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98]

Note that using np.array with zip causes NumPy to build an array from a temporary list of tuples. Python lists of tuples use a lot more memory than equivalent NumPy arrays. So if your array is very large you may not want to use zip.

Instead, given a NumPy array A, you could use ravel() to make A a 1D array, and then use view to turn it into a structured array, and then use astype to convert the columns to the desired type:

a = range(100)
A = np.array(a).reshape( len(a)//2, 2)
A = A.ravel().view([('col1','i8'),('col2','i8'),]).astype([('col1','i4'),('col2','i8'),])
print(A[:5])
# array([(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)], 
#       dtype=[('col1', '<i4'), ('col2', '<i8')])

print(A.dtype)
# dtype([('col1', '<i4'), ('col2', '<i8')])

Tags:

Python

Numpy