Creating a structured array from a list

np.array() function accepts list of list as input. So if you want to create a 2 * 2 matrix, for example, this is what you need to do

X = np.array([[1,2], [3,4]])

Details of how np.array handles various inputs are buried in compiled code. As the many questions about creating object dtype arrays show, it can be complicated and confusing. The basic model is to create multidimensional numeric array from a nested list.

np.array([[1,2,3],[4,5,6]])

In implementing structured arrays, developers adopted the tuple as a way of distinguishing a record from just another nested dimension. That is evident in the display of a structured array.

It is also a requirement when defining a structured array, though the list of tuples requirement is somewhat buried in the documentation.

In [382]: dt=np.dtype([('y',int)])
In [383]: np.array(alist,dt)

TypeError: a bytes-like object is required, not 'int'

This is my version '1.12.0' error message. It appears to be different in yours.

As you note a list comprehension can convert the nest list into a list of tuples.

In [384]: np.array([tuple(i) for i in alist],dt)
Out[384]: 
array([(1,), (2,), (3,)], 
      dtype=[('y', '<i4')])

In answering SO questions that's the approach I use most often. Either that or iteratively set fields of a preallocated array (usually there are a lot more records than fields, so that loop is not expensive).

It looks like wrapping the array in an structured array call is equivalent to an astype call:

In [385]: np.array(np.array(alist),dt)
Out[385]: 
array([[(1,)],
       [(2,)],
       [(3,)]], 
      dtype=[('y', '<i4')])
In [386]: np.array(alist).astype(dt)
Out[386]: 
array([[(1,)],
       [(2,)],
       [(3,)]], 
      dtype=[('y', '<i4')])

But note the change in the number of dimensions. The list of tuples created a (3,) array. The astype converted a (3,1) numeric array into a (3,1) structured array.

Part of what the tuples tell np.array is - put the division between array dimensions and records 'here'. It interprets

[(3,), (1,), (2,)]
[record, record, record]

where as automatic translation of [[1],[2],[3]] might produce

[[record],[record],[record]]

When the dtype is numeric (non-structured) it ignores the distinction between list and tuple

In [388]: np.array([tuple(i) for i in alist],int)
Out[388]: 
array([[1],
       [2],
       [3]])

But when the dtype is compound, developers have chosen to use the tuple layer as significant information.


Consider a more complex structured dtype

In [389]: dt1=np.dtype([('y',int,(2,))])
In [390]: np.ones((3,), dt1)
Out[390]: 
array([([1, 1],), ([1, 1],), ([1, 1],)], 
      dtype=[('y', '<i4', (2,))])
In [391]: np.array([([1,2],),([3,4],)])
Out[391]: 
array([[[1, 2]],

       [[3, 4]]])
In [392]: np.array([([1,2],),([3,4],)], dtype=dt1)
Out[392]: 
array([([1, 2],), ([3, 4],)], 
      dtype=[('y', '<i4', (2,))])

The display (and input) has lists within tuples within list. And that's just the start

In [393]: dt1=np.dtype([('x',dt,(2,))])
In [394]: dt1
Out[394]: dtype([('x', [('y', '<i4')], (2,))])
In [395]: np.ones((2,),dt1)
Out[395]: 
array([([(1,), (1,)],), ([(1,), (1,)],)], 
      dtype=[('x', [('y', '<i4')], (2,))])

convert list of tuples to structured numpy array