Axes labels within numpy arrays

numpy arrays are the abstract objects that you can use to build labeled tables and plots. pandas pushes the table and data series angle, matplotlib the plotting angle. And for large scale data storage, such as generated by supercomputer models, there are systems like NETCDF and HDF5.

You might want to look at how HDF5 handles dimension scales, and how h5py gives you access to them in numpy.

http://docs.h5py.org/en/latest/high/dims.html

Datasets are multidimensional arrays. HDF5 provides support for labeling the dimensions and associating one or “dimension scales” with each dimension. A dimension scale is simply another HDF5 dataset.

Creating an array from axes is a common numpy task. np.arange and np.linspace create 1d arrays, np.meshgrid, mgrid and ogrid create 2d (or larger) arrays, which in turn are used to calculate values on a grid. Note that meshgrid allows you to specify ij or xy styles, reflecting two conventions, rows/columns v plot horizontal/vertical axes.

 X, Y = np.meshgrid(x,y)
 z = my_function(X,Y)

but plotting functions can take various forms of input:

 plot(x, y, z)   # 2 1d arrays and a 2d
 scatter(X,Y,Z)  # 3 2d arrays
 scatter(XYZ)    # 1 Nx3 array

So while this is a connection between the generating arrays and dependent one, this is a higher level of organization, one that your code has to maintain, not something that numpy does for you.

A comment mentioned structured arrays. That can replace the columns of a 2d array with named fields (and by extension to higher dimensions), but it is most useful when working with diverse data loaded from CSV files. They are more like the fields of SQL tables than the y coordinate of a plot.


What you may be looking for is xarray.


From its documentation:

xarray: N-D labeled arrays and datasets in Python

xarray (formerly xray) is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!

Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays, which allows for a more intuitive, more concise, and less error-prone developer experience. The package includes a large and growing library of domain-agnostic functions for advanced analytics and visualization with these data structures.

Xarray was inspired by and borrows heavily from pandas, the popular data analysis package focused on labelled tabular data. It is particularly tailored to working with netCDF files, which were the source of xarray’s data model, and integrates tightly with dask for parallel computing.


Here are the two options I could see working:

  1. Some insane 5D Numpy array with loads of repeating elements and sub elements. (which will absolutely not help at all for one of the primary problems you are trying to solve, namely ease of indexing.)
  2. Or Multi-Indexing and Pivot-Tables from Pandas

Pandas Docs - Intro documentation for Pandas generally if you're unused to the library. This link will take you to the hierarchical-indexing / pivot table section

Pandas Docs - Multi/Hierarchical Indexing

Pandas Docs - Pivot Tables, Stacking and Unstacking

An article of a worked example with LoTR script data

Without some example data to know exactly what you're working with I could only copy paste some example code from these links.

True, it is Pandas. Which you said wouldn't meet your needs. But the thing you pushed back against was the limitation of only having labels on one axis. Hierarchical indexing is Pandas answer to this problem and Pivot tables give you easy (if initially obscure) methods for reshaping your data into the arrangements you need for a given purpose.

Also the accessing of elements and subgroups of data is incredibly easy, which was one of your main requirements.

From what I am aware of it also maintains the high mathematical performance typical of Pandas.

Tags:

Python

Numpy