Saving in a file an array or DataFrame together with other information

There are many options. I will discuss only HDF5, because I have experience using this format.

Advantages: Portable (can be read outside of Python), native compression, out-of-memory capabilities, metadata support.

Disadvantages: Reliance on single low-level C API, possibility of data corruption as a single file, deleting data does not reduce size automatically.

In my experience, for performance and portability, avoid pyTables / HDFStore to store numeric data. You can instead use the intuitive interface provided by h5py.

Store an array

import h5py, numpy as np

arr = np.random.randint(0, 10, (1000, 1000))

f = h5py.File('file.h5', 'w', libver='latest')  # use 'latest' for performance

dset = f.create_dataset('array', shape=(1000, 1000), data=arr, chunks=(100, 100),
                        compression='gzip', compression_opts=9)

Compression & chunking

There are many compression choices, e.g. blosc and lzf are good choices for compression and decompression performance respectively. Note gzip is native; other compression filters may not ship by default with your HDF5 installation.

Chunking is another option which, when aligned with how you read data out-of-memory, can significantly improve performance.

Add some attributes

dset.attrs['Description'] = 'Some text snippet'
dset.attrs['RowIndexArray'] = np.arange(1000)

Store a dictionary

for k, v in d.items():
    f.create_dataset('dictgroup/'+str(k), data=v)

Out-of-memory access

dictionary = f['dictgroup']
res = dictionary['my_key']

There is no substitute for reading the h5py documentation, which exposes most of the C API, but you should see from the above there is a significant amount of flexibility.