Get year, month or day from numpy datetime64

This is how I do it.

import numpy as np

def dt2cal(dt):
    """
    Convert array of datetime64 to a calendar array of year, month, day, hour,
    minute, seconds, microsecond with these quantites indexed on the last axis.

    Parameters
    ----------
    dt : datetime64 array (...)
        numpy.ndarray of datetimes of arbitrary shape

    Returns
    -------
    cal : uint32 array (..., 7)
        calendar array with last axis representing year, month, day, hour,
        minute, second, microsecond
    """

    # allocate output 
    out = np.empty(dt.shape + (7,), dtype="u4")
    # decompose calendar floors
    Y, M, D, h, m, s = [dt.astype(f"M8[{x}]") for x in "YMDhms"]
    out[..., 0] = Y + 1970 # Gregorian Year
    out[..., 1] = (M - Y) + 1 # month
    out[..., 2] = (D - M) + 1 # dat
    out[..., 3] = (dt - D).astype("m8[h]") # hour
    out[..., 4] = (dt - h).astype("m8[m]") # minute
    out[..., 5] = (dt - m).astype("m8[s]") # second
    out[..., 6] = (dt - s).astype("m8[us]") # microsecond
    return out

It's vectorized across arbitrary input dimensions, it's fast, its intuitive, it works on numpy v1.15.4, it doesn't use pandas.

I really wish numpy supported this functionality, it's required all the time in application development. I always get super nervous when I have to roll my own stuff like this, I always feel like I'm missing an edge case.


There should be an easier way to do this, but, depending on what you're trying to do, the best route might be to convert to a regular Python datetime object:

datetime64Obj = np.datetime64('2002-07-04T02:55:41-0700')
print datetime64Obj.astype(object).year
# 2002
print datetime64Obj.astype(object).day
# 4

Based on comments below, this seems to only work in Python 2.7.x and Python 3.6+


As datetime is not stable in numpy I would use pandas for this:

In [52]: import pandas as pd

In [53]: dates = pd.DatetimeIndex(['2010-10-17', '2011-05-13', "2012-01-15"])

In [54]: dates.year
Out[54]: array([2010, 2011, 2012], dtype=int32)

Pandas uses numpy datetime internally, but seems to avoid the shortages, that numpy has up to now.


I find the following tricks give between 2x and 4x speed increase versus the pandas method described in this answer (i.e. pd.DatetimeIndex(dates).year etc.). The speed of [dt.year for dt in dates.astype(object)] I find to be similar to the pandas method. Also these tricks can be applied directly to ndarrays of any shape (2D, 3D etc.)

dates = np.arange(np.datetime64('2000-01-01'), np.datetime64('2010-01-01'))
years = dates.astype('datetime64[Y]').astype(int) + 1970
months = dates.astype('datetime64[M]').astype(int) % 12 + 1
days = dates - dates.astype('datetime64[M]') + 1