shape vs len for numpy array

From the source code, it looks like shape basically uses len(): https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py

@property
def shape(self) -> Tuple[int, int]:
    return len(self.index), len(self.columns)

def __len__(self) -> int:
    return len(self.index)

Calling shape will attempt to run both dim calcs. So maybe df.shape[0] + df.shape[1] is slower than len(df.index) + len(df.columns). Still, performance-wise, the difference should be negligible except for a giant giant 2D dataframe.

So in line with the previous answers, df.shape is good if you need both dimensions, for a single dimension, len() seems more appropriate conceptually.

Looking at property vs method answers, it all points to usability and readability of code. So again, in your case, I would say if you want information about the whole dataframe just to check or for example to pass the shape tuple to a function, use shape. For a single column, including index (i.e. the rows of a df), use len().

I wouldn't worry about performance here - any differences should only be very marginal.

I'd say the more pythonic alternative is probably the one which matches your needs more closely:

a.shape may contain more information than len(a) since it contains the size along all axes whereas len only returns the size along the first axis:

>>> a = np.array([[1,2,3,4], [1,2,3,4]])
>>> len(a)
2
>>> a.shape
(2L, 4L)

If you actually happen to work with one-dimensional arrays only, than I'd personally favour using len(a) in case you explicitly need the array's size.

shape vs len for numpy array

Tags:

Python

Numpy

Related

Recent Posts