Python libraries to calculate human readable filesize from bytes?

This isn't really hard to implement yourself:

suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
def humansize(nbytes):
    i = 0
    while nbytes >= 1024 and i < len(suffixes)-1:
        nbytes /= 1024.
        i += 1
    f = ('%.2f' % nbytes).rstrip('0').rstrip('.')
    return '%s %s' % (f, suffixes[i])

Examples:

>>> humansize(131)
'131 B'
>>> humansize(1049)
'1.02 KB'
>>> humansize(58812)
'57.43 KB'
>>> humansize(68819826)
'65.63 MB'
>>> humansize(39756861649)
'37.03 GB'
>>> humansize(18754875155724)
'17.06 TB'

This is not necessary faster than the @nneonneo solution, it's just a bit cooler, if I can say that :)

import math

suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']

def human_size(nbytes):
  human = nbytes
  rank = 0
  if nbytes != 0:
    rank = int((math.log10(nbytes)) / 3)
    rank = min(rank, len(suffixes) - 1)
    human = nbytes / (1024.0 ** rank)
  f = ('%.2f' % human).rstrip('0').rstrip('.')
  return '%s %s' % (f, suffixes[rank])

This works based on the fact that the integer part of a logarithm with base 10 of any number is one less than the actual number of digits. The rest is pretty much straight forward.


Disclaimer: I wrote the package I'm about to describe

The module bitmath supports the functionality you've described. It also addresses the comment made by @filmore, that semantically we should be using NIST unit prefixes (not SI), that is to say, MiB instead of MB. rounding is now supported as well.

You originally asked about:

print size(4026, system=alternative)

in bitmath the default prefix-unit system is NIST (1024 based), so, assuming you were referring to 4026 bytes, the equivalent solution in bitmath would look like any of the following:

In [1]: import bitmath

In [2]: print bitmath.Byte(bytes=4026).best_prefix()
3.931640625KiB

In [3]: human_prefix = bitmath.Byte(bytes=4026).best_prefix()

In [4]: print human_prefix.format("{value:.2f} {unit}")
3.93 KiB

I currently have an open task to allow the user to select a preferred prefix-unit system when using the best_prefix method.

Update: 2014-07-16 The latest package has been uploaded to PyPi, and it includes several new features (full feature list is on the GitHub page)


I used to reinvent the wheel every time I wrote a little script or ipynb or whatever. It got trite, so I wrote the datasize python module. I'm posting this here because I just updated it, and wow have the Python versions moved up!

It is a DataSize class, which subclasses int, so arithmetic just works, however it returns int from arithmetic because I use it with Pandas and some numpy, and I didn't want to slow things down when there is python<-->C++ translation for matrix math libraries.

You can construct a DataSize object using a string with either SI or NIST suffixes in either bits or bytes, and even wierd word lengths if you need to work with data for embedded tech that uses those. The DataSize object has an intuitive format() code syntax for human-readable representation. Internally the value is just an integer count of 8-bit bytes.

eg.

>>> from datasize import DataSize
>>> 'My new {:GB} SSD really only stores {:.2GiB} of data.'.format(DataSize('750GB'),DataSize(DataSize('750GB') * 0.8))
'My new 750GB SSD really only stores 558.79GiB of data.'