how to determine the filename of content downloaded with HTTP in Python?

The rfc6266 library appears to do exactly what you need. It can parse raw headers, requests responses, and urllib2 responses. It's on PyPI.

Some examples:

>>> import rfc6266, requests
>>> rfc6266.parse_headers('''Attachment; filename=example.html''').filename_unsafe
'example.html'
>>> rfc6266.parse_headers('''INLINE; FILENAME= "an example.html"''').filename_unsafe
'an example.html'
>>> rfc6266.parse_headers(
    '''attachment; '''
    '''filename*= UTF-8''%e2%82%ac%20rates''').filename_unsafe
'€ rates'
>>> rfc6266.parse_headers(
    '''attachment; '''
    '''filename="EURO rates"; '''
    '''filename*=utf-8''%e2%82%ac%20rates''').filename_unsafe
'€ rates'
>>> r = requests.get('http://example.com/€ rates')
>>> rfc6266.parse_requests_response(r).filename_unsafe
'€ rates'

As a note, though: this library does not like nonstandard whitespace in the header.


if you don't really need the result in utf-8

def getFilename(s):
  fname = re.findall("filename\*?=([^;]+)", s, flags=re.IGNORECASE)
  print fname[0].strip().strip('"')

but if utf-8 is a must

def getFilename(s):
    fname = re.findall("filename\*=([^;]+)", s, flags=re.IGNORECASE)
    if not fname:
        fname = re.findall("filename=([^;]+)", s, flags=re.IGNORECASE)
    if "utf-8''" in fname[0].lower():
        fname = re.sub("utf-8''", '', fname[0], flags=re.IGNORECASE)
        fname = urllib.unquote(fname).decode('utf8')
    else:
        fname = fname[0]
    # clean space and double quotes
    print fname.strip().strip('"')

# example
getFilename('Attachment; filename=example.html')
getFilename('INLINE; FILENAME= "an example.html"')

getFilename("attachment;filename*= UTF-8''%e2%82%ac%20rates")
getFilename("attachment; filename=\"EURO rates\";filename*=utf-8''%e2%82%ac%20rates")

getFilename("attachment;filename=\"_____ _____ ___ __ ____ _____ Hekayt Bent.2017.mp3\";filename*=UTF-8''%D8%A7%D8%BA%D9%86%D9%8A%D9%87%20%D8%AD%D9%83%D8%A7%D9%8A%D8%A9%20%D8%A8%D9%86%D8%AA%20%D9%84%D9%80%20%D9%85%D8%AD%D9%85%D8%AF%20%D8%B4%D8%AD%D8%A7%D8%AA%D8%A9%20Hekayt%20Bent.2017.mp3")

result

example.html
an example.html
€ rates
€ rates
اغنيه حكاية بنت لـ محمد شحاتة Hekayt Bent.2017.mp3