How to iterate over files in directory python

This tutorial will show you some ways to iterate files in a given directory and do some actions on them using Python.

1. Using os.listdir() #

This method returns a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.

Example: print out all paths to files that have jpg or png extension in C:\Users\admin directory

import os

directory = r'C:\Users\admin'
for filename in os.listdir(directory):
    if filename.endswith(".jpg") or filename.endswith(".png"):
        print(os.path.join(directory, filename))
    else:
        continue

2. Using os.scandir() #

Since Python 3.5, things are much easier with os.scandir(). This example does the same thing as above but it uses os.scandir() instead of os.listdir()

import os

directory = r'C:\Users\admin'
for entry in os.scandir(directory):
    if (entry.path.endswith(".jpg")
            or entry.path.endswith(".png")) and entry.is_file():
        print(entry.path)

Both os.listdir() and os.scandir approaches only list the directories or files immediately under a directory. If you want recursive listing files and folders in a given directory, please consider using below methods.

3. Using os.walk() #

This method will iterate over all descendant files in subdirectories. Consider the example above, but in this case, this method recursively prints all images in C:\Users\admin directory.

import os

for subdir, dirs, files in os.walk(r'C:\Users\admin'):
    for filename in files:
        filepath = subdir + os.sep + filename

        if filepath.endswith(".jpg") or filepath.endswith(".png"):
            print (filepath)

4. Using glob module #

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.

Let consider an example where we will list all png and pdf files in C:\Users\admin directory

import glob

# Print png images in folder C:\Users\admin\
for filepath in glob.iglob(r'C:\Users\admin\*.png'):
    print(filepath)

# Print pdf files in folder C:\Users\admin\
for filepath in glob.iglob(r'C:\Users\admin\*.pdf'):
    print(filepath)

By default, glob.iglob only lists files immediately under the given directory. To recursively list all files in nested folders, set the recursive param to True

import glob

# Recursively print png images in folder C:\Users\admin\
for filepath in glob.iglob(r'C:\Users\admin\*.png', recursive=True):
    print(filepath)

# Recursively print pdf files in folder C:\Users\admin\
for filepath in glob.iglob(r'C:\Users\admin\*.pdf', recursive=True):
    print(filepath)

You can either use glob.iglob or glob.glob. The difference is, glob.iglob return an iterator which yields the paths matching a pathname pattern while glob.glob returns a list.

5. Iterate recursively using Path class from pathlib module #

The code below does the same as above example, which lists and prints the png image in a folder but it uses the pathlib.Path

from pathlib import Path

paths = Path('C:\Users\admin').glob('**/*.png')
for path in paths:
    # because path is object not string
    path_in_str = str(path)
    # Do thing with the path
    print(path_in_str)

Tags:

Python

Related