How do I search sub-folders and sub-sub-folders in Google Drive?

Sharing a Python solution to the excellent Alternative 3 by @pinoyyid, above, in case it's useful to anyone. I'm not a developer so it's probably hopelessly un-pythonic... but it works, only makes 2 API calls, and is pretty quick.

  1. Get a master list of all the folders in a drive.
  2. Test whether the folder-to-search is a parent (ie. it has subfolders).
  3. Iterate through subfolders of the folder-to-search testing whether they too are parents.
  4. Build a Google Drive file query with one '<folder-id>' in parents segment per subfolder found.

Interestingly, Google Drive seems to have a hard limit of 599 '<folder-id>' in parents segments per query, so if your folder-to-search has more subfolders than this, you need to chunk the list.

FOLDER_TO_SEARCH = '123456789'  # ID of folder to search
DRIVE_ID = '654321'  # ID of shared drive in which it lives
MAX_PARENTS = 500  # Limit set safely below Google max of 599 parents per query.


def get_all_folders_in_drive():
    """
    Return a dictionary of all the folder IDs in a drive mapped to their parent folder IDs (or to the
    drive itself if a top-level folder). That is, flatten the entire folder structure.
    """
    folders_in_drive_dict = {}
    page_token = None
    max_allowed_page_size = 1000
    just_folders = "trashed = false and mimeType = 'application/vnd.google-apps.folder'"
    while True:
        results = drive_api_ref.files().list(
            pageSize=max_allowed_page_size,
            fields="nextPageToken, files(id, name, mimeType, parents)",
            includeItemsFromAllDrives=True, supportsAllDrives=True,
            corpora='drive',
            driveId=DRIVE_ID,
            pageToken=page_token,
            q=just_folders).execute()
        folders = results.get('files', [])
        page_token = results.get('nextPageToken', None)
        for folder in folders:
            folders_in_drive_dict[folder['id']] = folder['parents'][0]
        if page_token is None:
            break
    return folders_in_drive_dict


def get_subfolders_of_folder(folder_to_search, all_folders):
    """
    Yield subfolders of the folder-to-search, and then subsubfolders etc. Must be called by an iterator.
    :param all_folders: The dictionary returned by :meth:`get_all_folders_in-drive`.
    """
    temp_list = [k for k, v in all_folders.items() if v == folder_to_search]  # Get all subfolders
    for sub_folder in temp_list:  # For each subfolder...
        yield sub_folder  # Return it
        yield from get_subfolders_of_folder(sub_folder, all_folders)  # Get subsubfolders etc


def get_relevant_files(self, relevant_folders):
    """
    Get files under the folder-to-search and all its subfolders.
    """
    relevant_files = {}
    chunked_relevant_folders_list = [relevant_folders[i:i + MAX_PARENTS] for i in
                                     range(0, len(relevant_folders), MAX_PARENTS)]
    for folder_list in chunked_relevant_folders_list:
        query_term = ' in parents or '.join('"{0}"'.format(f) for f in folder_list) + ' in parents'
        relevant_files.update(get_all_files_in_folders(query_term))
    return relevant_files


def get_all_files_in_folders(self, parent_folders):
    """
    Return a dictionary of file IDs mapped to file names for the specified parent folders.
    """
    files_under_folder_dict = {}
    page_token = None
    max_allowed_page_size = 1000
    just_files = f"mimeType != 'application/vnd.google-apps.folder' and trashed = false and ({parent_folders})"
    while True:
        results = drive_api_ref.files().list(
            pageSize=max_allowed_page_size,
            fields="nextPageToken, files(id, name, mimeType, parents)",
            includeItemsFromAllDrives=True, supportsAllDrives=True,
            corpora='drive',
            driveId=DRIVE_ID,
            pageToken=page_token,
            q=just_files).execute()
        files = results.get('files', [])
        page_token = results.get('nextPageToken', None)
        for file in files:
            files_under_folder_dict[file['id']] = file['name']
        if page_token is None:
            break
    return files_under_folder_dict


if __name__ == "__main__":
    all_folders_dict = get_all_folders_in_drive()  # Flatten folder structure
    relevant_folders_list = [FOLDER_TO_SEARCH]  # Start with the folder-to-archive
    for folder in get_subfolders_of_folder(FOLDER_TO_SEARCH, all_folders_dict):
        relevant_folders_list.append(folder)  # Recursively search for subfolders
    relevant_files_dict = get_relevant_files(relevant_folders_list)  # Get the files

Sharing a javascript solution using recursion to build an array of folders, starting with the first level folder and moving down the hierarchy. This array is composed by recursively cycling through the parent Id's of the file in question.

The extract below makes 3 separate queries to the gapi:

  1. get the root folder id
  2. get a list of folders
  3. get a list of files

the code iterates through the list of files, then creating an array of folder names.

const { google } = require('googleapis')
const gOAuth =  require('./googleOAuth')

// resolve the promises for getting G files and folders
const getGFilePaths = async () => {
  //update to use Promise.All()
  let gRootFolder = await getGfiles().then(result => {return result[2][0]['parents'][0]})
  let gFolders = await getGfiles().then(result => {return result[1]})
  let gFiles = await getGfiles().then(result => {return result[0]})
  // create the path files and create a new key with array of folder paths, returning an array of files with their folder paths
  return pathFiles = gFiles
                      .filter((file) => {return file.hasOwnProperty('parents')})
                      .map((file) => ({...file, path: makePathArray(gFolders, file['parents'][0], gRootFolder)}))
}

// recursive function to build an array of the file paths top -> bottom
let makePathArray = (folders, fileParent, rootFolder) => {
  if(fileParent === rootFolder){return []}
  else {
    let filteredFolders = folders.filter((f) => {return f.id === fileParent})
    if(filteredFolders.length >= 1 && filteredFolders[0].hasOwnProperty('parents')) {
      let path = makePathArray(folders, filteredFolders[0]['parents'][0])
      path.push(filteredFolders[0]['name'])
      return path
    }
    else {return []}
  }
}

// get meta-data list of files from gDrive, with query parameters
const getGfiles = () => {
  try {
    let getRootFolder = getGdriveList({corpora: 'user', includeItemsFromAllDrives: false,
    fields: 'files(name, parents)', 
    q: "'root' in parents and trashed = false and mimeType = 'application/vnd.google-apps.folder'"})
  
    let getFolders = getGdriveList({corpora: 'user', includeItemsFromAllDrives: false,
    fields: 'files(id,name,parents), nextPageToken', 
    q: "trashed = false and mimeType = 'application/vnd.google-apps.folder'"})
  
    let getFiles = getGdriveList({corpora: 'user', includeItemsFromAllDrives: false,
    fields: 'files(id,name,parents, mimeType, fullFileExtension, webContentLink, exportLinks, modifiedTime), nextPageToken', 
    q: "trashed = false and mimeType != 'application/vnd.google-apps.folder'"})
  
    return Promise.all([getFiles, getFolders, getRootFolder])
  }
  catch(error) {
    return `Error in retriving a file reponse from Google Drive: ${error}`
  }
}

// make call out gDrive to get meta-data files. Code adds all files in a single array which are returned in pages
const getGdriveList = async (params) => {
  const gKeys = await gOAuth.get()
  const drive = google.drive({version: 'v3', auth: gKeys})
  let list = []
  let nextPgToken
  do {
    let res = await drive.files.list(params)
    list.push(...res.data.files)
    nextPgToken = res.data.nextPageToken
    params.pageToken = nextPgToken
  }
  while (nextPgToken)
  return list
}

EDIT: April 2020 Google have announced that multi-parent files is being disabled from September 2020. This alters the narrative below and means option 2 is no longer an option. It might be possible to implement Option 2 using shortcuts. I will update this answer further as I test the new restrictions/features We are all used to the idea of folders (aka directories) in Windows/nix etc. In the real world, a folder is a container, into which documents are placed. It is also possible to place smaller folders inside bigger folders. Thus the big folder can be thought of as containing all of the documents inside its smaller children folders.

However, in Google Drive, a Folder is NOT a container, so much so that in the first release of Google Drive, they weren't even called Folders, they were called Collections. A Folder is simply a File with (a) no contents, and (b) a special mime-type (application/vnd.google-apps.folder). The way Folders are used is exactly the same way that tags (aka labels) are used. The best way to understand this is to consider GMail. If you look at the top of an open mail item, you see two icons. A folder with the tooltip "Move to" and a label with the tooltip "Labels". Click on either of these and the same dialogue box appears and is all about labels. Your labels are listed down the left hand side, in a tree display that looks a lot like folders. Importantly, a mail item can have multiple labels, or you could say, a mail item can be in multiple folders. Google Drive's Folders work in exactly the same way that GMail labels work.

Having established that a Folder is simply a label, there is nothing stopping you from organising your labels in a hierarchy that resembles a folder tree, in fact this is the most common way of doing so.

It should now be clear that a file (let's call it MyFile) in folderA2b is NOT a child or grandchild of folderA. It is simply a file with a label (confusingly called a Parent) of "folderA2b". OK, so how DO I get all the files "under" folderA?

Alternative 1. Recursion

The temptation would be to list the children of folderA, for any children that are folders, recursively list their children, rinse, repeat. In a very small number of cases, this might be the best approach, but for most, it has the following problems:-

  • It is woefully time consuming to do a server round trip for each sub folder. This does of course depend on the size of your tree, so if you can guarantee that your tree size is small, it could be OK.

Alternative 2. The common parent

This works best if all of the files are being created by your app (ie. you are using drive.file scope). As well as the folder hierarchy above, create a dummy parent folder called say "MyAppCommonParent". As you create each file as a child of its particular Folder, you also make it a child of MyAppCommonParent. This becomes a lot more intuitive if you remember to think of Folders as labels. You can now easily retrieve all descdendants by simply querying MyAppCommonParent in parents.

Alternative 3. Folders first

Start by getting all folders. Yep, all of them. Once you have them all in memory, you can crawl through their parents properties and build your tree structure and list of Folder IDs. You can then do a single files.list?q='folderA' in parents or 'folderA1' in parents or 'folderA1a' in parents.... Using this technique you can get everything in two http calls.

The pseudo code for option 3 is a bit like...

// get all folders from Drive files.list?q=mimetype=application/vnd.google-apps.folder and trashed=false&fields=parents,name // store in a Map, keyed by ID // find the entry for folderA and note the ID // find any entries where the ID is in the parents, note their IDs // for each such entry, repeat recursively // use all of the IDs noted above to construct a ... // files.list?q='folderA-ID' in parents or 'folderA1-ID' in parents or 'folderA1a-ID' in parents...

Alternative 2 is the most effificient, but only works if you have control of file creation. Alternative 3 is generally more efficient than Alternative 1, but there may be certain small tree sizes where 1 is best.