Finding the oldest commit in a GitHub repository via the API

Using the GraphQL API, there is a workaround for getting the oldest commit (initial commit) in a specific branch.

First get the last commit and return the totalCount and the endCursor :

{
  repository(name: "linux", owner: "torvalds") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}

It returns something like that for the cursor and pageInfo object :

"totalCount": 931886,
"pageInfo": {
  "endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 0"
}

I don't have any source about the cursor string format b961f8dc8976c091180839f4483d67b7c2ca2578 0 but I've tested with some other repository with more than 1000 commits and it seems that it's always formatted like:

<static hash> <incremented_number>

So you would just subtract 2 from totalCount (if totalCount is > 1) and get that oldest commit (or initial commit if you prefer):

{
  repository(name: "linux", owner: "torvalds") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1, after: "b961f8dc8976c091180839f4483d67b7c2ca2578 931884") {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}

which gives the following output (initial commit by Linus Torvalds) :

{
  "data": {
    "repository": {
      "ref": {
        "target": {
          "history": {
            "nodes": [
              {
                "message": "Linux-2.6.12-rc2\n\nInitial git repository build. I'm not bothering with the full history,\neven though we have it. We can create a separate \"historical\" git\narchive of that later if we want to, and in the meantime it's about\n3.2GB when imported into git - space that would just make the early\ngit days unnecessarily complicated, when we don't have a lot of good\ninfrastructure for it.\n\nLet it rip!",
                "committedDate": "2005-04-16T22:20:36Z",
                "authoredDate": "2005-04-16T22:20:36Z",
                "oid": "1da177e4c3f41524e886b7f1b8a0c1fc7321cac2",
                "author": {
                  "email": "[email protected]",
                  "name": "Linus Torvalds"
                }
              }
            ],
            "totalCount": 931886,
            "pageInfo": {
              "endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 931885"
            }
          }
        }
      }
    }
  }
}

A simple implementation in python to get the first commit using this method :

import requests

token = "YOUR_TOKEN"

name = "linux"
owner = "torvalds"
branch = "master"

query = """
query ($name: String!, $owner: String!, $branch: String!){
  repository(name: $name, owner: $owner) {
    ref(qualifiedName: $branch) {
      target {
        ... on Commit {
          history(first: 1, after: %s) {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}
"""

def getHistory(cursor):
    r = requests.post("https://api.github.com/graphql",
        headers = {
            "Authorization": f"Bearer {token}"
        },
        json = {
            "query": query % cursor,
            "variables": {
                "name": name,
                "owner": owner,
                "branch": branch
            }
        })
    return r.json()["data"]["repository"]["ref"]["target"]["history"]

#in the first request, cursor is null
history = getHistory("null")
totalCount = history["totalCount"]
if (totalCount > 1):
    cursor = history["pageInfo"]["endCursor"].split(" ")
    cursor[1] = str(totalCount - 2)
    history = getHistory(f"\"{' '.join(cursor)}\"")
    print(history["nodes"][0])
else:
    print("got oldest commit (initial commit)")
    print(history["nodes"][0])

You can find an example in javascript on this post


This can be done in as few as two requests, if data is already cached (on GitHub's side) and depending on your precision requirements.

First check to see if there are in fact commits before the creation time by doing a GET for /repos/:owner/:repo/commits with the until parameter set to the creation time (as suggested by VonC's answer) and limiting the number returned to 1 commit (via the per_page parameter).

If there are commits before the creation time, then the contributors statistics endpoint (/repos/:owner/:repo/stats/contributors) can be invoked. The response has a weeks list per contributor, and the oldest w value there is the same week as the oldest commit.

If you need a precise timestamp, you can then use the commits listing endpoint again with until and since set to the 7 days after the oldest week value.

Note that the statistics endpoint may return a 202 indicating that statistics are not available, in which case a retry in a few seconds is required.


One suggestion would be to list commits on a repo (See GitHub api V3 section), using the until parameter, set to the creation of the repo (plus one day, for instance).

GET /repos/:owner/:repo/commits

That way, you would list all commits created at the time of the repo being created, or before: that would limit the list, excluding all the commits created after the repo creation.