What is the `git restore` command and what is the difference between `git restore` and `git reset`?

I have presented git restore (which is still marked as "experimental") in "How to reset all files from working directory but not from staging area?", with the recent Git 2.23 (August 2019).

It helps separate git checkout into two commands:

  • one for files (git restore), which can cover git reset cases.
  • one for branches (git switch, as seen in "Confused by git checkout"), which deals only with branches, not files.

As reset, restore and revert documentation states:

There are three commands with similar names: git reset, git restore and git revert.

  • git-revert is about making a new commit that reverts the changes made by other commits.
  • git-restore is about restoring files in the working tree from either the index or another commit.
    This command does not update your branch.
    The command can also be used to restore files in the index from another commit.
  • git-reset is about updating your branch, moving the tip in order to add or remove commits from the branch. This operation changes the commit history.
    git reset can also be used to restore the index, overlapping with git restore.

So:

To restore a file in the index to match the version in HEAD (this is the same as using git-reset)

git restore --staged hello.c

or you can restore both the index and the working tree (this the same as using git-checkout)

git restore --source=HEAD --staged --worktree hello.c

or the short form which is more practical but less readable:

git restore -s@ -SW hello.c

With Git 2.25.1 (Feb. 2020), "git restore --staged" did not correctly update the cache-tree structure, resulting in bogus trees to be written afterwards, which has been corrected.

See discussion.

See commit e701bab (08 Jan 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 09e393d, 22 Jan 2020)

restore: invalidate cache-tree when removing entries with --staged

Reported-by: Torsten Krah
Signed-off-by: Jeff King

When "git restore --staged " removes a path that's in the index, it marks the entry with CE_REMOVE, but we don't do anything to invalidate the cache-tree.
In the non-staged case, we end up in checkout_worktree(), which calls remove_marked_cache_entries(). That actually drops the entries from the index, as well as invalidating the cache-tree and untracked-cache.

But with --staged, we never call checkout_worktree(), and the CE_REMOVE entries remain. Interestingly, they are dropped when we write out the index, but that means the resulting index is inconsistent: its cache-tree will not match the actual entries, and running "git commit" immediately after will create the wrong tree.

We can solve this by calling remove_marked_cache_entries() ourselves before writing out the index. Note that we can't just hoist it out of checkout_worktree(); that function needs to iterate over the CE_REMOVE entries (to drop their matching worktree files) before removing them.

One curiosity about the test: without this patch, it actually triggers a BUG() when running git-restore:

BUG: cache-tree.c:810: new1 with flags 0x4420000 should not be in cache-tree

But in the original problem report, which used a similar recipe, git restore actually creates the bogus index (and the commit is created with the wrong tree). I'm not sure why the test here behaves differently than my out-of-suite reproduction, but what's here should catch either symptom (and the fix corrects both cases).


With Git 2.27 (Q2 2020), "git restore --staged --worktree" now defaults to take the contents out of "HEAD", instead of erring out.

See commit 088018e (05 May 2020) by Eric Sunshine (sunshineco).
(Merged by Junio C Hamano -- gitster -- in commit 4c2941a, 08 May 2020)

restore: default to HEAD when combining --staged and --worktree

Signed-off-by: Eric Sunshine
Reviewed-by: Taylor Blau

By default, files are restored from the index for --worktree, and from HEAD for --staged.

When --worktree and --staged are combined, --source must be specified to disambiguate the restore source, thus making it cumbersome to restore a file in both the worktree and the index.

(Due to an oversight, the --source requirement, though documented, is not actually enforced.)

However, HEAD is also a reasonable default for --worktree when combined with --staged, so make it the default anytime --staged is used (whether combined with --worktree or not).

So now, this works:

git restore --staged --worktree
git restore -SW

For your 1st question "What is git-restore?":

git-restore is a tool to revert non-commited changes. Non-commited changes are: a) changes in your working copy, or b) content in your index (a.k.a. staging area).

This command was introduced in git 2.23 (together with the git-switch) to separate multiple concerns previously united in git-checkout.

git-restore can be used in three different modes, depending on whether you like to revert work in the working copy, in the index, or both.

git restore [--worktree] <file> overwrites <file> in your working copy with the contents in your index (*). In other words, it reverts your changes in the working copy. Whether you specify --worktree or not does not matter because it is implied if you don't say otherwise.

git restore --staged <file> overwrites <file> in your index with the current HEAD from the local repository. In other words, it unstages previously staged content. In so far, it is indeed equivalent to the old git reset HEAD <file>.

To overwrite both, the working copy and the index with the current HEAD, use git restore --staged --worktree --source HEAD <file>. This version does both: revert your working copy to HEAD and unstage previously staged work.

For your 2nd question "What's the difference between git-restore and git-reset?":

There are overlaps between these two commands, and differences.

Both can be used to modify your working copy and/or the staging area. However, only git-reset can modify your repository. In this sense, git-restore seems the safer option if you only want to revert local work.

There are more differences, which I can't enumerate here.

(*) A file not added to the index is still regarded to be in the index, however in it's "clean" state from the current HEAD revision.


To add to VonC's answer, and bring into the picture all the relevant commands, in alphabetical order.

  • git checkout
  • git reset
  • git restore
  • git switch

I'll throw in one more, the misnamed git revert, as well.

From an end-user perspective

All you need are git checkout, git reset, and git revert. These commands have been in Git all along.

But git checkout has, in effect, two modes of operation. One mode is "safe": it won't accidentally destroy any unsaved work. The other mode is "unsafe": if you use it, and it tells Git to wipe out some unsaved file, Git assumes that (a) you knew it meant that and (b) you really did mean to wipe out your unsaved file, so Git immediately wipes out your unsaved file.

This is not very friendly, so the Git folks finally—after years of users griping—split git checkout into two new commands. This leads us to:

From a historical perspective

git restore is new, having first come into existence in August 2019, in Git 2.23. git reset is very old, having been in Git all along, dating back to before 2005. Both commands have the ability to destroy unsaved work.

The git switch command is also new, introduced along with git restore in Git 2.23. It implements the "safe half" of git checkout; git restore implements the "unsafe half".

When would you use which command?

This is the most complicated part, and to really understand it, we need to know the following items:

  • Git is really all about commits. Commits get stored in the Git repository. The git push and git fetch commands transfer commits—whole commits, as an all-or-nothing deal1—to the other Git. You either have all of a commit, or you don't have it. Other commands, such as git merge or git rebase, all work with local commits. The pull command runs fetch (to get commits) followed by a second command to work with the commits once they're local.

  • New commits add to the repository. You almost never remove a commit from the repository. Only one of the five commands listed here—checkout, reset, restore, revert, and switch—is capable of removing commits.2

  • Each commit is numbered by its hash ID, which is unique to that one particular commit. It's actually computed from what's in the commit, which is how Git makes these numbers work across all Gits eveywhere. This means that what is in the commit is frozen for all time: if you change anything, what you get is a new commit with a new number, and the old commit is still there, with its same old number.

  • Each commit stores two things: a snapshot, and metadata. The metadata include the hash ID(s) of some previous commit(s). This makes commits form backwards-looking chains.

  • A branch name holds the hash ID of one commit. This makes the branch name find that commit, which in turn means two things:

    • that particular commit is the tip commit of that branch; and
    • all commits leading up to and including that tip commit are on that branch.
  • We're also going to talk about Git's index in a moment, and your working tree. They're separate from these but worth mentioning early, especially since the index has three names: Git sometimes calls it the index, sometimes calls it the staging area, and sometimes—rarely these days—calls it the cache. All three names refer to the same thing.

Everything up through the branch name is, I think, best understood via pictures (at least for most people). If we draw a series of commits, with newer commits towards the right, using o for each commit and omitting some commits for space or whatever, we get something like this:

        o--o---o   <-- feature-top
       /        \
o--o--o--o--...--o---o--o   <-- main
    \               /
     o--o--...--o--o   <-- feature-hull

which, as you can see, is a boat repository. There are three branches. The mainline branch holds every commit, including all the commits on the top row and bottom (hull) row. The feature-top branch holds the top three commits and also the three commits along the main line to the left, but not any of the commits on the bottom row. All the connectors between commits are—well, should be but I don't have a good enough font—one-way arrows, pointing left, or down-and-left, or up-and-left.

These "arrows", or one way connections from commit to commit, are technically arcs, or one-way edges, in a directed graph. This directed graph is one without cycles, making it a Directed Acyclic Graph or DAG, which has a bunch of properties that are useful to Git.

If you're just using Git to store files inside commits, all you really care about are the round o nodes or vertices (again two words for the same thing), each of which acts to store your files, but you should at least be vaguely aware of how they are arranged. It matters, especially because of merges. Merge commits are those with two outgoing arcs, pointing backwards to two of what Git calls parent commits. The child commit is the one "later": just as human parents are always older than their children, Git parent commits are older than their children.

We need one more thing, though: Where do new commits come from? We noted that what's in a commit—both the snapshot, holding all the files, and the metadata, holding the rest of the information Git keeps about a commit—is all read-only. Your files are not only frozen, they're also transformed, and the transformed data are then de-duplicated, so that even though every commit has a full snapshot of every file, the repository itself stays relatively slim. But this means that the files in a commit can only be read by Git, and nothing—not even Git itself—can write to them. They get saved once, and are de-duplicated from then on. The commits act as archives, almost like tar or rar or winzip or whatever.

To work with a Git repository, then, we have to have Git extract the files. This takes the files out of some commit, turning those special archive-formatted things into regular, usable files. Note that Git may well be able to store files that your computer literally can't store: a classic example is a file named aux.h, for some C program, on a Windows machine. We won't go into all the details, but it is theoretically possible to still get work done with this repository, which was probably built on a Linux system, even if you're on a Windows system where you can't work with the aux.h file directly.

Anyway, assuming there are no nasty little surprises like aux.h, you would just run git checkout or git switch to get some commit out of Git. This will fill in your working tree, populating it from the files stored in the tip commit of some branch. The tip commit is, again, the last commit on that branch, as found by the branch name. Your git checkout or git switch selected that commit to be the current commit, by selecting that branch name to be the current branch. You now have all the files from that commit, in an area where you can see them and work on them: your working tree.

Note that the files in your working tree are not actually in Git itself. They were just extracted from Git. This matters a lot, because when git checkout extracts the files from Git, it actually puts each file in two places. One of those places is the ordinary everyday file you see and work on / with. The other place Git puts each file is into Git's index.

As I mentioned a moment ago, the index has three names: index, staging area, and cache. All refer to the same thing: the place Git sticks these "copies" of each file. Each one is actually pre-de-duplicated, so the word "copy" is slightly wrong, but—unlike much of the rest of its innards—Git actually does a really good job of hiding the de-duplication aspect. Unless you start getting into internal commands like git ls-files and git update-index, you don't need to know about this part, and can just think of the index as holding a copy of the file, ready to go into the next commit.

What this all means for you as someone just using Git is that the index / staging-area acts as your proposed next commit. When you run git commit, Git is going to package up these copies of the file as the ones to be archived in the snapshot. The copies you have in your working tree are yours; the index / staging-area copies are Git's, ready to go. So, if you change your copies and want the changed copy to be what goes in the next snapshot, you must tell Git: Update the Git copy, in the Git index / staging-area. You do this with git add.3 The git add command means make the proposed-next-commit copy match the working-tree copy. It's the add command that does the updating: this is when Git compresses and de-duplicates the file and makes it ready for archiving, not at git commit time.4

Then, assuming you have some series of commits ending with the one with hash-N:

[hash1] <-[hash2] ... <-[hashN]   <--branch

you run git commit, give it any metadata it needs (a commit log message), and you get an N+1'th commit:

[hash1] <-[hash2] ... <-[hashN] <-[hashN+1]   <--branch

Git automatically updates the branch name to point to the new commit, which has therefore been added to the branch.

Let's look at each of the various commands now:

  • git checkout: this is a large and complicated command.

    We already saw this one, or at least, half of this one. We used it to pick out a branch name, and therefore a particular commit. This kind of checkout first looks at our current commit, index, and working tree. It makes sure that we have committed all our modified files, or—this part gets a bit complicated—that if we haven't committed all our modified files, switching to that other branch is "safe". If it's not safe, git checkout tells you that you can't switch due to having modified files. If it is safe, git checkout will switch; if you didn't mean to switch, you can just switch back. (See also Checkout another branch when there are uncommitted changes on the current branch)

    But git checkout has an unsafe half. Suppose you have modified some file in your working tree, such as README.md or aux.h or whatever. You now look back at what you changed and think: No, that's a bad idea. I should get rid of this change. I'd like the file back exactly as it was before.

    To get this—to wipe out your changes to, say, README.md—you can run:

    git checkout -- README.md
    

    The -- part here is optional. It's a good idea to use it, because it tells Git that the part that comes after -- is a file name, not a branch name.

    Suppose you have a branch named hello and a file named hello. What does:

    git checkout hello
    

    mean? Are we asking Git to clobber the file hello to remove the changes we made, or are we asking Git to check out the branch hello? To make this unambiguous, you have to write either:

    git checkout -- hello        (clobber the file)
    

    or:

    git checkout hello --        (get the branch)
    

    This case, where there are branches and files or directories with the same name, is a particularly insidious one. It has bitten real users. It's why git switch exists now. The git switch command never means clobber my files. It only means do the safe kind of git checkout.

    (The git checkout command has been smartened up too, so that if you have the new commands and you run the "bad" kind of ambiguous git checkout, Git will just complain at you and do nothing at all. Either use the smarter split-up commands, or add the -- at the right place to pick which kind of operation you want.)

    More precisely, this kind of git checkout, ideally spelled git checkout -- paths, is a request for Git to copy files from Git's index to your working tree. This means clobber my files. You can also run git checkout tree-ish -- paths, where you add a commit hash ID5 to the command. This tells Git to copy the files from that commit, first to Git's index, and then on to your working tree. This, too, means clobber my files: the difference is where Git gets the copies of the files it's extracting.

    If you ran git add on some file and thus copied it into Git's index, you need git checkout HEAD -- file to get it back from the current commit. The copy that's in Git's index is the one you git add-ed. So these two forms of git checkout, with a commit hash ID (or the name HEAD), the optional --, and the file name, are the unsafe clobber my files forms.

  • git reset: this is also a large and complicated command.

    There are, depending on how you count, up to about five or six different forms of git reset. We'll concentrate on a smaller subset here.

    • git reset [ --hard | --mixed | --soft ] [ commit ]

      Here, we're asking Git to do several things. First, if we give a commit argument, such as HEAD or HEAD~3 or some such, we've picked a particular commit that Git should reset to. This is the kind of command that will remove commits by ejecting them off the end of the branch. Of all the commands listed here, this is the only one that removes any commits. One other command—git commit --amend—has the effect of ejecting the last commit while putting on a new replacement, but that one is limited to ejecting one commit.

      Let's show this as a drawing. Suppose we have:

      ...--E--F--G--H   <-- branch
      

      That is, this branch, named branch, ends with four commits whose hash IDs we'll call E, F, G, and H in that order. The name branch currently stores the hash ID of the last of these commits, H. If we use git reset --hard HEAD~3, we're telling Git to eject the last three commits. The result is:

             F--G--H   ???
            /
      ...--E   <-- branch
      

      The name branch now selects commit E, not commit H. If we did not write down (on paper, on a whiteboard, in a file) the hash IDs of the last three commits, they've just become somewhat hard to find. Git does gives a way to find them again, for a while, but mostly they just seem to be gone.

      The HEAD~3 part of this command is how we chose to drop the last three commits. It's part of a whole sub-topic in Git, documented in the gitrevisions manual, on ways to name specific commits. The reset command just needs the hash ID of an actual commit, or anything equivalent, and HEAD~3 means go back three first-parent steps, which in this case gets us from commit H back to commit E.

      The --hard part of the git reset is how we tell Git what to do with (a) its index and (b) our working tree files. We have three choices here:

      • --soft tells Git: leave both alone. Git will move the branch name without touching the index or our working tree. If you run git commit now, whatever is (still) in the index is what goes into the new commit. If the index matches the snapshot in commit H, this gets you a new commit whose snapshot is H, but whose parent is E, as if commits F through H had all been collapsed into a single new commit. People usually call this squashing.

      • --mixed tells Git: reset your index, but leave my working tree alone. Git will move the branch name, then replace every file that is in the index with the one from the newly selected commit. But Git will leave all your working tree files alone. This means that as far as Git is concerned, you can start git adding files to make a new commit. Your new commit won't match H unless you git add everything, so this means you could, for instance, build a new intermediate commit, sort of like E+F or something, if you wanted.

      • --hard tells Git: reset your index and my working tree. Git will move the branch name, replace all the files in its index, and replace all the files in your working tree, all as one big thing. It's now as if you never made those three commits at all. You no longer have the files from F, or G, or H: you have the files from commit E.

      Note that if you leave out the commit part of this kind of (hard/soft/mixed) reset, Git will use HEAD. Since HEAD names the current commit (as selected by the current branch name), this leaves the branch name itself unchanged: it still selects the same commit as before. So this is only useful with --mixed or --hard, because git reset --soft, with no commit hash ID, means don't move the branch name, don't change Git's index, and don't touch my working tree. Those are the three things this kind of git reset can do—move the branch name, change what's in Git's index, and change what's in your working tree—and you just ruled all three out. Git is OK with doing nothing, but why bother?

    • git reset [ tree-ish ] -- path

      This is the other kind of git reset we'll care about here. It's a bit like a mixed reset, in that it means clobber some of the index copies of files, but here you specify which files to clobber. It's also a bit unlike a mixed reset, because this kind of git reset will never move the branch name.

      Instead, you pick which files you want copied from somewhere. The somewhere is the tree-ish you give; if you don't give one, the somewhere is HEAD, i.e., the current commit. This can only restore files in the proposed next commit to the form they have in some existing commit. By defaulting to the current existing commit, this kind of git reset -- path has the effect of undoing a git add -- path.6

      There are several other forms of git reset. To see what they all mean, consult the documentation.

  • git restore: this got split off from git checkout.

    Basically, this does the same thing as the various forms of git checkout and git reset that clobber files (in your working tree and/or in Git's index). It's smarter than the old git checkout-and-clobber-my-work variant, in that you get to choose where the files come from and where they go, all in the one command line.

    To do what you used to do with git checkout -- file, you just run git restore --staged --worktree -- file. (You can leave out the -- part, as with git checkout, in most cases, but it's just generally wise to get in the habit of using it. Like git add, this command is designed such that only files named -whatever are actually problematic.)

    To do what you used to do with git reset -- file, you just run git restore --worktree -- file, or even just git restore -- file since --worktree is the default here.

    Note that you can copy a file from some existing commit, to Git's index, without touching your working tree copy of that file: git restore --source commit --staged -- file does that. You can't do that at all with the old git checkout but you can do that with the old git reset, as git reset commit -- file. This overlap exists because git restore is new, and this kind of restore makes sense; probably, ideally, we should always use git restore here, instead of using the old git reset way of doing things, but Git tries to maintain backwards compatibility.

  • git switch: this just does the "safe half" of git checkout. That's really all you need to know. Using git switch, without --force, Git won't overwrite your unsaved work, even if you make a typo or whatever. The old git checkout command could overwrite unsaved work: if your typo turns a branch name into a file name, for instance, well, oops.

  • git revert (I added this for completeness): this makes a new commit. The point of the new commit is to back out what someone did in some existing commit. You therefore need to name the existing commit that revert should back out. This command probably should have been named git backout.

    If you back out the most recent commit, this does revert to the second-most-recent snapshot:

      ...--G--H   <-- branch
    

    becomes:

      ...--G--H--Ħ   <-- branch
    

    where commit Ħ (H-bar) "undoes" commit H and therefore leaves us with the same files as commit G. But we don't have to undo the most recent commit. We could take:

      ...--E--F--G--H   <-- branch
    

    and add a commit Ǝ that undoes E to get:

      ...--E--F--G--H--Ǝ   <-- branch
    

    which may not match the source snapshot of any previous commit!


1Git is, slowly, growing a facility to "partly get" a commit so that you can deal with huge repositories with huge commits without having to wait for the entire commit all at once, for instance. Right now that's not something ordinary users will ever see, and when it does come to regular users, it's meant as an add-on to the basic "all or nothing" mode of a commit. It will turn this from "you either have a commit, or not" to "you have a commit—either all of it, or part of it with the promise to deliver the rest soon—or not; if you have part of a commit, you can work with the part, but that's all".

2Even then, a "removed" commit is not gone yet: you can get it back. This answer won't cover how to do that, though. Also, git commit --amend is a special case, which we will mention, but not really cover properly here.

3To remove the file from both your working tree and Git's index, you can use git rm. If you remove the file from your working tree, then run git add on that file name, Git will "add" the removal, so that works too.

4If you use git commit -a, Git will, at that time, run git add on all the files. This is done in a tricky way that can break some poorly-written pre-commit hooks. I recommend learning the two step process, in part because of those poorly-written hooks—though I'd try to avoid or fix them if possible—and in part just because if you try to avoid learning about Git's index like the authors of those poorly-written hooks did, Git is going to give you more trouble later.

5The reason this is a tree-ish and not a commit-ish is that you can use anything that specifies some existing internal Git tree object. Each commit has a saved snapshot, though, that is suitable for here, and is what you'd normally put here.

6As with all these other Git commands, you can use the -- between the add command and the paths to add. It's actually a good habit to get into, as this means that you can add a path named -u, if you have such a path: git add -- -u means add the file named -u but git add -u doesn't mean that at all. Of course, files whose names match option sequences are less common and less surprising than files whose names match branch names: it's really easy to have a dev branch and a set of files named dev/whatever. Since file paths will match using directories, for add, checkout, reset, and restore, these can get mixed up. The add command doesn't take a branch name though, so it's safer in that respect.