How to determine if Git handles a file as binary or as text?

git grep -I --name-only --untracked -e . -- ascii.dat binary.dat ...

will return the names of files that git interprets as text files.

The trick here is in these two git grep parameters:

  • -I: Don’t match the pattern in binary files.
  • -e .: Regular expression match any character in the file

You can use wildcards e.g.

git grep -I --name-only --untracked -e . -- *.ps1

# considered binary (or with bare CR) file
git ls-files --eol | grep -E '^(i/-text)'

# files that do not have any line-ending characters (including empty files) - unlikely that this is a true binary file ?
git ls-files --eol | grep -E '^(i/none)'

#                                                        via experimentation
#                                                      ------------------------
#    "-text"        binary (or with bare CR) file     : not    auto-normalized
#    "none"         text file without any EOL         : not    auto-normalized
#    "lf"           text file with LF                 : is     auto-normalized when gitattributes text=auto
#    "crlf"         text file with CRLF               : is     auto-normalized when gitattributes text=auto
#    "mixed"        text file with mixed line endings : is     auto-normalized when gitattributes text=auto
#                   (LF or CRLF, but not bare CR)

Source: https://git-scm.com/docs/git-ls-files#Documentation/git-ls-files.txt---eol https://github.com/git/git/commit/a7630bd4274a0dff7cff8b92de3d3f064e321359

Oh by the way: be careful with setting the .gitattributes text attribute e.g. *.abc text. Because in that case all files with *.abc will be normalized, even if they are binary (internal CRLF found in the binary would be normalized to LF). This is different from the auto behaviour.


builtin_diff()1 calls diff_filespec_is_binary() which calls buffer_is_binary() which checks for any occurrence of a zero byte (NUL “character”) in the first 8000 bytes (or the entire length if shorter).

I do not see that this “is it binary?” test is explicitly exposed in any command though.

git merge-file directly uses buffer_is_binary(), so you may be able to make use of it:

git merge-file /dev/null /dev/null file-to-test

It seems to produce the error message like error: Cannot merge binary files: file-to-test and yields an exit status of 255 when given a binary file. I am not sure I would want to rely on this behavior though.

Maybe git diff --numstat would be more reliable:

isBinary() {
    p=$(printf '%s\t-\t' -)
    t=$(git diff --no-index --numstat /dev/null "$1")
    case "$t" in "$p"*) return 0 ;; esac
    return 1
}
isBinary file-to-test && echo binary || echo not binary

For binary files, the --numstat output should start with - TAB - TAB, so we just test for that.


1builtin_diff() has strings like Binary files %s and %s differ that should be familiar.


I don't like this answer, but you can parse the output of git-diff-tree to see if it is binary. For example:

git diff-tree -p 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- MegaCli 
diff --git a/megaraid/MegaCli b/megaraid/MegaCli
new file mode 100755
index 0000000..7f0e997
Binary files /dev/null and b/megaraid/MegaCli differ

as opposed to:

git diff-tree -p 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- megamgr
diff --git a/megaraid/megamgr b/megaraid/megamgr
new file mode 100755
index 0000000..50fd8a1
--- /dev/null
+++ b/megaraid/megamgr
@@ -0,0 +1,78 @@
+#!/bin/sh
[…]

Oh, and BTW, 4b825d… is a magic SHA which represents the empty tree (it is the SHA for an empty tree, but git is specially aware of this magic).

Tags:

Git