Why are true and false so large?

In the past, /bin/true and /bin/false in the shell were actually scripts.

For instance, in a PDP/11 Unix System 7:

$ ls -la /bin/true /bin/false
-rwxr-xr-x 1 bin         7 Jun  8  1979 /bin/false
-rwxr-xr-x 1 bin         0 Jun  8  1979 /bin/true
$
$ cat /bin/false
exit 1
$
$ cat /bin/true
$  

Nowadays, at least in bash, the trueand false commands are implemented as shell built-in commands. Thus no executable binary files are invoked by default, both when using the false and true directives in the bash command line and inside shell scripts.

From the bashsource, builtins/mkbuiltins.c:

char *posix_builtins[] =
    {
      "alias", "bg", "cd", "command", "**false**", "fc", "fg", "getopts", "jobs",
      "kill", "newgrp", "pwd", "read", "**true**", "umask", "unalias", "wait",
      (char *)NULL
    };

Also per @meuh comments:

$ command -V true false
true is a shell builtin
false is a shell builtin

So it can be said with a high degree of certainty the trueand false executable files exist mainly for being called from other programs.

From now on, the answer will focus on the /bin/true binary from the coreutilspackage in Debian 9 / 64 bits. (/usr/bin/true running RedHat. RedHat and Debian use both the coreutils package, analysed the compiled version of the latter having it more at hand).

As it can be seen in the source file false.c, /bin/false is compiled with (almost) the same source code as /bin/true, just returning EXIT_FAILURE (1) instead, so this answer can be applied for both binaries.

#define EXIT_STATUS EXIT_FAILURE
#include "true.c"

As it also can be confirmed by both executables having the same size:

$ ls -l /bin/true /bin/false
-rwxr-xr-x 1 root root 31464 Feb 22  2017 /bin/false
-rwxr-xr-x 1 root root 31464 Feb 22  2017 /bin/true

Alas, the direct question to the answer why are true and false so large? could be, because there are not anymore so pressing reasons to care about their top performance. They are not essential to bash performance, not being used anymore by bash (scripting).

Similar comments apply to their size, 26KB for the kind of hardware we have nowadays is insignificant. Space is not at premium for the typical server/desktop anymore, and they do not even bother anymore to use the same binary for false and true, as it is just deployed twice in distributions using coreutils.

Focusing, however, in the real spirit of the question, why something that should be so simple and small, gets so large?

The real distribution of the sections of /bin/true is as these charts shows; the main code+data amounts to roughly 3KB out of a 26KB binary, which amounts to 12% of the size of /bin/true.

The true utility got indeed more cruft code over the years, most notably the standard support for --version and --help.

However, that it is not the (only) main justification for it being so big, but rather, while being dynamically linked (using shared libs), also having part of a generic library commonly used by coreutils binaries linked as a static library. The metada for building an elf executable file also amounts for a significant part of the binary, being it a relatively small file by today´s standards.

The rest of the answer is for explaining how we got to build the following charts detailing the composition of the /bin/true executable binary file and how we arrived to that conclusion.

bintrue bintrue2

As @Maks says, the binary was compiled from C; as per my comment also, it is also confirmed it is from coreutils. We are pointing directly to the author(s) git https://github.com/wertarbyte/coreutils/blob/master/src/true.c, instead of the gnu git as @Maks (same sources, different repositories - this repository was selected as it has the full source of the coreutils libraries)

We can see the various building blocks of the /bin/truebinary here (Debian 9 - 64 bits from coreutils):

$ file /bin/true
/bin/true: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=9ae82394864538fa7b23b7f87b259ea2a20889c4, stripped

$ size /bin/true
    text       data     bss     dec     hex filename
   24583       1160     416   26159    662f true

Of those:

  • text (usually code) is around 24KB
  • data (initialised variables, mostly strings) are around 1KB
  • bss (uninitialized data) 0.5KB

Of the 24KB, around 1KB is for fixing up the 58 external functions.

That still leaves around roughly 23KB for rest of the code. We will show down bellow that the actual main file - main()+usage() code is around 1KB compiled, and explain what the other 22KB are used for.

Drilling further down the binary with readelf -S true, we can see that while the binary is 26159 bytes, the actual compiled code is 13017 bytes, and the rest is assorted data/initialisation code.

However, true.c is not the whole story and 13KB seems pretty much excessive if it were only that file; we can see functions called in main() that are not listed in the external functions seen in the elf with objdump -T true ; functions that are present at:

  • https://github.com/coreutils/gnulib/blob/master/lib/progname.c
  • https://github.com/coreutils/gnulib/blob/master/lib/closeout.c
  • https://github.com/coreutils/gnulib/blob/master/lib/version-etc.c

Those extra functions not linked externally in main() are:

  • set_program_name()
  • close_stdout()
  • version_etc()

So my first suspicion was partly correct, whilst the library is using dynamic libraries, the /bin/true binary is big *because it has some static libraries included with it* (but that is not the only cause).

Compiling C code is not usually that inefficient for having such space unaccounted for, hence my initial suspicion something was amiss.

The extra space, almost 90% of the size of the binary, is indeed extra libraries/elf metadata.

While using Hopper for disassembling/decompiling the binary to understand where functions are, it can be seen the compiled binary code of true.c/usage() function is actually 833 bytes, and of the true.c/main() function is 225 bytes, which is roughly slightly less than 1KB. The logic for version functions, which is buried in the static libraries, is around 1KB.

The actual compiled main()+usage()+version()+strings+vars are only using up around 3KB to 3.5KB.

It is indeed ironic, such small and humble utilities have became bigger in size for the reasons explained above.

related question: Understanding what a Linux binary is doing

true.c main() with the offending function calls:

int
main (int argc, char **argv)
{
  /* Recognize --help or --version only if it's the only command-line
     argument.  */
  if (argc == 2)
    {
      initialize_main (&argc, &argv);
      set_program_name (argv[0]);           <-----------
      setlocale (LC_ALL, "");
      bindtextdomain (PACKAGE, LOCALEDIR);
      textdomain (PACKAGE);

      atexit (close_stdout);             <-----

      if (STREQ (argv[1], "--help"))
        usage (EXIT_STATUS);

      if (STREQ (argv[1], "--version"))
        version_etc (stdout, PROGRAM_NAME, PACKAGE_NAME, Version,  AUTHORS,  <------
                     (char *) NULL);
    }

  exit (EXIT_STATUS);
}

The decimal size of the various sections of the binary:

$ size -A -t true 
true  :
section               size      addr
.interp                 28       568
.note.ABI-tag           32       596
.note.gnu.build-id      36       628
.gnu.hash               60       664
.dynsym               1416       728
.dynstr                676      2144
.gnu.version           118      2820
.gnu.version_r          96      2944
.rela.dyn              624      3040
.rela.plt             1104      3664
.init                   23      4768
.plt                   752      4800
.plt.got                 8      5552
.text                13017      5568
.fini                    9     18588
.rodata               3104     18624
.eh_frame_hdr          572     21728
.eh_frame             2908     22304
.init_array              8   2125160
.fini_array              8   2125168
.jcr                     8   2125176
.data.rel.ro            88   2125184
.dynamic               480   2125272
.got                    48   2125752
.got.plt               392   2125824
.data                  128   2126240
.bss                   416   2126368
.gnu_debuglink          52         0
Total                26211

Output of readelf -S true

$ readelf -S true
There are 30 section headers, starting at offset 0x7368:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000000238  00000238
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             0000000000000254  00000254
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .note.gnu.build-i NOTE             0000000000000274  00000274
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .gnu.hash         GNU_HASH         0000000000000298  00000298
       000000000000003c  0000000000000000   A       5     0     8
  [ 5] .dynsym           DYNSYM           00000000000002d8  000002d8
       0000000000000588  0000000000000018   A       6     1     8
  [ 6] .dynstr           STRTAB           0000000000000860  00000860
       00000000000002a4  0000000000000000   A       0     0     1
  [ 7] .gnu.version      VERSYM           0000000000000b04  00000b04
       0000000000000076  0000000000000002   A       5     0     2
  [ 8] .gnu.version_r    VERNEED          0000000000000b80  00000b80
       0000000000000060  0000000000000000   A       6     1     8
  [ 9] .rela.dyn         RELA             0000000000000be0  00000be0
       0000000000000270  0000000000000018   A       5     0     8
  [10] .rela.plt         RELA             0000000000000e50  00000e50
       0000000000000450  0000000000000018  AI       5    25     8
  [11] .init             PROGBITS         00000000000012a0  000012a0
       0000000000000017  0000000000000000  AX       0     0     4
  [12] .plt              PROGBITS         00000000000012c0  000012c0
       00000000000002f0  0000000000000010  AX       0     0     16
  [13] .plt.got          PROGBITS         00000000000015b0  000015b0
       0000000000000008  0000000000000000  AX       0     0     8
  [14] .text             PROGBITS         00000000000015c0  000015c0
       00000000000032d9  0000000000000000  AX       0     0     16
  [15] .fini             PROGBITS         000000000000489c  0000489c
       0000000000000009  0000000000000000  AX       0     0     4
  [16] .rodata           PROGBITS         00000000000048c0  000048c0
       0000000000000c20  0000000000000000   A       0     0     32
  [17] .eh_frame_hdr     PROGBITS         00000000000054e0  000054e0
       000000000000023c  0000000000000000   A       0     0     4
  [18] .eh_frame         PROGBITS         0000000000005720  00005720
       0000000000000b5c  0000000000000000   A       0     0     8
  [19] .init_array       INIT_ARRAY       0000000000206d68  00006d68
       0000000000000008  0000000000000008  WA       0     0     8
  [20] .fini_array       FINI_ARRAY       0000000000206d70  00006d70
       0000000000000008  0000000000000008  WA       0     0     8
  [21] .jcr              PROGBITS         0000000000206d78  00006d78
       0000000000000008  0000000000000000  WA       0     0     8
  [22] .data.rel.ro      PROGBITS         0000000000206d80  00006d80
       0000000000000058  0000000000000000  WA       0     0     32
  [23] .dynamic          DYNAMIC          0000000000206dd8  00006dd8
       00000000000001e0  0000000000000010  WA       6     0     8
  [24] .got              PROGBITS         0000000000206fb8  00006fb8
       0000000000000030  0000000000000008  WA       0     0     8
  [25] .got.plt          PROGBITS         0000000000207000  00007000
       0000000000000188  0000000000000008  WA       0     0     8
  [26] .data             PROGBITS         00000000002071a0  000071a0
       0000000000000080  0000000000000000  WA       0     0     32
  [27] .bss              NOBITS           0000000000207220  00007220
       00000000000001a0  0000000000000000  WA       0     0     32
  [28] .gnu_debuglink    PROGBITS         0000000000000000  00007220
       0000000000000034  0000000000000000           0     0     1
  [29] .shstrtab         STRTAB           0000000000000000  00007254
       000000000000010f  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

Output of objdump -T true (external functions dynamically linked on run-time)

$ objdump -T true

true:     file format elf64-x86-64

DYNAMIC SYMBOL TABLE:
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 __uflow
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 getenv
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 free
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 abort
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 __errno_location
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 strncmp
0000000000000000  w   D  *UND*  0000000000000000              _ITM_deregisterTMCloneTable
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 _exit
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 __fpending
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 textdomain
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 fclose
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 bindtextdomain
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 dcgettext
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 __ctype_get_mb_cur_max
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 strlen
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.4   __stack_chk_fail
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 mbrtowc
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 strrchr
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 lseek
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 memset
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 fscanf
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 close
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 __libc_start_main
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 memcmp
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 fputs_unlocked
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 calloc
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 strcmp
0000000000000000  w   D  *UND*  0000000000000000              __gmon_start__
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.14  memcpy
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 fileno
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 malloc
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 fflush
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 nl_langinfo
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 ungetc
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 __freading
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 realloc
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 fdopen
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 setlocale
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.3.4 __printf_chk
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 error
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 open
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 fseeko
0000000000000000  w   D  *UND*  0000000000000000              _Jv_RegisterClasses
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 __cxa_atexit
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 exit
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 fwrite
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.3.4 __fprintf_chk
0000000000000000  w   D  *UND*  0000000000000000              _ITM_registerTMCloneTable
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 mbsinit
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 iswprint
0000000000000000  w   DF *UND*  0000000000000000  GLIBC_2.2.5 __cxa_finalize
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.3   __ctype_b_loc
0000000000207228 g    DO .bss   0000000000000008  GLIBC_2.2.5 stdout
0000000000207220 g    DO .bss   0000000000000008  GLIBC_2.2.5 __progname
0000000000207230  w   DO .bss   0000000000000008  GLIBC_2.2.5 program_invocation_name
0000000000207230 g    DO .bss   0000000000000008  GLIBC_2.2.5 __progname_full
0000000000207220  w   DO .bss   0000000000000008  GLIBC_2.2.5 program_invocation_short_name
0000000000207240 g    DO .bss   0000000000000008  GLIBC_2.2.5 stderr

The implementation probably comes from GNU coreutils. These binaries are compiled from C; no particular effort has been made to make them smaller than they are by default.

You could try to compile the trivial implementation of true yourself, and you'll notice it's already few KB in size. For example, on my system:

$ echo 'int main() { return 0; }' | gcc -xc - -o true
$ wc -c true
8136 true

Of course, your binaries are even bigger. That's because they also support command line arguments. Try running /usr/bin/true --help or /usr/bin/true --version.

In addition to the string data, the binary includes logic to parse command line flags, etc. That adds up to about 20 KB of code, apparently.

For reference, you can find the source code here: http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/true.c


Stripping them down to core functionality and writing in assembler yields far smaller binaries.

Original true/false binaries are written in C, which by its nature pulls in various library + symbol references. If you run readelf -a /bin/true this is quite noticeable.

352 bytes for a stripped ELF static executable (with room to save a couple bytes by optimizing the asm for code-size).

$ more true.asm false.asm
::::::::::::::
true.asm
::::::::::::::
global _start
_start:
 mov ebx,0
 mov eax,1     ; SYS_exit from asm/unistd_32.h
 int 0x80      ; The 32-bit ABI is supported in 64-bit code, in kernels compiled with IA-32 emulation
::::::::::::::
false.asm
::::::::::::::
global _start
_start:
 mov ebx,1
 mov eax,1
 int 0x80
$ nasm -f elf64 true.asm && ld -s -o true true.o     # -s means strip
$ nasm -f elf64 false.asm && ld -s -o false false.o
$ ll true false
-rwxrwxr-x. 1 steve steve 352 Jan 25 16:03 false
-rwxrwxr-x. 1 steve steve 352 Jan 25 16:03 true
$ ./true ; echo $?
0
$ ./false ; echo $?
1
$

Or, with a bit of a nasty/ingenious approach (kudos to stalkr), create your own ELF headers, getting it down to 132 127 bytes. We're entering Code Golf territory here.

$ cat true2.asm
BITS 64
  org 0x400000   ; _start is at 0x400080 as usual, but the ELF headers come first

ehdr:           ; Elf64_Ehdr
  db 0x7f, "ELF", 2, 1, 1, 0 ; e_ident
  times 8 db 0
  dw  2         ; e_type
  dw  0x3e      ; e_machine
  dd  1         ; e_version
  dq  _start    ; e_entry
  dq  phdr - $$ ; e_phoff
  dq  0         ; e_shoff
  dd  0         ; e_flags
  dw  ehdrsize  ; e_ehsize
  dw  phdrsize  ; e_phentsize
  dw  1         ; e_phnum
  dw  0         ; e_shentsize
  dw  0         ; e_shnum
  dw  0         ; e_shstrndx
  ehdrsize  equ  $ - ehdr

phdr:           ; Elf64_Phdr
  dd  1         ; p_type
  dd  5         ; p_flags
  dq  0         ; p_offset
  dq  $$        ; p_vaddr
  dq  $$        ; p_paddr
  dq  filesize  ; p_filesz
  dq  filesize  ; p_memsz
  dq  0x1000    ; p_align
  phdrsize  equ  $ - phdr

_start:
  xor  edi,edi         ; int status = 0
      ; or  mov dil,1  for false: high bytes are ignored.
  lea  eax, [rdi+60]   ; rax = 60 = SYS_exit, using a 3-byte instruction: base+disp8 addressing mode
  syscall              ; native 64-bit system call, works without CONFIG_IA32_EMULATION

; less-golfed version:
;      mov  edi, 1    ; for false
;      mov  eax,252   ; SYS_exit_group from asm/unistd_64.h
;      syscall

filesize  equ  $ - $$      ; used earlier in some ELF header fields

$ nasm -f bin -o true2 true2.asm
$ ll true2
-rw-r--r-- 1 peter peter 127 Jan 28 20:08 true2
$ chmod +x true2 ; ./true2 ; echo $?
0
$