Android - How Magisk works?

Most part of your question is covered in Magisk Documentation. I will quote one of my previous answers to a different question, with some unnecessary details :)

PREREQUISITES:

To have a comprehensive understanding of how Magisk works, one must have basic understanding of:

  • Discretionary Access Control (DAC)
    • User identifiers ([ESR]UID), set-user-ID
    • Linux Capabilities (process and file) which provide a fine-grained control over superuser permissions
  • Mandatory Access Control (MAC)
    • SELinux on Android
  • Mount namespaces, Android's usage of namespaces for Storage Permissions
  • Bind mount
  • Android boot process, partitions and filesystems
  • Android init services (the very first process started by kernel)
    • *.rc files
  • Structure of boot partition (kernel + DTB + ramdisk), Device Tree Blobs, DM-Verity (Android Verified Boot), Full Disk Encryption / File Based Encryption (FDE/FBE) etc.

WHAT IS ROOT?

Gaining root privileges means to run a process (usually shell) with UID zero (0) and all of the Linux capabilities so that the privileged process can bypass all kernel permission checks.
Superuser privileges are gained usually by executing a binary which has either:

  • set-user-ID-root (SUID) bit set on it

    This is how su and sudo work on Linux in traditional UNIX DAC. Non-privileged users execute these binaries to get root rights.

  • Or File capabilities (setgid,setuid+ep) set on it

    This is the less common method used.

In both cases the calling process must have all capabilities in its Bounding Set (one of the 5 capabilities categories a process can have) to have real root privileges.

HOW ANDROID RESTRICTS ROOT ACCESS?

Up to Android 4.3, one could simply execute a set-user-ID-root su binary to elevate its permissions to root user. However there were a number of Security Enhancements in Android 4.3 which broke this behavior:

  • Android switched to file capabilities instead of relying on set-user-ID type of security vulnerabilities. A more secure mechanism: Ambient capabilities has also been introduced in Android Oreo.
  • System daemons and services can make use of file capabilities to gain process capabilities (see under Transformation of capabilities during execve) but apps can't do that either because application code is executed by zygote with process control attribute NO_NEW_PRIVS, ignoring set-user-ID as well as file capabilities. SUID is also ignored by mounting /system and /data with nosuid option for all apps.
  • UID can be switched only if calling process has SETUID/SETGID capability in its Bounding set. But Android apps are made to run with all capabilities already dropped in all sets using process control attribute CAPBSET_DROP.
  • Starting with Oreo, apps' ability to change UID/GID has been further suppressed by blocking certain syscalls using seccomp filters.

Since the standalone su binaries stopped working with the release of Jelly Bean, a transition was made to su daemon mode. This daemon is launched during boot which handles all superuser requests made by applications when they execute the special su binary (1). install-recovery.sh (located under /system/bin/ or /system/etc/) which is executed by a pre-installed init service flash_recovery (useless for adventurers; updates recovery after an OTA installation) was used to launch this SU daemon on boot.

The next major challenge was faced when SELinux was set strictly enforcing with the release of Android 5.0. flash_recovery service was added to a restricted SELinux context: u:r:install_recovery:s0 which stopped the unadulterated access to system. Even the UID 0 was bound to perform a very limited set of tasks on device. So the only viable option was to start a new service with unrestricted SUPER CONTEXT by patching the SELinux policy. That's what was done (temporarily for Lollipop (2, 3) and then permanently for Marshmallow) and that's what Magisk does.

HOW MAGISK WORKS?

Flashing Magisk usually requires a device with unlocked bootloader so that boot.img could be dynamically modified from custom recovery (4) or a pre-modified boot.img (5) could be flashed/booted e.g. from fastboot.
As a side note, it's possible to start Magisk on a running ROM if you somehow get root privileges using some exploit in OS (6). However most of such security vulnerabilities have been fixed over time (7).
Also due to some vulnerabilities at SoC level (such as Qualcomm's EDL mode), locked bootloader can be hacked to load modified boot / recovery image breaking the Chain of Trust. However these are only exceptions.

Once the device boots from patched boot.img, a fully privileged Magisk daemon (with UID: 0, full capabilities and unrestricted SELinux context) runs from the very start of booting process. When an app needs root access, it executes Magisk's (/sbin/)su binary (worldly accessible by DAC and MAC) which doesn't change UID/GID on its own, but just connects to the daemon through a UNIX socket (8) and asks to provide the requesting app a root shell with all capabilities. In order to interact with user to grant/deny su requests from apps, the daemon is hooked with the Magisk Manager app that can display user interface prompts. A database (/data/adb/magisk.db) of granted/denied permissions is built by the daemon for future use.

Booting Process:
Android kernel starts init with SELinux in permissive mode on boot (with a few exceptions). init loads /sepolicy (or split policy) before starting any services/daemons/processes, sets it enforcing and then switches to its own context. From here afterwards, even init isn't allowed by policy to revert back to permissive mode (9, 10). Neither the policy can be modified even by root user (11). Therefore Magisk replaces /init file with a custom init which patches the SELinux policy rules with SUPER CONTEXT (u:r:magisk:s0) and defines the service to launch Magisk daemon with this context. Then the original init is executed to continue booting process (12).

Systemless Working:
Since the init file is built in boot.img, modifying it is unavoidable and /system modification becomes unnecessary. That's where the systemless term was coined (13, 14). Main concern was to make OTAs easier - re-flashing the boot image (and recovery) is less hassle than re-flashing system. Block-Based OTA on a modified /system partition will fail because it enables the use of dm-verity to cryptographically sign the system partition.

System-as-root:
On newer devices using system-as-root kernel doesn't load ramdisk from boot but from system. So [system.img]/init needs to be replaced with Magisk's init. Also Magisk modifies /init.rc and places its own files in /root and /sbin. It means system.img is to be modified, but Magisk's approach is not to touch system partition.

On A/B devices during normal boot skip_initramfs option is passed from bootloader in kernel cmdline as boot.img contains ramdisk for recovery. So Magisk patches kernel binary to always ignore skip_initramfs i.e. boot in recovery, and places Magisk init binary in recovery ramdisk inside boot.img. On boot when kernel boots to recovery, if there's no skip_initramfs i.e. user intentionally booted to recovery, then Magisk init simply executes recovery init. Otherwise system.img is mounted at /system_root by Magisk init, contents of ramdisk are then copied to / cleaning everything previously existing, files are added/modified in rootfs /, /system_root/system is bind-mounted to /system, and finally [/system]/init is executed (15, 16).

However things have again changed with Q, now /system is mounted at / but the files to be added/modified like /init, /init.rc and /sbin are overlaid with bind mounts (17).

On non-A/B system-as-root devices, Magisk needs to be installed to recovery ramdisk in order to retain systemless approach because boot.img contains no ramdisk (18).

Modules:
An additional benefit of systemless approach is the usage of Magisk Modules. If you want to place some binaries under /system/*bin/ or modify some configuration files (like hosts or dnsmasq.conf) or some libraries / framework files (such as required by mods like XPOSED) in /system or /vendor, you can do that without actually touching the partition by making use of Magic Mount (based on bind mounts). Magisk supports adding as well removing files by overlaying them.

MagiskHide: (19)
Another challenge was to hide the presence of Magisk so that apps won't be able to know if the device is rooted. Many apps don't like rooted devices and may stop working. Google was one of the major affectees, so they introduced SafetyNet as a part of Play Protect which runs as a GMS (Play Services) process and tells apps (including their own Google Pay) and hence their developers that the device is currently in a non-tampered state (20).

Rooting is one of the many possible tempered states, others being un-Verified Boot, unlocked bootloader, CTS non-certification, custom ROM, debuggable build, permissive SELinux, ADB turned on, some bad properties, presence of Lucky Patcher, Xposed etc. Magisk uses some tricks to make sure that most of these tests always pass, though apps can make use of other Android APIs or read some files directly. Some modules provide additional obfuscation.

Other than hiding its presence from Google's SafeyNet, Magisk also lets users hide root (su binary and any other Magisk related files) from any app, again making using of bind mounts and mount namespaces. For this, zygote has to be continuously watched for newly forked apps' VMs.

However it's a tough task to really hide rooted device from apps as new techniques evolve to detect Magisk's presence, mainly from /proc or other filesystems. So a number of quirks are done to properly support hiding modifications from detection. Magisk tries to remove all traces of its presence during booting process (21).


Magisk also supports:

  • Disabling dm-verity and /data encryption by modifying fstab (in ramdisk, /vendor or DTB). See How to disable dm-verity on Android?
  • Changing read-only properties using resetprop tool, Modifying boot.img using magiskboot and Modifying SELinux policy using magiskpolicy.
  • Executing boot scripts using init.d-like mechanism (22).

That's a brief description of Magisk's currently offered features (AFAIK).


FURTHER READING:

  • How does SuperSU provide root privilege?
  • How to manually root a phone?
  • Android Partitions and Filesystems
  • Android Boot Process
  • What special privileges “/system/xbin/su” does have w.r.t. root access?
  • What sepolicy context will allow any other context to access it?

Magisk provides root access by providing a working "root" binary mounted at /sbin/magisk. Any application that tries to run this binary will bring up Magisk to grant them root access, which is in turn managed and maintained by the Magisk Manager application.

The /boot partition is a separate partition that stores some data required to boot the system. It includes initialization of some very low-level mechanisms like the Linux kernel, device drivers, file systems etc, before the upper-layer Android OS is brought up. It is separated in a way such that Linux-level stuff is stored in it while Android-level stuff (SystemUI, Settings etc.) is stored in the /system partition. Modifying /boot does not count as modifying /system, the latter of which is what DM-verity and AVB usually checks.

And Magisk patches and integrates itself into the /boot partition, so it doesn't touch the system partition at all. It uses a technique called "a bind mount" to change the content of system files that other programs see, without actually modifying the underlying filesystem beneath the system partition (so "real" files are left intact).

Tags:

Rooting

Magisk