How to "jail" a process without being root?

More similar Qs with more answers worth attention:

  • https://stackoverflow.com/q/3859710/94687
  • https://stackoverflow.com/q/4410447/94687
  • https://stackoverflow.com/q/4249063/94687
  • https://stackoverflow.com/q/1019707/94687

NOTE: Some of the answers there point to specific solutions not yet mentioned here.

Actually, there are quite a few jailing tools with different implementation, but many of them are either not secure by design (like fakeroot, LD_PRELOAD-based), or not complete (like fakeroot-ng, ptrace-based), or would require root (chroot, or plash mentioned at fakechroot warning label).

These are just examples; I thought of listing them all side-by-side, with indication of these 2 features ("can be trusted?", "requires root to set up?"), perhaps at Operating-system-level virtualization Implementations.

In general, the answers there cover the full described range of possibilities and even more:

virtual machines/OS

  • (the answer mentioning virtual machines/OS)

kernel extension (like SELinux)

  • (mentioned in comments here),

chroot

Chroot-based helpers (which however must be setUID root, because chroot requires root; or perhaps chroot could work in an isolated namespace--see below):

[to tell a little more about them!]

Known chroot-based isolation tools:

  • hasher with its hsh-run and hsh-shell commands. (Hasher was designed for building software in a safe and repeatable manner.)
  • schroot mentioned in another answer
  • ...

ptrace

Another trustworthy isolation solution (besides a seccomp-based one) would be the complete syscall-interception through ptrace, as explained in the manpage for fakeroot-ng:

Unlike previous implementations, fakeroot-ng uses a technology that leaves the traced process no choice regarding whether it will use fakeroot-ng's "services" or not. Compiling a program statically, directly calling the kernel and manipulating ones own address space are all techniques that can be trivially used to bypass LD_PRELOAD based control over a process, and do not apply to fakeroot-ng. It is, theoretically, possible to mold fakeroot-ng in such a way as to have total control over the traced process.

While it is theoretically possible, it has not been done. Fakeroot-ng does assume certain "nicely behaved" assumptions about the process being traced, and a process that break those assumptions may be able to, if not totally escape then at least circumvent some of the "fake" environment imposed on it by fakeroot-ng. As such, you are strongly warned against using fakeroot-ng as a security tool. Bug reports that claim that a process can deliberatly (as opposed to inadvertly) escape fake‐ root-ng's control will either be closed as "not a bug" or marked as low priority.

It is possible that this policy be rethought in the future. For the time being, however, you have been warned.

Still, as you can read it, fakeroot-ng itself is not designed for this purpose.

(BTW, I wonder why they have chosen to use the seccomp-based approach for Chromium rather than a ptrace-based...)

Of the tools not mentioned above, I have noted Geordi for myself, because I liked that the controlling program is written in Haskell.

Known ptrace-based isolation tools:

  • Geordi
  • proot
  • fakeroot-ng
  • ... (see also How to achieve the effect of chroot in userspace in Linux (without being root)?)

seccomp

One known way to achieve isolation is through the seccomp sandboxing approach used in Google Chromium. But this approach supposes that you write a helper which would process some (the allowed ones) of the "intercepted" file access and other syscalls; and also, of course, make effort to "intercept" the syscalls and redirect them to the helper (perhaps, it would even mean such a thing as replacing the intercepted syscalls in the code of the controlled process; so, it doesn't sound to be quite simple; if you are interested, you'd better read the details rather than just my answer).

More related info (from Wikipedia):

  • http://en.wikipedia.org/wiki/Seccomp
  • http://code.google.com/p/seccompsandbox/wiki/overview
  • LWN article: Google's Chromium sandbox, Jake Edge, August 2009
  • seccomp-nurse, a sandboxing framework based on seccomp.

(The last item seems to be interesting if one is looking for a general seccomp-based solution outside of Chromium. There is also a blog post worth reading from the author of "seccomp-nurse": SECCOMP as a Sandboxing solution ?.)

The illustration of this approach from the "seccomp-nurse" project:

                      enter image description here

A "flexible" seccomp possible in the future of Linux?

There used to appear in 2009 also suggestions to patch the Linux kernel so that there is more flexibility to the seccomp mode--so that "many of the acrobatics that we currently need could be avoided". ("Acrobatics" refers to the complications of writing a helper that has to execute many possibly innocent syscalls on behalf of the jailed process and of substituting the possibly innocent syscalls in the jailed process.) An LWN article wrote to this point:

One suggestion that came out was to add a new "mode" to seccomp. The API was designed with the idea that different applications might have different security requirements; it includes a "mode" value which specifies the restrictions that should be put in place. Only the original mode has ever been implemented, but others can certainly be added. Creating a new mode which allowed the initiating process to specify which system calls would be allowed would make the facility more useful for situations like the Chrome sandbox.

Adam Langley (also of Google) has posted a patch which does just that. The new "mode 2" implementation accepts a bitmask describing which system calls are accessible. If one of those is prctl(), then the sandboxed code can further restrict its own system calls (but it cannot restore access to system calls which have been denied). All told, it looks like a reasonable solution which could make life easier for sandbox developers.

That said, this code may never be merged because the discussion has since moved on to other possibilities.

This "flexible seccomp" would bring the possibilities of Linux closer to providing the desired feature in the OS, without the need to write helpers that complicated.

(A blog posting with basically the same content as this answer: http://geofft.mit.edu/blog/sipb/33.)

namespaces (unshare)

Isolating through namespaces (unshare-based solutions) -- not mentioned here -- e.g., unsharing mount-points (combined with FUSE?) could perhaps be a part of a working solution for you wanting to confine filesystem accesses of your untrusted processes.

More on namespaces, now, as their implementation has been completed (this isolation technique is also known under the nme "Linux Containers", or "LXC", isn't it?..):

"One of the overall goals of namespaces is to support the implementation of containers, a tool for lightweight virtualization (as well as other purposes)".

It's even possible to create a new user namespace, so that "a process can have a normal unprivileged user ID outside a user namespace while at the same time having a user ID of 0 inside the namespace. This means that the process has full root privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace".

For real working commands to do this, see the answers at:

  • Is there a linux vfs tool that allows bind a directory in different location (like mount --bind) in user space?
  • Simulate chroot with unshare

and special user-space programming/compiling

But well, of course, the desired "jail" guarantees are implementable by programming in user-space (without additional support for this feature from the OS; maybe that's why this feature hasn't been included in the first place in the design of OSes); with more or less complications.

The mentioned ptrace- or seccomp-based sandboxing can be seen as some variants of implementing the guarantees by writing a sandbox-helper that would control your other processes, which would be treated as "black boxes", arbitrary Unix programs.

Another approach could be to use programming techniques that can care about the effects that must be disallowed. (It must be you who writes the programs then; they are not black boxes anymore.) To mention one, using a pure programming language (which would force you to program without side-effects) like Haskell will simply make all the effects of the program explicit, so the programmer can easily make sure there will be no disallowed effects.

I guess, there are sandboxing facilities available for those programming in some other language, e.g., Java.

  • Cf. "Sandboxed Haskell" project proposal.

  • NaCl--not mentioned here--belongs to this group, doesn't it?


Some pages accumulating info on this topic were also pointed at in the answers there:

  • page on Google Chrome's sandboxing methods for Linux
  • sandboxing.org group

This is a fundamental limitation of the unix permission model: only root can delegate.

You don't need to be root to run a virtual machine (not true of all VM technologies), but this is a heavyweight solution.

User-mode Linux is a relatively lightweight Linux-on-Linux virtualization solution. It's not that easy to set up; you'll need to populate a root partition (in a directory) with at least the minimum needed to boot (a few files in /etc, /sbin/init and its dependencies, a login program, a shell and utilities).