Linux Container primitives

Xin Cheng
3 min readOct 29, 2021

--

Recently I had to block the rootless container traffic to specific IP launched by podman, but from host it is allowed. Using plain firewall or iptables rule cannot solve this problem, since podman uses slirp4netns to setup container network namespace. Therefore, the traffic from container is somewhat indistinguishable from traffic from host. Also there is no such thing in podman like Kubernetes network policy, finally the working solution is to understand how namespace works in Linux and Linux containers.

The most important enablers for Linux container is namespace and cgroup. There are quite a lot of articles around them. So I will just capture high-level use case and build quick intuition.

Namespace

  1. Namespace is used for process isolation
  2. There are 7 main namespaces: cgroup, ipc, mnt, net, pid, user, uts
  3. Process can share zero or more namespace with other processes
  4. Linux leverages Virtual File System for kernel to provide the filesystem interface to userspace programs, e.g. /proc has lots of information about process ID (/proc/<pid>/ns (e.g. /proc/$$/ns/uts has magic inode to kernel file, you can use ‘readlink /proc/<pid>/ns/net’ )
  5. There are some primitives for manipulating namespace, e.g. clone (Creating a child in a new namespace), unshare (run program with some namespaces unshared from parent), setns (Joining an existing namespace)

cgroups

Control resource allocations to processes. Service providers can use this to provide SLA to customer (e.g. 50% to customer A, 50% to customer B). The most resources are cpu, memory, blkio.

Capabilities

Capabilities grant granular permissions on specific “privileged” tasks to unprivileged processes.

Process Capabilities: tied to its user or inherited from its parent process

File Capabilities: another level of permission that is encoded in the binary file extended attributes being run by that process. If the process has proper capabilities but the binary that the process is trying to run doesn’t it may be denied due to the absence of a capability of some sort.

Next time, we will talk about how Docker uses these Linux primitives.

--

--

Xin Cheng
Xin Cheng

Written by Xin Cheng

Multi/Hybrid-cloud, Kubernetes, cloud-native, big data, machine learning, IoT developer/architect, 3x Azure-certified, 3x AWS-certified, 2x GCP-certified

No responses yet