How to set up properly zram and swap

swapon have -p switch which sets the priority. I can set up:

swapon -p 32767 /dev/zram0
swapon -p 0 /dev/my-lvm-volume/swap

Or in /etc/fstab:

/dev/zram0              none swap sw,pri=32767 0 0
/dev/my-lvm-volume/swap none swap sw,pri=0     0 0

EDIT: Just for a full solution - such line may be helpful as udev rule:

KERNEL=="zram0", ACTION=="add", ATTR{disksize}="1073741824", RUN="/sbin/mkswap /$root/$name"

Sidenote: because of per-cpu locking, it is important to have as many zram-swaps as CPUs (modprobe zram_num_devices=n zram) instead of a single big one. RTFM!


For some reason there seems to be a lot of misinterpretation of https://www.kernel.org/doc/Documentation/blockdev/zram.txt

It clearly states:

2) Set max number of compression streams
Regardless the value passed to this attribute, ZRAM will always allocate multiple compression streams - one per online CPUs - thus allowing several concurrent compression operations. The number of allocated compression streams goes down when some of the CPUs become offline. There is no single-compression-stream mode anymore, unless you are running a UP system or has only 1 CPU online.

To find out how many streams are currently available:

cat /sys/block/zram0/max_comp_streams

But there is a common, persistent urban myth that max streams is 1.

It's plainly not true.

The two OSs where zram has proven effective Chrome OS & Android you a single device. Also they tweak page-cluster:

page-cluster controls the number of pages up to which consecutive pages are read in from swap in a single attempt. This is the swap counterpart to page cache readahead.
The mentioned consecutivity is not in terms of virtual/physical addresses, but consecutive on swap space – that means they were swapped out together.

It is a logarithmic value – setting it to zero means "1 page", setting it to 1 means "2 pages", setting it to 2 means "4 pages", etc. Zero disables swap readahead completely.

The default value is three (eight pages at a time). There may be some small benefits in tuning this to a different value if your workload is swap-intensive.

Lower values mean lower latencies for initial faults, but at the same time extra faults and I/O delays for following faults if they would have been part of that consecutive pages readahead would have brought in.

                — from the kernel documentation for /proc/sys/vm/*

So use echo "0" > /proc/sys/vm/page-cluster to force single page.

Much seems to originate from zram_config the debian/ubuntu package that for some reason seems to have very little correlation with the kernel documents for zram and has bred a series of Chinese whispers that in essence could be completely wrong.

With file swap do you create a swap drive for each core? Maybe that might answer your questions. Also to back this up Googles Chrome OS & Android which successfully employ with the above page-cluster as its not matching a disk so latency can be improved, single devices.

Also for a sys-admin what is important actual mem usage or vm mem usage? Most examples show creation via disk_size and totally ignore mem_limit. disk_size=uncompressed vm size. mem_limit=actual mem footprint limit.

It sort of makes disk_size choice confusing as its a virtual max size dependent on comp_alg ratio and the overhead of 0.1% of the size of the disk when not in use and really is a guesstimate of mem_limit * (approx 2 - 4) of frugal vs optimism.

zram_config doesn't even check for previous service usage and overwrites whilst a simple check of the zram sys class like the below will.

createZramSwaps () {
        totalmem=$(free|awk '/^Mem:/{print $2}')
        mem=$((( totalmem * MEM_FACTOR / 100 / BIG_CORES ) * 1024))

        # Check Zram Class created
        ZRAM_SYS_DIR='/sys/class/zram-control'
        if [ ! -d "${ZRAM_SYS_DIR}" ]; then
                modprobe zram
                RAM_DEV='0'
                echo ${COMP_ALG_SWAP} > /sys/block/zram${RAM_DEV}/comp_algorithm
                echo ${mem} > /sys/block/zram${RAM_DEV}/disksize
                mkswap /dev/zram${RAM_DEV}
                swapon -p ${SWAP_PRI} /dev/zram${RAM_DEV}
        else
                RAM_DEV=$(cat /sys/class/zram-control/hot_add)
                echo ${COMP_ALG_SWAP} > /sys/block/zram${RAM_DEV}/comp_algorithm
                echo ${mem} > /sys/block/zram${RAM_DEV}/disksize
                mkswap /dev/zram${RAM_DEV}
                swapon -p ${SWAP_PRI} /dev/zram${RAM_DEV}
        fi

        if [ "$BIG_CORES" -gt 1 ];then
                for i in $(seq $((BIG_CORES - 1))); do
                        RAM_DEV=$(cat /sys/class/zram-control/hot_add)
                        echo ${COMP_ALG_SWAP} > /sys/block/zram${RAM_DEV}/comp_algorithm
                        echo ${mem} > /sys/block/zram${RAM_DEV}/disksize
                        mkswap /dev/zram${RAM_DEV}
                        swapon -p ${SWAP_PRI} /dev/zram${RAM_DEV}
                done
        fi
}