Restrict size of buffer cache in Linux

If you do not want an absolute limit but just pressure the kernel to flush out the buffers faster, you should look at vm.vfs_cache_pressure

This variable controls the tendency of the kernel to reclaim the memory which is used for caching of VFS caches, versus pagecache and swap. Increasing this value increases the rate at which VFS caches are reclaimed.

Ranges from 0 to 200. Move it towards 200 for higher pressure. Default is set at 100. You can also analyze your memory usage using the slabtop command. In your case, the dentry and *_inode_cache values must be high.

If you want an absolute limit, you should look up cgroups. Place the Ceph OSD server within a cgroup and limit the maximum memory it can use by setting the memory.limit_in_bytes parameter for the cgroup.

memory.memsw.limit_in_bytes sets the maximum amount for the sum of memory and swap usage. If no units are specified, the value is interpreted as bytes. However, it is possible to use suffixes to represent larger units — k or K for kilobytes, m or M for Megabytes, and g or G for Gigabytes.

References:

[1] - GlusterFS Linux Kernel Tuning

[2] - RHEL 6 Resource Management Guide


I don't know about A % but, You can set a time limit so it drops it after x amount of minutes.

First in a terminal

sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

To clear current caches.

Make it a cron-job Press Alt-F2, type gksudo gedit /etc/crontab, Then Add this line near the bottom.

 */15 *    * * *   root    sync && echo 3 > /proc/sys/vm/drop_caches

This cleans every 15 minutes. You can set to 1 or 5 minutes if you really want to by changing the first parameter to * or */5 instead of */15

To see your free RAM, excepting cache:

free -m | sed -n -e '3p' | grep -Po "\d+$

I think your hunch at the very end of your question is on the right track. I'd suspect either A, NUMA-aware memory allocation migrating pages between CPUs, or B, more likely, the defrag code of transparent hugepages trying to find contiguous, aligned regions.

Hugepages and transparent hugepages has been identified for both marked performance improvements on certain workloads and responsible for consuming enormous amounts of CPU time without providing much benefit.

It'd help to know which kernel you're running, the contents of /proc/meminfo (or at least the HugePages_* values.), and, if possible, more of the vtune profiler callgraph referencing pageblock_pfn_to_page().

Also, if you'd indulge my guess, try disable hugepage defrag with:

echo 'never' >/sys/kernel/mm/transparent_hugepage/defrag

(it may be this instead, depending on your kernel:)

echo 'never' > /sys/kernel/mm/redhat_transparent_hugepage/defrag

Lastly, is this app using many tens of gigs of ram something you wrote? What language?

Since you used the term, "faulting in memory pages," I'm guessing you're familiar enough with operating design and virtual memory. I struggle to envision a situation/application that would be faulting so aggressively that isn't reading in lots of I/O - almost always from the buffer cache that you're trying to limit.

(If you're curious, check out mmap(2) flags like MAP_ANONYMOUS and MAP_POPULATE and mincore(2) which can be used to see which virtual pages actually have a mapped physical page.)

Good Luck!