how is page size determined in virtual address space?

You can find out a system's default page size by querying its configuration via the getconf command:

$ getconf PAGE_SIZE
4096

or

$ getconf PAGESIZE
4096

NOTE: The above units are typically in bytes, so the 4096 equates to 4096 bytes or 4kB.

This is hardwired in the Linux kernel's source here:

Example

$ more /usr/src/kernels/3.13.9-100.fc19.x86_64/include/asm-generic/page.h
...
...
/* PAGE_SHIFT determines the page size */

#define PAGE_SHIFT  12
#ifdef __ASSEMBLY__
#define PAGE_SIZE   (1 << PAGE_SHIFT)
#else
#define PAGE_SIZE   (1UL << PAGE_SHIFT)
#endif
#define PAGE_MASK   (~(PAGE_SIZE-1))

How does shifting give you 4096?

When you shift bits, you're performing a binary multiplication by 2. So in effect a shifting of bits to the left (1 << PAGE_SHIFT) is doing the multiplication of 2^12 = 4096.

$ echo "2^12" | bc
4096

The hardware (specifically, the MMU, which is part of the CPU) determines what page sizes are possible. There is no relation to the processor register size and only an indirect relation to the address space size (in that the MMU determines both).

Almost all architectures support a 4kB page size. Some architectures support larger pages (and a few also support smaller pages), but 4kB is a very widespread default.

Linux supports two page sizes:

  • Normal-sized pages, which I believe are 4kB by default on all architectures, though some architectures allow other values, e.g. 16kB on ARM64 or 8kB, 16kB or 64kB on IA64. These correspond to the deepest level of descriptors on the MMU (what Linux calls PTE).
  • Huge pages, if compiled in (CONFIG_HUGETLB_PAGE is necessary, and CONFIG_HUGETLBFS as well for most uses). This corresponds to the second-deepest level of MMU descriptors (what Linux calls PMD) (or at least it usually does, I don't know if this holds on all architectures).

The page size is a compromise between memory usage, memory usage and speed.

  • A larger page size means more waste when a page is partially used, so the system runs out of memory sooner.
  • A deeper MMU descriptor level means more kernel memory for page tables.
  • A deeper MMU descriptor level means more time spent in page table traversal.

The gains of larger page sizes are tiny for most applications, whereas the cost is substantial. This is why most systems use only normal-sized pages.

You can query the (normal) page size on your system with the getconf utility or the C function sysconf.

$ getconf PAGE_SIZE
4096

Using huge pages requires mounting the hugetlbfs filesystem and mmapping files there.