Is a "segmentation fault" a system error or program bug?

(tl;dr: It's almost certainly a bug in your program or a library it uses.)

A segmentation fault indicates that a memory access was not legal. That is, based on the issued request, the CPU issues a page fault because the page requested either isn't resident or has permissions that are incongruous with the request.

After that, the kernel checks to see whether it simply doesn't know anything about this page, whether it's just not in memory yet and it should put it there, or whether it needs to perform some special handling (for example, copy-on-write pages are read-only, and this valid page fault may indicate we should copy it and update the permissions). See Wikipedia for minor vs. major (e.g. demand paging) vs. invalid page faults.

Getting a segmentation fault indicates the invalid case: the page is not only not in memory, but the kernel also doesn't have any remediative actions to perform because the process doesn't logically have that page of its virtual address space mapped. As such, this almost certainly indicates a bug in either the program or one of its underlying libraries -- for example, attempting to read or write into memory which is not valid for the process. If the address had happened to be valid, it could have caused stack corruption or scribbled over other data, but reading or writing an unmapped page is caught by hardware.

The reason why it works with your larger dataset and not your smaller dataset is entirely specific to that program: it's probably a bug in that program's logic, which is only tripped for the smaller dataset for some reason (for example, your dataset may have a field representing the total number of entries, and if it's not updated, your program may blindly read into unallocated memory if it doesn't do other sanity checks).

It's several orders of magnitude less likely than simply being a software bug, but a segmentation fault may also be an indicator of hardware issues, like faulty memory, a faulty CPU, or your hardware tripping over errata (as an example, see here).

Getting segfaults due to failing hardware often results in sometimes-works behaviour, although a bad bit in physical RAM might get mapped the same way in repeated runs of a program if you don't run anything else in between. You can mostly rule out this possibility by booting memtest86+ to check for failing RAM, and using software like Prime95 to stress-test your CPU (including the FP math FMA execution units).


You can run the program in a debugger like gdb and get the backtrace at the time of the segmentation fault, which will likely indicate the culprit:

% gdb --args ./foo --bar --baz
(gdb) r   # run the program
[...wait for segfault...]
(gdb) bt  # get the backtrace for the current thread

A segmentation fault occurs when memory locations are accessed that aren't allowed to be accessed. Often, this is due to dereferencing a null pointer or accessing memory out of bounds of allocated memory.

If the full dataset works but a subset does not:

  • check if the program handles gracefully that a dataset does not contain a feature (maybe you allocate an array based on features existing in the dataset, but then assume a length based on a known list of features from the full dataset?)
  • is any group empty and that causes an issue? Generally any kind of off-by-one errors which would manifest if an array was empty?

It can be caused by either. Most often, it's a software bug, as described by Chris, but some hardware issues (especially bad memory and bad power supply) can lead to segfaults as well. A bad value is read from memory, which leads to executing a corrupted instruction, reading through a corrupted pointer, using a corrupted page table, etc., all of which lead to a segfault.

The difference, though, is that hardware-based segfaults are very much random, caused by one-in-several-million bit flipping events (if the system is more unstable than that, it doesn't even get to the point of booting up). Segfaults caused by software bugs, on the other hand, can be completely repeatable.