Is C++ considered a Von Neumann programming language?

TL:DR: The C++ abstract machine is a type of PRAM (Parallel Random Access Machine).

From the Von Neumann Languages Wikipedia article you linked:

Many widely used programming languages such as C, C++ and Java have ceased to be strictly von Neumann by adding support for parallel processing, in the form of threads.

Cease describes a transition from being to not-being. So yes, before C++11 added threads, C++ was strictly a Von Neumann language according to Wikipedia. (And after it's still basically a VN language; having multiple threads sharing the same address-space doesn't fundamentally change how C++ works.)

The interesting parts of being a Von Neumann architecture in this context:

  • Having addressable RAM at all, allowing efficient access (modulo cache / paging) to any object at any time
  • Storing the program in RAM: function pointers are possible and efficient, without requiring an interpreter
  • Having a program counter that steps through instructions in the stored program: The natural model is an imperative programming language that does one thing at a time. This is so fundamental that it's easy to forget it's not the only model! (vs. an FPGA or ASIC or something where all gates potentially do something in parallel every clock cycle. Or a MIMD GPU where a computational "kernel" you write is run over all the data potentially in parallel, without implicit sequencing of what order each element is processed in. Or Computational RAM: put ALUs in the memory chips to bypass the Von Neumann bottleneck)

IDK why the wiki article mentions self-modifying code, though; like most languages, ISO C++ doesn't standardize that and is fully compatible with ahead-of-time compilation for a split-bus / split-address-space Harvard architecture. (No eval or anything else that would require an interpreter or JIT.) Or on a normal CPU (Von Neumann), strict W^X memory protection and never using mprotect to change page permissions from writeable to executable.

Of course most real C++ implementations do provide well-defined ways to write machine-code into a buffer and cast to a function pointer, as extensions. (e.g. GNU C/C++'s __builtin___clear_cache(start, end) is named for I-cache sync, but defined in terms of making it safe to call data as a function wrt. dead-store elimination optimizations as well, so it's possible for code to break without it even on x86 which has coherent I-caches.) So implementations can extend ISO C++ to take advantage of this feature of Von Neumann architectures; ISO C++ is intentionally limited in scope to allow for differences between OSes and stuff like that.

Note that being Von Neumann does not strictly imply supporting indirect addressing modes. Some early CPUs didn't, and self-modifying code (to rewrite an address hard-coded in an instruction) was necessary to implement things that we now use indirection for.

Also note that John Von Neumann was a really famous guy, with his name attached to a lot of fundamental things. Some of the connotations of Von Neumann architecture (as opposed to Harvard) aren't really relevant in all contexts. e.g. the "Von Neumann language" term doesn't so much care about Von Neumann vs. Harvard; It cares about stored-program with a program counter vs. something like Cellular Automata or a Turing machine (with a real tape). Getting extra bandwidth by using a separate bus (or just split caches) to fetch instructions (Harvard) is just a performance optimization, not a fundamental change.

What is an abstract machine model / model of computation anyway?

First of all, there are some models of computation that are weaker than Turing machines, like Finite State Machines. There are also non-sequential models of computation, for example Cellular Automata (Conway's Game of Life), where multiple things happen in parallel at each "step".

The Turing machine is the most widely-known (and mathematically simple) sequential abstract machine that is as "strong" as we know how to make. Without any kind of absolute memory addressing, just relative movement on the tape, it naturally provides infinite storage. This is important, and makes all other kinds of abstract machines very unlike real CPUs in some ways. Remember, these models of computation are used for theoretical computer science, not engineering. Problems like finite amounts of memory or performance aren't relevant to what's computable in theory, only in practice.

If you can compute something on a Turing machine, you can compute it on any other Turing-complete model of computation (by definition), perhaps with a much simpler program or perhaps not. Turing machines aren't very nice to program, or at least very different from assembly language for any real CPU. Most notably, the memory isn't random-access. And they can't easily model parallel computing / algorithms. (If you want to prove things about an algorithm in the abstract, having an implementation of it for an abstract machine of some sort is probably a good thing.)

It's also potentially interesting to prove what features an abstract machine needs to have in order to be Turing complete, so that's another motivation for developing more of them.

There are many others that are equivalent in terms of computability. The RAM machine model is most like real-world CPUs that have an array of memory. But being a simple abstract machine, it doesn't bother with registers. In fact, just to make things more confusing, it calls its memory cells an array of registers. A RAM machine supports indirect addressing, so the correct analogy to real world CPUs is definitely to memory, not CPU registers. (And there are an unbounded number of registers, each of unbounded size. Addresses keep going forever and every "register" needs to able to hold a pointer.) A RAM machine can be Harvard: program stored in a separate finite-state portion of the machine. Think of it like a machine with memory-indirect addressing modes so you can keep "variables" in known locations, and use some of them as pointers to unbounded-size data structures.

The program for an abstract RAM machine looks like assembly language, with load/add/jnz and whatever other selection of instructions you want it to have. The operands can be immediates or register numbers (what normal people would call absolute addresses). Or if the model has an accumulator, then you have a load/store machine with an accumulator a lot more like a real CPU.

If you ever wondered why a "3-address" machine like MIPS was called that instead of 3-operand, it's probably 1. because the instruction encoding needs room / I-fetch bandwidth through the Von Neumann bottleneck for 3 explicit operand locations (register number) and 2. because in a RAM abstract machine, operands are memory addresses = register numbers.

C++ can't be Turing complete: pointers have a finite size.

Of course, C++ has huge differences from a CS abstract machine model: C++ requires every type to have a compile-time-constant finite sizeof, so C++ can't be Turing-complete if you include the infinite-storage requirement. Everything in Is C actually Turing-complete? on cs.SE applies to C++, too: the requirement that types have a fixed width is a showstopper for infinite storage. See also

So Computer Science abstract machines are silly, what about the C++ Abstract machine?

They of course have their purposes, but there's a lot more interesting stuff we can say about C++ and what kind of machine it assumes if we get a bit less abstract and also talk about what a machine can do efficiently. Once we talk about finite machine machines and performance, these differences become relevant.

First, to run C++ at all, and second, to run without huge and/or unacceptable performance overheads. (e.g. the HW will need to support pointers fairly directly, probably not with self-modifying code that stores the pointer value into every load/store instruction that uses it. And that wouldn't work in C++11 where threading is part of the language: the same code can be operating on 2 different pointers at once.)

We can look in more detail at the model of computation assumed by the ISO C++ standard, which describes how the language works in terms of what happens on the Abstract Machine. Real implementations are required to run code on real hardware that runs "as-if" the abstract machine were executing C++ source, reproducing any/all observable behaviour (observable by other parts of the program without invoking UB).

C/C++ has memory and pointers, so it's pretty definitely a type of RAM machine.

Or these days, a Parallel random-access machine, adding shared memory to the RAM model, and giving each thread its own program counter. Given that std::atomic<> release-sequences make all previous operations visible to other threads, the "establishing a happens-before relationship" model of synchronization is based around coherent shared memory. Emulating it on top of something that required manual triggering of syncing / flushing would be horrible for performance. (Very clever optimizations may prove when that can be delayed so not every release-store has to suffer, but seq-cst will probably be horrible. seq-cst has to establish a global order of operations that all threads agree on; that's hard unless a store becomes visible to all other threads at the same time.)

But note that in C++, actual simultaneous access is UB unless you do it with atomic<T>. This allows the optimizer to freely use CPU registers for locals, temporaries, and even globals without exposing registers as a language feature. UB allows optimization in general; that's why modern C/C++ implementations are not portable assembly language.

The historical register keyword in C/C++ means a variable can't have its address taken, so even a non-optimizing compiler can keep it in a CPU register, not memory. We're talking about CPU registers, not the computer science RAM Machine "register = addressable memory location". (Like rax..rsp/r8..r15 on x86, or r0..r31 on MIPS). Modern compilers do escape analysis and naturally keep locals in registers normally, unless they have to spill them. Other types of CPU registers are possible, e.g. a register-stack like x87 FP registers. Anyway, the register keyword existed to optimize for this type of machine. But it doesn't rule out running on a machine with no registers, only memory-memory instructions.

C++ is designed to run well on a Von Neumann machine with CPU registers, but the C++ abstract machine (that the standard uses to define the language) doesn't allow execution of data as code, or say anything about registers. Each C++ thread does have its own execution context, though, and that models PRAM threads/cores each having their own program counter and callstack (or whatever an implementation uses for automatic storage, and for figuring out where to return.) In a real machine with CPU registers, they're private to each thread.

All real-world CPUs are Random Access Machines, and have CPU registers separate from addressable / indexable RAM. Even CPUs that can only compute with a single accumulator register typically have at least one pointer or index register that at least allows some limited array indexing. At least all CPUs that work well as C compiler targets.

Without registers, every machine instruction encoding would need absolute memory addresses for all operands. (Maybe like a 6502 where the "zero page", the low 256 bytes of memory, was special, and there are addressing modes that use a word from the zero page as the index or pointer, to allow 16-bit pointers without any 16-bit architectural registers. Or something like that.) See Why do C to Z80 compilers produce poor code? on RetroComputing.SE for some interesting stuff about real-world 8-bit CPUs where a fully compliant C implementation (supporting recursion and reentrancy) is quite expensive to implement. A lot of of the slowness is that 6502 / Z80 systems were too small to host an optimizing compiler. But even a hypothetical modern optimizing cross-compiler (like a gcc or LLVM back-end) would have a hard time with some things. See also a recent answer on What is an unused memory address? for a nice explanation of 6502's zero-page indexed addressing mode: 16-bit pointer from an absolute 8-bit address in memory + 8-bit register.

A machine without indirect addressing at all couldn't easily support array indexing, linked lists, and definitely not pointer variables as first-class objects. (Not efficiently anyway)

What's efficient on real machines -> what idioms are natural

Most of C's early history was on PDP-11, which is a normal mem + register machine where any register can work as a pointer. Automatic storage maps to registers, or to space on the callstack when they need to be spilled. Memory is a flat array of bytes (or chunks of char), no segmentation.

Array indexing is just defined in terms of pointer arithmetic instead of being its own thing perhaps because PDP-11 could do that efficiently: any register can hold an address and be dereferenced. (vs. some machines with only a couple special registers of pointer width, and the rest narrower. That was common on an 8-bit machine, but early 16-bit machines like PDP-11 had little enough RAM that one 16-bit register was enough for an address).

See Dennis Ritchie's article The Development of the C Language for more history; C grew out of B on PDP-7 Unix. (The first Unix was written in PDP-7 asm). I don't know much about PDP-7, but apparently BCPL and B also use pointers that are just integers, and arrays are based on pointer-arithmetic.

PDP-7 is an 18-bit word-addressable ISA. That's probably why B has no char type. But its registers are wide enough to hold pointers so it does naturally support B and C's pointer model (that pointers aren't really special, you can copy them around and deref them, and you can take the address of anything). So flat memory model, no "special" area of memory like you find on segmented machines or some 8-bit micros with a zero page.

Things like C99 VLAs (and unlimited size local variables) and unlimited reentrancy and recursion imply a callstack or other allocation mechanism for function local-variable context (aka stack frames on a normal machine that uses a stack pointer.)

I think trying to pin C++ (or most other languages) to a single architecture model is difficult at best. Let's consider C++ 98/03. As the question says, they fit with the Von Neumann model. Oh, but wait--they also fit about equally well (if not better) with Harvard architecture.

For that matter, Harvard Architecture is really more a family of models than a single model. In particular, a CPU is usually seen as using a Harvard Architecture if it has separate caches for code and data--even if it's something like an x86, where the hardware does its best to hide that split from the code (e.g., you can write self-modifying code, and after you've modified the code, what you execute will be the new code--though there can be a substantial penalty, because the instruction cache isn't optimized to deal with modifications).

But "Harvard Architecture" can also be used to describe things like some DSPs, which have two (or three) entirely separate memory buses connected to physically separate memory:

enter image description here

The language rules to accommodate this are actually fairly subtle--to the point that unless you were looking for them, it would be easy to miss them entirely. For example, C and C++ define a pointer to a function as a separate thing from a pointer to data. They're also pretty careful to avoid giving any guarantees about things like addresses being comparable except under fairly limited circumstances (e.g., in C++ you're not guaranteed anything about comparing the address of a function to the address of data).

Since the C++11 standard, however, that's changed a bit. While the core language retains the basic character of having some stream of instructions that are executed in a specified order, the library adds the ability to create multiple threads that can execute in parallel. These are allowed to communicate via shared memory, but you have to use an atomic variable or a memory fence to guarantee any degree of success. That allows implementation on machines anywhere from extremely tightly coupled, to fairly loosely coupled, where (for example) communication that looks like shared memory may actually involve sending data over something like a network connection, with a signal sent to tell the far end when a transmission is complete.

So, again, the specification of the language isn't really tied to what would normally be seen as a single architecture at the hardware level. Rather the contrary, while it probably works better for what would normally be thought of as fairly tightly coupled machines, I believe it could be realized on fairly loosely coupled machines such as a cluster of entirely separate, disparate machines. You'd typically need (or at least want) to change how you wrote your code, but at least in theory you could write portable C++ code that ran on either.

C++ is a specification written in English in a standard. See n3337 -late draft of C++11.

As Jerry Coffin and Peter Cordes are explaining, the official model is a parallel random machine.

But you generally code in C++ by using a compiler and running your program (unless you code embedded systems) under some operating system (e.g. Windows or Linux; read also this). Many OSes provide dynamic loading facilities (e.g. dlopen(3) on Linux) and most computers could have C++ compilers.

Then you practically could generate C++ code at runtime, fork a compilation of that generated C++ code as a plugin, then dlopen that generated plugin. And on Linux you can do that many times (e.g. have dozen of thousands of such generated plugins, see my bismon and manydl.c programs).

You could also find several JIT-compiling C++ libraries, such as libgccjit or LLVM.

Practically speaking, C++ programs can generate code at runtime then use it (even if that is outside of the C++ standard). And that is characteristic of Von Neumann machines.