Evaluated vs. unevaluated Association

Summary

The confusion we observe here is largely due to ambiguous use of the symbol Association as an expression head. On the one hand, Association can be used as a constructor function to build an association object. On the other hand, it serves as the symbolic head of a constructed association object. The difference between these two uses is normally hard to spot since the FullForm of a constructor expression is visually indistinguishable from the synthetic full-form of a constructed association object. The two uses have different semantics, as observed in the question.

Many atomic types of Wolfram Language suffer from this same ambiguity.

Discussion (current as of V12)

Notwithstanding the ideal in Wolfram Language that everything is an expression, the basic head-with-elements composite expression is not always a good representation for some data types. There might be efficiency issues or the representation might include details that are too distracting for the user to see.

The way to deal with such issues is to introduce new optimized types of expression to represent the challenging data types. These optimized types are usually atomic, but some go so far as to fully simulate composite expressions (e.g. packed arrays). These custom objects might be built into the kernel (e.g. associations or images) or they may be implemented in high-level WL code (e.g. datasets). Either way, the internal subparts of these optimized types are generally not observable to the usual part access and pattern-matching facilities in the language. Not observable, that is, unless the developer of the feature provided purpose-built functions to simulate such access.

Associations use of this kind of optimization. The optimized object is a handle to a kernel-provided hash-trie implementation that offers both memory and speed advantages over an equivalent but unoptimized high-level expression. ByteArray and Image are just two of many other examples of similarly motivated optimizations.

The constructor expression for an association is composite, but the produced object is atomic:

AtomQ[Unevaluated@<| 1 -> 2 |>]
(* False *)

AtomQ[<| 1 -> 2 |>]
(* True *)

The implementation of association provides a synthetic FullForm for these atoms:

<| 1 -> 2 |> // FullForm
(* Associaton[Rule[1, 2]] *)

... but the synthetic Part implementation does not match that synthetic FullForm:

Part[<| 1 -> 2 |>, 1]
(* 2, but if the full form were true then it should be 1 -> 2 *)

There are good practical reasons for this mismatch, but they can lull one into thinking that an Association atom is just a normal composite expression.

Associations are not unique with respect to such anomalies. Almost all atomic optimizations of expressions involve mismatches of this kind. What is more, the language does not enforce consistency -- it is up to the developer of each optimization to decide how fully to simulate basic expression behaviour.

Here are some things to watch out for:

  • It is not possible to tell if an expression is atomic just by looking at its input form or even its full form.
  • It is not possible to tell if an expression is optimized just by looking at its head. Even though some constructor functions return an object with a different head (e.g. Interpolation -> InterpolationFunction), most do not (e.g. Association -> Association). The design choice of using the same head for distinct expression types is an interesting one but will not be taken up in detail here.
  • A strong clue that an expression has been optimized is that part access or pattern-matching gives surprising results. For associations, the initial implementation did not support pattern-matching at all. A simulation was added in later releases, but a scan of pattern-matching vs. association questions on this site will show that the simulation is not perfect. Graph objects provide subelement access through purpose-built property functions and do not support the regular part and matching mechanisms.
  • The input syntax and display forms of an optimized type may not be symmetric. As examples, try examining the input forms of Dataset[{1}] or Image[{{1}}] or ByteArray[{1}].
  • The display forms of optimized expressions sometimes leave out critical information required to reconstitute the object's state (e.g. association examples from the question or copy-paste-evaluating a packed array's input form).
  • The display forms of the unoptimized version of an optimized expression will generally not use short input syntax. For example, HoldForm[Complex[1, 2] // InputForm] or similar expressions using Rational or Association.
  • Atomic expressions can be written in high-level WL code using SetNoEntry. Such expressions are opaque to most forms of pattern-matching (but not all). Some built-in functionality use this (e.g. Dataset).
  • When investigating the transition from unoptimized to optimized expression forms, beware that any evaluation will disturb the observations. We must write expressions like AtomQ[Unevaluated@...]. TreeForm in particular is known to have evaluation leaks that require doubling up constructions such as Unevaluated to see the actual structure (e.g. TreeForm[Unevaluated@Unevaluated@<|1 -> 2|>]).

Associations are atomic (AtomQ).

Except for the most fundamental atomic types (such as Integer, String, Symbol, Real, etc.), most atoms have a representation as a compound expression1, which will normally immediately evaluate to a true atom. This is the difference between a and b: b contains a real association, while a contains its compound representation which is not actually an association, but would evaluate to one as soon as the Hold is removed. I wrote about this in more detail in two answers to this question, and showed how to obtain the compound representation of an atom.

The reason why most atoms have a compound representation is so that they can be serialized (e.g. saved to an .m file, Compressed, sent through MathLink, stored in a notebook, etc.) without each serialization method having to support each atomic type individually.

For most such atoms, there is a function to test their type. For associations, this is AssociationQ. The patterns _Association and _?AssociationQ are not equivalent. The first one will match any compound expression with the head Association. The second one will only match true associations. There is also GraphQ, ImageQ, MeshRegionQ, etc.


1 Here I use the term "compound expression" to refer to an expression that has a head and multiple arguments in the form head[arg1, arg2, ...] accessible in the standard way, i.e. it is not an atom. Not to be confused with ;.


Not a full answer, but too long for a comment.

Please keep in mind that an Association is quite a complex data structure. Associations are atomic, thus they behave quite differently from purely tree-based Mathematica expression. This is why the constituents of an Association cannot be accessed as we are used to do it with Part. (This has also to do with the fact that Part is overloaded for objects with head Association.)

The true data structure lives more on the "C-side" of Mathematica; it is not implemented in the top-level language (as far as I know). A "true" Association consists of a trie along with various routines for accessing and modifying it. Think of it as a C++ class whose routines have been linked to Mathematica symbols. All that we can see of a true Association on the Mathematica side is basically what the developers want us to see. They tried their best to keep it as intuitive as possible. But of course it is impossible to completely hide the fact that Associations are no common Mathematica expressions.

Hold prevents building up this structure, so Hold[<|1 -> 2|>] will evaluate only to a real Asssociation when Hold is removed. So Hold[<|1 -> 2|>] is a merely tree-based expression and is displayed as such by FullForm[Hold[<|1 -> 2|>]]. In contrast, the Evaluate in Hold[Evaluate[<|1 -> 2|>]] causes the Association to be generated. And from that time on, <|1 -> 2|> is going to be atomic.