Why Zariski topology?

To appreciate the Zariski topology it helps to have a fairly broad view about what a topological space is. Topological spaces in full generality are, confusingly, not very topological in the naive sense! As discussed in this math.SE question, I think it is better to think of point-set topology as being about semidecidable properties (which are the open sets). The familiar kind of topology induced by a metric is about the specific property of being close in a metric sense, but other kinds of topologies are about different kinds of properties.

The Zariski topology is about the property of non-vanishing of polynomials. The semidecidable properties here are the properties "this set of polynomials does not vanish here." Intuitively speaking the reason this is semidecidable is that you can compute the value of a polynomial at a point to finite precision and once you show that it is sufficiently different from zero it cannot be zero.

The fact that the Zariski topology isn't Hausdorff isn't a weird property of the Zariski topology; it tells you something important about how vanishing of polynomials behaves, namely that the behavior of a polynomial on a few points can tell you a lot about its behavior at seemingly far-away points. This is intrinsic to the nature of algebraic geometry and pretending that the Zariski topology doesn't exist won't make it go away.

Okay, so what can you actually do with it? Here are a couple of things:

  • If two polynomials agree on a Zariski-dense subset, then they agree identically. This is a surprisingly useful way to prove polynomial identities; for example, it can famously be used to prove the Cayley-Hamilton theorem.
  • Moving to the Zariski topology on schemes allows the use of generic points. I am not familiar with examples of this technique in use though.
  • Serre famously made use of the Zariski topology to introduce sheaf cohomology to algebraic geometry, which was (as I understand it) a crucial innovation.

To really appreciate the Zariski topology it helps to generalize it to arbitrary commutative rings. An important motivational example: if $X$ is a compact Hausdorff space and $C(X)$ is the ring of continuous functions $X \to \mathbb{R}$, then the maximal spectrum of $C(X)$ not only can be identified with $X$, but has the same topology! (This is an exercise in Atiyah-MacDonald.)

The rings one gets in this way are precisely the real subalgebras of complex commutative C*-algebras by the commutative Gelfand-Naimark theorem, and in fact you get a (contravariant) equivalence of categories. Moreover, by the Serre-Swan theorem, the category of real vector bundles on $X$ is naturally equivalent to the category of finitely-generated projective modules over $C(X)$.

It helps to think about this example like a physicist. Think of $X$ as the set of possible states of some physical system and the elements of $C(X)$ as observations one can make about the system; the value of a function at a point is the result of the observation in a fixed state. The Zariski topology here captures all semidecidable properties that you can decide using the observations in $C(X)$. For example, if one of the functions in $C(X)$ is called "temperature," there is a corresponding semidecidable property "the temperature of the system is between $0$ and $100$ degrees inclusive," which you can decide by computing the temperature to finite precision.

(What if $X$ is not compact? Then if you work with the ring $C_b(X)$ of bounded continuous functions on $X$, there are consistent sets of possible values of the observables which do not arise from an actual state of your system; they are points in the Stone-Čech compactification $\beta X$ instead.)

Here's another example that I like: let $B$ be a Boolean ring, which is a ring satisfying $b^2 = b$ for all $b \in B$. Then every element of $B$ can be identified with a subset of its maximal spectrum. This idea can be used to

  • prove Stone's representation theorem for Boolean algebras,
  • deduce the existence of ultrafilters from the existence of maximal ideals in rings, and
  • prove the compactness theorem in propositional logic (without proving the completeness theorem)!

For a discussion, see my blog post Boolean rings, ultrafilters, and Stone's representation theorem.


I would just like to mention a pleasant feature of the Zariski topology which is , to my knowledge, never addressed in algebraic geometry books (counterexample anybody?)

The Zariski topology is never Hausdorff in positive dimension, but apart from that it is normal ($=T_4$) in the affine case.
This means that for an affine variety (or an affine scheme) $X$, given two disjoint closed subsets $C,D\subset X$ there exists a regular function $f\in \mathcal O(X)$ with $f(c)=1$ for all $c\in C$ and $f(d)=0$ for all $d\in D$.
More astonishingly yet, you can even take arbitrary regular functions $g\in \mathcal O(C), h \in \mathcal O(D)$ and interpolate them to an $f\in \mathcal O(X)$ such that $f\mid C=g$ and $f\mid D=h$

This is due to the fact that in algebraic geometry you define the functions first, the polynomials (or one of their quotient rings), and then you deduce from them a topology.
In classical topology (as in calculus or analysis) you define the topological space first (through a metric, say) and then you investigate the continuous functions on these spaces.
And so it can happen (in contrast to algebraic geometry) that you don't have enough functions to separate disjoint closed subsets from each other.

Edit
In the same vein (equivalently, really) let me mention that affine algebraic varieties (or affine schemes ) satisfy the Urysohn property: every regular function on the closed $C\subset X$ extends to a regular function on $X$.
In the language of schemes it is the absolute triviality that, for $C=V(I)$, the morphism $\mathcal O(X)=A \to \mathcal O(C)=A/I$ is surjective!
And it is a triviality because it is built into the foundations of algebraic geometry: the Zariski topology is constructed out of the functions (and Grothendieck's genius was to force every element of any commutative ring to be a function!).


I'd like to give my personal perspective which I believe is a more elementary version of Zhen Lin's. What I have been able to explain for myself is a) why the Zariski topology is natural to consider when talking about vanishing sets, b) how the Zariski topology on sets of prime ideals of a ring $R$ suggests that locally ringed spaces are good general objects for geometric considerations, c) why the Zariski topology on the set of all prime ideals $\DeclareMathOperator{\Spec}{Spec}\Spec R$, gives us affine schemes, and d) when it is ok to use the Zariski topology only on the set of maximal ideals $\DeclareMathOperator{\maxSpec}{maxSpec}\maxSpec R$ (so in particular why over an algebraically closed $\Bbbk$ we can think of $\mathbb A^n_\Bbbk$ as $n$-tuples $(a_1,\dots,a_n)$ of elements of $\Bbbk$ identified with the maximal ideals $\left<x_1-a_1,\dots,x_n-a_n\right>$ of $\Bbbk[x_1,\dots, x_n]$)


Vanishing Sets and the Zariski Topology

Imagine that we have a ring $R$ and a set (space) $X$, so that we think of $R$ as "functions" on $X$, in the sense that for every $x\in X$ there is a set $R_x$ of "values at X" such that we can think of $x$ as a (surjective) evaluation function $x\colon R\to R_x$ given by $f\to f(x)$. If we try to axiomatize the properties of the notion of "$f\in R$ vanishes at $x\in X"$, we arrive at:

  1. $f\in R$ and $g\in R$ have the same value at at $x$, if and only if $(f-g)(x)$ vanishes at $x$;
  2. If $f$ vanishes at $x$, then $(f\cdot g)(x)$ also vanishes.

These are enough to ensure that every $x\colon R\to R_x$ induces a ring structure on $R_x$ such that the set of functions $f$ vanishing at $x$ is precisely the ideal $\ker x\subset R$. Requiring that constant unit (i.e. $1$) does not vanish anywhere ensures that the ideals are proper, i.e. none of $R_x$ is the trivial zero ring.

It is not difficult to show that given a set of points in our space $S\subset X$, the set of functions in $R$ vanishing on $S$ is an ideal $I(S)$ of $R$, and in particular that it is the intersection of the ideals $\ker x$ associated to the points $x\in S$, i.e. $I(S)=\bigcap\{\ker x\colon x\in S\}$. Similarly, given any set of functions $J\subset R$, the vanishing set of points $V(J)=\{x\in X\colon f(x)=0(x)\forall f\in J\}$ can be described as the set of points $x$ whose associated ideal $\ker x$ contains $J$, i.e. $V(J)=\{x\in X\colon J \subset \ker x\}$.

Since the idea of algebraic geometry is to establish geometric objects as zero-loci of functions, that is, as vanishing sets, we care about the following easy to check properties of the operator $V$:

  1. $V(I)=V(\left<I\right>)$, so from now on we'll only considers ideals of $R$ as our sets of functions $I$, $J$, etc.
  2. $V(0)=X$
  3. $I\subset J$ implies $V(J)\subset V(I)$
  4. $V(\sum_\lambda I_\lambda)=\bigcap_\lambda V(I_\lambda)$
  5. $V(I)\cup V(J)\subset V(I\cap J)$

The last statement is NOT an equality in general. Indeed if $\ker x$ is not a prime ideal, then letting $fg\in\ker x$, but $f,g\not\in\ker x$, we get that $x\not\in V(f)\cup V(g)$, but $x\in V((f)\cap (g))$. This is bad, since it meas that the vanishing sets in this general context are not necessarily closed under finite unions, which makes it extremely difficult to effective decompose them into smaller pieces. Pretty much the only way to obtain an easily verifiable condition of the closure under finite unions is to require that all the associated ideals $\ker x$ are prime (so that $R_x$ are integral domains), in which case the vanishing sets $V(I)$ satisfy the axioms for closed sets of a topology, which I call the Zariski topology induced by $R$ on $X$.

Note that $X$ can be thought of as a mutliset of prime ideals of $R$.


Vanishing Sets and Locally Ringed Spaces

We want to do more: we want to study the sheaf of vanishing sets on $X$. Of course, this makes no sense as I've stated it since sheaves are defined relative to a topology (roughly if something is a local phenomenon, then it is a sheaf), and we have not specified a topology on $X$. Observe, however, that being a closed set in a topology is a local property in that topology, in the sense that if $S$ is locally closed relative to every open $U\subset X$, then $S$ is closed in $X$. It follows that under the Zariski topology on $X$ induced by $R$, vanishing sets are a sheaf.

But if vanishing sets are a sheaf, and each vanishing set is given by a ''function'' in $R$ on $X$, we better make ''functions'' on $X$ into a sheaf as well. There is essentially one reasonable way to do this, which is by restricting appropriately functions in $R$ to open subsets $U$.

First, a simplification. Since every vanishing set is generated as the intersection of hypersurfaces (vanishing sets of single ''functions'', since $I=\sum_\alpha (f_\alpha)$ we have that $V(I)=\bigcap_{\alpha}V(f_\alpha))$, it is completely useless to have ''functions'' $f\in R$ that do not vanish at any point $x$: they provide extra ideals which say nothing about the points of the space. It is clear that we should demand that any $f$ that does not vanish anywhere should be a unit of $R$, and to achieve this we may replace $R$ with its localization $S^{-1}R$ at the multiplicative system $S=\{f\in R\colon f(x)\neq0\forall x\in X\}$ (the system is multiplicative since the $R_x$ are integral domains). This leaves the vanishing sets exactly the same, while giving us a slightly simpler ring to encode them (the fewer ideals, the better).

Having said this, suppose that we have an open set $U\subset X$. Whatever ring $R_U$ we associate to $U$, we want its vanishing sets to be closed sets of $U$. We should also have a restriction map $\DeclareMathOperator{\res}{res}\res_{X,U}\colon R\to R_U$ to tell us how to restrict ''functions'' on $X$ to ''functions'' on $U$. This map should be a ring homomorphism if we have any sense of decency (plus its inverse has to map the ideal of $R_U$ vanishing at $S\subset U$ to the ideal of $R$ vanishing at $S\subset X$). Furthermore, given the above convention, if $f\in R$ does not vanish on any points of $U$, then it should get sent to a unit. Hence, $R_U$ will necessarily admit a homomorphism from the localization $S^{-1}R$ where $S=\{f\in R\colon f(x)\neq0(x)\forall x\in U\}$. Thus, we can define what I call the ''structure presheaf'' $\mathscr F_X$ by $\mathscr F_X(U)=S^{-1}R$ for $S=\{f\in R\colon f(x)\neq0\forall x\in U\}$, and setting $\res_{U,V}$, the restriction map from functions on $U$ to functions on $V$, to be localization of $R_U$ at $S=\{f\in R_U\colon f(x)\neq0\forall x\in V\}$.

A key property of this presheaf is that its stalks are local rings and that they encode vanishing. In particular, since $f$ vanishes at a point $x$ if and only if $f\in\ker x$, then it is not hard to see that the stalk $\mathscr F_{X,x}$ at $x$ is the localization of $R=\mathscr F(X)$ at the prime ideal $\ker x$, and hence that $f$ vanishes at a point $x$ if and only if $x$ is a non-unit in the stalk $\mathscr F_{X,x}$. Consequently, the sheaffification $\mathscr O_X$ of $\mathscr F_X$ is precisely what I call the ''structure sheaf'' of $X$ (remember that $X$ is a multiset of prime ideals, not all prime ideals which is the usual context for the structure sheaf). This sheaf is quite elusive, but has the property that $(X,\mathscr O_X)$ is a locally ringed space (stalks are local rings), and that vanishing sets can be extracted from the stalks $\mathscr O_{X,x}$ by saying that $f\in\mathscr O_X(U)$ vanishes at a point $x$ if $f$ localizes to a non-unit at $\mathscr O_{X,x}$. Hence the study of vanishing sets becomes a special case of the study of locally ringed spaces!


Affine Schemes -- the most basic locally ringed spaces

Why is the structure sheaf $\mathscr O_X$ elusive (the one from above for $X$ a set with a ring $R$ of ''functions'' on it)? Because $\mathscr O_X(U)$ is not necessarily $\mathscr F_X(U)$, the localization of $R$ at the set of functions that vanish nowhere on $U$. In fact, the top ring $\mathscr O_X(X)$ itself is not necessarily $R=\mathscr F_X(X)$, which means that its actually really hard to compute $\mathscr O_X$. In particular, the path to affine schemes begins with trying to compute $\mathscr O_X$.

One easy explicit description of $\mathscr O_X$ comes from noticing that if we set for any $f\in R$ $X_f=X\setminus V_f$, then the $X_f$ are a basis of open sets for $X$ in the Zariski topology since $X_{fg}\subset X_f\cap X_g$ and $\bigcup_\alpha X_{f_\alpha}=X\setminus V(\sum(f_\alpha))$.

We know that $\mathscr F_X(X_f)=R_f$ where $R_f$ is the localization of $R$ at $R_f=S^{-1}R$ for $S=\{g\in R\colon g(x)\neq0\forall x\in V(f)\}$, or equivalently, at $S=\{f,f^2,\dots\}$. Hence, the structure sheaf $\mathscr O_X$ is fully determined by the $R_f$ according to the rule $\mathscr O_X(U)=\varprojlim_{X_f\subset U}R_f$. This is still really hard to compute unless a certain miracle occurs, which is the following: if $V(f)\subset V(J)$ implies $J\subset\DeclareMathOperator{\rad}{rad} f$, then the $X_f$ satisfy what Eisenbud and Harris call the $\mathscr B$-sheaf axioms, which imply that $\mathscr O_X(X_f)=\mathscr F_X(X_f)=R_f$.

Why doesn't $V(f)\subset V(J)$ always imply $J\subset\rad J$ always? Well, we certainly have $I(V(J))\subset I(V(f))$, and $J\subset I(V(J))=\bigcap\{\ker x\colon J\subset\ker x\}$, but $I(V(f))=\bigcap\{\ker x\colon f\subset\ker x\}$ which could be strictly bigger than $\rad f=\bigcap\{\mathfrak p\subset R\colon f\in\mathfrak p\}$ as not all prime ideals of $R$ are necessarily $\ker x$ for some $x\in X$. Requiring that every prime ideal $\mathfrak p\in\Spec R$ correspond to an $x\in X$, and removing the unnecessary duplicates (two points $x$ and $y$ are not distinguishable by vanishing sets if $\ker x=\ker y$), we obtain that $X=\Spec R$ is a simple sufficient condition for the structure sheaf $\mathscr O_X$ to be mostly computable (we would know that $\mathscr O_X(X_f)=R_f$).


\maxSpec

So far I have explained (to the best of my ability) why the Zariski topology on $\Spec R$ is natural, which does not answer the question posed if we think (as it is often done) of $\mathbb A^n_\Bbbk$ as the set of maximal ideals of $\maxSpec\Bbbk[x_1,\dots,x_n]$ rather than the set of prime ideals $\Spec \Bbbk[x_1,\dots,x_n]$, which is done quite frequently. The reason for doing this is the Jacobson property of rings, which a ring has if every prime ideal is the intersection of the maximal ideals containing it. It should be clear from the above that for such rings $R$ we also have that $V(f)\subset V(J)$ implies $J\subset\rad f$ since certainly the intersection of the prime ideals containing $f$ is the same as the intersection of the maximal ideals containing $f$ whenever $R$ is Jacobson. Hence, the structure sheaf for $X=\maxSpec(R)$ satisfies $\mathscr O_X(X_f)=R_f$ when $R$ is Jacobson.

So when is $R$ Jacobson? Well, as can be read in Eisenbud's Commutative Algebra with a View toward Algebraic Geometry, fields are certainly Jacobson, $\mathbb Z$ is Jacobson, and the most general version of the Nullstellensatz: if $R$ is a Jacobson ring then so is $R[x]$. So in particular, $\Bbbk[x_1,\dots,x_n]$ is Jacobson, which is why we can do algebraic geometry on $\mathbb A^n_\Bbbk$ using $\maxSpec$ instead of $\Spec$ (so that for $\Bbbk$ algebraically closed, for example, we can identify $\mathbb A^n_\Bbbk$ with $n$-tuples of points in $\Bbbk$).