Why is the upper Riemann integral the infimum of all upper sums?

Your question does have some ambiguity. From the wording of your question and comments it appears that you want to know:

Does the limit of upper sums (with respect to partitions getting finer and finer) equal the infimum of all upper sums?


First of all note that when we are dealing with limits of things dependent on a partition of an interval then there are two ways in which the limit operation can be defined:

1) Limit via refinement of a partition: Let $P = \{x_{0}, x_{1}, x_{2},\ldots, x_{n} \}$ be a partition of $[a, b]$ where $$a =x_{0} < x_{1} < x_{2} < \cdots < x_{n} = b$$ A partition $P'$ of $[a, b]$ is said to be a refinement of $P$ (or finer than $P$) if $P \subseteq P'$.

Let $\mathcal{P}[a, b]$ denote the collection of all partitions of $[a, b]$ and let $F:\mathcal{P}[a, b] \to \mathbb{R}$ be a function. A number $L$ is said to be the limit of $F$ (via refinement) if for every $\epsilon > 0$ there is a partition $P_{\epsilon}\in \mathcal{P}[a, b]$ such that $|F(P) - L| < \epsilon$ for all $P \in \mathcal{P}[a, b]$ with $P_{\epsilon} \subseteq P$.

2) Limit as norm of parititon tends to $0$: If $P = \{a = x_{0}, x_{1}, x_{2}, \ldots, x_{n} = b\}$ is a partition of $[a, b]$ then the norm $||P||$ of partition $P$ is defined as $||P|| = \max_{i = 1}^{n}(x_{i} - x_{i - 1})$.

Let $\mathcal{P}[a, b]$ denote the collection of all partitions of $[a, b]$ and let $F: \mathcal{P}[a, b] \to \mathbb{R}$ be a function. A number $L$ is said to be limit of $F$ as norm of partition tends to $0$ if for every $\epsilon > 0$ there is a $\delta > 0$ such that $|F(P) - L| < \epsilon$ for all $P\in \mathcal{P}[a, b]$ with $||P|| < \delta$. This is written as $\lim_{||P|| \to 0}F(P) = L$.

Note that for a given function $F:\mathcal{P}[a, b] \to \mathbb{R}$ the limiting behavior of $F$ can be different according to these two definitions given above. In fact if $F(P) \to L$ as $||P||\to 0$ then $F(P) \to L$ via refinement but the converse may not hold in general.

Let us establish that if $F(P) \to L$ as $||P||\to 0$ then $F(P) \to L$ via refinement. Let $\epsilon>0$ be arbitrary and let $\delta>0$ be such that $|F(P) -L|<\epsilon$ whenever $||P||<\delta$. Let us now choose any specific partition $P_{\epsilon} $ with $||P_{\epsilon} ||<\delta$. If $P_{\epsilon} \subseteq P$ then $$||P||\leq ||P_{\epsilon} ||<\delta\tag{A} $$ and hence by our assumption $|F(P) - L|<\epsilon $. Therefore it follows that $F(P) \to L$ via refinement also.

Notice that the argument here crucially hinges on inequality $(\text{A}) $. Starting with an $\epsilon>0$ we first found a $\delta>0$ via the given assumption $\lim_{||P||\to 0}F(P)=L$. The process of finding a suitable partition $P_{\epsilon} $ crucially depends on the implication $$P, Q\in\mathcal{P} [a, b], P\subseteq Q\implies ||Q||\leq||P||$$ which leads to inequality $(\text{A}) $ above. If the reverse implication $$P, Q \in \mathcal{P} [a, b], ||Q||\leq||P||\implies P\subseteq Q $$ were true then one could provide a similar argument as in last paragraph to prove that if $F(P) \to L$ via refinement then $F(P) \to L$ as $||P||\to 0$. We just need to set $\delta=||P_{\epsilon} ||$ and we are done. But this is not the case.


Now let $f$ be a function defined and bounded on $[a, b]$ and let $P = \{x_{0}, x_{1}, x_{2}, \ldots x_{n}\}$ be a partition of $[a, b]$. Let $M_{k} = \sup\,\{f(x), x \in [x_{k - 1}, x_{k}]\}$ and let $\mathcal{P}[a, b]$ denote the collection of all partitions of $[a, b]$. We define the upper sum function $S:\mathcal{P}[a, b] \to \mathbb{R}$ by $$S(P) = \sum_{k = 1}^{n}M_{k}(x_{k} - x_{k - 1})$$ It is easy to prove that if $m = \inf\,\{f(x), x \in [a, b]\}$ then $S(P) \geq m(b - a)$ for all $P \in \mathcal{P}[a, b]$ and further if $P, P' \in \mathcal{P}[a, b]$ are such that $P \subseteq P'$ then $S(P') \leq S(P)$. It follows that $J = \inf\,\{S(P), P \in \mathcal{P}[a, b]\}$ exists.

Your question can now be worded more concretely into one of the following two forms:

Does $S(P) \to J$ via refinement?

or

Does $\lim_{||P|| \to 0}S(P) = J$?

The answer to the first question is obviously "yes" and you should be able to prove this using the definition of limit via refinement given above.


The answer to second question is also "yes" but it is difficult to prove. We first prove the result for a non-negative function $f$. Let $\epsilon > 0$ be given. Since $J = \inf\,\{S(P), P \in \mathcal{P}[a, b]\}$, there is a partition $P_{\epsilon} \in \mathcal{P}[a, b]$ such that $$J \leq S(P_{\epsilon}) < J + \frac{\epsilon}{2}\tag{1}$$ Let $P_{\epsilon} = \{x_{0}', x_{1}', x_{2}', \ldots, x_{N}'\}$ and let $M = \sup\,\{f(x), x \in [a, b]\} + 1$. Let $\delta = \epsilon / (2MN)$ and consider a partition $P = \{x_{0}, x_{1}, x_{2}, \ldots, x_{n}\}$ with $||P|| < \delta$.

We can write $$S(P) = \sum_{k = 1}^{n}M_{k}(x_{k} - x_{k - 1}) = S_{1} + S_{2}\tag{2}$$ where $S_{1}$ is the sum corresponding to the index $k$ for which $[x_{k - 1}, x_{k}]$ does not contain any point of $P_{\epsilon}$ and $S_{2}$ is the sum corresponding to other values of index $k$. Clearly for $S_{1}$ the interval $[x_{k - 1}, x_{k}]$ lies wholly in one of the intervals $[x_{j - 1}', x_{j}']$ made by $P_{\epsilon}$ and hence $S_{1} \leq S(P_{\epsilon})$ (note that $f$ is non-negative). For $S_{2}$ we can see that the number of such indexes $k$ is no more than $N$ and hence $S_{2} < MN\delta = \epsilon / 2$ (note that $f$ is non-negative here). It follows that $$J \leq S(P) = S_{1} + S_{2} < S(P_{\epsilon}) + \frac{\epsilon}{2} < J + \epsilon\tag{3}$$ for all $P \in \mathcal{P}[a, b]$ with $||P|| < \delta$. It follows that $S(P) \to J$ as $||P|| \to 0$.

Extension to a general function $f$ can be achieved by writing $f(x) = g(x) + m$ where $m = \inf\,\{f(x), x \in [a, b]\}$ and noting that $g$ is non-negative.

Another interesting example showing the difference between two limit definitions is given in this answer.


Note: The limit of a Riemann sum is based on the two definitions given above but there is a slight complication. A Riemann sum depends not only on a partition but also on choice of tags corresponding to a partition. Formally one can view a Riemann sum not as a function from $\mathcal{P} [a, b] $ to $\mathbb{R} $ but rather as a relation from $\mathcal{P} [a, b] $ to $\mathbb {R} $ such that it relates every partition of $[a, b] $ to one or more real numbers.


You are having a fundamental misunderstanding on this topic for some reason. We have a bounded function. We define the upper integral. No question that it exists. We define the lower integral. Again, no question that it exists. We then define what it means for a bounded function to be Riemann integrable (RI): The uppper integral equals the lower integral. Plenty of questions about when this happens. The theory of the Riemann integral is all about when we are lucky enough to have $f$ RI, and about the value of the integral when it exists. For example, there is the theorem that if $f$ is continuous on $[a,b],$ then $f$ is RI on $[a,b].$ There is the FTC. A beautiful result of Lebesgue gives a necessary and sufficient condition: $f$ is RI iff $f$ is continuous a.e. All of these results go back to the definition.