How can I "see" that calculus works for multidimensional problems?

For the most general case, think about a mixing board.

Courtesy "I G", cc-by 2.0. Flickr:

Each input argument to the function is represented by a slider with an associated piece of a real number line along one side, just like in the picture. If you are thinking of a function which can accept arbitrary real number inputs, the slider will have to be infinitely long, of course, which of course is not possible in real life, but is in the imaginary, ideal world of mathematics. This mixing board also has a dial on it, which displays the number corresponding to the function's output.

The partial derivative of the function with respect to one of its input arguments corresponds to how sensitive the readout on the dial is if you wiggle the slider representing that argument just a little bit around wherever it's currently set - that is, how much more or less dramatic the changes in what is shown are compared to the size of your wiggle. If you wiggle a slider by, say, 0.0001, and the value changes by a factor 0.0002, the partial derivative with respect to that variable at the given setting is (likely only approximately) 2. If the value changes in an opposite sense, i.e. goes down when you move the slider up, the derivative is negative.

The gradient, then, is the ordered list of signed proportions by which you have to "wiggle" all the sliders so as to achieve the strongest possible, but still small, positive wiggle in the value on the dial. This is a vector, because you can think of vectors as ordered lists of quantities for which we can subject to elementwise addition and elementwise multiplication by a single number.

And of course, when I say "small" here I mean "ideally small" - i.e. "just on the cusp of being zero" which, of course, you can make formally rigorous in a number of ways, such as by using limits.

Well, in one variable you need to solve $f'(x)=0$ and controll wheter it is maximum or a minimum. Such equation is called Euler equation and it holds also in more variables, but with the formulation: $$\nabla f(x)=0$$ where $\nabla f(x)=(\frac{\partial f(x)}{\partial x_{1}},\dots,\frac{\partial f(x)}{\partial x_{n}})$. Now, also here you should controll if it is a minimum or not and this is done via checking on the Hessian of $f$. That is, if $\bar{x}$ is such that $\nabla f(\bar{x})=0$ then $\bar{x}$ is a minimum if $Hf(\bar{x})$ admits only positive eigenvalues, where $Hf$ is the matrix made of second order partial derivatives of $f$. Notice that we are assuming $\bar{x}$ belongs to the interior of a set, exactly as for functions with one variable.

Consider a single variable function $f(x)$, suppose there exists a maximum for $f(x)$ , then we can find that by the function's derivative has a sign flip at that point, then if we were to take a point $ x>x_o$

$$ \frac{df}{dx}|_{x>x_o} = \text{something negative}$$

Now, the magnitude of the derivative depends on how far you are from the global maximum, so suppose you take a 'step' on the $x$ axis scaled up by the derivative.

$$ \frac{df}{dx}_{x > x_o} \Delta x$$

Then, you will end up walking to the maximum. Now let's say you are at a point $x<x_o$ , then the first derivative is positive and you will still end up walking toward the maximum. Moral of the story? If you walk around the input set keeping your steps scaled up by the function's derivative, then you'll eventually hit a global maximum / minimum.[Edit: It may also turm up that you get stuck in local mini/global minimum :(]

Now, consider a multivariable $f(x,y)$ , by the logic above if it has a local max, say at a point $(x_o,y_o)$, if you take a $x>x_o$, then

$$ \frac{\partial f}{\partial x} = \text{something negative}$$

And similar argument to single variable case can be applied, and we can apply a similar argument for $y$. Ultimate this leads us to idea that the vector given as:

$$ \nabla F = < \frac{\partial F}{\partial x} , \frac{\partial F}{\partial y} >$$

Tells us how to move in the input plane such that our function is maximized.

So, say you are a point $<x_o,y_o>$ , then the point where you should move next to maximize the function is:

$$ <x,y> = <x_o,y_o> + < \frac{\partial F}{\partial x}|_{x_o} \Delta x, \frac{\partial F}{\partial y}|_{y_o} \Delta y>$$

To see, where you should move next:

$$<x',y'> = <x,y> + < \frac{\partial F}{\partial x}_{x} \Delta x , \frac{\partial F}{\partial y}_{y } \Delta y>$$

And using that gradient vector as a compass to move, you'll finally reach some kind of extremum point in the input plane.