Why do we use gradient descent in linear regression?

The example you gave is one-dimensional, which is not usually the case in machine learning, where you have multiple input features. In that case, you need to invert a matrix to use their simple approach, which can be hard or ill-conditioned.

Usually the problem is formulated as a least square problem, which is slightly easier. There are standard least square solvers which could be used instead of gradient descent (and often are). If the number of data points is very hight, using a standard least squares solver might be too expensive, and (stochastic) gradient descent might give you a solution that is as good in terms of test-set error as a more precise solution, with a run-time that is orders of magnitude smaller (see this great chapter by Leon Bottou)

If your problem is small that it can be efficiently solved by an off-the-shelf least squares solver, you should probably not do gradient descent.