Implementing gradient descent for multiple variables in Octave using "sum"

The general "rule of the thumb" is as follows, if you encounter something in the form of

SUM_i f(x_i, y_i, ...) g(a_i, b_i, ...)

then you can easily vectorize it (and this is what is done in the above) through

f(x, y, ...)' * g(a, b, ...)

As this is just a typical dot product, which in mathematics (in Euclidean space of finite dimension) looks like

<A, B> = SUM_i A_i B_i = A'B

thus

(X * theta-y)' * X)

is just

<X * theta-y), X> = <H_theta(X) - y, X> = SUM_i (H_theta(X_i) - y_i) X_i

as you can see this works both ways, as this is just a mathematical definition of dot product.