Vectorizing a gradient descent algorithm

Your vectorization is correct. I also tried both of your code, and it got me the same theta. Just remember don't use your updated theta in your second implementation.

This also works but less simplified than your 2nd implementation:

Error = X * theta - y;
for i = 1:2
    S(i) = sum(Error.*X(:,i));
end

theta = theta - alpha * (1/m) * S'

For the vectorized version try the following(two steps to make simultaneous update explicitly) :

 gradient = (alpha/m) * X' * (X*theta -y)
 theta = theta - gradient