How is Grad defined for array particularly in non-Cartesian coordinates?

Indeed, Grad does compute the covariant derivative. This can be seen from the following example given in the documentation

In a curvilinear coordinate system, a vector with constant components may have a nonzero gradient:

Grad[{1, 1, 1}, {r, θ, ϕ}, "Spherical"]
(* {{0, -(1/r), -(1/r)}, {0, 1/r, -(Cot[θ]/r)}, {0, 0, (
  Csc[θ] (Cos[θ] + Sin[θ]))/r}} *)

Note that the result (and input) is always to be understood with respect to a particular basis. In the example above, it is my understanding that the input {1, 1, 1} is a vector

$$ \mathbf{v} = v^r \mathbf{e}_{r} + v^\theta \mathbf{e}_{\theta} + v^\phi \mathbf{e}_{\phi} $$

with $v^i = 1$ for $i = r, \theta, \phi$ and the $\mathbf{e}_i$ are orthonormal. (Note that it is commonplace in differential geometry to work with unnormalized bases.)

With respect to higher rank tensors I think that Mathematica treats all components as contravariant (upper indices), e.g. plugging in an array of rank two will be understood as

$$ \mathrm{Grad}_k A^{ij} = \partial_k A^{ij} + \Gamma^{i}_{kl} A^{lj} + \Gamma^{j}_{kl} A^{il}$$ where $\Gamma^{i}_{jk}$ are the coefficients of the Christoffel connection (with respect to the chosen basis).

Update: An example

Note furthermore that the point of the covariant derivative is to obtain an object which is a tensor (and therefore transforms as a tensor). As an example consider the following

vecGradBuiltin = Grad[{Subscript[f, 1][r, θ], Subscript[f, 2][r, θ]}, {r, θ}, "Polar"]

which is the covariant derivative of the vector with components $(f_1, f_2)$ with respect to the orthonormal basis. With respect to the coordinate basis ($\partial_r$, $\partial_\theta$) the components are $(f_1, \frac{1}{r} f_2)$ since the bases are related by

$$ \left( \begin{matrix} \mathbf{e}_r\\ \mathbf{e}_\theta \end{matrix} \right) = \left( \begin{matrix} 1 & 0 \\ 0 & \frac{1}{r} \end{matrix} \right) \left( \begin{matrix} \partial_r\\ \partial_\theta \end{matrix} \right) $$

and, of course, for two different coordinate systems

$$ \mathbf{v} = v^a \mathbf{e}_a = v'{}^a \mathbf{e}'_a $$

Using the code from (224280) to compute the Christoffel symbols, a possible implementation of a vector gradient is

ChristoffelSymbol[g_, xx_] := 
 Block[{n, ig, res}, n = Length[xx]; ig = Inverse[g];
  res = Table[(1/2)*
     Sum[ig[[i, s]]*(-D[g[[j, k]], xx[[s]]] + D[g[[j, s]], xx[[k]]] + 
         D[g[[s, k]], xx[[j]]]), {s, 1, n}], {i, 1, n}, {j, 1, n}, {k,
      1, n}];
  Simplify[res]]
vectorGrad[vec_, g_, coord_] := 
 With[{n = Length[coord], Γ = 
    ChristoffelSymbol[g, coord]}, 
  Table[D[vec[[b]], coord[[a]]] + 
    Sum[Γ[[b, a, c]] vec[[c]], {c, 1, n}], {b, 1, 
    n}, {a, 1, n}]]

with this you can compute vector gradient in the coordinate basis

g = DiagonalMatrix[{1, r^2}];
coord = {r, θ};
j = DiagonalMatrix[{1, 1/r}];
vec = Array[Subscript[f, #][r, θ]&, 2];
vecGradHomebrew = 
 vectorGrad[j.vec, g, coord]

Now you have the components of the vector gradient once with respect to the orthonormal basis (vecGradBuiltin) and the components with respect to the coordinate basis (vecGradHomebrew).

As mentioned before, the crucial point is that the vector gradient is a tensor. Therefore the components transform as a tensor. Since we know how the two bases are related to each other, one can verify (note that the vector gradient has one covariant and one contravariant index, the way it is defined here, the first index is contravariant and the second is covariant)

Inverse[j].vecGradHomebrew.j == vecGradBuiltin // Simplify
(* True *)

Update: Spherical example

It is straightforward to do this in three dimensions, e.g. with the spherical coordinate system

g = DiagonalMatrix[{1, r^2, r^2 Sin[θ]^2}];
coord = {r, θ, ϕ};
j = DiagonalMatrix[{1, 1/r, 1/(r Sin[θ])}];
vec = Array[Subscript[f, #][r, θ] &, 3];
vecGradHomebrew = vectorGrad[j.vec, g, coord]
Inverse[j].vecGradHomebrew.j == 
  Grad[vec, coord, "Spherical"] // Simplify

Natas' answer is almost correct, and I gave it an up vote. However, technically what Grad computes is the raised covariant derivative $\nabla^b T^{cd\ldots} = g^{ba}\partial_aT^{cd\ldots} + \Gamma^{bc}_aT^{ad\ldots} + \ldots $. The beauty of orthonormal bases, and the reason they're the only ones exposed in System` functionality, is that components are independent of raising and lowering (in Euclidean signature metrics). However, if we dig into the lower-level package and use the coordinate basis instead of the orthonormal one, you can see the difference:

Grad[
    SymbolicTensors`Tensor[
        {fr[r,θ],fθ[r,θ]},
        {SymbolicTensors`TangentBasis[{r,θ}]}
    ],
    {r,θ},
    "Polar"
]


(* SymbolicTensors`Tensor[
       {
           {Derivative[1, 0][fr][r, θ],  ((-r)*fθ[r, θ] + Derivative[0, 1][fr][r, θ])/r^2},
           {fθ[r, θ]/r + Derivative[1, 0][fθ][r, θ], fr[r, θ]/r + Derivative[0, 1][fθ][r, θ])/r^2}
       }, 
       {SymbolicTensors`TangentBasis[{r, θ}], SymbolicTensors`TangentBasis[{r, θ}]}
 ]*)

If the Grad were truly the covariant derviative, the new index would of type SymbolicTensors`CotangentBasis[{r, θ}] instead.