Multivariate Bernstein polynomials for approximation of derivatives.

There are several references indeed; it's in any case a consequence of the univariate case. Let me say the way I see it (for which I do not have a reference).

For the univariate case, consider the difference operator $D_n:C^0([0,1])\to C^0([0,1])$ defined as $$D_n f(x):= \frac{f\left(\big(1-\frac{1}{n}\big)x + \frac{1}{n} \right) - f\left(\big(1-\frac{1}{n}\big)x \right)}{\frac{1}{n}}\, . $$ So this is just the usual discrete difference of $f$, but first we apply the affine contraction $[0,1]\ni x\mapsto \big(1-\frac{1}{n}\big)x\in \big[0, 1- \frac{1}{n}\big]$, in order that the translation of $\frac{1}{n}$ be well defined on $C^0([0,1])$. By the mean value theorem, for any $f\in C^1([0,1])$ and $x$ we have $D_nf(x)=f'(\xi)$ for some $\xi$ with $|x-\xi| < \frac{1}{n}$, so $D_nf\to f'$ uniformly (percisely, in $\omega$ is a modulus of continuity of $f'$, $\|D _ nf -f'\| _\infty\le\omega(\frac{1}{n})$. The reason why this approximation of the derivative $Df:=f'$ is relevant in connection to the Bernstein operators, is that, as it is easy to check, $$D B_n = B_{n-1}D_n\, ,$$ which implies that for any any $f\in C^1([0,1])$ the Bernstein polynomial of $f$ converges to $f$ in $C^1$ (just because $ \| (B _ n f)'- f' \| _ \infty = \|B _ {n-1} (D _ n f-f') + B _ {n-1}f' - f'\| _ \infty \le$ $ \|D_ n f-f' \| _ \infty +\|B _{n-1}f' - f'\| _ \infty=o(1)$), and more generally, that $B_n$ converges strongly to the identity on $C^r([0,1])$.

The analogous statement for partial derivatives of functions on $[0,1]^n$ follows plainly on the same lines. Only, it is convenient to consider, more generally than the polynomial you wrote, the multivariate Bernstein polynomials $$B_m f:= \sum _ {{0\le k_i \le m_i}\atop 0\le i\le n}f\Big(\frac{k _ 1}{m _ 1},\dots,\frac{k_n}{m _ n}\Big) \prod_{1\le i\le n}\Big({m_ i\atop k_ i}\Big) x_i ^ {k _ i}(1-x _ i)^{m _ i - k _ i} $$ where now $m:=(m_1,\dots,m_n)$ is a multi-index (so the one you wrote corresponds to a constant multi-index $m_1=\dots=m_n$; the following computation is a good motivation to consider also these Bernstein polynomials with different discretizations for each variable). As observed in the link, this may be thought as a (commuting) composition of "univariate Bernstein polynomial operators" $B_{m_ i}$ each acting on $f$ as a function of the $i$-th variable. Correspondingly, for any multi-index $\alpha\in \mathbb{N}^n$ and $m\ge \alpha$ $$\partial ^\alpha B _ m = B _ { m - \alpha } \partial^\alpha _m \, ,$$ where $\partial^\alpha _m $ is the partial difference operator analogously defined by (commuting) composition of the previously defined $D ^ {\alpha _ i }_ {m _ i }$, each acting on $f$ as a function of the $i$-th variable. As a consequence, for any $f\in C^r\big([0,1]^n\big)$ and $|\alpha|\le r$ we have $\partial^\alpha B_m f\to \partial^\alpha f$ uniformly as $\min_{1\le i \le n} m _ i \to\infty$.