Proving Grammar Correctness

You have to prove two things:

  1. For each $w\in\{a,b\}^*$, if $G\stackrel{*}\Rightarrow w$, then $|w|_a=|w|_b$.

  2. For each $w\in\{a,b\}^*$, if $|w|_a=|w|_b$, then $G\stackrel{*}\Rightarrow w$.

Sometimes a proof by induction is easier if you prove something stronger than the desired result. I would prove (1) by proving the following stronger

Claim: For $u\in\{S,A,B,a,b\}^*$ let $\alpha(u)=|u|_a+|u|_A$ and $\beta(u)=|u|_b+|u|_B$. Let $$S=u_0 \Rightarrow u_1\Rightarrow u_2\Rightarrow\dots\Rightarrow u_n$$ be a derivation in $G$, where $u_k\in\{S,A,B,a,b\}^*$ for $k=0,\dots,n$. Then $\alpha(u_k)=\beta(u_k)$ for $k=0,\dots,n$. In particular, if $u_n\in\{a,b\}^*$, then $|w|_A=|w|_B=0$, so $$|u_n|_a=\alpha(u_n)=\beta(u_n)=|u_n|_b\;,$$ and therefore $u_n\in L$.

I would prove this by induction on $n$, the length of the derivation:

  1. The only derivation of length $1$ is the one that you already have, $S\Rightarrow\epsilon$, and $\alpha(S)=\beta(S)=\alpha(\epsilon)=\beta(\epsilon)=0$, so we get the induction started just fine.

  2. Now suppose that the Claim is true for all derivations of length at most $n$, and let $$S=u_0 \Rightarrow u_1\Rightarrow u_2\Rightarrow\dots\Rightarrow u_n\Rightarrow u_{n+1}$$ be a derivation of length $n+1$. Then $S=u_0 \Rightarrow u_1\Rightarrow u_2\Rightarrow\dots\Rightarrow u_n$ is a derivation of length $n$, so by the induction hypothesis $\alpha(u_k)=\beta(u_k)$ for $k=0,\dots,n$. To complete the induction step you must show that $\alpha(u_{n+1})=\beta(u_{n+1})$. You do this by investigating how each of the productions of $G$ changes $\alpha(u_n)$ and $\beta(u_n)$.
    Suppose, for instance, that the derivation $u_n\Rightarrow u_{n+1}$ uses the production $A\to bAA$.
    Then $|u_{n+1}|_a=|u_n|_a$, $|u_{n+1}|_A=|u_n|_A+1$, $|u_{n+1}|_b=|u_n|_b+1$, and $|u_{n+1}|_B=|u_n|_B$, so $\alpha(u_{n+1})=\alpha(u_n)+1=\beta(u_n)+1=\beta(u_{n+1})$, as desired.
    I'll leave the other six productions to you.

Added: The proof of (2) - that every word in L has a derivation - is by induction on the length of $w$, which of course must be even if $|w|_a=|w|_b$. You got this started fine: if $|w|=0$, then $w=\epsilon$, and $G\stackrel{*}\Rightarrow\epsilon$.

Now assume that if $|w|<2n$ and $|w|_a=|w|_b$, then $G\stackrel{*}\Rightarrow w$; that's your induction hypothesis. Let $w\in\{a,b\}$ be such that $|w|=2n$ and $|w|_a=|w|_b=n$; you need to prove that $G\stackrel{*}\Rightarrow w$.

There are two cases. The simpler one is when $w$ is the concatenation of two shorter words belonging to $L$; in that case we can simply 'paste together' the derivations of the shorter words. Here's a way to make that precise.

Let $w=s_1s_2\dots s_{2n}$, where each $s_k\in\{a,b\}$. For $k=1,\dots,2n$ let $$m(k)=|s_1\dots s_k|_a-|s_1\dots s_k|_b\;,$$ the excess of $a$'s over $b$'s in the first $k$ letters of $w$. Suppose that $m(k)=0$ for some $k<2n$; then $k$ is even, and the words $u=s_1\dots s_k$ and $v=s_{k+1}\dots s_{2n}$ belong to $L$. (Why?) By the induction hypothesis, therefore, there are derivations $$S=u_0 \Rightarrow u_1\Rightarrow \dots\Rightarrow u_{p-1}\Rightarrow u_p=u\tag{1}$$ and $$S=v_0 \Rightarrow v_1\Rightarrow\dots\Rightarrow v_{q-1}\Rightarrow v_q=v\;.\tag{2}$$ The only terminal production is $S\to\epsilon$, so it must be the case that $u_{p-1}=xSy$, where $x,y\in\{a,b\}^*$ are such that $xy=u$. If $x=u$ and $y=\epsilon$, we can simply paste $(1)$ and $(2)$ together: $$S=u_0 \Rightarrow u_1\Rightarrow \dots\Rightarrow u_{p-1}=uS=uv_0\Rightarrow uv_1\Rightarrow\dots\Rightarrow uv_{q-1}\Rightarrow uv_q=uv=w\;,$$ and therefore $G\stackrel{*}\Rightarrow w$, as desired.

If $y\ne\epsilon$, I claim that the derivation of $w$ can be rearranged so that $u_{p-1}=uS$, and then we can paste the rearranged derivation together with $(2)$ to see that $G\stackrel{*}\Rightarrow w$. Every production except $S\to\epsilon$ leaves a non-terminal symbol on the righthand end of the word, so there must be some $k<p$ such that $u_k=xS$ and $u_{k+1}=x$ for some $x\in\{S,A,B,a,b\}^*$. Then $x=u_{k+1}\stackrel{*}\Rightarrow w$ is a legitimate derivation in $G$, so $S\stackrel{*}\Rightarrow xS\stackrel{*}\Rightarrow wS\Rightarrow w$ is a legitimate derivation of $w$ whose final step allows us to paste it together with $(2)$.

Note that $m(k)$ and $m(k+1)$ always differ by exactly $1$, so if $m(k)$ ever changes sign as $k$ runs from $1$ to $2n$, there must be some $k<2n$ such that $m(k)=0$. Now suppose that $w$ is not the concatenation of two shorter words of $L$; that means that $m(k)\ne 0$ for $1\le k<2n$, so $m(k)$ does not change sign as $k$ runs from $1$ to $2n$. If $s_1=a$, then $m(1)=1$, and $m(k)>0$ for $1\le k<2n$: every proper initial segment of $w$ has more $a$'s than $b$'s. But we know that $m(2n)=0$, since $w\in L$, so it must be the case that $m(2n-1)=1$, and $s_{2n}=b$. Similarly, if $s_1=b$, then $m(1)=-1$, $m(k)<0$ for $1\le k<2n$, every proper initial segment of $w$ has more $b$'s than $a$'s, and $s_{2n}=a$.

In the first case $w=aub$ for some $u\in\{a,b\}^*$, and in the second case $w=bua$; in either case it's clear that $u\in L$. (Why?) Moreover, $|u|<2n$, so by the induction hypothesis $G\stackrel{*}\Rightarrow u$. Can you see how to modify the derivation of $u$ to get a derivation of $w$, and thereby complete the induction step?


To show that each word that can be produced by the grammar $G$ contains an equal number of 'a' and 'b' prove by induction that at every step the number of 'a' plus the number of 'A' equals the number of 'b' plus the number of 'B'. Note that the initial word $S$ has this property and that it is preserved by any of the production rules. Since the resulting word cannot contain the non-terminal symbols 'A' and 'B' it must contain an equal number of 'a' and 'b'.

For the reverse inclusion, let $w$ be a word that contains an equal number of 'a' and 'b' and use induction by the word length to show that $w$ can be produced by $G$. If $w$ is the empty word, just use the rule $S \rightarrow \varepsilon$. Otherwise, $w$ must contain either $ab$ or $ba$ as a substring, w.l.o.g. let $w = uabv$ for some words $u$ and $v$. Then $uv$ contains an equal number of 'a' and 'b', hence it can be produced in $G$ by the induction hypothesis. In a left derivation of $uv$, there must be some intermediate word $uXy$ where $X$ is a non-terminal symbol, i.e. $X \in \{S,A,B\}$. By the given rules, it is possible to derive $abX$ from $X$ (for example $A \Rightarrow aS \Rightarrow abA$, similarly for $B$ and $S$). So you can derive $uabXy$ in $G$. Then go on as in the derivation of $uv$ to get a derivation of $w = uabv$.