What is a Context Free Grammar?

A context free grammar is a grammar which satisfies certain properties. In computer science, grammars describe languages; specifically, they describe formal languages.

A formal language is just a set (mathematical term for a collection of objects) of strings (sequences of symbols... very similar to the programming usage of the word "string"). A simple example of a formal language is the set of all binary strings of length three, {000, 001, 010, 011, 100, 101, 110, 111}.

Grammars work by defining transformations you can make to construct a string in the language described by a grammar. Grammars will say how to transform a start symbol (usually S) into some string of symbols. A grammar for the language given before is:

S -> BBB
B -> 0
B -> 1

The way to interpret this is to say that S can be replaced by BBB, and B can be replaced by 0, and B can be replaced by 1. So to construct the string 010 we can do S -> BBB -> 0BB -> 01B -> 010.

A context-free grammar is simply a grammar where the thing that you're replacing (left of the arrow) is a single "non-terminal" symbol. A non-terminal symbol is any symbol you use in the grammar that can't appear in your final strings. In the grammar above, "S" and "B" are non-terminal symbols, and "0" and "1" are "terminal" symbols. Grammars like

S -> AB
AB -> 1
A -> AA
B -> 0

Are not context free since they contain rules like AB -> 1 that have more than one non-terminal symbol on the left.


Language Theory is related to Theory of Computation. Which is the more philosophical side of Computer Science, about deciding which programs are possible, or which will ever be possible to write, and what type of problems is it impossible to write an algorithm to solve.

A regular expression is a way of describing a regular language. A regular language is a language which can be decided by a deterministic finite automaton.

You should read the article on Finite State Machines: http://en.wikipedia.org/wiki/Finite_state_machine

And Regular languages: http://en.wikipedia.org/wiki/Regular_language

All Regular Languages are Context Free Languages, but there are Context Free Languages that are not regular. A Context Free Language is the set of all strings accept by a Context Free Grammer or a Pushdown Automata which is a Finite State Machine with a single stack: http://en.wikipedia.org/wiki/Pushdown_automaton#PDA_and_Context-free_Languages

There are more complicated languages that require a Turing Machine (Any possible program you can write on your computer) to decide if a string is in the language or not.

Language theory is also very related to the P vs. NP problem, and some other interesting stuff.

My Introduction to Computer Science third year text book was pretty good at explaining this stuff: Introduction to the Theory of Computation. By Michael Sipser. But, it cost me like $160 to buy it new and it's not very big. Maybe you can find a used copy or find a copy at a library or something it might help you.

EDIT:

The limitations of Regular Expressions and higher language classes have been researched a ton over the past 50 years or so. You might be interested in the pumping lemma for regular languages. It is a means of proving that a certain language is not regular:

http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages

If a language isn't regular it may be Context Free, which means it could be described by a Context Free Grammer, or it may be even in a higher language class, you could prove it's not Context Free by the pumping lemma for Context Free languages which is similar to the one for regular expressions.

A language can even be undecidable, which means even a Turing machine (may program your computer can run) can't be programmed to decide if a string should be accepted as in the language or rejected.

I think the part you're most interested in is the Finite State Machines (Both Deterministic and Deterministic) to see what languages a Regular Expression can decide, and the pumping lemma to prove which languages are not regular.

Basically a language isn't regular if it needs some sort of memory or ability to count. The language of matching parenthesis is not regular for example because the machine needs to remember if it has opened a parenthesis to know if it has to close one.

The language of all strings using the letters a and b that contain at least three b's is a regular language: abababa

The language of all strings using the letters a and b that contain more b's than a's is not regular.

Also you should not that all finite language are regular, for example:

The language of all strings less than 50 characters long using the letters a and b that contain more b's than a's is regular, since it is finite we know it could be described as (b|abb|bab|bba|aabbb|ababb|...) ect until all the possible combinations are listed.