Equivalence between PDAs and CFGs

In this section we describe how the notions of PDAs and CFGs are equivalent.

CFG to PDA

Given a context free language, we can construct a PDA that recognizes the same language.

Construction

The idea is somewhat straightforward.

Here is the basic picture:

Pushdown Automaton for a CFG
Pushdown Automaton for a CFG

The “loops” around \(q_\textrm{loop}\) correspond to the production rules in the grammar. Each loop is a shorthand for a sequence of states. For instance if we have a production rule like \(S\to aTb\), this would produce the following “loop”:

Loop for rule S\to aTb
Loop for rule \(S\to aTb\)

We usually skip the intermediate states in the process and just write this path as a self-loop with transition \(\epsilon,S\to aTb\).

The graph also contains self-loops on \(q_\textrm{loop}\) with transitions \(t,t\to\epsilon\), for each terminal symbol \(t\). These self-loops allow us to advance the input every time we get a terminal at the top of the stack.

This completes the construction of the PDA.

Proof of Construction

We now want to show that this PDA recognizes the same language as the CFG. Effectively the idea is that moving through the PDA is exactly tracing a left-most derivation in the grammar.

An example will make that more clear. Consider the language:

S -> aSa | bSb | epsilon

In addition to the standard 4 states, it contains:

- 2 states that establish the loop $\epsilon,S\to aSa$,
- 2 states that establish the loop $\epsilon,S\to bSb$,
- a self-loop for $\epsilon,S\to \epsilon$,
- the two self-loops $a,a\to\epsilon$ and $b,b\to\epsilon$.

Now consider the string abbaabba. Here is how it would be computed by this automaton:

Input       Stack     Step from previous
---------   --------  ----------------------
abbaabba    S$        (moved to loop state)
abbaabba    aSa$      eps,S -> aSa
bbaabba     Sa$       a,a   -> eps
bbaabba     bSba$     eps,S -> bSb
baabba      Sba$      b,b   -> eps
baabba      bSbba$    eps,S -> bSb
aabba       Sbba$     b,b   -> eps
aabba       aSabba$   eps,S -> aSa
abba        Sabba$    a,a   -> eps
abba        abba$     eps,S -> eps
bba         bba$      a,a   -> eps
ba          ba$       b,b   -> eps
a           a$        b,b   -> eps
eps         $         a,a   -> eps
eps         eps       eps,$ -> eps   (moving to accept state)

PDA to CFG

For the converse direction, we want to show that given a PDA we can produce a CFG that recognizes the same language. In order to achieve this we first make some simplifying assumptions about the PDA:

Now we build a CFG for such an automaton.

Construction

Before we proceed to prove the claim, we will look at an example, namely the automaton that recognized the language consisting of a number of \(x\)’s followed by no more than an equal number of \(y\)’s. In order to accomodate our conditions above, some states need to be added:

A simple pushdown automaton
A simple pushdown automaton

We added a “dummy stack symbol” \(\#\) to accomodate the cases where we had to represent a transition that did not change the stack.

The corresponding CFG will in theory have up to \(30\) non-terminals, but many of them are irrelevant. We will present the relevant non-terminals as we build them:

So to sum up, by denoting \(S=A_{1,3}\), \(T=A_{1,2}\), the language becomes (in reality it has many more extraneous elements that don’t lead to anything):

S' -> S$
S  -> xSy | T
T  -> xT

Check that this grammar will produce the language we expect.

Proof

Now we sketch the proof that the CFG thus produced does recognize the same language as the automaton we started with.

The key claim we need to prove is the following:

\(A_{p,q}\) generates \(x\) if and only if \(x\) can bring the PDA from state \(p\) with empty stack to state \(q\) with empty stack.

Before we prove this, let us make an important observation:

If a string \(x\) can bring the PDA from state \(p\) with empty stack to state \(q\) with empty stack, then it can also bring the PDA from state \(p\) with stack symbol \(t\) at the top to state \(q\) with stack symbol \(t\) at the top, without ever popping that symbol on the way. It will simply follow the exact same steps.

Now we prove this claim.

This claim has two directions, we start with the “if” part.

Intuitively the idea of this proof is this: The PDA is to start and end with an empty stack as it moves from \(p\) to \(q\). If it also has an empty stack at some intermediate point, then we use our second case above to break the problem up into the two subparts. Otherwise, the symbol that is pushed onto the stack as we take our first step from \(p\) is not going to be popped out until we do our last step to get to \(q\). This is the kind of situation described by our first case.

Now we want to consider the opposite direction:

This completes our proof of the correspondence between the two approaches to context-free languages.