Decidable Languages

Reading

Section 4.1

Practice problems (page 182): 4.1, 4.2, 4.3, 4.4 (two ways), 4.9, 4.10, 4.11, 4.12 (careful), 4.13, 4.16

Challenge: 4.14, 4.23, 4.24, 4.27

Decidable Languages

We will now discuss a number of decidable problems, that we have largely encountered before. But before we do so we need to discuss the inputs to our Turing Machines.

String representations of objects

We need a way to turn the objects we are interested in working with into some strings, which can then be:

  1. considered as elements of a language \(L\)
  2. considered as the input to a Turing Machine

The key observation was that there might be many different representations of a type of object, what is important is that we fix one, and that it is well defined and specified.

An encoding or string representation of a class of objects is an association of a string to each object of this class so that:

The string corresponding to an object \(G\) under such an encoding will be denoted by \(\langle G\rangle\).

All encodings for the same object would be fundamentally interchangeable. What is important is that we fix one encoding.

We briefly discussed a number of such encoding for graphs:

Of particular interest for us will be encodings of the kinds of objects we have encountered so far. For instance imagine the encoding of a DFA:

So given a DFA we can convert it to the above string. Conversely, given a string we can determine if it has the form specified above, i.e. we can write a Turing Machine that would do each of the following:

Each of these steps would be tedious to write in a TM but perfectly doable.

Exercise: I purposefully omitted something from the checks that need to happen. What is it?

Exercise: We can imagine a similar process for an NFA. What would change?

We can now consider some languages related to these DFAs. We can for instance imagine the language:

\[L = \{\langle\textrm{A}\rangle\mid A\textrm{ is a DFA}\}\]

Essentially the steps we described above tell us that this language \(L\) is decidable: Given an arbitrary string, we can tell if it has the form specified above.

Here is a more interesting language:

\[A_\textrm{DFA} = \left\{\langle B,w\rangle\mid B\textrm{ is a DFA that accepts on input string }w \right\}\]

So this is the language of all pairs of a DFA and an input string, such that the DFA accepts the string.

Asking that this language be decidable is the same thing as asking: Can we write an algorithm that given a DFA and an input string can decide if the DFA accepts the string or not? We will call this the acceptance problem for DFAs.

The answer is obviously yes: We run the DFA on the string, and see what ends up happening.

But let us be a bit more precise in our use of the term “run”:

In the future we will abbreviate this even more to simply say that we “simulate B on input \(w\)”.

It is clear by part 3 above that this Turing Machine is a decider: It will terminate when the tape 2 reaches the end of the string \(w\). So we have:

The language \(A_{\textrm{DFA}}\) is Turing-decidable. In other words, the problem of determining if a DFA accepts a certain string is a decidable problem.

Exercise: What about the language \(A_{\textrm{NFA}}\) that does the same thing but for an NFA?

We could model this perhaps via a deterministic Turing Machine, but there is a simple way: Convert the NFA to a DFA. So the construction might go something like this:

This is an important idea: We reduce our problem to a previously solve problem. We use previously constructed TMs as “subroutines”, or if you prefer as helper functions which we can call on to do part of the work.

Exercise: Consider the language: \[E_{\textrm{DFA}} = \left\{\langle A\rangle\mid A\textrm{ is a DFA and }L(A)\neq\emptyset\right\}\] In other words, the language contains all DFAs that have at least one string that they accept. Is this language decidable? Can you describe the Turing Machine that decides it? We will call this the emptyness problem for DFAs.

Exercise: Consider the language: \[EQ_\textrm{DFA} = \left\{\langle A,B\rangle\mid A,B\textrm{ are DFAs and }L(A)=L(B)\right\}\] In other words the input to this language is a pair of DFAs that both recognize the same language. Is this language decidable? We will call this the equivalence problem for DFAs.

Hint: Try to use the answer to the previous problem.

Decidable Problems concerning CFLs

Let us now consider problems related to context-free languages, and specifically context-free grammars.

Exercise: How would we represent a CFG as a string?

We can ponder the same questions as before:

We will discover that the answer to the first two questions is Yes, but the answer to the third is no:

There is no TM that can decide, given two CFGs, whether they recognize the same language or not.

Exercise:

  1. Look back at your proof of why the equivalence problem for regular languages/DFAs was decidable. What goes wrong when we try to apply the same method to CFGs/CFLs?
  2. It is easy to see that the language \(L=\left\{\langle A,B\rangle\mid A,B\textrm{ are CFGs such that }L(A)\neq L(B) \right\}\) is Turing-recognizable. How would that go?
  3. Is the language \(L\) above the complement of the language \(\textrm{EQ}_{\textrm{CFG}}\)?
  4. Is the complement of \(\textrm{EQ}_{\textrm{CFG}}\) recognizable?
  5. Is it true, that if a language \(L\) and its complement are both Turing-recognizable, then \(L\) is decidable?

We now proceed to discuss the other two problems:

The acceptance problem for CFGs is decidable.

So, given a CFG and a string \(w\), we need to decide if the CFG can generate that string.

A first idea is to try all derivations. But there are effectively infinitely many possible derivations. We need a way to stop. Otherwise what we have is a recognizer, not a decider.

The idea comes from Chomsky Normal forms:

If a grammar is in Chomsky Normal Form, and we have a string \(w\) of length \(n\) produced by the grammar, then the derivation contains exactly \(2n-1\) steps.

This follows from the rules of the Chomsky Normal form. Think of some small values of \(n\) and it will become clear.

So now this gives us a way to deal with the acceptance problem:

This has profound implications for computing: All programming languages are codified in terms of a CFG specifying the valid syntax for programs in a language. In order to run any program in a programming language, a program being essentially just a string of input from some alphabet, we need to parse it according to the CFG from the language. The fact that the acceptance problem is decidable tells us that this is indeed doable. Although the method described above would likely be too slow to be of practical use.

Before moving on, let us consider an important consequence of this result:

Every CFL is decidable.

This essentially follows what we already did: Suppose that \(G\) is a CFG for the language \(L\). Then for a string \(w\) we run the acceptance algorithm for CFGs on the pair \(\langle G, w\rangle\), and we return the result of that algorithm.

This enables us to build a hierarchy of languages: The collection of all regular languages is contained within the collection of CFLs which then is contained within the collection of Turing-decidable languages, which in turn is contained in the collection of Turing-recognizable languages. All these inclusions are strict.

Now we move on to the other decidable problem:

The emptyness problem for CFGs is decidable.

Essentially the question boils down to this: Can the start variable generate a string of literals by means of the production rules?

We will actually discuss the most general problem:

There is an algorithm that given a CFG and a nonterminal in it, determines if that terminal can generate a string of literals via a derivation in the grammar.

Think about how this algorithm would go before reading on.

Here is how the Turing Machine might go: