Practice problems (page 83): 1.20
In this section we will discuss the formal definition of “languages”, give a number of examples, and discuss some standard constructions.
A language in a given alphabet is simply a set of strings in that alphabet.
So a language is nothing more than a separation of the strings in that alphabet in two groups: Those that “belong” to this language and those that don’t. Let’s think of some examples:
It is worth pausing here for a moment, and thinking about how you could specify the kinds of languages we are describing above in say OCAML. What kind of structure could we use? What would be its advantages and disadvantages?
After a bit of thought you’ll probably arrive at two different descriptions, both of which really end up describing ways of describing a “set”:
So basically:
We can “specify” a language by providing either:
- A way to tell, for a given string, if it belongs in the language or not, or
- A way to generate strings from the language, eventually all the strings from the language.
Here is a question worth thinking about for a moment, or perhaps longer:
Given descriptions for two languages, via either of the two methods above, can we tell if those descriptions actually correspond to the same language? In other words, can we tell when two languages are “equal”?
There are a number of standard operations on languages, that allow us to create new languages from existing ones:
If \(A\) and \(B\) are two languages on the same alphabet, thought of as sets, then their union \(A\cup B\) consists of all strings that belong to at least one of the two languages. so:
\[A\cup B = \left\{x \mid x \in A \textrm{ or }x\in B \right\}\]
If \(A\) and \(B\) are two languages on the same alphabet, thought of as sets, then their intersection \(A\cup B\) consists of all strings, if any, that belong to both languages:
\[A\cap B = \left\{x \mid x \in A \textrm{ and }x\in B \right\}\]
If \(A\) is a language, its complement, usually denoted \(A'\) or \(A^c\), consists of all strings in the alphabet that are not in the language:
\[A' = \left\{x \mid x \not\in A\right\}\]
If \(A\) and \(B\) are languages, their concatenation consists of all strings formed by concatenating together a string from \(A\) and a string from \(B\):
\[AB = \left\{ab \mid a \in A \textrm{ and }b\in B \right\}\]
This is an operation related to languages, that you do not often encounter in sets in general. The Kleene star of a language, denoted by \(A^*\), is formed by taking sequences of strings from the language:
\[A = \left\{a_1a_2\cdots a_n \mid a_i \in A,\, n \geq 0 \right\}\]
Note in particular that the empty string is always part of the Kleene Star of any language.
Here is a challenge problem. Some parts of it are easier than others and will be part of your homework.
Challenge
Given one of the language descriptions above (as a predicate or as a stream), can you describe how to obtain descriptions for each of the language constructs just described? For example, could you instruct a computer, if they are given predicates for languages \(A\) and \(B\), to create predicates for Union/Intersection/Complement/Concatenation?
In a certain sense, talking about languages amounts to talking about computation. For instance, saying that I have implemented a way to compute, given a number, whether that number is prime or not is exactly the same as saying that I have written a program that can “recognize” the language:
\[A = \left\{x \mid\textrm{ the number represented by }x\textrm{ is prime } \right\}\]
Pretty much most problems in computing can be rephrased in such a manner. Therefore being able to recognize languages, i.e. being able in a mechanistic way to determine if a string is in a language or not, is the pivotal question we will be attempting to answer in this course.