% discussion of section 4.2.8 (exercises for section 4.2) in CPTT (the "dragon % book") % \documentclass[draft]{article} \usepackage{amsmath,amssymb,tikz} \usetikzlibrary{automata,positioning} \begin{document} \title{Discussion of Selected Exercises: \\ Section 4.2.8 of Compilers: Principles, Techniques and Tools \\ \vspace{1em} \large{Topic: Context-Free Grammars}} \author{Mike Gerwitz} \date{\today} \maketitle \def\exercise#1 #2\par{ \goodbreak \vspace{0.5em plus 0.5em} \noindent \llap{\bf Exercise #1 }% {\sl#2}\par \vspace{0.5em plus 0.5em} \goodbreak } \def\exend{$\blacksquare$} \def\set#1{\left\{#1\right\}} \def\nt#1{{\ifmmode#1\else$#1$\fi}} \def\nts#1{\;\nt#1\;} \def\prod{\rightarrow} \def\punion{\;|\;} \def\emptystr{\ifmmode\epsilon\else$\emptystr$\fi} \def\mspace#1{\ifmmode\;#1\;\else$#1$\fi} \def\derivop{\displaystyle\mathop{\Rightarrow}} \def\deriv{{\mspace\derivop}} % extra grouping to solve issue in mmode w/ align \def\lmderiv{\mspace{\deriv\limits_{lm}}} \def\derivz{\mspace{\derivop^{\kern -0.25em*}}} \def\derivp{\mspace{\derivop^{\kern -0.25em+}}} \def\derivlm{\mspace{\derivop_{lm}}} \def\derivrm{\mspace{\derivop_{rm}}} \def\derivlmz{\mspace{\derivop^{\kern -0.25em*}_{lm}}} \let\eqrefold\eqref \def\eqref#1{\eqrefold{e:#1}} \def\gref#1{grammar~\eqref{#1}} \def\Gref#1{Grammar~\eqref{#1}} \def\fref#1{Figure~\ref{f:#1}} \def\prooftext#1 #2\par{ \goodbreak \vspace{1ex plus 0.5ex} \noindent \llap{#1 }% #2\par } \def\proof{\prooftext {\bf\small\uppercase{Proof}} } \def\basis{\prooftext {\sc Basis} } \def\ind{\prooftext {\sc Induction} } \def\contra{\prooftext {\sc Contradiction} } \def\foorp{$\square$\vspace{1ex plus 1ex}} \begin{abstract} This paper contains the author's answers and proofs for selected exercises from Section~4.2.8 of the ``dragon book''---Compiler: Principles, Techniques and Tools (hereinafter ``CPTT''). This book, while an excellent resource, can be challenging for self-study as it does not provide a means of verifying one's answers outside of a classroom setting (unless the reader has confidence in his/her proofs). This paper is intended for two audiences: (a) those reading the book and looking for clarification and discussion on the exercises and (b) those who are curious on the topic of context-free grammars that do not possess the text. The selected exercises are those that the author felt would be most useful for discussion and, as such, are expected to be challenging to the reader. Less challenging portions of exercises may be discussed to segue into the more challenging portions. \end{abstract} \section{Context-Free Grammars} The focus of this discussion (and of Section 4.2 in CPTT) is on context-free grammars (or simply ``grammars''). \section{Convention and Notation} The following notational conventions are used throughout this paper. In most cases, they have been borrowed from the text. For grammars, capital symbols are used to represent non-terminals. The $\nt{S}$ symbol is used to denote the starting non-terminal. The symbol $\prod$~is used to separate the non-terminal from its production body, whereas $\deriv$~indicates a single step in a derivation. Leftmost and rightmost derivations are denoted $\derivlm$ and~$\derivrm$ respectively. $\derivz$ means ``derives in zero or more steps'', whereas $\derivp$ means ``derives in one or more steps''. The symbol $\punion$~separates multiple productions for a single non-terminal. Any time punctuation is placed at the end of a grammar or derivation, it should be read as part of the surrounding paragraph, \emph{not} as part of the production or derivation. For example, in the grammar $$ \nt{S} \prod 0\nts{S}1 \punion \emptystr, $$ \noindent the trailing comma is not part of the construction. Furthermore, whitespace is not significant and may be discarded. \emptystr~is the empty string. ``The text'' refers to CPTT, whereas ``this paper'' refers to the paper you are currently reading. \section{Exercise 4.2.3---Grammar Design} This exercises requests that the reader design grammars for a series of language descriptions a--f; we will discuss each of them. Although the text does not request it, proofs will be provided for each, as they are useful to demonstrate correctness and an excellent practice in discipline. \exercise 4.2.3a The set of all strings of 0's and 1's such that every 0 is immediately followed by at least one 1. The grammar for this exercise is fairly trivial, but will serve as a useful introduction to the formalities of this paper. First, let us consider a grammar that demonstrates such a property. Our alphabet is $\Sigma = \set{0,1}$. The only restriction on the sentences of our grammar is that each 0 must be followed by a 1---this therefore means that we can have any number of adjacent 1's, but it is not possible to have adjacent 0's. Considering that our alphabet~$\Sigma$ has only two characters, this grammar is fairly simple: \begin{equation}\label{e:z1} \nt{S} \prod 1\nt{S} \punion 01\nt{S} \punion \emptystr. \end{equation} As an example, let us consider some of the sentences that we may wish to be derived by this grammar. In particular, consider derivation of the string $01011$: \begin{equation} \nt{S} \deriv 01\;\nt{S} \deriv 01\;01\;\nt{S} \deriv 01\;01\;1\nt{S} \deriv 01\;01\;1\;\emptystr \derivz 01\;01\;1. \end{equation} Notice also that a string of 1's---such as $1111$---is also derivable given our grammar: \begin{equation}\label{e:z1-1s} \nt{S} \deriv 1\;\nt{S} \deriv 1\;1\;\nt{S} \deriv 1\;1\;1\;\nt{S} \deriv 1\;1\;1\;1\;\nt{S} \deriv 1\;1\;1\;1\;\emptystr \derivz 1\;1\;1\;1, \end{equation} \noindent as is the empty string $\emptystr$ in one step: \begin{equation} \nt{S} \deriv \emptystr. \end{equation} To prove that grammar \eqref{z1} is correct, we must prove two independent statements: \begin{enumerate} \item The \emph{only} strings derivable from \gref{z1} are those of 0's and 1's such that every 0 is immediately followed by at least one 1; \item The grammar accepts all such strings. \end{enumerate} We will prove these statements in order. For the first statement, we must show that, at any given step $n$ of \gref{z1}, the only derivable strings contain a 1 after each and every 0 (or that the string contains no 0's). For the second statement, we must show that any string containing 0's and 1's such that every 0 is followed by at least one 1 is derivable from our grammar. Grammar proofs are discussed in Section 4.2.6 of the text. \proof The only strings derivable from~$\nt{S}$ are those of 0's and~1's such that every 0~is immediately followed by at least one~1. We shall perform this proof inductively on the number of steps~$n$ in a given derivation. \basis The basis is $n=1$. In one step, our grammar may produce one of three strings: A string beginning with a~1 (the first production of~$\nt{S}$), a string beginning with a~0 followed by a~1 (the second production of~$\nt{S}$) and the empty string~\emptystr\ (the final production of~$\nt{S}$). The empty string~\emptystr\ has no~0's and so follows the rules of the language. The same is true for any string beginning with a~1. The third and final string that can be generated when~$n=1$ is~01. This string does contain a~0 and therefore also satisfies our requirement. \ind We shall now assume that all derivations of fewer than $n$~steps result in either a sentence containing no~0's or a sentence that contains 0's~followed by one or more~1's. Such a derivation must have the form \begin{equation}\label{e:z1-ind} \nt{S} \deriv xS \derivz xy. \end{equation} \noindent Since $x$~is derived in fewer than $n$~steps then, by our inductive hypothesis, $x$~must contain~0's only if followed a~1; the same is true of~$y$. Additionally, according to \gref{z1}, $y$~must be of one of the productions \begin{align*} \nt{S} &\prod 1\nt{S} \\ \nt{S} &\prod 01\nt{S} \\ \nt{S} &\prod \emptystr. \end{align*} \noindent Each of these productions have already been discussed in our basis; therefore, $y$~cannot contain a~0 followed by another~0. Additionally, it is required that adjacent~1's be permitted after a~0, which is possible by the first production (as demonstrated in \eqref{z1-1s}). As such, $xy$~must contain only~0's followed by one or more~1's and our hypothesis has been proved. \foorp To ensure a thorough understanding of the above proof, it is worth mentioning why \eqref{z1-ind}~used both the \deriv\ and~\derivz\ derivation symbols. Our basis applies when $n=1$; the inductive hypothesis applies otherwise (when $n>1$). As such, we must have \emph{at least} one production in~\eqref{z1-ind}. Now that we have proved that we may only derive sentences from \gref{z1} that contain~0's followed by one or more~1's, we must now show that the grammar may be used to derive all such possible strings. \proof Any string~$s$ of length~$l$ consisting of~1's and~0's such that any~0 is followed by at least one~1 is derivable from~$\nt{S}$. \basis A string of length~$0$ ($l=0$) must be~\emptystr, which is derivable from~$\nt{S}$ in one step. \ind Assume that any string $s$ of a length less than $l$ is derivable from~\nt{S}. Such a string must have the form~$xy, y\in\set{1,01,\emptystr}$---that is, we can consider $s$ to be the concatenation of $y$~with a previously derived string. Since the length of $x$~is clearly less than~$l$, it must by derivable from~\nt{S} by our inductive hypothesis. Furthermore, $xy$~must have a derivation of the form \begin{equation}\label{e:z1-deriv-1} \nt{S} \derivp x\;\nt{S} \deriv x\;y, \end{equation} \noindent thereby proving that $s$~is derivable from~\nt{S}. \foorp The derivation~\eqref{z1-deriv-1} may seem to be too abstract to be useful; since this is our first proof, it is worth clarifying why it does in fact complete the proof. We first showed that any string of the language of 0's and 1's that we have been studying can be described as the concatenation of a smaller such string with 0, 01 or~\emptystr\ (which completes the string). This string, as we stated, has the form~$xy$. Therefore, we must show that \nt{S}~supports concatenation---\eqref{z1-deriv-1} demonstrates this with~$x$ fairly abstractly, since it does not matter what exactly $x$~is. From the productions of~\nt{S} in \gref{z1}, it is understood that $x$ can be any string of terminals (that is---any derivation) leading up to that point in the derivation~\eqref{z1-deriv-1}. We must now show that the remaining part of~$xy$---that is, $y$---is derivable. The only non-terminal remaining after~$x$ is~\nt{S}. We have defined $y$~to be any string of terminals in the set $\set{0,01,\emptystr}$. Clearly, each of these strings are derivable from~\nt{S}. Therefore, we can replace~\nt{S} in~\eqref{z1-deriv-1} with~$y$, indicating that this is a valid derivation given our definition of~$y$; it is up to the reader of the proof to make this connection. Note that, while the domain of $y$~happens to be every production of~\nt{S}, this is not necessary for the proof---that is the subject of the first proof. Before we put this exercise to rest (indeed, we completed the exercise requirement in the first paragraph following the exercise definition), it is also worth noting that this grammar may also be accepted by a finite automata (and consequently, a regular expression); this is demonstrated by \fref{z1-regex}. It should be noted that this is not the case with all of the exercises that follow. \exend \begin{figure} \center \begin{tikzpicture} \node[state,initial] (a) {$a$}; \node[state] (b) [right=of a] {$b$}; \node[state,accepting] (c) [right=of b] {$c$}; \path[->] (a) edge [loop below] node {1} () edge [bend right, below] node {\emptystr} (c) edge [above] node {$0$} (b) (b) edge [above] node {$1$} (c) (c) edge [bend right, above] node {\emptystr} (a) ; \end{tikzpicture} \caption{An NFA corresponding to the extended regular expression $\left(0^?1^+\right)^*$ describing \gref{z1}.} \label{f:z1-regex} \end{figure} The above example was fairly simple, yet resulted in a realitively lengthy discourse far past what was required by the text; the reader can expect such a discussion to continue for all examples that follow. \exercise 4.2.3b The set of all strings of 0's and 1's that are palindromes; that is, the string reads the same backward as forward. As the exercise stated, a {\sl palindrome} is a string that reads the same in both directions; let us consider some examples before attempting to construct a grammar. The following list of strings are all palindromes, one per line:\footnote{An example of an English palindrome is ``Mr.~Owl ate my metal worm'' (discarding punctuation and capitalization.)} \begin{equation}\label{e:palex} \begin{tabular}{rcl} 1 &00 &1 \\ 1100 &11 &0011 \\ 010 &1 &010 \\ & 0 & \end{tabular} \end{equation} The above palindromes have been laid out so that their symmetry is apparent. At first glance, one can imagine constructing a palindrome out of pairs of characters, like the second row of~\eqref{palex}: \begin{equation}\label{e:palex-2} \begin{tabular}{crcl} & 11 & \\ 1 & 11 & 1 \\ 11 & 00 & 11 \\ 110 & 00 & 011 \\ 1100 & 11 & 0011 \end{tabular} \end{equation} \noindent In this case, each palindrome would always have an even number of characters. However, it is important to note the bottom two palindromes of \eqref{palex}, which have an \emph{odd} number of characters: \begin{equation}\label{e:palex-3} \begin{tabular}{rcl} & 00 & \\ 0 & 11 & 0 \\ 01 & 00 & 10 \\ 010 & 1 & 010 \end{tabular} \end{equation} Given this evaluation and the understanding that $2n$~is always even for some positive integer~$n$, it would be accurate to recursively construct a palindrome from the edges inward in pairs. Once we reach the center, we may end with~\emptystr\ if we wish to have an even ($2n$) number of characters, or otherwise may add a single character to create a palindrome containing an odd ($2n+1$) number of characters. \begin{equation}\label{e:palindrome} \begin{aligned} \nt{S} &\prod 0\nts{S}0 \punion 1\nts{S}1 \punion M \\ \nt{M} &\prod 0 \punion 1 \punion \emptystr \end{aligned} \end{equation} In \gref{palindrome} above, we define out start non-terminal~\nt{S} with productions for the outer pairs. The non-terminal~\nt{M} represents the acceptable inner (``middle'') characters, which determines if the length of the palindrome is even (if \emptystr~is used) or odd (0 or~1). We will leave demonstrations of such derivations to the proof. To prove that grammar~\nt{S} is the proper grammar for all palindromes, we must again prove two things: That language $L(\nt{S})$ can produce only palindromes of~0's and~1's and that all such palindromes can be derived from~\nt{S}. The difference between these two descriptions may be subtle for such a simple grammar, but the distinction is important to ensure that $L(\nt{S})$ represents \emph{nothing more and nothing less} than a language that may be used for such palindromes. As before, the proofs will be inductive---the first proof on the number of steps~$n$ of a derivation of~\nt{S} and the second on the length~$l$ of the palindrome~$s$. Our alphabet~$\Sigma$ is once again~$\set{0,1}$. \proof The only strings derivable from grammar~\nt{S} are palindromes consisting of 0's and~1's. \basis The basis is $n=2$, which is the fewest number of steps from which a string may be derived from~\nt{S}.\footnote{$n=1$ steps cannot result in a string consisting only of nonterminals, as it would result in $0S0$,~$1S1$ or~$M$.} Such a derivation must be of the form $$ \nt{S} \deriv M \deriv x, $$ \noindent where $x$~is 0,~1, or~\emptystr. In the latter case, the derived string is clearly a palindrome of length zero. In the case of 0 or~1, the length of the string is one, which must be a palindrome. \ind Now assume that every string derived in less than $n$~steps is a palindrome. Such a derivation must be of the form $$ \nt{S} \deriv x\nts{S}x \derivz x\;y\;x. $$ \noindent That is, the string~$x$ appears on both the left and right of~$y$. Since the derivation of~$y$ from~\nt{S} takes fewer than $n$~steps---specifically, $n-1$ steps---$y$~must be a palindrome by our inductive hypothesis. Because $x$~is added to both the beginning and end of~$y$, then any string derived in $n$~steps must be a palindrome. \foorp Let us further demonstrate the above proof by deriving~\eqref{palex-2} from~\nt{S}:\footnote{The dots were added so as not to confuse the reader as to what was going on; the symbol~\derivp\ is sufficient and therefore the dots will be omitted in the future.} \begin{equation} \nt{S} \deriv 1\nts{S}1 \deriv 1\;1\nts{S}1\;1 \deriv \cdots \derivp 1\;1\;0\;0\;1\;\emptystr\;1\;0\;0\;1\;1 \end{equation} \noindent and additionally \eqref{palex-3}: \begin{equation} \nt{S} \deriv 0\nts{S}0 \deriv 0\;1\nts{S}1\;0 \deriv 0\;1\;0\nts{S}0\;1\;0 \deriv 0\;1\;0\;1\;0\;1\;0. \end{equation} \noindent The induction step works by recognizing the basis as the middle of the string (nonterminal~\nt{M} in \gref{palindrome})---\emptystr~for palindromes of an even length and the $\left\lceil n/2 \right\rceil^{th}$ character for those of an odd length (1 in the case of the latter derivation). Call this string~$b$. We know that $b$~is a palindrome, as explained in the proof above. For our inductive step, we recognize that, for each step~$n$, we add two characters---one to the beginning and one to the end---to the result of step~$n-1$. As such, since the derivation of~$n-1$ steps must be a palindrome, the derivation in~$n$ steps must also be---it is not possible to derive anything but a palindrome from~\nt{M} and \nt{S}~maintains this designation. For completeness, we must now show that all possible palindromes of the alphabet~$\Sigma$ can be derived from~\nt{S}. \proof Every palindrome consisting of~0's and~1's is derivable from~\nt{S}. \basis If the string~$s$ is of length~$l\leq1$, then it must be \emptystr,~0 or~1, all of which are palindromes derivable by~\nt{M}. \ind Observe that any palindrome of length~$l>1$ must contain the same character at positions~$1$ and~$l$.\footnote{1-indexed for notational convenience.} Assume that each string with a length less than~$l$ is derivable from~\nt{S}. Since $s$~is a palindrome, then it must have the form $xyx, x\in\Sigma$, where $y$~is also a palindrome. Since $y$~has a length $l-2] (a) edge [loop below] node {1} () edge [loop above] node {0} () ; \end{tikzpicture} \caption{The minimum-state DFA for the regular expression $\left(0|1\right)^*$.} \label{f:pal-a} \end{figure} Consider that the only way for a finite automata to maintain a history of states is to have a state to represent each unique history. However, to accept a string of any length, we would need an automaton containing a potentially infinite number of states, which is not finite (and therefore not a finite automaton). Therefore, it is not possible to represent the history of every possible palindrome using a finite set of states. Given this, it must stand that a finite automaton must at some point contain a state that transitions to a previous or current state, such as the NFA in \fref{pal-a2}. Since the history of the string is ``stored'' purely in the possible states leading up to the current state, this transition~$t$ equates to a loss of ``memory'', without which the right-hand portion of the palindrome cannot be properly matched. Furthermore, since each position~$n$ may contain any character in~$\Sigma$, and since the transition~$t$ can only yield a set of future states with a limited (finite) precision, each of these future states must be redundant. Since each NFA can be represented by an equivalent DFA and each DFA for some grammar has a single common minimum-state DFA, any portion of a finite automaton that can accept a palindrome of any length must be equivalent to \fref{pal-a} (such as state~$x$ in \fref{pal-a2}). We are therefore left to conclude that no finite automata can accept a palindrome of arbitrary length without accepting every string that is a combination of each character in $\Sigma$. \foorp \begin{figure} \center \begin{tikzpicture} \node[state,initial] (a) {$1$}; \node[state] (b) [right=of a] {$2$}; \node[state] (x) [right=of b] {$x$}; \node[state] (y) [right=of x] {$n-1$}; \node[state,accepting] (z) [right=of y] {$n$}; \path[->] (a) edge [above] node {$\alpha$} (b) edge [below, bend right=45] node {$\kern-0.7em\emptystr$} (z) (b) edge [above] node {$\beta$} (x) edge [below, bend right=65] node {$\emptystr$} (y) (x) edge [loop above] node {$\beta$} () edge [loop below] node {$\alpha$} () edge [above] node {$\beta$} (y) (y) edge [above] node {$\alpha$} (z) ; \end{tikzpicture} \caption{An NFA with a finite set of states must at some point transition to a previous or identical state in order to accept input of any length. $\Sigma=\set{\alpha,\beta}$.} \label{f:pal-a2} \end{figure} To provide further clarification---any finite automata that transitions to a \emph{previous} state, since it looses a portion of its history, can no longer accurately determine the states leading up to the final state. That is, consider the string 10101 and consider that the first three characters of this string can be represented by the states $\set{a,b,a}$. At this point, we can no longer be certain of what the string may end with, because we have lost any sense of nesting/recursion. Therefore, the states leading to the final state are forced to accept any character in $\Sigma$ and therefore must be equivalent to the minimum-state DFA of \fref{pal-a}. As was mentioned by the text, ``finite automata cannot count''. \fref{pal-a2} gets around such an issue by transitioning only to current or future states, which permits a \emph{finite} amount of nesting (placing the aforementioned minimum-state DFA~$x$ in the middle). However, note a glaring issue---this automaton does not accept~$\beta$ in the first character position. If it did, then we would need a second set of states in order to maintain such a history and know that we should also \emph{end} with $\beta$~instead of~$\alpha$. The number of states would therefore grow very quickly with the level of nesting and the size of~$\Sigma$ (such a consideration is left to the reader). We have exhaustively proved that \gref{palindrome} is the correct answer for this exercise. \exend \exercise 4.2.3c The set of all strings of 0's and 1's with an equal number of 0's and 1's. To understand how to approach this problem, we shall consider a number of strings that are derivable from this language. An obvious case is~\emptystr, which contains zero~0's and zero~1's. Some additional examples are shown in \fref{eq-ex} along with their lengths (denoted by~$l$). \begin{figure}[h] \center \begin{tabular}{r|cccccc} $s$ & \emptystr & 10 & 01 & 1010 & 1001 & 011100 \\ \hline $l$ & 0 & 2 & 2 & 4 & 4 & 6 \end{tabular} \caption{Examples of strings with an equal number of 0's and 1's.} \label{f:eq-ex} \end{figure} These examples demonstrate a number of important properties. In particular, the length~$l$ of the string~$s$ is always even, with the number of 0's and~1's $n=l/2$. Additionally, the characters of the alphabet~$\Sigma$ may appear in any order in the string. Therefore, we do not have the luxury of a simple, nested, recursive implementation as we did with the palindrome exercise (at least not exclusively). Let us construct the grammar iteratively, beginning with the simplest case of~\emptystr. \begin{equation}\label{e:eq-1} \nt{S} \prod \emptystr \end{equation} \noindent The second case---10---is also fairly easy to fit into~$\nt{S}$: \begin{equation}\label{e:eq-2} \nt{S} \prod 10 \punion \emptystr \end{equation} The third case demonstrates an important case regarding our strings: They may begin with either a~0 or a~1 and they may also \emph{end} with either character (more generally, they may begin or end with any character in~$\Sigma$). However, we cannot simply adjust our grammar to accept either character in both positions---$\nt{S}$ must assure that, any time we include a~0 in a production, we also include a~1 (and vice versa). So far, this is guaranteed by~$\nt{S}$ in \gref{eq-2}; to keep on this path, we must add 01 as yet another special case. \begin{equation}\label{e:eq-3} \nt{S} \prod 01 \punion 10 \punion \emptystr \end{equation} \goodbreak The fourth case---1010---introduces the need to handle strings of an arbitrary length. To do this, we must determine at what point we should recurse on~$\nt{S}$. Looking at the example, we could derive 1010 as two nested applications of~$\nt{S}$ if we recurse between the two terminals. \begin{equation}\label{e:eq-a} \nt{S} \deriv 1\nts{S}0 \deriv 1\;0\nts{S}1\;0 \deriv 1\;0\;\emptystr\;1\;0 \derivz 1\;01\;0 \end{equation} \noindent Of course, one could also adopt an alternate perspective by considering the string to be the production of two adjacent non-terminals. \begin{equation}\label{e:eq-b} \nt{S} \deriv \nt{S}\;\nt{S} \derivlm 10\;\nt{S} \derivlm 10\;10 \end{equation} \noindent Unfortunately, with this information alone, we cannot be certain which of these productions---if such a choice even matters---should be used in our grammar. Perhaps we can gain further insight from the remaining examples. The next example---1001---can be derived in a manner similar to \eqref{eq-b}, but not \eqref{eq-a}; in particular, \gref{eq-3} has no production for the string 00, and so we cannot construct the string from the outside in. Given that, we can be certain that an adjacent non-terminal production is needed and so we will add the production used in \eqref{eq-b} to our grammar. \begin{equation}\label{e:eq-4} \nt{S} \prod 01 \punion 10 \punion \nt{S}\;\nt{S} \punion \emptystr \end{equation} However, the aforementioned predicament---the absense of a production that can yield only 00---raises the question of whether or not we can truly derive any string of equal 1's and 0's from the above grammar. Our final example challenges this. 011100 cannot possibly be represented by~$\nt{S}$ in \gref{eq-4} because this grammar constructs the string from left-to-right (or right-to-left) in pairs of~0's and~1's. Therefore, the only way to have adjacent~1's or adjacent~ 0's is to alternate the productions, which makes it impossible to have more than two adjacent identical characters. Given this, it seems that both \eqref{eq-b} \emph{and} \eqref{eq-a} are necessary; the following derivation demonstrates this fact (neither can individually be used to derive the string 011100). \begin{equation} \nt{S} \deriv \nt{S}\;\nt{S} \derivlm 01\;\nt{S} \derivlm 01\;1\nts{S}0 \derivlm 01\;1\;1\nts{S}0\;0 \derivlm 01\;1\;1\;\emptystr\;0\;0 \derivlmz 01\;1\;10\;0 \end{equation} \noindent We thus arrive at \gref{eq-5} below. \begin{equation}\label{e:eq-5} \nt{S} \prod 0\nts{S}1 \punion 1\nts{S}0 \punion \nt{S}\;\nt{S} \punion \emptystr \end{equation} An astute reader may at this point notice that we have created an ambiguity in our grammar: Recall~\eqref{eq-a} and~\eqref{eq-b}, which had two possible derivations for the same string; both of these derivations are now possible in our grammar. The text defines an ambiguous grammar to be a grammar that contains more than one leftmost or more than one rightmost derivation for the same sentence. This is a particularly interesting example of ambiguity, in particular because we cannot resolve it. Let us consider why. \proof Grammar~$\nt{S}$ cannot be disambiguated. We will prove this fact by contradiction. \contra Firstly, recognize that~$\nt{S}$ is ambiguous because there exists some sentence~$s$ that has both of the following derivations in $n>1$ steps, where $a\ne b$: \begin{align*} \nt{S} &\deriv a\nts{S}b \derivp a\;x\;b; \\ \nt{S} &\deriv \nt{S}\;\nt{S} \deriv a\nts{S}b\;\nt{S} \derivp a\;b\;\nt{S} \derivp a\;b\;x. \end{align*} Suppose to the contrary that there is some way to disambiguate~$x$. There must then be some terminal $c\in\Sigma$ in~$x$ that may be used to perform the disambiguation and such a disambiguation would imply a difference in the semantics of~$x$ between the two derivations. However, $x=x$ and so both derivations hold exactly the same meaning---balanced strings. Furthermore, the productions for producing balanced strings requires each character in~$\Sigma$; $c$ therefore must not exist. \foorp Fortunately, this ambiguity is not an issue for our grammar because the multiple derivations are semantically equivalent---we are not arriving at any different result within the context of this exercise. The sentence 1010 of \fref{eq-ex} demonstrates this concept: It does not matter whether we consider the sentence to be a single balanced string or the concatenation of two balanced strings; we arrive at the same result regardless with no harm done.\footnote{Of course, one valid argument is that a more concise and unambiguous grammar will reduce problems during parsing. However, the parser (like Lex, as described by the text) can give precedence to the productions that appear earlier in the grammar to resolve this issue.} While the discussion thus far is likely to convince the reader that \gref{eq-5} is correct, we shall conclude with a formal proof of this fact. A proof that the grammar cannot be represented by any finite automata shall be omitted, in particular because the productions of $\nt{S}$ have a structure very similar to the palindrome \gref{palindrome}. \proof Only sentences composed of balanced~1's and!0's may be derived from~$\nt{S}$. \basis The basis is $n=1$. The only sentence that may be derived in 1 step is~\emptystr, which is clearly balanced (containing zero~0's and zero~1's). \ind Assume that any sentence derived in fewer than~$n$ steps is balanced. Now recognize that any sentence derived in $n>1$ steps must make use of one of the following productions of $\nt{S}$: \begin{align*} \nt{S} &\prod 0\nts{S}1; \\ \nt{S} &\prod 1\nts{S}0; \\ \nt{S} &\prod \nt{S}\;\nt{S}. \end{align*} \noindent Therefore, the smallest sentence that is not~\emptystr\ is either $0\nt{x}1$ or $1\nt{x}0$, both of which are balanced (each contains one~0 and one~1). Since $x$~is derivable from~$\nt{S}$ in fewer than~$n$ steps, then by our inductive hypothesis, all sentences derivable from~$\nt{S}$ must be balanced. The last remaining production has the form~$xy$, both of which are derivable from~\nt{S} in fewer than~$n$ steps and thus must be balanced. Furthermore, since the productions of~$\nt{S}$ produce only 0,~1, or~\emptystr, $\nt{S}$~has the alphabet $\Sigma=\set{0,1}$ and, consequently, may derive no sentence except for those containing balanced~0's and~1's. \foorp Having proved that only sentences of balanced~0's and~1's are derivable from~$\nt{S}$, we must now prove that $\nt{S}$~can derive \emph{all} such strings (that is, all such strings are sentences of $\nt{S}$). Such a proof is interesting because our grammar is more sophisticated than the previous examples. \proof All strings of balanced~0's and~1's are sentences of~$\nt{S}$. \basis The basis is a string of length $l=0$, which contains zero~0's and zero~1's. This string must be~\emptystr, which is derivable from~$\nt{S}$. \ind First, recognize that all balanced strings must have a length $l=2k$---that is, $l$~is always even (as emphasized in \fref{eq-ex}) and contains $k$~0's and $k$~1's. Assume that all strings less than length~$2k$ are derivable from~$\nt{S}$. Consider any balanced string~$s$ of length~$2k$. We can consider $s$~to have the form~$yz$---that is, the concatenation of two balanced strings~$y$ and~$z$, both of which in turn have the form $axb, a\neq b$ where $x$ itself must be balanced (since $a\neq b$); alternatively, either $y$ or~$z$ may be~\emptystr, which therefore implies that the form~$yz$ accepts any balanced string where the first and last characters are not the same. We must now show that all such strings can be represented by the form~$yz$. First, recognize that $y=axb$ may have either the form $0x1$ or $1x0$; the form $yz$ then permits up to two adjacent identical characters in $\Sigma$; any additional adjacent identical characters may be derived by $x$. Consider $x=\emptystr$; then, clearly $axb$ is balanced and can be concatenated to form a larger balanced string. If $x\neq\emptystr$ but $x_1=b$,\footnote{$x_n$ denotes the $n^{\text{th}}$ character of~$x$.} then we can instead consider an alternative interpretation $y'=ax_1$ and $x'=x_2\cdots x_nb$, and then let $y=y'x'$ (instead of $axb$). We are then left with the case where $x_1=a$. Such a case allows for an arbitrarily deep nesting of adjacent identical characters and therefore $axb$ can be represented by the regular expression $a^+b^+$. It is therefore clear that the form $yz$ is able to describe any string of balanced characters in the alphabet $\Sigma=\set{0,1}$. Such a form must have the derivation $$ \nt{S} \deriv \nt{S}\;\nt{S} \derivlmz xy. $$ \noindent Since this is a leftmost derivation, $y$~is either a balanced string or \emptystr. In the former case, it is obvious that both $x$ and~$y$ are of a length less than~$2k$ and are therefore derivable from~\nt{S} by our inductive hypothesis. Otherwise, $y=\emptystr$ and the length of $x$~is precisely~$2k$ and we must consider the form $axb$; $x$~is clearly of a length of less than~$2k$ and is therefore balanced by our inductive hypothesis. Furthermore, it must have a derivation of the form $$ \nt{S} \deriv a\nts{S}b \derivz a\;x\;b, $$ \noindent thereby proving that $axb$ is derivable from~\nt{S}. \foorp This proof was considerably more involved than our previous ones and is an excellent segue into proving more sophisticated grammars. Of course, the reader can surely see the challenges that might arise from attempting to prove much more complicated grammars. \exend \end{document}