Initial commit of section 4.2.8 discussion of CPTT

This is already becoming much longer than I had originally anticipated and is taking much of my little reading time that I have. That said, this is an excellent way to practice these concepts (rather than hacking, as I would for practicing a programming language). Therefore, I'm not sure if I will go into as many examples as I had originally thought (that is, past 4.2.3f); we shall see. I hope that readers will find this information useful; I have certainly enjoyed writing it.
2013-05-15 21:59:38 -04:00 · 2013-05-15 21:59:38 -04:00 · 363452fe85
commit 363452fe85
2 changed files with 844 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,3 @@
+*.pdf
+*.log
+*.aux
--- a/s428.tex
+++ b/s428.tex
@ -0,0 +1,841 @@
+% discussion of section 4.2.8 (exercises for section 4.2) in CPTT (the "dragon
+% book")
+%
+
+\documentclass[draft]{article}
+\usepackage{amsmath,amssymb,tikz}
+\usetikzlibrary{automata,positioning}
+
+\begin{document}
+\title{Discussion of Selected Exercises: \\
+Section 4.2.8 of Compilers: Principles, Techniques and Tools \\
+\vspace{1em}
+\large{Topic: Context-Free Grammars}}
+
+\author{Mike Gerwitz}
+\date{\today}
+
+\maketitle
+
+\def\exercise#1 #2\par{
+  \goodbreak
+  \vspace{0.5em plus 0.5em}
+  \noindent
+  \llap{\bf Exercise #1 }%
+  {\sl#2}\par
+  \vspace{0.5em plus 0.5em}
+  \goodbreak
+}
+\def\exend{$\blacksquare$}
+
+\def\set#1{\left\{#1\right\}}
+
+\def\nt#1{{\ifmmode#1\else$#1$\fi}}
+\def\nts#1{\;\nt#1\;}
+\def\prod{\rightarrow}
+\def\punion{\;|\;}
+\def\emptystr{\ifmmode\epsilon\else$\emptystr$\fi}
+
+\def\mspace#1{\ifmmode\;#1\;\else$#1$\fi}
+
+\def\derivop{\displaystyle\mathop{\Rightarrow}}
+\def\deriv{{\mspace\derivop}} % extra grouping to solve issue in mmode w/ align
+\def\lmderiv{\mspace{\deriv\limits_{lm}}}
+\def\derivz{\mspace{\derivop^{\kern -0.25em*}}}
+\def\derivp{\mspace{\derivop^{\kern -0.25em+}}}
+\def\derivlm{\mspace{\derivop_{lm}}}
+\def\derivrm{\mspace{\derivop_{rm}}}
+\def\derivlmz{\mspace{\derivop^{\kern -0.25em*}_{lm}}}
+
+\let\eqrefold\eqref
+\def\eqref#1{\eqrefold{e:#1}}
+\def\gref#1{grammar~\eqref{#1}}
+\def\Gref#1{Grammar~\eqref{#1}}
+\def\fref#1{Figure~\ref{f:#1}}
+
+\def\prooftext#1 #2\par{
+  \goodbreak
+  \vspace{1ex plus 0.5ex}
+  \noindent
+  \llap{#1 }%
+  #2\par
+}
+\def\proof{\prooftext {\bf\small\uppercase{Proof}} }
+\def\basis{\prooftext {\sc Basis} }
+\def\ind{\prooftext {\sc Induction} }
+\def\contra{\prooftext {\sc Contradiction} }
+\def\foorp{$\square$\vspace{1ex plus 1ex}}
+
+
+\begin{abstract}
+This paper contains the author's answers and proofs for selected exercises from
+Section~4.2.8 of the ``dragon book''---Compiler: Principles, Techniques and
+Tools (hereinafter ``CPTT''). This book, while an excellent resource, can be
+challenging for self-study as it does not provide a means of verifying one's
+answers outside of a classroom setting (unless the reader has confidence in
+his/her proofs). This paper is intended for two audiences: (a) those reading the
+book and looking for clarification and discussion on the exercises and (b) those
+who are curious on the topic of context-free grammars that do not possess the
+text. The selected exercises are those that the author felt would be most useful
+for discussion and, as such, are expected to be challenging to the reader. Less
+challenging portions of exercises may be discussed to segue into the more
+challenging portions.
+\end{abstract}
+
+
+\section{Context-Free Grammars}
+The focus of this discussion (and of Section 4.2 in CPTT) is on context-free
+grammars (or simply ``grammars'').
+
+\section{Convention and Notation}
+The following notational conventions are used throughout this paper. In most
+cases, they have been borrowed from the text.
+
+For grammars, capital symbols are used to represent non-terminals. The $\nt{S}$
+symbol is used to denote the starting non-terminal. The symbol $\prod$~is used
+to separate the non-terminal from its production body, whereas
+$\deriv$~indicates a single step in a derivation. Leftmost and rightmost
+derivations are denoted $\derivlm$ and~$\derivrm$ respectively. $\derivz$ means
+``derives in zero or more steps'', whereas $\derivp$ means ``derives in one or
+more steps''.  The symbol $\punion$~separates multiple productions for a single
+non-terminal. Any time punctuation is placed at the end of a grammar or
+derivation, it should be read as part of the surrounding paragraph, \emph{not}
+as part of the production or derivation. For example, in the grammar
+$$
+  \nt{S} \prod 0\nts{S}1 \punion \emptystr,
+$$
+\noindent
+the trailing comma is not part of the construction. Furthermore, whitespace is
+not significant and may be discarded. \emptystr~is the empty string.
+
+``The text'' refers to CPTT, whereas ``this paper'' refers to the paper you are
+currently reading.
+
+
+\section{Exercise 4.2.3---Grammar Design}
+This exercises requests that the reader design grammars for a series of language
+descriptions a--f; we will discuss each of them. Although the text does not
+request it, proofs will be provided for each, as they are useful to demonstrate
+correctness and an excellent practice in discipline.
+
+\exercise 4.2.3a The set of all strings of 0's and 1's such that every 0 is
+immediately followed by at least one 1.
+
+The grammar for this exercise is fairly trivial, but will serve as a useful
+introduction to the formalities of this paper. First, let us consider a grammar
+that demonstrates such a property. Our alphabet is $\Sigma = \set{0,1}$. The
+only restriction on the sentences of our grammar is that each 0 must be followed
+by a 1---this therefore means that we can have any number of adjacent 1's, but
+it is not possible to have adjacent 0's. Considering that our alphabet~$\Sigma$
+has only two characters, this grammar is fairly simple:
+
+\begin{equation}\label{e:z1}
+  \nt{S} \prod 1\nt{S} \punion 01\nt{S} \punion \emptystr.
+\end{equation}
+
+As an example, let us consider some of the sentences that we may wish to be
+derived by this grammar. In particular, consider derivation of the string
+$01011$:
+
+\begin{equation}
+  \nt{S} \deriv 01\;\nt{S}
+    \deriv 01\;01\;\nt{S}
+    \deriv 01\;01\;1\nt{S}
+    \deriv 01\;01\;1\;\emptystr
+    \derivz 01\;01\;1.
+\end{equation}
+
+Notice also that a string of 1's---such as $1111$---is also derivable given our
+grammar:
+
+\begin{equation}\label{e:z1-1s}
+  \nt{S} \deriv 1\;\nt{S}
+    \deriv 1\;1\;\nt{S}
+    \deriv 1\;1\;1\;\nt{S}
+    \deriv 1\;1\;1\;1\;\nt{S}
+    \deriv 1\;1\;1\;1\;\emptystr
+    \derivz 1\;1\;1\;1,
+\end{equation}
+
+\noindent
+as is the empty string $\emptystr$ in one step:
+
+\begin{equation}
+  \nt{S} \deriv \emptystr.
+\end{equation}
+
+To prove that grammar \eqref{z1} is correct, we must prove two independent
+statements:
+
+\begin{enumerate}
+  \item The \emph{only} strings derivable from \gref{z1} are those of 0's and
+  1's such that every 0 is immediately followed by at least one 1;
+
+  \item The grammar accepts all such strings.
+\end{enumerate}
+
+We will prove these statements in order. For the first statement, we must
+show that, at any given step $n$ of \gref{z1}, the only derivable strings
+contain a 1 after each and every 0 (or that the string contains no 0's). For the
+second statement, we must show that any string containing 0's and 1's such that
+every 0 is followed by at least one 1 is derivable from our grammar. Grammar
+proofs are discussed in Section 4.2.6 of the text.
+
+\proof The only strings derivable from~$\nt{S}$ are those of 0's and~1's such
+that every 0~is immediately followed by at least one~1. We shall perform this
+proof inductively on the number of steps~$n$ in a given derivation.
+
+\basis The basis is $n=1$. In one step, our grammar may produce one of three
+strings: A string beginning with a~1 (the first production of~$\nt{S}$), a
+string beginning with a~0 followed by a~1 (the second production of~$\nt{S}$)
+and the empty string~\emptystr\ (the final production of~$\nt{S}$).
+
+The empty string~\emptystr\ has no~0's and so follows the rules of the language.
+The same is true for any string beginning with a~1. The third and final string
+that can be generated when~$n=1$ is~01. This string does contain a~0 and
+therefore also satisfies our requirement.
+
+\ind We shall now assume that all derivations of fewer than $n$~steps result in
+either a sentence containing no~0's or a sentence that contains 0's~followed by
+one or more~1's. Such a derivation must have the form
+
+\begin{equation}\label{e:z1-ind}
+  \nt{S} \deriv xS \derivz xy.
+\end{equation}
+
+\noindent
+Since $x$~is derived in fewer than $n$~steps then, by our inductive hypothesis,
+$x$~must contain~0's only if followed a~1; the same is true of~$y$.
+
+Additionally, according to \gref{z1}, $y$~must be of one of the productions
+
+\begin{align*}
+  \nt{S} &\prod 1\nt{S} \\
+  \nt{S} &\prod 01\nt{S} \\
+  \nt{S} &\prod \emptystr.
+\end{align*}
+\noindent
+Each of these productions have already been discussed in our basis; therefore,
+$y$~cannot contain a~0 followed by another~0. Additionally, it is required that
+adjacent~1's be permitted after a~0, which is possible by the first production
+(as demonstrated in \eqref{z1-1s}). As such, $xy$~must contain only~0's
+followed by one or more~1's and our hypothesis has been proved. \foorp
+
+To ensure a thorough understanding of the above proof, it is worth mentioning
+why \eqref{z1-ind}~used both the \deriv\ and~\derivz\ derivation symbols. Our
+basis applies when $n=1$; the inductive hypothesis applies otherwise (when
+$n>1$). As such, we must have \emph{at least} one production in~\eqref{z1-ind}.
+
+Now that we have proved that we may only derive sentences from \gref{z1} that
+contain~0's followed by one or more~1's, we must now show that the grammar may
+be used to derive all such possible strings.
+
+\proof Any string~$s$ of length~$l$ consisting of~1's and~0's such that any~0 is
+followed by at least one~1 is derivable from~$\nt{S}$.
+
+\basis A string of length~$0$ ($l=0$) must be~\emptystr, which is derivable
+from~$\nt{S}$ in one step.
+
+\ind Assume that any string $s$ of a length less than $l$ is derivable
+from~\nt{S}. Such a string must have the form~$xy,
+y\in\set{1,01,\emptystr}$---that is, we can consider $s$ to be the concatenation
+of $y$~with a previously derived string. Since the length of $x$~is clearly less
+than~$l$, it must by derivable from~\nt{S} by our inductive hypothesis.
+Furthermore, $xy$~must have a derivation of the form
+
+\begin{equation}\label{e:z1-deriv-1}
+  \nt{S} \derivp x\;\nt{S} \deriv x\;y,
+\end{equation}
+\noindent
+thereby proving that $s$~is derivable from~\nt{S}. \foorp
+
+The derivation~\eqref{z1-deriv-1} may seem to be too abstract to be useful;
+since this is our first proof, it is worth clarifying why it does in fact
+complete the proof. We first showed that any string of the language of 0's and
+1's that we have been studying can be described as the concatenation of a
+smaller such string with 0, 01 or~\emptystr\ (which completes the string). This
+string, as we stated, has the form~$xy$. Therefore, we must show that
+\nt{S}~supports concatenation---\eqref{z1-deriv-1} demonstrates this with~$x$
+fairly abstractly, since it does not matter what exactly $x$~is. From the
+productions of~\nt{S} in \gref{z1}, it is understood that $x$ can be any string
+of terminals (that is---any derivation) leading up to that point in the
+derivation~\eqref{z1-deriv-1}.
+
+We must now show that the remaining part of~$xy$---that is, $y$---is derivable.
+The only non-terminal remaining after~$x$ is~\nt{S}. We have defined $y$~to be
+any string of terminals in the set $\set{0,01,\emptystr}$. Clearly, each of
+these strings are derivable from~\nt{S}. Therefore, we can replace~\nt{S}
+in~\eqref{z1-deriv-1} with~$y$, indicating that this is a valid derivation given
+our definition of~$y$; it is up to the reader of the proof to make this
+connection. Note that, while the domain of $y$~happens to be every production
+of~\nt{S}, this is not necessary for the proof---that is the subject of the
+first proof.
+
+Before we put this exercise to rest (indeed, we completed the exercise
+requirement in the first paragraph following the exercise definition), it is
+also worth noting that this grammar may also be accepted by a finite automata
+(and consequently, a regular expression); this is demonstrated by
+\fref{z1-regex}. It should be noted that this is not the case with all of the
+exercises that follow.
+\exend
+
+\begin{figure}
+  \center
+  \begin{tikzpicture}
+    \node[state,initial] (a) {$a$};
+    \node[state] (b) [right=of a] {$b$};
+    \node[state,accepting] (c) [right=of b] {$c$};
+
+    \path[->]
+      (a) edge [loop below] node {1} ()
+          edge [bend right, below] node {\emptystr} (c)
+          edge [above] node {$0$} (b)
+      (b) edge [above] node {$1$} (c)
+      (c) edge [bend right, above] node {\emptystr} (a)
+    ;
+  \end{tikzpicture}
+
+  \caption{An NFA corresponding to the extended regular expression
+  $\left(0^?1^+\right)^*$ describing \gref{z1}.}
+  \label{f:z1-regex}
+\end{figure}
+
+The above example was fairly simple, yet resulted in a realitively lengthy
+discourse far past what was required by the text; the reader can expect such a
+discussion to continue for all examples that follow.
+
+
+\exercise 4.2.3b The set of all strings of 0's and 1's that are
+palindromes; that is, the string reads the same backward as forward.
+
+As the exercise stated, a {\sl palindrome} is a string that reads the same in
+both directions; let us consider some examples before attempting to construct a
+grammar. The following list of strings are all palindromes, one per
+line:\footnote{An example of an English palindrome is ``Mr.~Owl ate my metal
+worm'' (discarding punctuation and capitalization.)}
+
+\begin{equation}\label{e:palex}
+  \begin{tabular}{rcl}
+    1 &00 &1 \\
+    1100 &11 &0011 \\
+    010 &1 &010 \\
+    & 0 &
+  \end{tabular}
+\end{equation}
+
+The above palindromes have been laid out so that their symmetry is apparent. At
+first glance, one can imagine constructing a palindrome out of pairs of
+characters, like the second row of~\eqref{palex}:
+
+\begin{equation}\label{e:palex-2}
+  \begin{tabular}{crcl}
+    & 11 & \\
+    1 & 11 & 1 \\
+    11 & 00 & 11 \\
+    110 & 00 & 011 \\
+    1100 & 11 & 0011
+  \end{tabular}
+\end{equation}
+
+\noindent
+In this case, each palindrome would always have an even number of characters.
+However, it is important to note the bottom two palindromes of \eqref{palex},
+which have an \emph{odd} number of characters:
+
+\begin{equation}\label{e:palex-3}
+  \begin{tabular}{rcl}
+    & 00 & \\
+    0 & 11 & 0 \\
+    01 & 00 & 10 \\
+    010 & 1 & 010
+  \end{tabular}
+\end{equation}
+
+Given this evaluation and the understanding that $2n$~is always even for some
+positive integer~$n$, it would be accurate to recursively construct a palindrome
+from the edges inward in pairs. Once we reach the center, we may end
+with~\emptystr\ if we wish to have an even ($2n$) number of characters, or
+otherwise may add a single character to create a palindrome containing an odd
+($2n+1$) number of characters.
+
+\begin{equation}\label{e:palindrome}
+  \begin{aligned}
+    \nt{S} &\prod 0\nts{S}0 \punion 1\nts{S}1 \punion M \\
+    \nt{M} &\prod 0 \punion 1 \punion \emptystr
+  \end{aligned}
+\end{equation}
+
+In \gref{palindrome} above, we define out start non-terminal~\nt{S} with
+productions for the outer pairs. The non-terminal~\nt{M} represents the
+acceptable inner (``middle'') characters, which determines if the length of the
+palindrome is even (if \emptystr~is used) or odd (0 or~1). We will leave
+demonstrations of such derivations to the proof.
+
+To prove that grammar~\nt{S} is the proper grammar for all palindromes, we must
+again prove two things: That language $L(\nt{S})$ can produce only palindromes
+of~0's and~1's and that all such palindromes can be derived from~\nt{S}. The
+difference between these two descriptions may be subtle for such a simple
+grammar, but the distinction is important to ensure that $L(\nt{S})$ represents
+\emph{nothing more and nothing less} than a language that may be used for such
+palindromes.
+
+As before, the proofs will be inductive---the first proof on the number of
+steps~$n$ of a derivation of~\nt{S} and the second on the length~$l$ of the
+palindrome~$s$. Our alphabet~$\Sigma$ is once again~$\set{0,1}$.
+
+\proof The only strings derivable from grammar~\nt{S} are palindromes consisting
+of 0's and~1's.
+
+\basis The basis is $n=2$, which is the fewest number of steps from which a
+string may be derived from~\nt{S}.\footnote{$n=1$ steps cannot result in a
+string consisting only of nonterminals, as it would result in $0S0$,~$1S1$
+or~$M$.} Such a derivation must be of the form
+$$
+  \nt{S} \deriv M \deriv x,
+$$
+\noindent
+where $x$~is 0,~1, or~\emptystr. In the latter case, the derived string is
+clearly a palindrome of length zero. In the case of 0 or~1, the length of the
+string is one, which must be a palindrome.
+
+\ind Now assume that every string derived in less than $n$~steps is a
+palindrome. Such a derivation must be of the form
+$$
+  \nt{S} \deriv x\nts{S}x \derivz x\;y\;x.
+$$
+\noindent
+That is, the string~$x$ appears on both the left and right of~$y$. Since the
+derivation of~$y$ from~\nt{S} takes fewer than $n$~steps---specifically, $n-1$
+steps---$y$~must be a palindrome by our inductive hypothesis. Because $x$~is
+added to both the beginning and end of~$y$, then any string derived in $n$~steps
+must be a palindrome. \foorp
+
+Let us further demonstrate the above proof by deriving~\eqref{palex-2}
+from~\nt{S}:\footnote{The dots were added so as not to confuse the reader as to
+what was going on; the symbol~\derivp\ is sufficient and therefore the dots will
+be omitted in the future.}
+
+\begin{equation}
+  \nt{S}
+  \deriv 1\nts{S}1
+  \deriv 1\;1\nts{S}1\;1
+  \deriv \cdots
+  \derivp 1\;1\;0\;0\;1\;\emptystr\;1\;0\;0\;1\;1
+\end{equation}
+
+\noindent
+and additionally \eqref{palex-3}:
+
+\begin{equation}
+  \nt{S}
+  \deriv 0\nts{S}0
+  \deriv 0\;1\nts{S}1\;0
+  \deriv 0\;1\;0\nts{S}0\;1\;0
+  \deriv 0\;1\;0\;1\;0\;1\;0.
+\end{equation}
+
+\noindent
+The induction step works by recognizing the basis as the middle of the string
+(nonterminal~\nt{M} in \gref{palindrome})---\emptystr~for palindromes of an
+even length and the $\left\lceil n/2 \right\rceil^{th}$ character for those of
+an odd length (1 in the case of the latter derivation). Call this string~$b$. We
+know that $b$~is a palindrome, as explained in the proof above. For our
+inductive step, we recognize that, for each step~$n$, we add two
+characters---one to the beginning and one to the end---to the result of
+step~$n-1$. As such, since the derivation of~$n-1$ steps must be a palindrome,
+the derivation in~$n$ steps must also be---it is not possible to derive anything
+but a palindrome from~\nt{M} and \nt{S}~maintains this designation.
+
+For completeness, we must now show that all possible palindromes of the
+alphabet~$\Sigma$ can be derived from~\nt{S}.
+
+\proof Every palindrome consisting of~0's and~1's is derivable from~\nt{S}.
+
+\basis If the string~$s$ is of length~$l\leq1$, then it must be \emptystr,~0 or~1,
+all of which are palindromes derivable by~\nt{M}.
+
+\ind Observe that any palindrome of length~$l>1$ must contain the same
+character at positions~$1$ and~$l$.\footnote{1-indexed for notational
+convenience.} Assume that each string with a length less than~$l$ is derivable
+from~\nt{S}. Since $s$~is a palindrome, then it must have the form $xyx,
+x\in\Sigma$, where $y$~is also a palindrome. Since $y$~has a length $l-2<l$,
+then it must be derivable from~\nt{S} by the inductive hypothesis. The
+palindrome~$s$ must therefore have a derivation of the form
+$$
+    \nt{S} \deriv x\nts{S}x \derivz x\;y\;x,
+$$
+\noindent
+which thus proves that~$s$ is derivable from~\nt{S}. \foorp
+
+It is also worth noting that, unlike the first exercise, we cannot represent a
+palindrome as a finite automaton (and therefore cannot represent it as a regular
+expression). Let us prove this assertion.
+
+\proof \nt{S}~cannot be represented by any finite automata. Specifically, a
+finite automaton representing~\nt{S} may accept all strings that are
+palindromes of the alphabet~$\Sigma$, but such an automaton must also accept
+strings that are not palindromes. We shall prove this statement by
+contradiction.
+
+\contra Given the alphabet~$\Sigma$, a palindrome may contain any character
+from~$\Sigma$ at any arbitrary position~$n$ and may be of length~$l\geq0$. As
+such, we must be able to represent this automaton by the regular expression
+$\left(0|1\right)^*$, whose corresponding minimum-state DFA is shown in
+\fref{pal-a}. However, it is also necessary that characters $c_n$
+and~$c_{l-n+1}$ be the same symbol in~$\Sigma$---a requirement that
+minimum-state DFA of \fref{pal-a} cannot guarantee.
+
+\begin{figure}
+  \center
+  \begin{tikzpicture}
+    \node[state,initial,accepting] (a) {$a$};
+
+    \path[->]
+      (a) edge [loop below] node {1} ()
+          edge [loop above] node {0} ()
+    ;
+  \end{tikzpicture}
+
+  \caption{The minimum-state DFA for the regular expression
+  $\left(0|1\right)^*$.}
+  \label{f:pal-a}
+\end{figure}
+
+Consider that the only way for a finite automata to maintain a history of states
+is to have a state to represent each unique history. However, to accept a string
+of any length, we would need an automaton containing a potentially infinite
+number of states, which is not finite (and therefore not a finite automaton).
+Therefore, it is not possible to represent the history of every possible
+palindrome using a finite set of states.
+
+Given this, it must stand that a finite automaton must at some point contain a
+state that transitions to a previous or current state, such as the NFA in
+\fref{pal-a2}. Since the history of the string is ``stored'' purely in the
+possible states leading up to the current state, this transition~$t$ equates to a
+loss of ``memory'', without which the right-hand portion of the palindrome cannot
+be properly matched. Furthermore, since each position~$n$ may contain any
+character in~$\Sigma$, and since the transition~$t$ can only yield a set of
+future states with a limited (finite) precision, each of these future states
+must be redundant. Since each NFA can be represented by an equivalent DFA and
+each DFA for some grammar has a single common minimum-state DFA, any portion of
+a finite automaton that can accept a palindrome of any length must be equivalent
+to \fref{pal-a} (such as state~$x$ in \fref{pal-a2}). We are therefore left to
+conclude that no finite automata can accept a palindrome of arbitrary length
+without accepting every string that is a combination of each character in
+$\Sigma$. \foorp
+
+\begin{figure}
+  \center
+  \begin{tikzpicture}
+    \node[state,initial] (a) {$1$};
+    \node[state] (b) [right=of a] {$2$};
+    \node[state] (x) [right=of b] {$x$};
+    \node[state] (y) [right=of x] {$n-1$};
+    \node[state,accepting] (z) [right=of y] {$n$};
+
+    \path[->]
+      (a) edge [above] node {$\alpha$} (b)
+          edge [below, bend right=45] node {$\kern-0.7em\emptystr$} (z)
+      (b) edge [above] node {$\beta$} (x)
+          edge [below, bend right=65] node {$\emptystr$} (y)
+      (x) edge [loop above] node {$\beta$} ()
+          edge [loop below] node {$\alpha$} ()
+          edge [above] node {$\beta$} (y)
+      (y) edge [above] node {$\alpha$} (z)
+    ;
+  \end{tikzpicture}
+
+  \caption{An NFA with a finite set of states must at some point transition to a
+  previous or identical state in order to accept input of any length.
+  $\Sigma=\set{\alpha,\beta}$.}
+  \label{f:pal-a2}
+\end{figure}
+
+To provide further clarification---any finite automata that transitions to a
+\emph{previous} state, since it looses a portion of its history, can no longer
+accurately determine the states leading up to the final state. That is, consider
+the string 10101 and consider that the first three characters of this string can
+be represented by the states $\set{a,b,a}$. At this point, we can no longer be
+certain of what the string may end with, because we have lost any sense of
+nesting/recursion. Therefore, the states leading to the final state are forced
+to accept any character in $\Sigma$ and therefore must be equivalent to the
+minimum-state DFA of \fref{pal-a}. As was mentioned by the text, ``finite
+automata cannot count''.
+
+\fref{pal-a2} gets around such an issue by transitioning only to current or
+future states, which permits a \emph{finite} amount of nesting (placing the
+aforementioned minimum-state DFA~$x$ in the middle). However, note a glaring
+issue---this automaton does not accept~$\beta$ in the first character position.
+If it did, then we would need a second set of states in order to maintain such a
+history and know that we should also \emph{end} with $\beta$~instead
+of~$\alpha$. The number of states would therefore grow very quickly with the
+level of nesting and the size of~$\Sigma$ (such a consideration is left to the
+reader).
+
+We have exhaustively proved that \gref{palindrome} is the correct answer for
+this exercise. \exend
+
+
+\exercise 4.2.3c The set of all strings of 0's and 1's with an equal number of
+0's and 1's.
+
+To understand how to approach this problem, we shall consider a number of
+strings that are derivable from this language. An obvious case is~\emptystr,
+which contains zero~0's and zero~1's. Some additional examples are shown in
+\fref{eq-ex} along with their lengths (denoted by~$l$).
+
+\begin{figure}[h]
+  \center
+  \begin{tabular}{r|cccccc}
+    $s$ & \emptystr & 10 & 01 & 1010 & 1001 & 011100 \\
+    \hline
+    $l$ & 0 & 2 & 2 & 4 & 4 & 6
+  \end{tabular}
+
+  \caption{Examples of strings with an equal number of 0's and 1's.}
+  \label{f:eq-ex}
+\end{figure}
+
+These examples demonstrate a number of important properties. In particular, the
+length~$l$ of the string~$s$ is always even, with the number of 0's and~1's
+$n=l/2$. Additionally, the characters of the alphabet~$\Sigma$ may appear in any
+order in the string. Therefore, we do not have the luxury of a simple, nested,
+recursive implementation as we did with the palindrome exercise (at least not
+exclusively).
+
+Let us construct the grammar iteratively, beginning with the simplest case
+of~\emptystr.
+
+\begin{equation}\label{e:eq-1}
+  \nt{S} \prod \emptystr
+\end{equation}
+
+\noindent
+The second case---10---is also fairly easy to fit into~$\nt{S}$:
+
+\begin{equation}\label{e:eq-2}
+  \nt{S} \prod 10 \punion \emptystr
+\end{equation}
+
+The third case demonstrates an important case regarding our strings: They may
+begin with either a~0 or a~1 and they may also \emph{end} with either character
+(more generally, they may begin or end with any character in~$\Sigma$). However,
+we cannot simply adjust our grammar to accept either character in both
+positions---$\nt{S}$ must assure that, any time we include a~0 in a production,
+we also include a~1 (and vice versa). So far, this is guaranteed by~$\nt{S}$ in
+\gref{eq-2}; to keep on this path, we must add 01 as yet another special case.
+
+\begin{equation}\label{e:eq-3}
+  \nt{S} \prod 01 \punion 10 \punion \emptystr
+\end{equation}
+
+\goodbreak
+The fourth case---1010---introduces the need to handle strings of an arbitrary
+length. To do this, we must determine at what point we should recurse
+on~$\nt{S}$. Looking at the example, we could derive 1010 as two nested
+applications of~$\nt{S}$ if we recurse between the two terminals.
+
+\begin{equation}\label{e:eq-a}
+  \nt{S}
+    \deriv 1\nts{S}0
+    \deriv 1\;0\nts{S}1\;0
+    \deriv 1\;0\;\emptystr\;1\;0
+    \derivz 1\;01\;0
+\end{equation}
+
+\noindent
+Of course, one could also adopt an alternate perspective by considering the
+string to be the production of two adjacent non-terminals.
+
+\begin{equation}\label{e:eq-b}
+  \nt{S} \deriv \nt{S}\;\nt{S}
+    \derivlm 10\;\nt{S}
+    \derivlm 10\;10
+\end{equation}
+
+\noindent
+Unfortunately, with this information alone, we cannot be certain which of these
+productions---if such a choice even matters---should be used in our grammar.
+Perhaps we can gain further insight from the remaining examples.
+
+The next example---1001---can be derived in a manner similar to \eqref{eq-b},
+but not \eqref{eq-a}; in particular, \gref{eq-3} has no production for the
+string 00, and so we cannot construct the string from the outside in. Given
+that, we can be certain that an adjacent non-terminal production is needed and
+so we will add the production used in \eqref{eq-b} to our grammar.
+
+\begin{equation}\label{e:eq-4}
+  \nt{S} \prod 01 \punion 10 \punion \nt{S}\;\nt{S} \punion \emptystr
+\end{equation}
+
+However, the aforementioned predicament---the absense of a production that can
+yield only 00---raises the question of whether or not we can truly derive any
+string of equal 1's and 0's from the above grammar. Our final example challenges
+this. 011100 cannot possibly be represented by~$\nt{S}$ in \gref{eq-4} because
+this grammar constructs the string from left-to-right (or right-to-left) in
+pairs of~0's and~1's. Therefore, the only way to have adjacent~1's or adjacent~
+0's is to alternate the productions, which makes it impossible to have more than
+two adjacent identical characters.
+
+Given this, it seems that both \eqref{eq-b} \emph{and} \eqref{eq-a} are
+necessary; the following derivation demonstrates this fact (neither can
+individually be used to derive the string 011100).
+
+\begin{equation}
+  \nt{S} \deriv \nt{S}\;\nt{S}
+    \derivlm 01\;\nt{S}
+    \derivlm 01\;1\nts{S}0
+    \derivlm 01\;1\;1\nts{S}0\;0
+    \derivlm 01\;1\;1\;\emptystr\;0\;0
+    \derivlmz 01\;1\;10\;0
+\end{equation}
+
+\noindent
+We thus arrive at \gref{eq-5} below.
+
+\begin{equation}\label{e:eq-5}
+  \nt{S} \prod 0\nts{S}1
+    \punion 1\nts{S}0
+    \punion \nt{S}\;\nt{S}
+    \punion \emptystr
+\end{equation}
+
+An astute reader may at this point notice that we have created an ambiguity in
+our grammar: Recall~\eqref{eq-a} and~\eqref{eq-b}, which had two possible
+derivations for the same string; both of these derivations are now possible in
+our grammar. The text defines an ambiguous grammar to be a grammar that contains
+more than one leftmost or more than one rightmost derivation for the same
+sentence. This is a particularly interesting example of ambiguity, in particular
+because we cannot resolve it. Let us consider why.
+
+\proof Grammar~$\nt{S}$ cannot be disambiguated. We will prove this fact by
+contradiction.
+
+\contra Firstly, recognize that~$\nt{S}$ is ambiguous because there exists some
+sentence~$s$ that has both of the following derivations in $n>1$ steps, where
+$a\ne b$:
+
+\begin{align*}
+  \nt{S} &\deriv a\nts{S}b \derivp a\;x\;b;
+  \\
+  \nt{S} &\deriv \nt{S}\;\nt{S}
+    \deriv a\nts{S}b\;\nt{S}
+    \derivp a\;b\;\nt{S}
+    \derivp a\;b\;x.
+\end{align*}
+
+Suppose to the contrary that there is some way to disambiguate~$x$. There must
+then be some terminal $c\in\Sigma$ in~$x$ that may be used to perform the
+disambiguation and such a disambiguation would imply a difference in the
+semantics of~$x$ between the two derivations. However, $x=x$ and so both
+derivations hold exactly the same meaning---balanced strings. Furthermore, the
+productions for producing balanced strings requires each character in~$\Sigma$;
+$c$ therefore must not exist. \foorp
+
+Fortunately, this ambiguity is not an issue for our grammar because the multiple
+derivations are semantically equivalent---we are not arriving at any different
+result within the context of this exercise. The sentence 1010 of \fref{eq-ex}
+demonstrates this concept: It does not matter whether we consider the sentence
+to be a single balanced string or the concatenation of two balanced strings; we
+arrive at the same result regardless with no harm done.\footnote{Of course, one
+valid argument is that a more concise and unambiguous grammar will reduce
+problems during parsing. However, the parser (like Lex, as described by the
+text) can give precedence to the productions that appear earlier in the grammar
+to resolve this issue.}
+
+While the discussion thus far is likely to convince the reader that \gref{eq-5}
+is correct, we shall conclude with a formal proof of this fact.  A proof that
+the grammar cannot be represented by any finite automata shall be omitted, in
+particular because the productions of $\nt{S}$ have a structure very similar to
+the palindrome \gref{palindrome}.
+
+\proof Only sentences composed of balanced~1's and!0's may be derived
+from~$\nt{S}$.
+
+\basis The basis is $n=1$. The only sentence that may be derived in 1 step
+is~\emptystr, which is clearly balanced (containing zero~0's and zero~1's).
+
+\ind Assume that any sentence derived in fewer than~$n$ steps is balanced. Now
+recognize that any sentence derived in $n>1$ steps must make use of one of the
+following productions of $\nt{S}$:
+
+\begin{align*}
+  \nt{S} &\prod 0\nts{S}1; \\
+  \nt{S} &\prod 1\nts{S}0; \\
+  \nt{S} &\prod \nt{S}\;\nt{S}.
+\end{align*}
+
+\noindent
+Therefore, the smallest sentence that is not~\emptystr\ is either $0\nt{x}1$ or
+$1\nt{x}0$, both of which are balanced (each contains one~0 and one~1). Since
+$x$~is derivable from~$\nt{S}$ in fewer than~$n$ steps, then by our inductive
+hypothesis, all sentences derivable from~$\nt{S}$ must be balanced. The last
+remaining production has the form~$xy$, both of which are derivable from~\nt{S}
+in fewer than~$n$ steps and thus must be balanced. Furthermore, since the
+productions of~$\nt{S}$ produce only 0,~1, or~\emptystr, $\nt{S}$~has the
+alphabet $\Sigma=\set{0,1}$ and, consequently, may derive no sentence except for
+those containing balanced~0's and~1's. \foorp
+
+Having proved that only sentences of balanced~0's and~1's are derivable
+from~$\nt{S}$, we must now prove that $\nt{S}$~can derive \emph{all} such
+strings (that is, all such strings are sentences of $\nt{S}$). Such a proof is
+interesting because our grammar is more sophisticated than the previous
+examples.
+
+\proof All strings of balanced~0's and~1's are sentences of~$\nt{S}$.
+
+\basis The basis is a string of length $l=0$, which contains zero~0's and
+zero~1's. This string must be~\emptystr, which is derivable from~$\nt{S}$.
+
+\ind First, recognize that all balanced strings must have a length $l=2k$---that
+is, $l$~is always even (as emphasized in \fref{eq-ex}) and contains $k$~0's and
+$k$~1's. Assume that all strings less than length~$2k$ are derivable
+from~$\nt{S}$.
+
+Consider any balanced string~$s$ of length~$2k$. We can consider $s$~to have the
+form~$yz$---that is, the concatenation of two balanced strings~$y$ and~$z$, both
+of which in turn have the form $axb, a\neq b$ where $x$ itself must be balanced
+(since $a\neq b$); alternatively, either $y$ or~$z$ may be~\emptystr, which
+therefore implies that the form~$yz$ accepts any balanced string where the first
+and last characters are not the same.
+
+We must now show that all such strings can be represented by the form~$yz$.
+First, recognize that $y=axb$ may have either the form $0x1$ or $1x0$; the form
+$yz$ then permits up to two adjacent identical characters in $\Sigma$; any
+additional adjacent identical characters may be derived by $x$. Consider
+$x=\emptystr$; then, clearly $axb$ is balanced and can be concatenated to form a
+larger balanced string. If $x\neq\emptystr$ but $x_1=b$,\footnote{$x_n$ denotes
+the $n^{\text{th}}$ character of~$x$.} then we can instead consider an
+alternative interpretation $y'=ax_1$ and $x'=x_2\cdots x_nb$, and then let
+$y=y'x'$ (instead of $axb$).
+
+We are then left with the case where $x_1=a$. Such a case allows for an
+arbitrarily deep nesting of adjacent identical characters and therefore $axb$
+can be represented by the regular expression $a^+b^+$. It is therefore clear
+that the form $yz$ is able to describe any string of balanced characters in the
+alphabet $\Sigma=\set{0,1}$. Such a form must have the derivation
+
+$$
+  \nt{S} \deriv \nt{S}\;\nt{S} \derivlmz xy.
+$$
+
+\noindent
+Since this is a leftmost derivation, $y$~is either a balanced string or
+\emptystr. In the former case, it is obvious that both $x$ and~$y$ are of a
+length less than~$2k$ and are therefore derivable from~\nt{S} by our inductive
+hypothesis. Otherwise, $y=\emptystr$ and the length of $x$~is precisely~$2k$ and
+we must consider the form $axb$; $x$~is clearly of a length of less than~$2k$
+and is therefore balanced by our inductive hypothesis. Furthermore, it must have
+a derivation of the form
+$$
+  \nt{S} \deriv a\nts{S}b \derivz a\;x\;b,
+$$
+\noindent
+thereby proving that $axb$ is derivable from~\nt{S}. \foorp
+
+This proof was considerably more involved than our previous ones and is an
+excellent segue into proving more sophisticated grammars. Of course, the reader
+can surely see the challenges that might arise from attempting to prove much
+more complicated grammars. \exend
+
+\end{document}