848 lines
35 KiB
TeX
848 lines
35 KiB
TeX
% Discussion of Selected Exercises From Compilers: Principles, Techniques, and Tools


%


% Copyright (C) 2013, 2018 Mike Gerwitz


%


% Licensed under a Creative Commons AttributionShareAlike 4.0


% International License.


%


% Discussion of section 4.2.8 (exercises for section 4.2) in CPTT (the


% "dragon book")


%%




\documentclass[draft]{article}


\usepackage{amsmath,amssymb,tikz}


\usetikzlibrary{automata,positioning}




\begin{document}


\title{Discussion of Selected Exercises: \\


Section 4.2.8 of Compilers: Principles, Techniques and Tools \\


\vspace{1em}


\large{Topic: ContextFree Grammars}}




\author{20130515}


\date{\today}




\maketitle




\def\exercise#1 #2\par{


\goodbreak


\vspace{0.5em plus 0.5em}


\noindent


\llap{\bf Exercise #1 }%


{\sl#2}\par


\vspace{0.5em plus 0.5em}


\goodbreak


}


\def\exend{$\blacksquare$}




\def\set#1{\left\{#1\right\}}




\def\nt#1{{\ifmmode#1\else$#1$\fi}}


\def\nts#1{\;\nt#1\;}


\def\prod{\rightarrow}


\def\punion{\;\;}


\def\emptystr{\ifmmode\epsilon\else$\emptystr$\fi}




\def\mspace#1{\ifmmode\;#1\;\else$#1$\fi}




\def\derivop{\displaystyle\mathop{\Rightarrow}}


\def\deriv{{\mspace\derivop}} % extra grouping to solve issue in mmode w/ align


\def\lmderiv{\mspace{\deriv\limits_{lm}}}


\def\derivz{\mspace{\derivop^{\kern 0.25em*}}}


\def\derivp{\mspace{\derivop^{\kern 0.25em+}}}


\def\derivlm{\mspace{\derivop_{lm}}}


\def\derivrm{\mspace{\derivop_{rm}}}


\def\derivlmz{\mspace{\derivop^{\kern 0.25em*}_{lm}}}




\let\eqrefold\eqref


\def\eqref#1{\eqrefold{e:#1}}


\def\gref#1{grammar~\eqref{#1}}


\def\Gref#1{Grammar~\eqref{#1}}


\def\fref#1{Figure~\ref{f:#1}}




\def\prooftext#1 #2\par{


\goodbreak


\vspace{1ex plus 0.5ex}


\noindent


\llap{#1 }%


#2\par


}


\def\proof{\prooftext {\bf\small\uppercase{Proof}} }


\def\basis{\prooftext {\sc Basis} }


\def\ind{\prooftext {\sc Induction} }


\def\contra{\prooftext {\sc Contradiction} }


\def\foorp{$\square$\vspace{1ex plus 1ex}}






\begin{abstract}


\input{abstract}


\end{abstract}






\section{ContextFree Grammars}


The focus of this discussion (and of Section 4.2 in CPTT) is on contextfree


grammars (or simply ``grammars'').




\section{Convention and Notation}


The following notational conventions are used throughout this paper. In most


cases, they have been borrowed from the text.




For grammars, capital symbols are used to represent nonterminals. The $\nt{S}$


symbol is used to denote the starting nonterminal. The symbol $\prod$~is used


to separate the nonterminal from its production body, whereas


$\deriv$~indicates a single step in a derivation. Leftmost and rightmost


derivations are denoted $\derivlm$ and~$\derivrm$ respectively. $\derivz$ means


``derives in zero or more steps'', whereas $\derivp$ means ``derives in one or


more steps''. The symbol $\punion$~separates multiple productions for a single


nonterminal. Any time punctuation is placed at the end of a grammar or


derivation, it should be read as part of the surrounding paragraph, \emph{not}


as part of the production or derivation. For example, in the grammar


$$


\nt{S} \prod 0\nts{S}1 \punion \emptystr,


$$


\noindent


the trailing comma is not part of the construction. Furthermore, whitespace is


not significant and may be discarded. \emptystr~is the empty string.




``The text'' refers to CPTT, whereas ``this paper'' refers to the paper you are


currently reading.






\section{Exercise 4.2.3Grammar Design}


This exercises requests that the reader design grammars for a series of language


descriptions af; we will discuss each of them. Although the text does not


request it, proofs will be provided for each, as they are useful to demonstrate


correctness and an excellent practice in discipline.




\exercise 4.2.3a The set of all strings of 0's and 1's such that every 0 is


immediately followed by at least one 1.




The grammar for this exercise is fairly trivial, but will serve as a useful


introduction to the formalities of this paper. First, let us consider a grammar


that demonstrates such a property. Our alphabet is $\Sigma = \set{0,1}$. The


only restriction on the sentences of our grammar is that each 0 must be followed


by a 1this therefore means that we can have any number of adjacent 1's, but


it is not possible to have adjacent 0's. Considering that our alphabet~$\Sigma$


has only two characters, this grammar is fairly simple:




\begin{equation}\label{e:z1}


\nt{S} \prod 1\nt{S} \punion 01\nt{S} \punion \emptystr.


\end{equation}




As an example, let us consider some of the sentences that we may wish to be


derived by this grammar. In particular, consider derivation of the string


$01011$:




\begin{equation}


\nt{S} \deriv 01\;\nt{S}


\deriv 01\;01\;\nt{S}


\deriv 01\;01\;1\nt{S}


\deriv 01\;01\;1\;\emptystr


\derivz 01\;01\;1.


\end{equation}




Notice also that a string of 1'ssuch as $1111$is also derivable given our


grammar:




\begin{equation}\label{e:z11s}


\nt{S} \deriv 1\;\nt{S}


\deriv 1\;1\;\nt{S}


\deriv 1\;1\;1\;\nt{S}


\deriv 1\;1\;1\;1\;\nt{S}


\deriv 1\;1\;1\;1\;\emptystr


\derivz 1\;1\;1\;1,


\end{equation}




\noindent


as is the empty string $\emptystr$ in one step:




\begin{equation}


\nt{S} \deriv \emptystr.


\end{equation}




To prove that grammar \eqref{z1} is correct, we must prove two independent


statements:




\begin{enumerate}


\item The \emph{only} strings derivable from \gref{z1} are those of 0's and


1's such that every 0 is immediately followed by at least one 1;




\item The grammar accepts all such strings.


\end{enumerate}




We will prove these statements in order. For the first statement, we must


show that, at any given step $n$ of \gref{z1}, the only derivable strings


contain a 1 after each and every 0 (or that the string contains no 0's). For the


second statement, we must show that any string containing 0's and 1's such that


every 0 is followed by at least one 1 is derivable from our grammar. Grammar


proofs are discussed in Section 4.2.6 of the text.




\proof The only strings derivable from~$\nt{S}$ are those of 0's and~1's such


that every 0~is immediately followed by at least one~1. We shall perform this


proof inductively on the number of steps~$n$ in a given derivation.




\basis The basis is $n=1$. In one step, our grammar may produce one of three


strings: A string beginning with a~1 (the first production of~$\nt{S}$), a


string beginning with a~0 followed by a~1 (the second production of~$\nt{S}$)


and the empty string~\emptystr\ (the final production of~$\nt{S}$).




The empty string~\emptystr\ has no~0's and so follows the rules of the language.


The same is true for any string beginning with a~1. The third and final string


that can be generated when~$n=1$ is~01. This string does contain a~0 and


therefore also satisfies our requirement.




\ind We shall now assume that all derivations of fewer than $n$~steps result in


either a sentence containing no~0's or a sentence that contains 0's~followed by


one or more~1's. Such a derivation must have the form




\begin{equation}\label{e:z1ind}


\nt{S} \deriv xS \derivz xy.


\end{equation}




\noindent


Since $x$~is derived in fewer than $n$~steps then, by our inductive hypothesis,


$x$~must contain~0's only if followed a~1; the same is true of~$y$.




Additionally, according to \gref{z1}, $y$~must be of one of the productions




\begin{align*}


\nt{S} &\prod 1\nt{S} \\


\nt{S} &\prod 01\nt{S} \\


\nt{S} &\prod \emptystr.


\end{align*}


\noindent


Each of these productions have already been discussed in our basis; therefore,


$y$~cannot contain a~0 followed by another~0. Additionally, it is required that


adjacent~1's be permitted after a~0, which is possible by the first production


(as demonstrated in \eqref{z11s}). As such, $xy$~must contain only~0's


followed by one or more~1's and our hypothesis has been proved. \foorp




To ensure a thorough understanding of the above proof, it is worth mentioning


why \eqref{z1ind}~used both the \deriv\ and~\derivz\ derivation symbols. Our


basis applies when $n=1$; the inductive hypothesis applies otherwise (when


$n>1$). As such, we must have \emph{at least} one production in~\eqref{z1ind}.




Now that we have proved that we may only derive sentences from \gref{z1} that


contain~0's followed by one or more~1's, we must now show that the grammar may


be used to derive all such possible strings.




\proof Any string~$s$ of length~$l$ consisting of~1's and~0's such that any~0 is


followed by at least one~1 is derivable from~$\nt{S}$.




\basis A string of length~$0$ ($l=0$) must be~\emptystr, which is derivable


from~$\nt{S}$ in one step.




\ind Assume that any string $s$ of a length less than $l$ is derivable


from~\nt{S}. Such a string must have the form~$xy,


y\in\set{1,01,\emptystr}$that is, we can consider $s$ to be the concatenation


of $y$~with a previously derived string. Since the length of $x$~is clearly less


than~$l$, it must by derivable from~\nt{S} by our inductive hypothesis.


Furthermore, $xy$~must have a derivation of the form




\begin{equation}\label{e:z1deriv1}


\nt{S} \derivp x\;\nt{S} \deriv x\;y,


\end{equation}


\noindent


thereby proving that $s$~is derivable from~\nt{S}. \foorp




The derivation~\eqref{z1deriv1} may seem to be too abstract to be useful;


since this is our first proof, it is worth clarifying why it does in fact


complete the proof. We first showed that any string of the language of 0's and


1's that we have been studying can be described as the concatenation of a


smaller such string with 0, 01 or~\emptystr\ (which completes the string). This


string, as we stated, has the form~$xy$. Therefore, we must show that


\nt{S}~supports concatenation\eqref{z1deriv1} demonstrates this with~$x$


fairly abstractly, since it does not matter what exactly $x$~is. From the


productions of~\nt{S} in \gref{z1}, it is understood that $x$ can be any string


of terminals (that isany derivation) leading up to that point in the


derivation~\eqref{z1deriv1}.




We must now show that the remaining part of~$xy$that is, $y$is derivable.


The only nonterminal remaining after~$x$ is~\nt{S}. We have defined $y$~to be


any string of terminals in the set $\set{0,01,\emptystr}$. Clearly, each of


these strings are derivable from~\nt{S}. Therefore, we can replace~\nt{S}


in~\eqref{z1deriv1} with~$y$, indicating that this is a valid derivation given


our definition of~$y$; it is up to the reader of the proof to make this


connection. Note that, while the domain of $y$~happens to be every production


of~\nt{S}, this is not necessary for the proofthat is the subject of the


first proof.




Before we put this exercise to rest (indeed, we completed the exercise


requirement in the first paragraph following the exercise definition), it is


also worth noting that this grammar may also be accepted by a finite automata


(and consequently, a regular expression); this is demonstrated by


\fref{z1regex}. It should be noted that this is not the case with all of the


exercises that follow.


\exend




\begin{figure}


\center


\begin{tikzpicture}


\node[state,initial] (a) {$a$};


\node[state] (b) [right=of a] {$b$};


\node[state,accepting] (c) [right=of b] {$c$};




\path[>]


(a) edge [loop below] node {1} ()


edge [bend right, below] node {\emptystr} (c)


edge [above] node {$0$} (b)


(b) edge [above] node {$1$} (c)


(c) edge [bend right, above] node {\emptystr} (a)


;


\end{tikzpicture}




\caption{An NFA corresponding to the extended regular expression


$\left(0^?1^+\right)^*$ describing \gref{z1}.}


\label{f:z1regex}


\end{figure}




The above example was fairly simple, yet resulted in a realitively lengthy


discourse far past what was required by the text; the reader can expect such a


discussion to continue for all examples that follow.






\exercise 4.2.3b The set of all strings of 0's and 1's that are


palindromes; that is, the string reads the same backward as forward.




As the exercise stated, a {\sl palindrome} is a string that reads the same in


both directions; let us consider some examples before attempting to construct a


grammar. The following list of strings are all palindromes, one per


line:\footnote{An example of an English palindrome is ``Mr.~Owl ate my metal


worm'' (discarding punctuation and capitalization.)}




\begin{equation}\label{e:palex}


\begin{tabular}{rcl}


1 &00 &1 \\


1100 &11 &0011 \\


010 &1 &010 \\


& 0 &


\end{tabular}


\end{equation}




The above palindromes have been laid out so that their symmetry is apparent. At


first glance, one can imagine constructing a palindrome out of pairs of


characters, like the second row of~\eqref{palex}:




\begin{equation}\label{e:palex2}


\begin{tabular}{crcl}


& 11 & \\


1 & 11 & 1 \\


11 & 00 & 11 \\


110 & 00 & 011 \\


1100 & 11 & 0011


\end{tabular}


\end{equation}




\noindent


In this case, each palindrome would always have an even number of characters.


However, it is important to note the bottom two palindromes of \eqref{palex},


which have an \emph{odd} number of characters:




\begin{equation}\label{e:palex3}


\begin{tabular}{rcl}


& 00 & \\


0 & 11 & 0 \\


01 & 00 & 10 \\


010 & 1 & 010


\end{tabular}


\end{equation}




Given this evaluation and the understanding that $2n$~is always even for some


positive integer~$n$, it would be accurate to recursively construct a palindrome


from the edges inward in pairs. Once we reach the center, we may end


with~\emptystr\ if we wish to have an even ($2n$) number of characters, or


otherwise may add a single character to create a palindrome containing an odd


($2n+1$) number of characters.




\begin{equation}\label{e:palindrome}


\begin{aligned}


\nt{S} &\prod 0\nts{S}0 \punion 1\nts{S}1 \punion M \\


\nt{M} &\prod 0 \punion 1 \punion \emptystr


\end{aligned}


\end{equation}




In \gref{palindrome} above, we define out start nonterminal~\nt{S} with


productions for the outer pairs. The nonterminal~\nt{M} represents the


acceptable inner (``middle'') characters, which determines if the length of the


palindrome is even (if \emptystr~is used) or odd (0 or~1). We will leave


demonstrations of such derivations to the proof.




To prove that grammar~\nt{S} is the proper grammar for all palindromes, we must


again prove two things: That language $L(\nt{S})$ can produce only palindromes


of~0's and~1's and that all such palindromes can be derived from~\nt{S}. The


difference between these two descriptions may be subtle for such a simple


grammar, but the distinction is important to ensure that $L(\nt{S})$ represents


\emph{nothing more and nothing less} than a language that may be used for such


palindromes.




As before, the proofs will be inductivethe first proof on the number of


steps~$n$ of a derivation of~\nt{S} and the second on the length~$l$ of the


palindrome~$s$. Our alphabet~$\Sigma$ is once again~$\set{0,1}$.




\proof The only strings derivable from grammar~\nt{S} are palindromes consisting


of 0's and~1's.




\basis The basis is $n=2$, which is the fewest number of steps from which a


string may be derived from~\nt{S}.\footnote{$n=1$ steps cannot result in a


string consisting only of nonterminals, as it would result in $0S0$,~$1S1$


or~$M$.} Such a derivation must be of the form


$$


\nt{S} \deriv M \deriv x,


$$


\noindent


where $x$~is 0,~1, or~\emptystr. In the latter case, the derived string is


clearly a palindrome of length zero. In the case of 0 or~1, the length of the


string is one, which must be a palindrome.




\ind Now assume that every string derived in less than $n$~steps is a


palindrome. Such a derivation must be of the form


$$


\nt{S} \deriv x\nts{S}x \derivz x\;y\;x.


$$


\noindent


That is, the string~$x$ appears on both the left and right of~$y$. Since the


derivation of~$y$ from~\nt{S} takes fewer than $n$~stepsspecifically, $n1$


steps$y$~must be a palindrome by our inductive hypothesis. Because $x$~is


added to both the beginning and end of~$y$, then any string derived in $n$~steps


must be a palindrome. \foorp




Let us further demonstrate the above proof by deriving~\eqref{palex2}


from~\nt{S}:\footnote{The dots were added so as not to confuse the reader as to


what was going on; the symbol~\derivp\ is sufficient and therefore the dots will


be omitted in the future.}




\begin{equation}


\nt{S}


\deriv 1\nts{S}1


\deriv 1\;1\nts{S}1\;1


\deriv \cdots


\derivp 1\;1\;0\;0\;1\;\emptystr\;1\;0\;0\;1\;1


\end{equation}




\noindent


and additionally \eqref{palex3}:




\begin{equation}


\nt{S}


\deriv 0\nts{S}0


\deriv 0\;1\nts{S}1\;0


\deriv 0\;1\;0\nts{S}0\;1\;0


\deriv 0\;1\;0\;1\;0\;1\;0.


\end{equation}




\noindent


The induction step works by recognizing the basis as the middle of the string


(nonterminal~\nt{M} in \gref{palindrome})\emptystr~for palindromes of an


even length and the $\left\lceil n/2 \right\rceil^{th}$ character for those of


an odd length (1 in the case of the latter derivation). Call this string~$b$. We


know that $b$~is a palindrome, as explained in the proof above. For our


inductive step, we recognize that, for each step~$n$, we add two


charactersone to the beginning and one to the endto the result of


step~$n1$. As such, since the derivation of~$n1$ steps must be a palindrome,


the derivation in~$n$ steps must also beit is not possible to derive anything


but a palindrome from~\nt{M} and \nt{S}~maintains this designation.




For completeness, we must now show that all possible palindromes of the


alphabet~$\Sigma$ can be derived from~\nt{S}.




\proof Every palindrome consisting of~0's and~1's is derivable from~\nt{S}.




\basis If the string~$s$ is of length~$l\leq1$, then it must be \emptystr,~0 or~1,


all of which are palindromes derivable by~\nt{M}.




\ind Observe that any palindrome of length~$l>1$ must contain the same


character at positions~$1$ and~$l$.\footnote{1indexed for notational


convenience.} Assume that each string with a length less than~$l$ is derivable


from~\nt{S}. Since $s$~is a palindrome, then it must have the form $xyx,


x\in\Sigma$, where $y$~is also a palindrome. Since $y$~has a length $l2<l$,


then it must be derivable from~\nt{S} by the inductive hypothesis. The


palindrome~$s$ must therefore have a derivation of the form


$$


\nt{S} \deriv x\nts{S}x \derivz x\;y\;x,


$$


\noindent


which thus proves that~$s$ is derivable from~\nt{S}. \foorp




It is also worth noting that, unlike the first exercise, we cannot represent a


palindrome as a finite automaton (and therefore cannot represent it as a regular


expression). Let us prove this assertion.




\proof \nt{S}~cannot be represented by any finite automata. Specifically, a


finite automaton representing~\nt{S} may accept all strings that are


palindromes of the alphabet~$\Sigma$, but such an automaton must also accept


strings that are not palindromes. We shall prove this statement by


contradiction.




\contra Given the alphabet~$\Sigma$, a palindrome may contain any character


from~$\Sigma$ at any arbitrary position~$n$ and may be of length~$l\geq0$. As


such, we must be able to represent this automaton by the regular expression


$\left(01\right)^*$, whose corresponding minimumstate DFA is shown in


\fref{pala}. However, it is also necessary that characters $c_n$


and~$c_{ln+1}$ be the same symbol in~$\Sigma$a requirement that


minimumstate DFA of \fref{pala} cannot guarantee.




\begin{figure}


\center


\begin{tikzpicture}


\node[state,initial,accepting] (a) {$a$};




\path[>]


(a) edge [loop below] node {1} ()


edge [loop above] node {0} ()


;


\end{tikzpicture}




\caption{The minimumstate DFA for the regular expression


$\left(01\right)^*$.}


\label{f:pala}


\end{figure}




Consider that the only way for a finite automata to maintain a history of states


is to have a state to represent each unique history. However, to accept a string


of any length, we would need an automaton containing a potentially infinite


number of states, which is not finite (and therefore not a finite automaton).


Therefore, it is not possible to represent the history of every possible


palindrome using a finite set of states.




Given this, it must stand that a finite automaton must at some point contain a


state that transitions to a previous or current state, such as the NFA in


\fref{pala2}. Since the history of the string is ``stored'' purely in the


possible states leading up to the current state, this transition~$t$ equates to a


loss of ``memory'', without which the righthand portion of the palindrome cannot


be properly matched. Furthermore, since each position~$n$ may contain any


character in~$\Sigma$, and since the transition~$t$ can only yield a set of


future states with a limited (finite) precision, each of these future states


must be redundant. Since each NFA can be represented by an equivalent DFA and


each DFA for some grammar has a single common minimumstate DFA, any portion of


a finite automaton that can accept a palindrome of any length must be equivalent


to \fref{pala} (such as state~$x$ in \fref{pala2}). We are therefore left to


conclude that no finite automata can accept a palindrome of arbitrary length


without accepting every string that is a combination of each character in


$\Sigma$. \foorp




\begin{figure}


\center


\begin{tikzpicture}


\node[state,initial] (a) {$1$};


\node[state] (b) [right=of a] {$2$};


\node[state] (x) [right=of b] {$x$};


\node[state] (y) [right=of x] {$n1$};


\node[state,accepting] (z) [right=of y] {$n$};




\path[>]


(a) edge [above] node {$\alpha$} (b)


edge [below, bend right=45] node {$\kern0.7em\emptystr$} (z)


(b) edge [above] node {$\beta$} (x)


edge [below, bend right=65] node {$\emptystr$} (y)


(x) edge [loop above] node {$\beta$} ()


edge [loop below] node {$\alpha$} ()


edge [above] node {$\beta$} (y)


(y) edge [above] node {$\alpha$} (z)


;


\end{tikzpicture}




\caption{An NFA with a finite set of states must at some point transition to a


previous or identical state in order to accept input of any length.


$\Sigma=\set{\alpha,\beta}$.}


\label{f:pala2}


\end{figure}




To provide further clarificationany finite automata that transitions to a


\emph{previous} state, since it looses a portion of its history, can no longer


accurately determine the states leading up to the final state. That is, consider


the string 10101 and consider that the first three characters of this string can


be represented by the states $\set{a,b,a}$. At this point, we can no longer be


certain of what the string may end with, because we have lost any sense of


nesting/recursion. Therefore, the states leading to the final state are forced


to accept any character in $\Sigma$ and therefore must be equivalent to the


minimumstate DFA of \fref{pala}. As was mentioned by the text, ``finite


automata cannot count''.




\fref{pala2} gets around such an issue by transitioning only to current or


future states, which permits a \emph{finite} amount of nesting (placing the


aforementioned minimumstate DFA~$x$ in the middle). However, note a glaring


issuethis automaton does not accept~$\beta$ in the first character position.


If it did, then we would need a second set of states in order to maintain such a


history and know that we should also \emph{end} with $\beta$~instead


of~$\alpha$. The number of states would therefore grow very quickly with the


level of nesting and the size of~$\Sigma$ (such a consideration is left to the


reader).




We have exhaustively proved that \gref{palindrome} is the correct answer for


this exercise. \exend






\exercise 4.2.3c The set of all strings of 0's and 1's with an equal number of


0's and 1's.




To understand how to approach this problem, we shall consider a number of


strings that are derivable from this language. An obvious case is~\emptystr,


which contains zero~0's and zero~1's. Some additional examples are shown in


\fref{eqex} along with their lengths (denoted by~$l$).




\begin{figure}[h]


\center


\begin{tabular}{rcccccc}


$s$ & \emptystr & 10 & 01 & 1010 & 1001 & 011100 \\


\hline


$l$ & 0 & 2 & 2 & 4 & 4 & 6


\end{tabular}




\caption{Examples of strings with an equal number of 0's and 1's.}


\label{f:eqex}


\end{figure}




These examples demonstrate a number of important properties. In particular, the


length~$l$ of the string~$s$ is always even, with the number of 0's and~1's


$n=l/2$. Additionally, the characters of the alphabet~$\Sigma$ may appear in any


order in the string. Therefore, we do not have the luxury of a simple, nested,


recursive implementation as we did with the palindrome exercise (at least not


exclusively).




Let us construct the grammar iteratively, beginning with the simplest case


of~\emptystr.




\begin{equation}\label{e:eq1}


\nt{S} \prod \emptystr


\end{equation}




\noindent


The second case10is also fairly easy to fit into~$\nt{S}$:




\begin{equation}\label{e:eq2}


\nt{S} \prod 10 \punion \emptystr


\end{equation}




The third case demonstrates an important case regarding our strings: They may


begin with either a~0 or a~1 and they may also \emph{end} with either character


(more generally, they may begin or end with any character in~$\Sigma$). However,


we cannot simply adjust our grammar to accept either character in both


positions$\nt{S}$ must assure that, any time we include a~0 in a production,


we also include a~1 (and vice versa). So far, this is guaranteed by~$\nt{S}$ in


\gref{eq2}; to keep on this path, we must add 01 as yet another special case.




\begin{equation}\label{e:eq3}


\nt{S} \prod 01 \punion 10 \punion \emptystr


\end{equation}




\goodbreak


The fourth case1010introduces the need to handle strings of an arbitrary


length. To do this, we must determine at what point we should recurse


on~$\nt{S}$. Looking at the example, we could derive 1010 as two nested


applications of~$\nt{S}$ if we recurse between the two terminals.




\begin{equation}\label{e:eqa}


\nt{S}


\deriv 1\nts{S}0


\deriv 1\;0\nts{S}1\;0


\deriv 1\;0\;\emptystr\;1\;0


\derivz 1\;01\;0


\end{equation}




\noindent


Of course, one could also adopt an alternate perspective by considering the


string to be the production of two adjacent nonterminals.




\begin{equation}\label{e:eqb}


\nt{S} \deriv \nt{S}\;\nt{S}


\derivlm 10\;\nt{S}


\derivlm 10\;10


\end{equation}




\noindent


Unfortunately, with this information alone, we cannot be certain which of these


productionsif such a choice even mattersshould be used in our grammar.


Perhaps we can gain further insight from the remaining examples.




The next example1001can be derived in a manner similar to \eqref{eqb},


but not \eqref{eqa}; in particular, \gref{eq3} has no production for the


string 00, and so we cannot construct the string from the outside in. Given


that, we can be certain that an adjacent nonterminal production is needed and


so we will add the production used in \eqref{eqb} to our grammar.




\begin{equation}\label{e:eq4}


\nt{S} \prod 01 \punion 10 \punion \nt{S}\;\nt{S} \punion \emptystr


\end{equation}




However, the aforementioned predicamentthe absense of a production that can


yield only 00raises the question of whether or not we can truly derive any


string of equal 1's and 0's from the above grammar. Our final example challenges


this. 011100 cannot possibly be represented by~$\nt{S}$ in \gref{eq4} because


this grammar constructs the string from lefttoright (or righttoleft) in


pairs of~0's and~1's. Therefore, the only way to have adjacent~1's or adjacent~


0's is to alternate the productions, which makes it impossible to have more than


two adjacent identical characters.




Given this, it seems that both \eqref{eqb} \emph{and} \eqref{eqa} are


necessary; the following derivation demonstrates this fact (neither can


individually be used to derive the string 011100).




\begin{equation}


\nt{S} \deriv \nt{S}\;\nt{S}


\derivlm 01\;\nt{S}


\derivlm 01\;1\nts{S}0


\derivlm 01\;1\;1\nts{S}0\;0


\derivlm 01\;1\;1\;\emptystr\;0\;0


\derivlmz 01\;1\;10\;0


\end{equation}




\noindent


We thus arrive at \gref{eq5} below.




\begin{equation}\label{e:eq5}


\nt{S} \prod 0\nts{S}1


\punion 1\nts{S}0


\punion \nt{S}\;\nt{S}


\punion \emptystr


\end{equation}




An astute reader may at this point notice that we have created an ambiguity in


our grammar: Recall~\eqref{eqa} and~\eqref{eqb}, which had two possible


derivations for the same string; both of these derivations are now possible in


our grammar. The text defines an ambiguous grammar to be a grammar that contains


more than one leftmost or more than one rightmost derivation for the same


sentence. This is a particularly interesting example of ambiguity, in particular


because we cannot resolve it. Let us consider why.




\proof Grammar~$\nt{S}$ cannot be disambiguated. We will prove this fact by


contradiction.




\contra Firstly, recognize that~$\nt{S}$ is ambiguous because there exists some


sentence~$s$ that has both of the following derivations in $n>1$ steps, where


$a\ne b$:




\begin{align*}


\nt{S} &\deriv a\nts{S}b \derivp a\;x\;b;


\\


\nt{S} &\deriv \nt{S}\;\nt{S}


\deriv a\nts{S}b\;\nt{S}


\derivp a\;b\;\nt{S}


\derivp a\;b\;x.


\end{align*}




Suppose to the contrary that there is some way to disambiguate~$x$. There must


then be some terminal $c\in\Sigma$ in~$x$ that may be used to perform the


disambiguation and such a disambiguation would imply a difference in the


semantics of~$x$ between the two derivations. However, $x=x$ and so both


derivations hold exactly the same meaningbalanced strings. Furthermore, the


productions for producing balanced strings requires each character in~$\Sigma$;


$c$ therefore must not exist. \foorp




Fortunately, this ambiguity is not an issue for our grammar because the multiple


derivations are semantically equivalentwe are not arriving at any different


result within the context of this exercise. The sentence 1010 of \fref{eqex}


demonstrates this concept: It does not matter whether we consider the sentence


to be a single balanced string or the concatenation of two balanced strings; we


arrive at the same result regardless with no harm done.\footnote{Of course, one


valid argument is that a more concise and unambiguous grammar will reduce


problems during parsing. However, the parser (like Lex, as described by the


text) can give precedence to the productions that appear earlier in the grammar


to resolve this issue.}




While the discussion thus far is likely to convince the reader that \gref{eq5}


is correct, we shall conclude with a formal proof of this fact. A proof that


the grammar cannot be represented by any finite automata shall be omitted, in


particular because the productions of $\nt{S}$ have a structure very similar to


the palindrome \gref{palindrome}.




\proof Only sentences composed of balanced~1's and!0's may be derived


from~$\nt{S}$.




\basis The basis is $n=1$. The only sentence that may be derived in 1 step


is~\emptystr, which is clearly balanced (containing zero~0's and zero~1's).




\ind Assume that any sentence derived in fewer than~$n$ steps is balanced. Now


recognize that any sentence derived in $n>1$ steps must make use of one of the


following productions of $\nt{S}$:




\begin{align*}


\nt{S} &\prod 0\nts{S}1; \\


\nt{S} &\prod 1\nts{S}0; \\


\nt{S} &\prod \nt{S}\;\nt{S}.


\end{align*}




\noindent


Therefore, the smallest sentence that is not~\emptystr\ is either $0\nt{x}1$ or


$1\nt{x}0$, both of which are balanced (each contains one~0 and one~1). Since


$x$~is derivable from~$\nt{S}$ in fewer than~$n$ steps, then by our inductive


hypothesis, all sentences derivable from~$\nt{S}$ must be balanced. The last


remaining production has the form~$xy$, both of which are derivable from~\nt{S}


in fewer than~$n$ steps and thus must be balanced. Furthermore, since the


productions of~$\nt{S}$ produce only 0,~1, or~\emptystr, $\nt{S}$~has the


alphabet $\Sigma=\set{0,1}$ and, consequently, may derive no sentence except for


those containing balanced~0's and~1's. \foorp




Having proved that only sentences of balanced~0's and~1's are derivable


from~$\nt{S}$, we must now prove that $\nt{S}$~can derive \emph{all} such


strings (that is, all such strings are sentences of $\nt{S}$). Such a proof is


interesting because our grammar is more sophisticated than the previous


examples.




\proof All strings of balanced~0's and~1's are sentences of~$\nt{S}$.




\basis The basis is a string of length $l=0$, which contains zero~0's and


zero~1's. This string must be~\emptystr, which is derivable from~$\nt{S}$.




\ind First, recognize that all balanced strings must have a length $l=2k$that


is, $l$~is always even (as emphasized in \fref{eqex}) and contains $k$~0's and


$k$~1's. Assume that all strings less than length~$2k$ are derivable


from~$\nt{S}$.




Consider any balanced string~$s$ of length~$2k$. We can consider $s$~to have the


form~$yz$that is, the concatenation of two balanced strings~$y$ and~$z$, both


of which in turn have the form $axb, a\neq b$ where $x$ itself must be balanced


(since $a\neq b$); alternatively, either $y$ or~$z$ may be~\emptystr, which


therefore implies that the form~$yz$ accepts any balanced string where the first


and last characters are not the same.




We must now show that all such strings can be represented by the form~$yz$.


First, recognize that $y=axb$ may have either the form $0x1$ or $1x0$; the form


$yz$ then permits up to two adjacent identical characters in $\Sigma$; any


additional adjacent identical characters may be derived by $x$. Consider


$x=\emptystr$; then, clearly $axb$ is balanced and can be concatenated to form a


larger balanced string. If $x\neq\emptystr$ but $x_1=b$,\footnote{$x_n$ denotes


the $n^{\text{th}}$ character of~$x$.} then we can instead consider an


alternative interpretation $y'=ax_1$ and $x'=x_2\cdots x_nb$, and then let


$y=y'x'$ (instead of $axb$).




We are then left with the case where $x_1=a$. Such a case allows for an


arbitrarily deep nesting of adjacent identical characters and therefore $axb$


can be represented by the regular expression $a^+b^+$. It is therefore clear


that the form $yz$ is able to describe any string of balanced characters in the


alphabet $\Sigma=\set{0,1}$. Such a form must have the derivation




$$


\nt{S} \deriv \nt{S}\;\nt{S} \derivlmz xy.


$$




\noindent


Since this is a leftmost derivation, $y$~is either a balanced string or


\emptystr. In the former case, it is obvious that both $x$ and~$y$ are of a


length less than~$2k$ and are therefore derivable from~\nt{S} by our inductive


hypothesis. Otherwise, $y=\emptystr$ and the length of $x$~is precisely~$2k$ and


we must consider the form $axb$; $x$~is clearly of a length of less than~$2k$


and is therefore balanced by our inductive hypothesis. Furthermore, it must have


a derivation of the form


$$


\nt{S} \deriv a\nts{S}b \derivz a\;x\;b,


$$


\noindent


thereby proving that $axb$ is derivable from~\nt{S}. \foorp




This proof was considerably more involved than our previous ones and is an


excellent segue into proving more sophisticated grammars. Of course, the reader


can surely see the challenges that might arise from attempting to prove much


more complicated grammars. \exend






\section{License}


This work is licensed under the Creative Commons AttributionShareAlike 4.0


International Licenseyou are free to use, share, and modify it to suit


your needs, provided that you give proper attribution and license derivative


works under similar terms. For more information, see:




\tt{https://creativecommons.org/licenses/bysa/4.0/}.




\end{document}


