20180603 00:06:51 04:00



% Discussion of Selected Exercises From Compilers: Principles, Techniques, and Tools




%




% Copyright (C) 2013, 2018 Mike Gerwitz

20130515 21:59:38 04:00



%

20180602 23:47:20 04:00



% Licensed under a Creative Commons AttributionShareAlike 4.0




% International License.

20170122 02:58:56 05:00



%




% Discussion of section 4.2.8 (exercises for section 4.2) in CPTT (the




% "dragon book")




%%

20130515 21:59:38 04:00







\documentclass[draft]{article}




\usepackage{amsmath,amssymb,tikz}




\usetikzlibrary{automata,positioning}








\begin{document}




\title{Discussion of Selected Exercises: \\




Section 4.2.8 of Compilers: Principles, Techniques and Tools \\




\vspace{1em}




\large{Topic: ContextFree Grammars}}





20170122 02:58:56 05:00



\author{20130515}

20130515 21:59:38 04:00



\date{\today}








\maketitle








\def\exercise#1 #2\par{




\goodbreak




\vspace{0.5em plus 0.5em}




\noindent




\llap{\bf Exercise #1 }%




{\sl#2}\par




\vspace{0.5em plus 0.5em}




\goodbreak




}




\def\exend{$\blacksquare$}








\def\set#1{\left\{#1\right\}}








\def\nt#1{{\ifmmode#1\else$#1$\fi}}




\def\nts#1{\;\nt#1\;}




\def\prod{\rightarrow}




\def\punion{\;\;}




\def\emptystr{\ifmmode\epsilon\else$\emptystr$\fi}








\def\mspace#1{\ifmmode\;#1\;\else$#1$\fi}








\def\derivop{\displaystyle\mathop{\Rightarrow}}




\def\deriv{{\mspace\derivop}} % extra grouping to solve issue in mmode w/ align




\def\lmderiv{\mspace{\deriv\limits_{lm}}}




\def\derivz{\mspace{\derivop^{\kern 0.25em*}}}




\def\derivp{\mspace{\derivop^{\kern 0.25em+}}}




\def\derivlm{\mspace{\derivop_{lm}}}




\def\derivrm{\mspace{\derivop_{rm}}}




\def\derivlmz{\mspace{\derivop^{\kern 0.25em*}_{lm}}}








\let\eqrefold\eqref




\def\eqref#1{\eqrefold{e:#1}}




\def\gref#1{grammar~\eqref{#1}}




\def\Gref#1{Grammar~\eqref{#1}}




\def\fref#1{Figure~\ref{f:#1}}








\def\prooftext#1 #2\par{




\goodbreak




\vspace{1ex plus 0.5ex}




\noindent




\llap{#1 }%




#2\par




}




\def\proof{\prooftext {\bf\small\uppercase{Proof}} }




\def\basis{\prooftext {\sc Basis} }




\def\ind{\prooftext {\sc Induction} }




\def\contra{\prooftext {\sc Contradiction} }




\def\foorp{$\square$\vspace{1ex plus 1ex}}












\begin{abstract}

20180603 00:01:54 04:00



\input{abstract}

20130515 21:59:38 04:00



\end{abstract}












\section{ContextFree Grammars}




The focus of this discussion (and of Section 4.2 in CPTT) is on contextfree




grammars (or simply ``grammars'').








\section{Convention and Notation}




The following notational conventions are used throughout this paper. In most




cases, they have been borrowed from the text.








For grammars, capital symbols are used to represent nonterminals. The $\nt{S}$




symbol is used to denote the starting nonterminal. The symbol $\prod$~is used




to separate the nonterminal from its production body, whereas




$\deriv$~indicates a single step in a derivation. Leftmost and rightmost




derivations are denoted $\derivlm$ and~$\derivrm$ respectively. $\derivz$ means




``derives in zero or more steps'', whereas $\derivp$ means ``derives in one or




more steps''. The symbol $\punion$~separates multiple productions for a single




nonterminal. Any time punctuation is placed at the end of a grammar or




derivation, it should be read as part of the surrounding paragraph, \emph{not}




as part of the production or derivation. For example, in the grammar




$$




\nt{S} \prod 0\nts{S}1 \punion \emptystr,




$$




\noindent




the trailing comma is not part of the construction. Furthermore, whitespace is




not significant and may be discarded. \emptystr~is the empty string.








``The text'' refers to CPTT, whereas ``this paper'' refers to the paper you are




currently reading.












\section{Exercise 4.2.3Grammar Design}




This exercises requests that the reader design grammars for a series of language




descriptions af; we will discuss each of them. Although the text does not




request it, proofs will be provided for each, as they are useful to demonstrate




correctness and an excellent practice in discipline.








\exercise 4.2.3a The set of all strings of 0's and 1's such that every 0 is




immediately followed by at least one 1.








The grammar for this exercise is fairly trivial, but will serve as a useful




introduction to the formalities of this paper. First, let us consider a grammar




that demonstrates such a property. Our alphabet is $\Sigma = \set{0,1}$. The




only restriction on the sentences of our grammar is that each 0 must be followed




by a 1this therefore means that we can have any number of adjacent 1's, but




it is not possible to have adjacent 0's. Considering that our alphabet~$\Sigma$




has only two characters, this grammar is fairly simple:








\begin{equation}\label{e:z1}




\nt{S} \prod 1\nt{S} \punion 01\nt{S} \punion \emptystr.




\end{equation}








As an example, let us consider some of the sentences that we may wish to be




derived by this grammar. In particular, consider derivation of the string




$01011$:








\begin{equation}




\nt{S} \deriv 01\;\nt{S}




\deriv 01\;01\;\nt{S}




\deriv 01\;01\;1\nt{S}




\deriv 01\;01\;1\;\emptystr




\derivz 01\;01\;1.




\end{equation}








Notice also that a string of 1'ssuch as $1111$is also derivable given our




grammar:








\begin{equation}\label{e:z11s}




\nt{S} \deriv 1\;\nt{S}




\deriv 1\;1\;\nt{S}




\deriv 1\;1\;1\;\nt{S}




\deriv 1\;1\;1\;1\;\nt{S}




\deriv 1\;1\;1\;1\;\emptystr




\derivz 1\;1\;1\;1,




\end{equation}








\noindent




as is the empty string $\emptystr$ in one step:








\begin{equation}




\nt{S} \deriv \emptystr.




\end{equation}








To prove that grammar \eqref{z1} is correct, we must prove two independent




statements:








\begin{enumerate}




\item The \emph{only} strings derivable from \gref{z1} are those of 0's and




1's such that every 0 is immediately followed by at least one 1;








\item The grammar accepts all such strings.




\end{enumerate}








We will prove these statements in order. For the first statement, we must




show that, at any given step $n$ of \gref{z1}, the only derivable strings




contain a 1 after each and every 0 (or that the string contains no 0's). For the




second statement, we must show that any string containing 0's and 1's such that




every 0 is followed by at least one 1 is derivable from our grammar. Grammar




proofs are discussed in Section 4.2.6 of the text.








\proof The only strings derivable from~$\nt{S}$ are those of 0's and~1's such




that every 0~is immediately followed by at least one~1. We shall perform this




proof inductively on the number of steps~$n$ in a given derivation.








\basis The basis is $n=1$. In one step, our grammar may produce one of three




strings: A string beginning with a~1 (the first production of~$\nt{S}$), a




string beginning with a~0 followed by a~1 (the second production of~$\nt{S}$)




and the empty string~\emptystr\ (the final production of~$\nt{S}$).








The empty string~\emptystr\ has no~0's and so follows the rules of the language.




The same is true for any string beginning with a~1. The third and final string




that can be generated when~$n=1$ is~01. This string does contain a~0 and




therefore also satisfies our requirement.








\ind We shall now assume that all derivations of fewer than $n$~steps result in




either a sentence containing no~0's or a sentence that contains 0's~followed by




one or more~1's. Such a derivation must have the form








\begin{equation}\label{e:z1ind}




\nt{S} \deriv xS \derivz xy.




\end{equation}








\noindent




Since $x$~is derived in fewer than $n$~steps then, by our inductive hypothesis,




$x$~must contain~0's only if followed a~1; the same is true of~$y$.








Additionally, according to \gref{z1}, $y$~must be of one of the productions








\begin{align*}




\nt{S} &\prod 1\nt{S} \\




\nt{S} &\prod 01\nt{S} \\




\nt{S} &\prod \emptystr.




\end{align*}




\noindent




Each of these productions have already been discussed in our basis; therefore,




$y$~cannot contain a~0 followed by another~0. Additionally, it is required that




adjacent~1's be permitted after a~0, which is possible by the first production




(as demonstrated in \eqref{z11s}). As such, $xy$~must contain only~0's




followed by one or more~1's and our hypothesis has been proved. \foorp








To ensure a thorough understanding of the above proof, it is worth mentioning




why \eqref{z1ind}~used both the \deriv\ and~\derivz\ derivation symbols. Our




basis applies when $n=1$; the inductive hypothesis applies otherwise (when




$n>1$). As such, we must have \emph{at least} one production in~\eqref{z1ind}.








Now that we have proved that we may only derive sentences from \gref{z1} that




contain~0's followed by one or more~1's, we must now show that the grammar may




be used to derive all such possible strings.








\proof Any string~$s$ of length~$l$ consisting of~1's and~0's such that any~0 is




followed by at least one~1 is derivable from~$\nt{S}$.








\basis A string of length~$0$ ($l=0$) must be~\emptystr, which is derivable




from~$\nt{S}$ in one step.








\ind Assume that any string $s$ of a length less than $l$ is derivable




from~\nt{S}. Such a string must have the form~$xy,




y\in\set{1,01,\emptystr}$that is, we can consider $s$ to be the concatenation




of $y$~with a previously derived string. Since the length of $x$~is clearly less




than~$l$, it must by derivable from~\nt{S} by our inductive hypothesis.




Furthermore, $xy$~must have a derivation of the form








\begin{equation}\label{e:z1deriv1}




\nt{S} \derivp x\;\nt{S} \deriv x\;y,




\end{equation}




\noindent




thereby proving that $s$~is derivable from~\nt{S}. \foorp








The derivation~\eqref{z1deriv1} may seem to be too abstract to be useful;




since this is our first proof, it is worth clarifying why it does in fact




complete the proof. We first showed that any string of the language of 0's and




1's that we have been studying can be described as the concatenation of a




smaller such string with 0, 01 or~\emptystr\ (which completes the string). This




string, as we stated, has the form~$xy$. Therefore, we must show that




\nt{S}~supports concatenation\eqref{z1deriv1} demonstrates this with~$x$




fairly abstractly, since it does not matter what exactly $x$~is. From the




productions of~\nt{S} in \gref{z1}, it is understood that $x$ can be any string




of terminals (that isany derivation) leading up to that point in the




derivation~\eqref{z1deriv1}.








We must now show that the remaining part of~$xy$that is, $y$is derivable.




The only nonterminal remaining after~$x$ is~\nt{S}. We have defined $y$~to be




any string of terminals in the set $\set{0,01,\emptystr}$. Clearly, each of




these strings are derivable from~\nt{S}. Therefore, we can replace~\nt{S}




in~\eqref{z1deriv1} with~$y$, indicating that this is a valid derivation given




our definition of~$y$; it is up to the reader of the proof to make this




connection. Note that, while the domain of $y$~happens to be every production




of~\nt{S}, this is not necessary for the proofthat is the subject of the




first proof.








Before we put this exercise to rest (indeed, we completed the exercise




requirement in the first paragraph following the exercise definition), it is




also worth noting that this grammar may also be accepted by a finite automata




(and consequently, a regular expression); this is demonstrated by




\fref{z1regex}. It should be noted that this is not the case with all of the




exercises that follow.




\exend








\begin{figure}




\center




\begin{tikzpicture}




\node[state,initial] (a) {$a$};




\node[state] (b) [right=of a] {$b$};




\node[state,accepting] (c) [right=of b] {$c$};








\path[>]




(a) edge [loop below] node {1} ()




edge [bend right, below] node {\emptystr} (c)




edge [above] node {$0$} (b)




(b) edge [above] node {$1$} (c)




(c) edge [bend right, above] node {\emptystr} (a)




;




\end{tikzpicture}








\caption{An NFA corresponding to the extended regular expression




$\left(0^?1^+\right)^*$ describing \gref{z1}.}




\label{f:z1regex}




\end{figure}








The above example was fairly simple, yet resulted in a realitively lengthy




discourse far past what was required by the text; the reader can expect such a




discussion to continue for all examples that follow.












\exercise 4.2.3b The set of all strings of 0's and 1's that are




palindromes; that is, the string reads the same backward as forward.








As the exercise stated, a {\sl palindrome} is a string that reads the same in




both directions; let us consider some examples before attempting to construct a




grammar. The following list of strings are all palindromes, one per




line:\footnote{An example of an English palindrome is ``Mr.~Owl ate my metal




worm'' (discarding punctuation and capitalization.)}








\begin{equation}\label{e:palex}




\begin{tabular}{rcl}




1 &00 &1 \\




1100 &11 &0011 \\




010 &1 &010 \\




& 0 &




\end{tabular}




\end{equation}








The above palindromes have been laid out so that their symmetry is apparent. At




first glance, one can imagine constructing a palindrome out of pairs of




characters, like the second row of~\eqref{palex}:








\begin{equation}\label{e:palex2}




\begin{tabular}{crcl}




& 11 & \\




1 & 11 & 1 \\




11 & 00 & 11 \\




110 & 00 & 011 \\




1100 & 11 & 0011




\end{tabular}




\end{equation}








\noindent




In this case, each palindrome would always have an even number of characters.




However, it is important to note the bottom two palindromes of \eqref{palex},




which have an \emph{odd} number of characters:








\begin{equation}\label{e:palex3}




\begin{tabular}{rcl}




& 00 & \\




0 & 11 & 0 \\




01 & 00 & 10 \\




010 & 1 & 010




\end{tabular}




\end{equation}








Given this evaluation and the understanding that $2n$~is always even for some




positive integer~$n$, it would be accurate to recursively construct a palindrome




from the edges inward in pairs. Once we reach the center, we may end




with~\emptystr\ if we wish to have an even ($2n$) number of characters, or




otherwise may add a single character to create a palindrome containing an odd




($2n+1$) number of characters.








\begin{equation}\label{e:palindrome}




\begin{aligned}




\nt{S} &\prod 0\nts{S}0 \punion 1\nts{S}1 \punion M \\




\nt{M} &\prod 0 \punion 1 \punion \emptystr




\end{aligned}




\end{equation}








In \gref{palindrome} above, we define out start nonterminal~\nt{S} with




productions for the outer pairs. The nonterminal~\nt{M} represents the




acceptable inner (``middle'') characters, which determines if the length of the




palindrome is even (if \emptystr~is used) or odd (0 or~1). We will leave




demonstrations of such derivations to the proof.








To prove that grammar~\nt{S} is the proper grammar for all palindromes, we must




again prove two things: That language $L(\nt{S})$ can produce only palindromes




of~0's and~1's and that all such palindromes can be derived from~\nt{S}. The




difference between these two descriptions may be subtle for such a simple




grammar, but the distinction is important to ensure that $L(\nt{S})$ represents




\emph{nothing more and nothing less} than a language that may be used for such




palindromes.








As before, the proofs will be inductivethe first proof on the number of




steps~$n$ of a derivation of~\nt{S} and the second on the length~$l$ of the




palindrome~$s$. Our alphabet~$\Sigma$ is once again~$\set{0,1}$.








\proof The only strings derivable from grammar~\nt{S} are palindromes consisting




of 0's and~1's.








\basis The basis is $n=2$, which is the fewest number of steps from which a




string may be derived from~\nt{S}.\footnote{$n=1$ steps cannot result in a




string consisting only of nonterminals, as it would result in $0S0$,~$1S1$




or~$M$.} Such a derivation must be of the form




$$




\nt{S} \deriv M \deriv x,




$$




\noindent




where $x$~is 0,~1, or~\emptystr. In the latter case, the derived string is




clearly a palindrome of length zero. In the case of 0 or~1, the length of the




string is one, which must be a palindrome.








\ind Now assume that every string derived in less than $n$~steps is a




palindrome. Such a derivation must be of the form




$$




\nt{S} \deriv x\nts{S}x \derivz x\;y\;x.




$$




\noindent




That is, the string~$x$ appears on both the left and right of~$y$. Since the




derivation of~$y$ from~\nt{S} takes fewer than $n$~stepsspecifically, $n1$




steps$y$~must be a palindrome by our inductive hypothesis. Because $x$~is




added to both the beginning and end of~$y$, then any string derived in $n$~steps




must be a palindrome. \foorp








Let us further demonstrate the above proof by deriving~\eqref{palex2}




from~\nt{S}:\footnote{The dots were added so as not to confuse the reader as to




what was going on; the symbol~\derivp\ is sufficient and therefore the dots will




be omitted in the future.}








\begin{equation}




\nt{S}




\deriv 1\nts{S}1




\deriv 1\;1\nts{S}1\;1




\deriv \cdots




\derivp 1\;1\;0\;0\;1\;\emptystr\;1\;0\;0\;1\;1




\end{equation}








\noindent




and additionally \eqref{palex3}:








\begin{equation}




\nt{S}




\deriv 0\nts{S}0




\deriv 0\;1\nts{S}1\;0




\deriv 0\;1\;0\nts{S}0\;1\;0




\deriv 0\;1\;0\;1\;0\;1\;0.




\end{equation}








\noindent




The induction step works by recognizing the basis as the middle of the string




(nonterminal~\nt{M} in \gref{palindrome})\emptystr~for palindromes of an




even length and the $\left\lceil n/2 \right\rceil^{th}$ character for those of




an odd length (1 in the case of the latter derivation). Call this string~$b$. We




know that $b$~is a palindrome, as explained in the proof above. For our




inductive step, we recognize that, for each step~$n$, we add two




charactersone to the beginning and one to the endto the result of




step~$n1$. As such, since the derivation of~$n1$ steps must be a palindrome,




the derivation in~$n$ steps must also beit is not possible to derive anything




but a palindrome from~\nt{M} and \nt{S}~maintains this designation.








For completeness, we must now show that all possible palindromes of the




alphabet~$\Sigma$ can be derived from~\nt{S}.








\proof Every palindrome consisting of~0's and~1's is derivable from~\nt{S}.








\basis If the string~$s$ is of length~$l\leq1$, then it must be \emptystr,~0 or~1,




all of which are palindromes derivable by~\nt{M}.








\ind Observe that any palindrome of length~$l>1$ must contain the same




character at positions~$1$ and~$l$.\footnote{1indexed for notational




convenience.} Assume that each string with a length less than~$l$ is derivable




from~\nt{S}. Since $s$~is a palindrome, then it must have the form $xyx,




x\in\Sigma$, where $y$~is also a palindrome. Since $y$~has a length $l2<l$,




then it must be derivable from~\nt{S} by the inductive hypothesis. The




palindrome~$s$ must therefore have a derivation of the form




$$




\nt{S} \deriv x\nts{S}x \derivz x\;y\;x,




$$




\noindent




which thus proves that~$s$ is derivable from~\nt{S}. \foorp








It is also worth noting that, unlike the first exercise, we cannot represent a




palindrome as a finite automaton (and therefore cannot represent it as a regular




expression). Let us prove this assertion.








\proof \nt{S}~cannot be represented by any finite automata. Specifically, a




finite automaton representing~\nt{S} may accept all strings that are




palindromes of the alphabet~$\Sigma$, but such an automaton must also accept




strings that are not palindromes. We shall prove this statement by




contradiction.








\contra Given the alphabet~$\Sigma$, a palindrome may contain any character




from~$\Sigma$ at any arbitrary position~$n$ and may be of length~$l\geq0$. As




such, we must be able to represent this automaton by the regular expression




$\left(01\right)^*$, whose corresponding minimumstate DFA is shown in




\fref{pala}. However, it is also necessary that characters $c_n$




and~$c_{ln+1}$ be the same symbol in~$\Sigma$a requirement that




minimumstate DFA of \fref{pala} cannot guarantee.








\begin{figure}




\center




\begin{tikzpicture}




\node[state,initial,accepting] (a) {$a$};








\path[>]




(a) edge [loop below] node {1} ()




edge [loop above] node {0} ()




;




\end{tikzpicture}








\caption{The minimumstate DFA for the regular expression




$\left(01\right)^*$.}




\label{f:pala}




\end{figure}








Consider that the only way for a finite automata to maintain a history of states




is to have a state to represent each unique history. However, to accept a string




of any length, we would need an automaton containing a potentially infinite




number of states, which is not finite (and therefore not a finite automaton).




Therefore, it is not possible to represent the history of every possible




palindrome using a finite set of states.








Given this, it must stand that a finite automaton must at some point contain a




state that transitions to a previous or current state, such as the NFA in




\fref{pala2}. Since the history of the string is ``stored'' purely in the




possible states leading up to the current state, this transition~$t$ equates to a




loss of ``memory'', without which the righthand portion of the palindrome cannot




be properly matched. Furthermore, since each position~$n$ may contain any




character in~$\Sigma$, and since the transition~$t$ can only yield a set of




future states with a limited (finite) precision, each of these future states




must be redundant. Since each NFA can be represented by an equivalent DFA and




each DFA for some grammar has a single common minimumstate DFA, any portion of




a finite automaton that can accept a palindrome of any length must be equivalent




to \fref{pala} (such as state~$x$ in \fref{pala2}). We are therefore left to




conclude that no finite automata can accept a palindrome of arbitrary length




without accepting every string that is a combination of each character in




$\Sigma$. \foorp








\begin{figure}




\center




\begin{tikzpicture}




\node[state,initial] (a) {$1$};




\node[state] (b) [right=of a] {$2$};




\node[state] (x) [right=of b] {$x$};




\node[state] (y) [right=of x] {$n1$};




\node[state,accepting] (z) [right=of y] {$n$};








\path[>]




(a) edge [above] node {$\alpha$} (b)




edge [below, bend right=45] node {$\kern0.7em\emptystr$} (z)




(b) edge [above] node {$\beta$} (x)




edge [below, bend right=65] node {$\emptystr$} (y)




(x) edge [loop above] node {$\beta$} ()




edge [loop below] node {$\alpha$} ()




edge [above] node {$\beta$} (y)




(y) edge [above] node {$\alpha$} (z)




;




\end{tikzpicture}








\caption{An NFA with a finite set of states must at some point transition to a




previous or identical state in order to accept input of any length.




$\Sigma=\set{\alpha,\beta}$.}




\label{f:pala2}




\end{figure}








To provide further clarificationany finite automata that transitions to a




\emph{previous} state, since it looses a portion of its history, can no longer




accurately determine the states leading up to the final state. That is, consider




the string 10101 and consider that the first three characters of this string can




be represented by the states $\set{a,b,a}$. At this point, we can no longer be




certain of what the string may end with, because we have lost any sense of




nesting/recursion. Therefore, the states leading to the final state are forced




to accept any character in $\Sigma$ and therefore must be equivalent to the




minimumstate DFA of \fref{pala}. As was mentioned by the text, ``finite




automata cannot count''.








\fref{pala2} gets around such an issue by transitioning only to current or




future states, which permits a \emph{finite} amount of nesting (placing the




aforementioned minimumstate DFA~$x$ in the middle). However, note a glaring




issuethis automaton does not accept~$\beta$ in the first character position.




If it did, then we would need a second set of states in order to maintain such a




history and know that we should also \emph{end} with $\beta$~instead




of~$\alpha$. The number of states would therefore grow very quickly with the




level of nesting and the size of~$\Sigma$ (such a consideration is left to the




reader).








We have exhaustively proved that \gref{palindrome} is the correct answer for




this exercise. \exend












\exercise 4.2.3c The set of all strings of 0's and 1's with an equal number of




0's and 1's.








To understand how to approach this problem, we shall consider a number of




strings that are derivable from this language. An obvious case is~\emptystr,




which contains zero~0's and zero~1's. Some additional examples are shown in




\fref{eqex} along with their lengths (denoted by~$l$).








\begin{figure}[h]




\center




\begin{tabular}{rcccccc}




$s$ & \emptystr & 10 & 01 & 1010 & 1001 & 011100 \\




\hline




$l$ & 0 & 2 & 2 & 4 & 4 & 6




\end{tabular}








\caption{Examples of strings with an equal number of 0's and 1's.}




\label{f:eqex}




\end{figure}








These examples demonstrate a number of important properties. In particular, the




length~$l$ of the string~$s$ is always even, with the number of 0's and~1's




$n=l/2$. Additionally, the characters of the alphabet~$\Sigma$ may appear in any




order in the string. Therefore, we do not have the luxury of a simple, nested,




recursive implementation as we did with the palindrome exercise (at least not




exclusively).








Let us construct the grammar iteratively, beginning with the simplest case




of~\emptystr.








\begin{equation}\label{e:eq1}




\nt{S} \prod \emptystr




\end{equation}








\noindent




The second case10is also fairly easy to fit into~$\nt{S}$:








\begin{equation}\label{e:eq2}




\nt{S} \prod 10 \punion \emptystr




\end{equation}








The third case demonstrates an important case regarding our strings: They may




begin with either a~0 or a~1 and they may also \emph{end} with either character




(more generally, they may begin or end with any character in~$\Sigma$). However,




we cannot simply adjust our grammar to accept either character in both




positions$\nt{S}$ must assure that, any time we include a~0 in a production,




we also include a~1 (and vice versa). So far, this is guaranteed by~$\nt{S}$ in




\gref{eq2}; to keep on this path, we must add 01 as yet another special case.








\begin{equation}\label{e:eq3}




\nt{S} \prod 01 \punion 10 \punion \emptystr




\end{equation}








\goodbreak




The fourth case1010introduces the need to handle strings of an arbitrary




length. To do this, we must determine at what point we should recurse




on~$\nt{S}$. Looking at the example, we could derive 1010 as two nested




applications of~$\nt{S}$ if we recurse between the two terminals.








\begin{equation}\label{e:eqa}




\nt{S}




\deriv 1\nts{S}0




\deriv 1\;0\nts{S}1\;0




\deriv 1\;0\;\emptystr\;1\;0




\derivz 1\;01\;0




\end{equation}








\noindent




Of course, one could also adopt an alternate perspective by considering the




string to be the production of two adjacent nonterminals.








\begin{equation}\label{e:eqb}




\nt{S} \deriv \nt{S}\;\nt{S}




\derivlm 10\;\nt{S}




\derivlm 10\;10




\end{equation}








\noindent




Unfortunately, with this information alone, we cannot be certain which of these




productionsif such a choice even mattersshould be used in our grammar.




Perhaps we can gain further insight from the remaining examples.








The next example1001can be derived in a manner similar to \eqref{eqb},




but not \eqref{eqa}; in particular, \gref{eq3} has no production for the




string 00, and so we cannot construct the string from the outside in. Given




that, we can be certain that an adjacent nonterminal production is needed and




so we will add the production used in \eqref{eqb} to our grammar.








\begin{equation}\label{e:eq4}




\nt{S} \prod 01 \punion 10 \punion \nt{S}\;\nt{S} \punion \emptystr




\end{equation}








However, the aforementioned predicamentthe absense of a production that can




yield only 00raises the question of whether or not we can truly derive any




string of equal 1's and 0's from the above grammar. Our final example challenges




this. 011100 cannot possibly be represented by~$\nt{S}$ in \gref{eq4} because




this grammar constructs the string from lefttoright (or righttoleft) in




pairs of~0's and~1's. Therefore, the only way to have adjacent~1's or adjacent~




0's is to alternate the productions, which makes it impossible to have more than




two adjacent identical characters.








Given this, it seems that both \eqref{eqb} \emph{and} \eqref{eqa} are




necessary; the following derivation demonstrates this fact (neither can




individually be used to derive the string 011100).








\begin{equation}




\nt{S} \deriv \nt{S}\;\nt{S}




\derivlm 01\;\nt{S}




\derivlm 01\;1\nts{S}0




\derivlm 01\;1\;1\nts{S}0\;0




\derivlm 01\;1\;1\;\emptystr\;0\;0




\derivlmz 01\;1\;10\;0




\end{equation}








\noindent




We thus arrive at \gref{eq5} below.








\begin{equation}\label{e:eq5}




\nt{S} \prod 0\nts{S}1




\punion 1\nts{S}0




\punion \nt{S}\;\nt{S}




\punion \emptystr




\end{equation}








An astute reader may at this point notice that we have created an ambiguity in




our grammar: Recall~\eqref{eqa} and~\eqref{eqb}, which had two possible




derivations for the same string; both of these derivations are now possible in




our grammar. The text defines an ambiguous grammar to be a grammar that contains




more than one leftmost or more than one rightmost derivation for the same




sentence. This is a particularly interesting example of ambiguity, in particular




because we cannot resolve it. Let us consider why.








\proof Grammar~$\nt{S}$ cannot be disambiguated. We will prove this fact by




contradiction.








\contra Firstly, recognize that~$\nt{S}$ is ambiguous because there exists some




sentence~$s$ that has both of the following derivations in $n>1$ steps, where




$a\ne b$:








\begin{align*}




\nt{S} &\deriv a\nts{S}b \derivp a\;x\;b;




\\




\nt{S} &\deriv \nt{S}\;\nt{S}




\deriv a\nts{S}b\;\nt{S}




\derivp a\;b\;\nt{S}




\derivp a\;b\;x.




\end{align*}








Suppose to the contrary that there is some way to disambiguate~$x$. There must




then be some terminal $c\in\Sigma$ in~$x$ that may be used to perform the




disambiguation and such a disambiguation would imply a difference in the




semantics of~$x$ between the two derivations. However, $x=x$ and so both




derivations hold exactly the same meaningbalanced strings. Furthermore, the




productions for producing balanced strings requires each character in~$\Sigma$;




$c$ therefore must not exist. \foorp








Fortunately, this ambiguity is not an issue for our grammar because the multiple




derivations are semantically equivalentwe are not arriving at any different




result within the context of this exercise. The sentence 1010 of \fref{eqex}




demonstrates this concept: It does not matter whether we consider the sentence




to be a single balanced string or the concatenation of two balanced strings; we




arrive at the same result regardless with no harm done.\footnote{Of course, one




valid argument is that a more concise and unambiguous grammar will reduce




problems during parsing. However, the parser (like Lex, as described by the




text) can give precedence to the productions that appear earlier in the grammar




to resolve this issue.}








While the discussion thus far is likely to convince the reader that \gref{eq5}




is correct, we shall conclude with a formal proof of this fact. A proof that




the grammar cannot be represented by any finite automata shall be omitted, in




particular because the productions of $\nt{S}$ have a structure very similar to




the palindrome \gref{palindrome}.








\proof Only sentences composed of balanced~1's and!0's may be derived




from~$\nt{S}$.








\basis The basis is $n=1$. The only sentence that may be derived in 1 step




is~\emptystr, which is clearly balanced (containing zero~0's and zero~1's).








\ind Assume that any sentence derived in fewer than~$n$ steps is balanced. Now




recognize that any sentence derived in $n>1$ steps must make use of one of the




following productions of $\nt{S}$:








\begin{align*}




\nt{S} &\prod 0\nts{S}1; \\




\nt{S} &\prod 1\nts{S}0; \\




\nt{S} &\prod \nt{S}\;\nt{S}.




\end{align*}








\noindent




Therefore, the smallest sentence that is not~\emptystr\ is either $0\nt{x}1$ or




$1\nt{x}0$, both of which are balanced (each contains one~0 and one~1). Since




$x$~is derivable from~$\nt{S}$ in fewer than~$n$ steps, then by our inductive




hypothesis, all sentences derivable from~$\nt{S}$ must be balanced. The last




remaining production has the form~$xy$, both of which are derivable from~\nt{S}




in fewer than~$n$ steps and thus must be balanced. Furthermore, since the




productions of~$\nt{S}$ produce only 0,~1, or~\emptystr, $\nt{S}$~has the




alphabet $\Sigma=\set{0,1}$ and, consequently, may derive no sentence except for




those containing balanced~0's and~1's. \foorp








Having proved that only sentences of balanced~0's and~1's are derivable




from~$\nt{S}$, we must now prove that $\nt{S}$~can derive \emph{all} such




strings (that is, all such strings are sentences of $\nt{S}$). Such a proof is




interesting because our grammar is more sophisticated than the previous




examples.








\proof All strings of balanced~0's and~1's are sentences of~$\nt{S}$.








\basis The basis is a string of length $l=0$, which contains zero~0's and




zero~1's. This string must be~\emptystr, which is derivable from~$\nt{S}$.








\ind First, recognize that all balanced strings must have a length $l=2k$that




is, $l$~is always even (as emphasized in \fref{eqex}) and contains $k$~0's and




$k$~1's. Assume that all strings less than length~$2k$ are derivable




from~$\nt{S}$.








Consider any balanced string~$s$ of length~$2k$. We can consider $s$~to have the




form~$yz$that is, the concatenation of two balanced strings~$y$ and~$z$, both




of which in turn have the form $axb, a\neq b$ where $x$ itself must be balanced




(since $a\neq b$); alternatively, either $y$ or~$z$ may be~\emptystr, which




therefore implies that the form~$yz$ accepts any balanced string where the first




and last characters are not the same.








We must now show that all such strings can be represented by the form~$yz$.




First, recognize that $y=axb$ may have either the form $0x1$ or $1x0$; the form




$yz$ then permits up to two adjacent identical characters in $\Sigma$; any




additional adjacent identical characters may be derived by $x$. Consider




$x=\emptystr$; then, clearly $axb$ is balanced and can be concatenated to form a




larger balanced string. If $x\neq\emptystr$ but $x_1=b$,\footnote{$x_n$ denotes




the $n^{\text{th}}$ character of~$x$.} then we can instead consider an




alternative interpretation $y'=ax_1$ and $x'=x_2\cdots x_nb$, and then let




$y=y'x'$ (instead of $axb$).








We are then left with the case where $x_1=a$. Such a case allows for an




arbitrarily deep nesting of adjacent identical characters and therefore $axb$




can be represented by the regular expression $a^+b^+$. It is therefore clear




that the form $yz$ is able to describe any string of balanced characters in the




alphabet $\Sigma=\set{0,1}$. Such a form must have the derivation








$$




\nt{S} \deriv \nt{S}\;\nt{S} \derivlmz xy.




$$








\noindent




Since this is a leftmost derivation, $y$~is either a balanced string or




\emptystr. In the former case, it is obvious that both $x$ and~$y$ are of a




length less than~$2k$ and are therefore derivable from~\nt{S} by our inductive




hypothesis. Otherwise, $y=\emptystr$ and the length of $x$~is precisely~$2k$ and




we must consider the form $axb$; $x$~is clearly of a length of less than~$2k$




and is therefore balanced by our inductive hypothesis. Furthermore, it must have




a derivation of the form




$$




\nt{S} \deriv a\nts{S}b \derivz a\;x\;b,




$$




\noindent




thereby proving that $axb$ is derivable from~\nt{S}. \foorp








This proof was considerably more involved than our previous ones and is an




excellent segue into proving more sophisticated grammars. Of course, the reader




can surely see the challenges that might arise from attempting to prove much




more complicated grammars. \exend





20180602 23:47:20 04:00







\section{License}




This work is licensed under the Creative Commons AttributionShareAlike 4.0




International Licenseyou are free to use, share, and modify it to suit




your needs, provided that you give proper attribution and license derivative




works under similar terms. For more information, see:








\tt{https://creativecommons.org/licenses/bysa/4.0/}.





20130515 21:59:38 04:00



\end{document}

20180602 23:47:20 04:00



