# Environment access and mutation using regular expressions # # Copyright (C) 2018 Mike Gerwitz # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . # # This script, when applied recursively, implements a system for # accessing and manipulating an environment using only a series of regular # expressions (in the formal sense). This poses some interesting challenges # and builds off of previous example scripts (which are noted below). In # particular, since regular expressions are context-free, we cannot use # for backreferences, and so much of the logic here is spent performing # comparisons (see `cmp.sed'). (Note that sed does support backreferences; # we do not use them here.) # # Consider this example: # # foo=quick # bar=fox # baz=lazy # # foo <- the *foo # bar <- brown *bar # verb <- jumped # subj <- the *baz dog # full <- *foo *bar *verb over *subj # # Recursively evaluating this until completion will yield the output # # @0 # foo=the quick # bar=brown fox # baz=lazy # verb=jumped # subj=the lazy dog # full=the quick brown fox jumped over the lazy dog # # along with a couple trailing newlines. # # To accomplish this task, we define a series of states that drive each # regular expression. The state is represented at the beginning of the # string, prefixed by a `@'. Some states take arguments so that they can # act like functions with continuations, allowing them to be re-used in # multiple contexts. To transition to a new state, a regular expression # replaces the state at the beginning of the string. Iteration is # accomplished by creating cycles in this state machine. We can visualize # the system roughly as this NFA, where the accepting states are enclosed in # parenthesis: # # ,-----,--------------------------------------------------, # v v \ # @A -> @E -> @PRECMP -> @CMP -> { @*, @_, @=, @! } -> @CLEAN -` # \ # `-> { (@0), (@1) } # # This use of state prefixes in each regular expression makes everything # quite verbose. We could have accomplished the same thing with # considerably less verbosity by using sed's block support, but that # distracts from the fact that all of this can be done using only # context-free regular expressions. So we don't. # # Another consequence of using state prefixes is that the match prohibits # multiple replacements within the same state on the same invocation of # sed. For example, @CLEAN's expressions do not use the `/g' flag because # doing so would still only match the beginning of the pattern space once; # so rather than being able to clean each env line in one shot, we have to # clean one line at a time. This could also be mitigated by using sed's # block support (but again, we won't). Consequently, each pass does very # little---this script must be run many times to produce the final # result. However, it does have the nice benefit that you can observe the # program processing step-by-step. # # This is intentionally esoteric; please do not ever use this methodology in # a production system---it is simply the wrong thing to do (unless you're # working under such heavy constraints that you have no choice, in which # case it may be better to choose a different system). I want to say that I # had fun making this, but the inability to abstract away some fundamental # verbosity gets quite frustrating. (Attempts at further abstraction would # have distracted too much from the implementation in this somewhat small # example.) I attempted to mitigate some of this frustration for you (the # reader) by adding comments below the expressions to loosely document the # groups, but as you may know (unless your implementation supports e.g. /x): # writing them is often considerably easier than reading them. Especially # the BRE flavor. # # To run: # # $ ./animate -c env-dyn.sed input-file # # See also `env-fixed.sed', which presents a much simpler implementation if # the environment has a fixed set of variables. ## # Read all lines into the pattern space. :a; N; $!ba # Exit code 1 means we finished successfully, whereas 2 means we finished in # error. /^@0/q1; /^@1/q2 # If no state is found at the beginning of the string, then this is our # first pass: initialize, beginning in state @A (assignment). Always end in # a newline to simplify regular expressions for assignments. s/^[^@]/@A\n&/ s/[^\n]$/&\n/ ## # State @A - Validate and Normalize Assignment ## # If there are no assignments remaining, transition @A->@0 to quit # successfully the next iteration. s/^@A\(.*\n\n\)[ \n]*$/@0\1/ # Check that the next assignment is well-formed. If so, transition @A->@E # and normalize the form by removing whitespace around the assignment # operator. s/^@A\(.*\n\n[a-z]\+\) *<- */@E\1<-/ # ( 1 ) # If we're still in State @A, then the assignment must have been # syntatically invalid. Transition to state @1 (to exit in error next # iteration). s/^@A/@1/ ## # State @E - Expand Assignment # # Expands variable references in the next assignment. Note how this uses # @PRECMP for both the expansions and for transitioning to the actual # assignment to the environment. ## # If a reference is found, transition @E->@PRECMP with two arguments # representing the reference replacement (@*) and unknown reference (@_) # states. s/^@E\(.*\n\n[^\n]\+\*\([a-z]\+\)\)/@PRECMP \2 @* @_\1/ # ( 1 ^ ( 2 ) ) # ref name # If there are no more references (that is, no more `*'s in the assignment # value), then transition @E->@PRECMP with two arguments representing the # assignment (@=) and append (@!) states. s/^@E\(.*\n\n\([a-z]\+\)<-[^\n*]\+\n\)/@PRECMP \2 @= @!\1/ # ( 1 ( 2 ) ^ ) # var name ## # State @PRECMP(S) - Prepare for Comparison # # This state takes a single argument S that is the name of the variable to # compare against. This allows it to set up comparisons for multiple uses. ## # Set up env var line for comparison by prefix it with ``~S'' (where S is # the argument to @PRECMP), followed by a duplicate of the env var # name. This duplicate is necessary since the name will be modified during # @CMP as part of the comparison process. s/^\(@PRECMP \([a-z]\+\).*\n\)\([a-z]\+\)=/\1~\2 \3 \3=/ # ( 1 ( 2 ) ) ( 3 ) # cmp name env var # Transition @PRECMP->@CMP if all environment variables have a comparison # string before them. s/^@PRECMP [a-z]\+\([^\n]\+\n\(~[^\n]*\n\)*\n\)/@CMP\1/ # ( 1 ( 2 ) ) # env vars ## # State @CMP(C A) # # The comparison state is responsible for performing the actual comparison # between the values set up by @PRECMP. It takes two arguments C and A # that represent the states to transition to for the consequent (if a match # is successful) or alternative (if no match is found), respectfully. This # allows @CMP to be used in multiple contexts, like a function call. ## # At this point, all values to be compared will be of the form ``~x y''. # Compare the first character of x and y, replacing with an exclamation # point at the beginning of the line in the case of a non-match. # # For more information on this strategy, see `cmp.sed'. We limit ourselves # here to [a-z]. s/^\(@CMP.*\n\)~a[^ ]* [^a][^ ]* /\1!/ s/^\(@CMP.*\n\)~b[^ ]* [^b][^ ]* /\1!/ s/^\(@CMP.*\n\)~c[^ ]* [^c][^ ]* /\1!/ s/^\(@CMP.*\n\)~d[^ ]* [^d][^ ]* /\1!/ s/^\(@CMP.*\n\)~e[^ ]* [^e][^ ]* /\1!/ s/^\(@CMP.*\n\)~f[^ ]* [^f][^ ]* /\1!/ s/^\(@CMP.*\n\)~g[^ ]* [^g][^ ]* /\1!/ s/^\(@CMP.*\n\)~h[^ ]* [^h][^ ]* /\1!/ s/^\(@CMP.*\n\)~i[^ ]* [^i][^ ]* /\1!/ s/^\(@CMP.*\n\)~j[^ ]* [^j][^ ]* /\1!/ s/^\(@CMP.*\n\)~k[^ ]* [^k][^ ]* /\1!/ s/^\(@CMP.*\n\)~l[^ ]* [^l][^ ]* /\1!/ s/^\(@CMP.*\n\)~m[^ ]* [^m][^ ]* /\1!/ s/^\(@CMP.*\n\)~n[^ ]* [^n][^ ]* /\1!/ s/^\(@CMP.*\n\)~o[^ ]* [^o][^ ]* /\1!/ s/^\(@CMP.*\n\)~p[^ ]* [^p][^ ]* /\1!/ s/^\(@CMP.*\n\)~q[^ ]* [^q][^ ]* /\1!/ s/^\(@CMP.*\n\)~r[^ ]* [^r][^ ]* /\1!/ s/^\(@CMP.*\n\)~s[^ ]* [^s][^ ]* /\1!/ s/^\(@CMP.*\n\)~t[^ ]* [^t][^ ]* /\1!/ s/^\(@CMP.*\n\)~u[^ ]* [^u][^ ]* /\1!/ s/^\(@CMP.*\n\)~v[^ ]* [^v][^ ]* /\1!/ s/^\(@CMP.*\n\)~w[^ ]* [^w][^ ]* /\1!/ s/^\(@CMP.*\n\)~x[^ ]* [^x][^ ]* /\1!/ s/^\(@CMP.*\n\)~y[^ ]* [^y][^ ]* /\1!/ s/^\(@CMP.*\n\)~z[^ ]* [^z][^ ]* /\1!/ # If we have not made a negative comparison replacement above, then the # characters match; discard them to prepare for the next iteration. s/^\(@CMP.*\n~\).\([^ ]*\) ./\1\2 / # If we have ``~ '', then that means all characters have been compared and # a match has been found. Transition @CMP->C and remove the two trailing # spaces. If all are non-matches (all env lines prefixed with `!'), # transition @CMP->A. Otherwise, we're not done yet, so do nothing. s/^@CMP \(@[^ ]\+\)[^\n]*\(.*\n~\) /\1\2/ # ( 1 = C ) A ( 2 ) s/^@CMP [^ ]\+ \(@[^ ]\+\)\(\(\n![^\n]\+\)*\n\n\)/\1\2/ # C ( 1 = A ) ( ( 3 ) 2 ) ## # State @CLEAN(S) # # This state cleans up after @CMP to restore environments to their original # state. It takes a single argument S to transition to after cleanup. ## # Strip any non-letter prefix and extra comparison words off of env # vars. Note that we do not check for the empty line separator---a prefix # check is enough since the assignment form must begin with a letter. s/^\(@CLEAN.*\n\)[!~]\([a-z]\+ \)*\([a-z]\+=\)/\1\3/ # ( 1 ) ( 2 ) ( 3 ) # state + args extra words var name= # If all such prefixes have been stripped, then we're done with # cleanup. Transition @CLEAN->S. s/^@CLEAN \(@[^ \n]\+\)\(\(\n[a-z][^\n]*\)*\n\n\)/\1\2/ # ( 1 ) ( ( 3 ) 2 ) # next state entire environment ## # State @* - Replace Variable Reference # # Replace a variable reference with the value of that variable from the # environment. ## # Replace the last reference on the assignment line (which happens to be the # reference that we matched on previously) with the value of the matched # variable from the environment (prefixed by `~'). Then transition # @*->@CLEAN with a single argument to return to @E for further expansion. s/^@\*\(.*\n~[^=]\+=\([^\n]\+\).*\n\n[^\n]\+\)\*[a-z]\+/@CLEAN @E\1\2/ # ( 1 ( 2 ) ) ^ # var value ## # State @_ - Unknown Reference # # Replace unknown references with a string of the form "[UNKNOWN REF: name]". ## # If we're in this state, then the last reference on the assignment line is # unknown. Replace it, then transition @_->@E for further expansion. s/^@_\(.*\n\n[^\n]\+\)\*\([a-z]\+\)/@CLEAN @E\1[UNKNOWN REF: \2]/ # ( 1 ) ^ ( 2 ) # ref name ## # State @= (Assignment) # # Replace env var value with the text of the assignment and then transition # @=->@CLEAN, providing a single state argument to transition to state @A # after cleanup is complete. ## s/^@=\(.*\n\)~\([^=]*\)=[^\n]*\(.*\n\n\)[^<]*<-\([^\n]*\)\n/@CLEAN @A\1\2=\4\3/ # ( 1 ) ( 2 ) ( 3 ) ( 4 ) # var new val ## # State @! (Non-Match Append) # # This state indicates that a match for an environment variable was not # found and that it should be appended to the environment. ## # Reformat the ``var<-val'' form as ``var=val'' and add it to the bottom of # the environment. The assignment line is removed. Then, transition # @!->@CLEAN with a single argument to transition back to @A after cleanup # is complete. s/^@!\(.*\)\n\n\([a-z]\+\)<-\([^\n]\+\)\n/@CLEAN @A\1\n\2=\3\n\n/ # ( 1 ) ( 2 = var) ( 3 = val)