325 lines
12 KiB
Sed
325 lines
12 KiB
Sed
# Environment access and mutation using regular expressions
|
||
#
|
||
# Copyright (C) 2018 Mike Gerwitz
|
||
#
|
||
# This program is free software: you can redistribute it and/or modify
|
||
# it under the terms of the GNU General Public License as published by
|
||
# the Free Software Foundation, either version 3 of the License, or
|
||
# (at your option) any later version.
|
||
#
|
||
# This program is distributed in the hope that it will be useful,
|
||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||
# GNU General Public License for more details.
|
||
#
|
||
# You should have received a copy of the GNU General Public License
|
||
# along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||
#
|
||
# This script, when applied recursively, implements a system for
|
||
# accessing and manipulating an environment using only a series of regular
|
||
# expressions (in the formal sense). This poses some interesting challenges
|
||
# and builds off of previous example scripts (which are noted below). In
|
||
# particular, since regular expressions are context-free, we cannot use
|
||
# for backreferences, and so much of the logic here is spent performing
|
||
# comparisons (see `cmp.sed'). (Note that sed does support backreferences;
|
||
# we do not use them here.)
|
||
#
|
||
# Consider this example:
|
||
#
|
||
# foo=quick
|
||
# bar=fox
|
||
# baz=lazy
|
||
#
|
||
# foo <- the *foo
|
||
# bar <- brown *bar
|
||
# verb <- jumped
|
||
# subj <- the *baz dog
|
||
# full <- *foo *bar *verb over *subj
|
||
#
|
||
# Recursively evaluating this until completion will yield the output
|
||
#
|
||
# @0
|
||
# foo=the quick
|
||
# bar=brown fox
|
||
# baz=lazy
|
||
# verb=jumped
|
||
# subj=the lazy dog
|
||
# full=the quick brown fox jumped over the lazy dog
|
||
#
|
||
# along with a couple trailing newlines.
|
||
#
|
||
# To accomplish this task, we define a series of states that drive each
|
||
# regular expression. The state is represented at the beginning of the
|
||
# string, prefixed by a `@'. Some states take arguments so that they can
|
||
# act like functions with continuations, allowing them to be re-used in
|
||
# multiple contexts. To transition to a new state, a regular expression
|
||
# replaces the state at the beginning of the string. Iteration is
|
||
# accomplished by creating cycles in this state machine. We can visualize
|
||
# the system roughly as this NFA, where the accepting states are enclosed in
|
||
# parenthesis:
|
||
#
|
||
# ,-----,--------------------------------------------------,
|
||
# v v \
|
||
# @A -> @E -> @PRECMP -> @CMP -> { @*, @_, @=, @! } -> @CLEAN -`
|
||
# \
|
||
# `-> { (@0), (@1) }
|
||
#
|
||
# This use of state prefixes in each regular expression makes everything
|
||
# quite verbose. We could have accomplished the same thing with
|
||
# considerably less verbosity by using sed's block support, but that
|
||
# distracts from the fact that all of this can be done using only
|
||
# context-free regular expressions. So we don't.
|
||
#
|
||
# Another consequence of using state prefixes is that the match prohibits
|
||
# multiple replacements within the same state on the same invocation of
|
||
# sed. For example, @CLEAN's expressions do not use the `/g' flag because
|
||
# doing so would still only match the beginning of the pattern space once;
|
||
# so rather than being able to clean each env line in one shot, we have to
|
||
# clean one line at a time. This could also be mitigated by using sed's
|
||
# block support (but again, we won't). Consequently, each pass does very
|
||
# little---this script must be run many times to produce the final
|
||
# result. However, it does have the nice benefit that you can observe the
|
||
# program processing step-by-step.
|
||
#
|
||
# This is intentionally esoteric; please do not ever use this methodology in
|
||
# a production system---it is simply the wrong thing to do (unless you're
|
||
# working under such heavy constraints that you have no choice, in which
|
||
# case it may be better to choose a different system). I want to say that I
|
||
# had fun making this, but the inability to abstract away some fundamental
|
||
# verbosity gets quite frustrating. (Attempts at further abstraction would
|
||
# have distracted too much from the implementation in this somewhat small
|
||
# example.) I attempted to mitigate some of this frustration for you (the
|
||
# reader) by adding comments below the expressions to loosely document the
|
||
# groups, but as you may know (unless your implementation supports e.g. /x):
|
||
# writing them is often considerably easier than reading them. Especially
|
||
# the BRE flavor.
|
||
#
|
||
# To run:
|
||
#
|
||
# $ ./animate -c env-dyn.sed input-file
|
||
#
|
||
# See also `env-fixed.sed', which presents a much simpler implementation if
|
||
# the environment has a fixed set of variables.
|
||
##
|
||
|
||
|
||
# Read all lines into the pattern space.
|
||
:a; N; $!ba
|
||
|
||
# Exit code 1 means we finished successfully, whereas 2 means we finished in
|
||
# error.
|
||
/^@0/q1; /^@1/q2
|
||
|
||
# If no state is found at the beginning of the string, then this is our
|
||
# first pass: initialize, beginning in state @A (assignment). Always end in
|
||
# a newline to simplify regular expressions for assignments.
|
||
s/^[^@]/@A\n&/
|
||
s/[^\n]$/&\n/
|
||
|
||
|
||
|
||
##
|
||
# State @A - Validate and Normalize Assignment
|
||
##
|
||
|
||
# If there are no assignments remaining, transition @A->@0 to quit
|
||
# successfully the next iteration.
|
||
s/^@A\(.*\n\n\)[ \n]*$/@0\1/
|
||
|
||
# Check that the next assignment is well-formed. If so, transition @A->@E
|
||
# and normalize the form by removing whitespace around the assignment
|
||
# operator.
|
||
s/^@A\(.*\n\n[a-z]\+\) *<- */@E\1<-/
|
||
# ( 1 )
|
||
|
||
# If we're still in State @A, then the assignment must have been
|
||
# syntatically invalid. Transition to state @1 (to exit in error next
|
||
# iteration).
|
||
s/^@A/@1/
|
||
|
||
|
||
|
||
##
|
||
# State @E - Expand Assignment
|
||
#
|
||
# Expands variable references in the next assignment. Note how this uses
|
||
# @PRECMP for both the expansions and for transitioning to the actual
|
||
# assignment to the environment.
|
||
##
|
||
|
||
# If a reference is found, transition @E->@PRECMP with two arguments
|
||
# representing the reference replacement (@*) and unknown reference (@_)
|
||
# states.
|
||
s/^@E\(.*\n\n[^\n]\+\*\([a-z]\+\)\)/@PRECMP \2 @* @_\1/
|
||
# ( 1 ^ ( 2 ) )
|
||
# ref name
|
||
|
||
# If there are no more references (that is, no more `*'s in the assignment
|
||
# value), then transition @E->@PRECMP with two arguments representing the
|
||
# assignment (@=) and append (@!) states.
|
||
s/^@E\(.*\n\n\([a-z]\+\)<-[^\n*]\+\n\)/@PRECMP \2 @= @!\1/
|
||
# ( 1 ( 2 ) ^ )
|
||
# var name
|
||
|
||
|
||
|
||
##
|
||
# State @PRECMP(S) - Prepare for Comparison
|
||
#
|
||
# This state takes a single argument S that is the name of the variable to
|
||
# compare against. This allows it to set up comparisons for multiple uses.
|
||
##
|
||
|
||
# Set up env var line for comparison by prefix it with ``~S'' (where S is
|
||
# the argument to @PRECMP), followed by a duplicate of the env var
|
||
# name. This duplicate is necessary since the name will be modified during
|
||
# @CMP as part of the comparison process.
|
||
s/^\(@PRECMP \([a-z]\+\).*\n\)\([a-z]\+\)=/\1~\2 \3 \3=/
|
||
# ( 1 ( 2 ) ) ( 3 )
|
||
# cmp name env var
|
||
|
||
# Transition @PRECMP->@CMP if all environment variables have a comparison
|
||
# string before them.
|
||
s/^@PRECMP [a-z]\+\([^\n]\+\n\(~[^\n]*\n\)*\n\)/@CMP\1/
|
||
# ( 1 ( 2 ) )
|
||
# env vars
|
||
|
||
|
||
|
||
##
|
||
# State @CMP(C A)
|
||
#
|
||
# The comparison state is responsible for performing the actual comparison
|
||
# between the values set up by @PRECMP. It takes two arguments C and A
|
||
# that represent the states to transition to for the consequent (if a match
|
||
# is successful) or alternative (if no match is found), respectfully. This
|
||
# allows @CMP to be used in multiple contexts, like a function call.
|
||
##
|
||
|
||
# At this point, all values to be compared will be of the form ``~x y''.
|
||
# Compare the first character of x and y, replacing with an exclamation
|
||
# point at the beginning of the line in the case of a non-match.
|
||
#
|
||
# For more information on this strategy, see `cmp.sed'. We limit ourselves
|
||
# here to [a-z].
|
||
s/^\(@CMP.*\n\)~a[^ ]* [^a][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~b[^ ]* [^b][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~c[^ ]* [^c][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~d[^ ]* [^d][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~e[^ ]* [^e][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~f[^ ]* [^f][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~g[^ ]* [^g][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~h[^ ]* [^h][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~i[^ ]* [^i][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~j[^ ]* [^j][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~k[^ ]* [^k][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~l[^ ]* [^l][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~m[^ ]* [^m][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~n[^ ]* [^n][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~o[^ ]* [^o][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~p[^ ]* [^p][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~q[^ ]* [^q][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~r[^ ]* [^r][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~s[^ ]* [^s][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~t[^ ]* [^t][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~u[^ ]* [^u][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~v[^ ]* [^v][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~w[^ ]* [^w][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~x[^ ]* [^x][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~y[^ ]* [^y][^ ]* /\1!/
|
||
s/^\(@CMP.*\n\)~z[^ ]* [^z][^ ]* /\1!/
|
||
|
||
# If we have not made a negative comparison replacement above, then the
|
||
# characters match; discard them to prepare for the next iteration.
|
||
s/^\(@CMP.*\n~\).\([^ ]*\) ./\1\2 /
|
||
|
||
# If we have ``~ '', then that means all characters have been compared and
|
||
# a match has been found. Transition @CMP->C and remove the two trailing
|
||
# spaces. If all are non-matches (all env lines prefixed with `!'),
|
||
# transition @CMP->A. Otherwise, we're not done yet, so do nothing.
|
||
s/^@CMP \(@[^ ]\+\)[^\n]*\(.*\n~\) /\1\2/
|
||
# ( 1 = C ) A ( 2 )
|
||
s/^@CMP [^ ]\+ \(@[^ ]\+\)\(\(\n![^\n]\+\)*\n\n\)/\1\2/
|
||
# C ( 1 = A ) ( ( 3 ) 2 )
|
||
|
||
|
||
|
||
##
|
||
# State @CLEAN(S)
|
||
#
|
||
# This state cleans up after @CMP to restore environments to their original
|
||
# state. It takes a single argument S to transition to after cleanup.
|
||
##
|
||
|
||
# Strip any non-letter prefix and extra comparison words off of env
|
||
# vars. Note that we do not check for the empty line separator---a prefix
|
||
# check is enough since the assignment form must begin with a letter.
|
||
s/^\(@CLEAN.*\n\)[!~]\([a-z]\+ \)*\([a-z]\+=\)/\1\3/
|
||
# ( 1 ) ( 2 ) ( 3 )
|
||
# state + args extra words var name=
|
||
|
||
# If all such prefixes have been stripped, then we're done with
|
||
# cleanup. Transition @CLEAN->S.
|
||
s/^@CLEAN \(@[^ \n]\+\)\(\(\n[a-z][^\n]*\)*\n\n\)/\1\2/
|
||
# ( 1 ) ( ( 3 ) 2 )
|
||
# next state entire environment
|
||
|
||
|
||
|
||
##
|
||
# State @* - Replace Variable Reference
|
||
#
|
||
# Replace a variable reference with the value of that variable from the
|
||
# environment.
|
||
##
|
||
|
||
# Replace the last reference on the assignment line (which happens to be the
|
||
# reference that we matched on previously) with the value of the matched
|
||
# variable from the environment (prefixed by `~'). Then transition
|
||
# @*->@CLEAN with a single argument to return to @E for further expansion.
|
||
s/^@\*\(.*\n~[^=]\+=\([^\n]\+\).*\n\n[^\n]\+\)\*[a-z]\+/@CLEAN @E\1\2/
|
||
# ( 1 ( 2 ) ) ^
|
||
# var value
|
||
|
||
|
||
|
||
##
|
||
# State @_ - Unknown Reference
|
||
#
|
||
# Replace unknown references with a string of the form "[UNKNOWN REF: name]".
|
||
##
|
||
|
||
# If we're in this state, then the last reference on the assignment line is
|
||
# unknown. Replace it, then transition @_->@E for further expansion.
|
||
s/^@_\(.*\n\n[^\n]\+\)\*\([a-z]\+\)/@CLEAN @E\1[UNKNOWN REF: \2]/
|
||
# ( 1 ) ^ ( 2 )
|
||
# ref name
|
||
|
||
|
||
##
|
||
# State @= (Assignment)
|
||
#
|
||
# Replace env var value with the text of the assignment and then transition
|
||
# @=->@CLEAN, providing a single state argument to transition to state @A
|
||
# after cleanup is complete.
|
||
##
|
||
|
||
s/^@=\(.*\n\)~\([^=]*\)=[^\n]*\(.*\n\n\)[^<]*<-\([^\n]*\)\n/@CLEAN @A\1\2=\4\3/
|
||
# ( 1 ) ( 2 ) ( 3 ) ( 4 )
|
||
# var new val
|
||
|
||
|
||
##
|
||
# State @! (Non-Match Append)
|
||
#
|
||
# This state indicates that a match for an environment variable was not
|
||
# found and that it should be appended to the environment.
|
||
##
|
||
|
||
# Reformat the ``var<-val'' form as ``var=val'' and add it to the bottom of
|
||
# the environment. The assignment line is removed. Then, transition
|
||
# @!->@CLEAN with a single argument to transition back to @A after cleanup
|
||
# is complete.
|
||
s/^@!\(.*\)\n\n\([a-z]\+\)<-\([^\n]\+\)\n/@CLEAN @A\1\n\2=\3\n\n/
|
||
# ( 1 ) ( 2 = var) ( 3 = val)
|