tamer: asg::air: Begin to introduce explicit scope testing

There's a lot of documentation on this in the commit itself, but this stems from a) frustration with trying to understand how the system needs to operate with all of the objects involved; and b) recognizing that if I'm having difficulty, then others reading the system later on (including myself) and possibly looking to improve upon it are going to have a whole lot of trouble. Identifier scope is something I've been mulling over for years, and more formally for the past couple of months. This finally begins to formalize that, out of frustration with package imports. But it will be a weight lifted off of me as well, with issues of scope always looming. This demonstrates a declarative means of testing for scope by scanning the entire graph in tests to determine where an identifier has been scoped. Since no such scoping has been implemented yet, the tests demonstrate how they will look, but otherwise just test for current behavior. There is more existing behavior to check, and further there will be _references_ to check, as they'll also leave a trail of scope indexing behind as part of the resolution process. See the documentation introduced by this commit for more information on that part of this commit. Introducing the graph scanning, with the ASG's static assurances, required more lowering of dynamic types into the static types required by the API. This was itself a confusing challenge that, while not all that bad in retrospect, was something that I initially had some trouble with. The documentation includes clarifying remarks that hopefully make it all understandable. DEV-13162
2023-05-12 12:41:51 -04:00 · 2023-05-12 12:41:51 -04:00 · 9fb2169a06
parent 00ff660008
commit 9fb2169a06
6 changed files with 478 additions and 2 deletions
--- a/tamer/src/asg/air.rs
+++ b/tamer/src/asg/air.rs
@ -523,6 +523,60 @@ impl AirAggregateCtx {
    }
 }

+/// Property of identifier scope within a given environment.
+///
+/// An _environment_ is the collection of identifiers associated with a
+///   container object.
+/// Environments stack,
+///   such that an environment inherits the identifiers of its parent.
+///
+/// The _scope_ of an identifier is defined by what environments can "see"
+///   that identifier.
+/// For the purposes of TAME's analysis,
+///   we care only about the global environment and shadowing.
+///
+/// The relationship between identifier scope and environment can be
+///   visualized as a two-dimensional table with the environments forming
+///   layers along the x-axes,
+///     and scopes slicing those layers along the y-axies.
+///
+/// TODO: Example visualization.
+#[cfg(test)]
+#[derive(Debug, PartialEq)]
+enum EnvScopeKind {
+    /// Identifiers are pooled without any defined hierarchy.
+    ///
+    /// An identifier that is part of a pool must be unique.
+    /// Since there is no hierarchy,
+    ///   the system should not suggest that shadowing is not permitted and
+    ///   should insteam emphasize that such an identifier must be unique
+    ///   globally.
+    ///
+    /// This should be used only at the root.
+    /// An identifier's scope can be further refined to provide more useful
+    ///   diagnostic messages by descending into the package in which it is
+    ///   defined and evaluating scope relative to the package.
+    _Pool,
+
+    /// Identifier in this environment is a shadow of a deeper environment.
+    ///
+    /// An identifier is said to cast a shadow on environments higher in its
+    ///   hierarchy.
+    /// Since shadowing is not permitted in TAME,
+    ///   this can be used to present useful diagnostic information to the
+    ///   user.
+    ///
+    /// A shadow can be used to check for identifier conflicts,
+    ///   but it cannot be used for lookup;
+    ///     this environment should be filtered out of this identifier's
+    ///     scope.
+    _Shadow,
+
+    /// This environment owns the identifier or is an environment descended
+    ///   from one that does.
+    Visible,
+}
+
 impl AsMut<AirAggregateCtx> for AirAggregateCtx {
    fn as_mut(&mut self) -> &mut AirAggregateCtx {
        self
--- a/tamer/src/asg/air/test.rs
+++ b/tamer/src/asg/air/test.rs
@ -35,6 +35,8 @@ type Sut = AirAggregate;
 use Air::*;
 use Parsed::Incomplete;

+mod scope;
+
 #[test]
 fn ident_decl() {
    let id = SPair("foo".into(), S2);
@ -642,6 +644,16 @@ where
    sut.finalize().unwrap().into_context()
 }

+/// [`asg_from_toks`] without creating a package automatically.
+pub fn asg_from_toks_raw<I: IntoIterator<Item = Air>>(toks: I) -> Asg
+where
+    I::IntoIter: Debug,
+{
+    let mut sut = Sut::parse(toks.into_iter());
+    assert!(sut.all(|x| x.is_ok()));
+    sut.finalize().unwrap().into_context()
+}
+
 fn root_lookup(asg: &Asg, name: SPair) -> Option<ObjectIndex<Ident>> {
    asg.lookup(asg.root(S1), name)
 }
--- a/tamer/src/asg/air/test/scope.rs
+++ b/tamer/src/asg/air/test/scope.rs
@ -0,0 +1,296 @@
+// Scope tests for ASG IR
+//
+//  Copyright (C) 2014-2023 Ryan Specialty, LLC.
+//
+//  This file is part of TAME.
+//
+//  This program is free software: you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation, either version 3 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+//! Scoping tests.
+//!
+//! These tests verify that identifiers are scoped to the proper
+//!   environments.
+//! These may duplicate portions of other tests,
+//!   but having concrete examples all in one place helps to develop, debug,
+//!   and understand a system that can be quite confusing in the abstract.
+//!
+//! These tests _do not_ assert that identifiers are properly assigned to
+//!   their corresponding definitions;
+//!     those tests exist elsewhere.
+//!
+//! The core abstraction for these tests is [`assert_scope`],
+//!   which allows for declarative assertions of identifier scope against
+//!   the graph;
+//!     it is key to creating tests that can be both easily created and
+//!     easily understood.
+//!
+//! If You Are Here Due To A Test Failure
+//! =====================================
+//! If there are failing tests in parent or sibling modules,
+//!   check those first;
+//!     these tests are potentially fragile given that they test only a
+//!     subset of behavior without first asserting against other system
+//!     invariants,
+//!       as described above.
+
+use super::*;
+use crate::{
+    asg::{
+        graph::object::{self, ObjectTy},
+        visit::{tree_reconstruction, TreeWalkRel},
+        ExprOp,
+    },
+    fmt::{DisplayWrapper, TtQuote},
+    span::UNKNOWN_SPAN,
+};
+use std::iter::once;
+
+use EnvScopeKind::*;
+use ObjectTy::*;
+
+const S0: Span = UNKNOWN_SPAN;
+
+fn m(a: Span, b: Span) -> Span {
+    a.merge(b).unwrap()
+}
+
+#[test]
+fn pkg_child_definition() {
+    let pkg_name = SPair("/pkg".into(), S1);
+    let name = SPair("foo".into(), S3);
+
+    #[rustfmt::skip]
+    let toks = vec![
+        // ENV: 0 global
+        PkgStart(S1, pkg_name),
+          // ENV: 1 pkg
+          ExprStart(ExprOp::Sum, S2),
+            // ENV: 1 pkg
+            BindIdent(name),
+          ExprEnd(S4),
+        PkgEnd(S5),
+    ];
+
+    let asg = asg_from_toks_raw(toks);
+
+    #[rustfmt::skip]
+    assert_scope(&asg, name, [
+        // The identifier is not local,
+        //   and so its scope should extend into the global environment.
+        // TODO: (Root, S0, Pool),
+
+        // Expr does not introduce a new environment,
+        //   and so the innermost environment in which we should be able to
+        //   find the identifier is the Pkg.
+        (Pkg, m(S1, S5), Visible)
+    ]);
+}
+
+#[test]
+fn pkg_nested_expr_definition() {
+    let pkg_name = SPair("/pkg".into(), S1);
+    let outer = SPair("outer".into(), S3);
+    let inner = SPair("inner".into(), S5);
+
+    #[rustfmt::skip]
+    let toks = vec![
+        // ENV: 0 global
+        PkgStart(S1, pkg_name),
+          // ENV: 1 pkg
+          ExprStart(ExprOp::Sum, S2),
+            // ENV: 1 pkg
+            BindIdent(outer),
+
+            ExprStart(ExprOp::Sum, S4),
+              // ENV: 1 pkg
+              BindIdent(inner),
+            ExprEnd(S6),
+          ExprEnd(S7),
+        PkgEnd(S8),
+    ];
+
+    let asg = asg_from_toks_raw(toks);
+
+    #[rustfmt::skip]
+    assert_scope(&asg, inner, [
+        // The identifier is not local,
+        //   and so its scope should extend into the global environment.
+        // TODO: (Root, S0, Pool),
+
+        // Expr does not introduce a new environment,
+        //   and so just as the outer expression,
+        //   the inner is scoped to a package environment.
+        (Pkg, m(S1, S8), Visible)
+    ]);
+}
+
+#[test]
+fn pkg_tpl_definition() {
+    let pkg_name = SPair("/pkg".into(), S1);
+
+    let tpl_outer = SPair("_tpl-outer_".into(), S3);
+    let meta_outer = SPair("@param_outer@".into(), S5);
+    let expr_outer = SPair("exprOuter".into(), S8);
+
+    let tpl_inner = SPair("_tpl-inner_".into(), S11);
+    let meta_inner = SPair("@param_inner@".into(), S13);
+    let expr_inner = SPair("exprInner".into(), S16);
+
+    #[rustfmt::skip]
+    let toks = vec![
+        // ENV: 0 global
+        PkgStart(S1, pkg_name),
+          // ENV: 1 pkg
+          TplStart(S2),
+            // ENV: 2 tpl
+            BindIdent(tpl_outer),
+
+            TplMetaStart(S4),
+              BindIdent(meta_outer),
+            TplMetaEnd(S6),
+
+            ExprStart(ExprOp::Sum, S7),
+              BindIdent(expr_outer),
+            ExprEnd(S9),
+
+            TplStart(S10),
+              // ENV: 3 tpl
+              BindIdent(tpl_inner),
+
+              TplMetaStart(S12),
+                BindIdent(meta_inner),
+              TplMetaEnd(S14),
+
+              ExprStart(ExprOp::Sum, S15),
+                BindIdent(expr_inner),
+              ExprEnd(S17),
+            TplEnd(S18),
+          TplEnd(S19),
+        PkgEnd(S20),
+    ];
+
+    let asg = asg_from_toks_raw(toks);
+
+    #[rustfmt::skip]
+    assert_scope(&asg, tpl_outer, [
+        // TODO: (Root, S0, Pool),
+        (Pkg, m(S1, S20), Visible)
+    ]);
+    #[rustfmt::skip]
+    assert_scope(&asg, meta_outer, [
+        // TODO: (Tpl, m(S2, S19), Visible)
+    ]);
+    #[rustfmt::skip]
+    assert_scope(&asg, expr_outer, [
+        (Tpl, m(S2, S19), Visible)
+    ]);
+
+    #[rustfmt::skip]
+    assert_scope(&asg, tpl_inner, [
+        (Tpl, m(S2, S19), Visible)
+    ]);
+    #[rustfmt::skip]
+    assert_scope(&asg, meta_inner, [
+        // TODO: (Tpl, m(S10, S18), Visible)
+    ]);
+    #[rustfmt::skip]
+    assert_scope(&asg, expr_inner, [
+        (Tpl, m(S10, S18), Visible)
+    ]);
+}
+
+///// Tests end above this line, plumbing begins below /////
+
+/// Assert that the scope of the identifier named `name` is that of the
+///   provided environment list `expected`.
+///
+/// This will search the graph for all environments in which `name` has been
+///   indexed,
+///     gather information about those environments,
+///     and compare them against `expected`.
+/// The environment listing is expected to be in ontological order,
+///   as by [`tree_reconstruction`].
+///
+/// This function is essential to providing easily understood,
+///   declarative scope test definitions,
+///   which make it easy to form and prove hypotheses about the behavior of
+///   TAMER's scoping system.
+fn assert_scope(
+    asg: &Asg,
+    name: SPair,
+    expected: impl IntoIterator<Item = (ObjectTy, Span, EnvScopeKind)>,
+) {
+    // We are interested only in identifiers for scoping,
+    //   not the objects that they point to.
+    // We will use the span of the identifier that we locate via index
+    //   lookups to determine whether we've found the right one.
+    // This relies on the tests using unique spans for each object on the
+    //   graph,
+    //     which is standard convention for TAMER's tests.
+    let expected_span = name.span();
+
+    // We use what was most convenient at the time of writing to gather
+    //   environments representing the scope of `name`.
+    // This is not the most efficient,
+    //   but our test graphs are quite small,
+    //   and so that won't matter.
+    //
+    // The reason that this works is because the traversal will visit
+    //   objects following the graph's ontology,
+    //     which will produce a tree in the expected order.
+    // We filter on index lookup,
+    //   which discards the portions of the tree
+    //     (the graph)
+    //     that we are not interested in.
+    //
+    // This also means that a failure to extend the scope of `name` to a
+    //   particular environment will cause it to be omitted from this
+    //   iterator,
+    //     but that is okay;
+    //       it'll result in a test failure that should be easy enough to
+    //       understand.
+    let given_without_root =
+        tree_reconstruction(asg).filter_map(|TreeWalkRel(dynrel, _)| {
+            dynrel.target_oi_rel_to_dyn::<object::Ident>().map(|oi_to| {
+                (
+                    dynrel.target_ty(),
+                    dynrel.target().resolve(asg).span(),
+                    asg.lookup(oi_to, name),
+                )
+            })
+        });
+
+    // `tree_reconstruction` omits root,
+    //   so we'll have to add it ourselves.
+    let oi_root = asg.root(name);
+    let given = once((Root, S0, asg.lookup(oi_root, name)))
+        .chain(given_without_root)
+        .filter_map(|(ty, span, ooi)| ooi.map(|oi| (ty, span, oi.resolve(asg))))
+        .inspect(|(ty, span, ident)| assert_eq!(
+            expected_span,
+            ident.span(),
+            "expected {wname} span {expected_span} at {ty}:{span}, but found {given}",
+            wname = TtQuote::wrap(name),
+            given = ident.span(),
+        ))
+        // TODO
+        .map(|(ty, span, _)| (ty, span, EnvScopeKind::Visible));
+
+    // Collection allows us to see the entire expected and given lists on
+    //   assertion failure.
+    assert_eq!(
+        expected.into_iter().collect::<Vec<_>>(),
+        given.collect::<Vec<_>>(),
+    )
+}
--- a/tamer/src/asg/graph.rs
+++ b/tamer/src/asg/graph.rs
@ -90,7 +90,7 @@ pub struct Asg {
    /// Directed graph on which objects are stored.
    graph: DiGraph<Node, AsgEdge, Ix>,

-    /// Edge cache of [`SymbolId`][crate::sym::SymbolId] to
+    /// Environment cache of [`SymbolId`][crate::sym::SymbolId] to
    ///   [`ObjectIndex`]es.
    ///
    /// This maps a `(SymbolId, NodeIndex)` pair to a node on the graph for
--- a/tamer/src/asg/graph/object.rs
+++ b/tamer/src/asg/graph/object.rs
@ -210,6 +210,31 @@ macro_rules! object_gen {
            $($kind,)+
        }

+        impl ObjectTy {
+            /// Assume that the provided [`ObjectIndex`] is of the type
+            ///   associated with `self`,
+            ///     and then determine whether that object can be related to
+            ///     another object of a given type `OB`.
+            ///
+            /// This method should be kept private;
+            ///   it is memory safe,
+            ///     but incorrect assumptions will violate graph object
+            ///     invariants and cause panics when the [`ObjectIndexTo`]
+            ///     is later resolved.
+            fn assuming_oi_maybe_rel_to_dyn<OB: ObjectRelatable>(
+                &self,
+                oi: ObjectIndex<Object>,
+            ) -> Option<ObjectIndexTo<OB>> {
+                match self {
+                    $(
+                        Self::$kind => {
+                            $kind::oi_rel_to_dyn(oi.must_narrow_into())
+                        }
+                    )+
+                }
+            }
+        }
+
        impl<T: ObjectInner> From<&Object<T>> for ObjectTy {
            fn from(obj: &Object<T>) -> Self {
                match obj {
--- a/tamer/src/asg/graph/object/rel.rs
+++ b/tamer/src/asg/graph/object/rel.rs
@ -131,6 +131,23 @@ macro_rules! object_rel {
                    _ => None,
                }
            }
+
+            fn oi_rel_to_dyn<OB: ObjectRelatable>(
+                #[allow(unused_variables)] // for empty Rel
+                oi: ObjectIndex<Self>,
+            ) -> Option<$crate::asg::graph::object::ObjectIndexTo<OB>> {
+                #[allow(unused_imports)]
+                use $crate::asg::graph::object::ObjectIndexTo;
+
+                match OB::rel_ty() {
+                    $(
+                        ObjectRelTy::$kind => {
+                            ObjectIndexTo::<$kind>::from(oi).reflexivity()
+                        },
+                    )*
+                    _ => None,
+                }
+            }
        }

        $(
@ -291,6 +308,37 @@ impl<S> DynObjectRel<S, ObjectIndex<Object>> {
        self.map(|(soi, toi)| (soi, toi.resolve(asg).pair_oi(toi)))
    }

+    /// Retrieve the target [`ObjectIndex`] as an [`ObjectIndexTo<OB>`](ObjectIndexTo),
+    ///   if the object can be related to objects of type `OB`.
+    ///
+    /// This method may be confusing in that it represents another
+    ///   _possible_ relation on top of the relation represented by
+    ///   [`Self`].
+    /// That is:
+    ///
+    /// ```text
+    ///   OA -> OB [ -> OC]
+    ///   '______'
+    ///     Self
+    /// ```
+    ///
+    /// If this method returns [`Some`],
+    ///   that means that the target of this relation `OB` is an object
+    ///   _that is capable of being related to_ an object of type `OC`.
+    pub fn target_oi_rel_to_dyn<OC: ObjectRelatable>(
+        &self,
+    ) -> Option<ObjectIndexTo<OC>> {
+        // TODO: A newtype ought to couple these in a way that we don't have
+        //   to trust this assumption!
+        // This requires assuming that the target is of the target type,
+        //   which _should_ certainly be the case if originating from the graph,
+        //   but if it's not,
+        //     then later resolving the `ObjectIndex` with a mismatched type
+        //     will result in a panic.
+        self.target_ty()
+            .assuming_oi_maybe_rel_to_dyn(*self.target())
+    }
+
    /// Dynamically determine whether this edge represents a cross edge.
    ///
    /// This function is intended for _dynamic_ edge types,
@ -461,7 +509,7 @@ pub trait ObjectRelatable: ObjectKind {
    /// Represent a relation to another [`ObjectKind`] that cannot be
    ///   statically known and must be handled at runtime.
    ///
-    /// A value of [`None`] means that the provided [`DynObjectRel`] is not
+    /// A value of [`None`] means that the provided [`ObjectRelTy`] is not
    ///   valid for [`Self`].
    /// If the caller is utilizing edge data that is already present on the graph,
    ///   then this means that the system is not properly upholding edge
@ -476,6 +524,21 @@ pub trait ObjectRelatable: ObjectKind {
        ty: ObjectRelTy,
        oi: ObjectIndex<Object>,
    ) -> Option<Self::Rel>;
+
+    /// Cast the provided [`ObjectIndex`] into an [`ObjectIndexTo`] if it is
+    ///   able to be related to the provided `OB`.
+    ///
+    /// This is intended to be used in a dynamic context,
+    ///   where the caller is not aware statically of the [`ObjectKind`]s
+    ///   involved.
+    /// If it is _required_ that an object be relatable,
+    ///   use [`ObjectRelTo`] to statically verify that assertion.
+    ///
+    /// If the type `OB` is not a valid target of a relation from this type,
+    ///   [`None`] will be returned.
+    fn oi_rel_to_dyn<OB: ObjectRelatable>(
+        oi: ObjectIndex<Self>,
+    ) -> Option<ObjectIndexTo<OB>>;
 }

 impl<O: ObjectKind + ObjectRelatable> ObjectIndex<O> {
@ -961,6 +1024,32 @@ mod private {
        pub fn span(&self) -> Span {
            (*self).into()
        }
+
+        /// Assert a reflexive relationship between `OB` and `OC`.
+        ///
+        /// The types `OB` and `OC` are equivalent (and therefore reflexive)
+        ///   iff they have matching `ObjectRelTy`s.
+        ///
+        /// The sole purpose of this method is to satisfy Rust's type system
+        ///   in dynamic situations where the type system is not able to
+        ///   understand what we're doing,
+        ///     where the type `OC` is more general than the type `OB`.
+        /// This method is always safe;
+        ///   it will return [`None`] if the two types differ in runtime
+        ///   value.
+        ///
+        /// While the term "reflexive" is a binary relation in mathematics,
+        ///   the term "reflexivity" originates from the Coq tactic.
+        ///
+        /// For an example of where this is needed,
+        ///   see [`ObjectRelatable::oi_rel_to_dyn`].
+        pub fn reflexivity<OC: ObjectRelatable>(
+            self,
+        ) -> Option<ObjectIndexTo<OC>> {
+            let Self(parts, _) = self;
+            (OB::rel_ty() == OC::rel_ty())
+                .then_some(ObjectIndexTo(parts, PhantomData::default()))
+        }
    }

    // Ignore metadata that should always be consistent with the underlying