Commit Graph

67 Commits (28b83ad6a3d7ef3c60ad2b8895b031758c8ee3d1)

Author SHA1 Message Date
Mike Gerwitz 28b83ad6a3 tamer: asg::graph::AsgObjectMut: Allow objects to assert ownership over relationships
There's a lot to say about this; it's been a bit of a struggle figuring out
what I wanted to do here.

First: this allows objects to use `AsgObjectMut` to control whether an edge
is permitted to be added, or to cache information about an edge that is
about to be added.  But no object does that yet; it just uses the default
trait implementation, and so this _does not change any current
behavior_.  It also is approximately equivalent cycle-count-wise, according
to Valgrind (within ~100 cycles out of hundreds of millions on large package
tests).

Adding edges to the graph is still infallible _after having received
permission_ from an `ObjectIndexRelTo`, but the object is free to reject the
edge with an `AsgError`.

As an example of where this will be useful: the template system needs to
keep track of what is in the body of a template as it is defined.  But the
`TplAirAggregate` parser is sidelined while expressions in the body are
parsed, and edges are added to a dynamic source using
`ObjectIndexRelTo`.  Consequently, we cannot rely on a static API to cache
information; we have to be able to react dynamically.  This will allow `Tpl`
objects to know any time edges are added and, therefore, determine their
shape as the graph is being built, rather than having to traverse the tree
after encountering a close.

(I _could_ change this, but `ObjectIndexRelTo` removes a significant amount
of complexity for the caller, so I'd rather not.)

I did explore other options.  I rejected the first one, then rejected this
one, then rejected the first one again before returning back to this one
after having previously sidelined the entire thing, because of the above
example.  The core point is: I need confidence that the graph isn't being
changed in ways that I forgot about, and because of the complexity of the
system and the heavy refactoring that I do, I need the compiler's help;
otherwise I risk introducing subtle bugs as objects get out of sync with the
actual state of the graph.

(I wish the graph supported these things directly, but that's a project well
outside the scope of my TAMER work.  So I have to make do, as I have been
all this time, by layering atop of Petgraph.)

(...I'm beginning to ramble.)

(...beginning?)

Anyway: my other rejected idea was to provide attestation via the
`ObjectIndex` APIs to force callers to go through those APIs to add an edge
to the graph; it would use sealed objects that are inaccessible to any
modules other than the objects, and assert that the caller is able to
provide a zero-sized object of that sealed type.

The problem with this is...exactly what was mentioned above:
`ObjectIndexRelTo` is dynamic.  We don't always know the source object type
statically, and so we cannot make those static assertions.

I could have tried the same tricks to store attestation at some other time,
but what a confusing mess it would be.

And so here we are.

Most of this work is cleaning up the callers---adding edges is now fallible,
from the `ObjectIndex` API standpoint, and so AIR needed to be set up to
handle those failures.  There _aren't_ any failures yet, but again, since
things are dynamic, they could appear at any moment.  Furthermore, since
ref/def is commutative (things can be defined and referenced in any order),
there could be surprise errors on edge additions in places that might not
otherwise expect it in the future.  We're now ready for that, and I'll be
able to e.g. traverse incoming edges on a `Missing->Transparent` definition
to notify dependents.

This project is going to be the end of me.  As interesting as it is.

I can see why Rust just chose to require macro definitions _before_ use.  So
much less work.

DEV-13163
2023-07-24 16:41:32 -04:00
Mike Gerwitz e414782def tamer: asg::graph: Encapsulate edge additions
AIR is no longer able to explicitly add edges without going through an
object-specific `ObjectIndex` API.  `Asg::add_edge` was already private, but
`ObjectIndex::add_edge_{to,from}` was not.

The problem is that I want to augment the graph with other invariants, such
as caches.  I'd normally have this built into the graph system itself, but I
don't have the time for the engineering effort to extend or replace
Petgraph, so I'm going to build atop of it.

To have confidence in any sort of caching, I need assurances that the graph
can't change out from underneath an object.  This gets _close_ to
accomplishing that, but I'm still uncomfortable:

  - We're one `pub` addition away from breaking these invariants; and
  - Other `Object` types can still manipulates one-anothers' edges.

So this is a first step that at least proves encapsulation within
`asg::graph`, but ideally we'd have the system enforce, statically, that
`Objects` own their _outgoing_ edges, and no other `Object` is able to
manipulate them.  This would ensure that any accidental future changes, or
bugs, will cause compilation failures rather than e.g. allowing caches to
get out of sync with the graph.

DEV-13163
2023-07-21 10:21:57 -04:00
Mike Gerwitz 19a5ec1e0f tamer: asg: Reduce Debug output of `Asg` and `AirAggregateCtx`
The ASG had its output reduced previously but I had apparently stashed it; I
found it while trying to clean up after so many failed or partial attempts
and the various scoping changes.

The most fundamental issue is that there's too much information: it's very
difficult to interrogate so I seldom look at it, and it slows down Parser
trace output to the point where it's useless on even one of our smallest
systems, generating 1.5GiB of output for a graph of ~10k
objects (via tameld).

DEV-13162
2023-05-23 16:15:38 -04:00
Mike Gerwitz e940fc5aa0 tamer: asg: Move index from Asg to AirAggregateCtx
This finally removes the awkward index from the ASG.  This will need much
more documentation and a better organized abstraction, but in the meantime,
previous commit dive into some of the rationale.

In essence: it only really makes sense to have indexing on the ASG itself if
it is used to cache queries or other expensive operations.  But that is not
what we were using it for---it was used for caching _lexical_ properties,
which are useful only during parsing for the sake of forming relationships
on the graph.  Once those relationships have formed, different types of
indexes will be useful in different lowering, optimization, or querying
contexts.

This formalizes that, and in doing so, ensures that the index is will always
be accurate relative to the content of the ASG.  Once the index becomes
separated from it---through the `AirAggregateCtx::finish` operation---then
it is discarded and the ASG exposed.

This is also important because the index is incomplete---it contains only
the information necessary for the parser to carry out its task.

This change was a long time coming, and has reduced ASG to its essence.

DEV-13162
2023-05-19 13:38:17 -04:00
Mike Gerwitz 94bbc2d725 tamer: asg::air: Root AirIdent operations using AirAggregateCtx
This is the culmination of a great deal of work over the past few
weeks.  Indeed, this change has been prototyped a number of different ways
and has lived in a stash of mine, in one form or another, for a few weeks.

This is not done just yet---I have to finish moving the index out of Asg,
and then clean up a little bit more---but this is a significant
simplification of the system.  It was very difficult to reason about prior
approaches, and this finally moves toward doing something that I wasn't sure
if I'd be able to do successfully: formalize scope using AirAggregate's
stack and encapsulate indexing as something that is _supplemental_ to the
graph, rather than an integral component of it.

This _does not yet_ index the AirIdent operation on the package itself
because the active state is not part of the stack; that is one of the
remaining changes I still have stashed.  It will be needed shortly for
package imports.

This rationale will have to appear in docs, which I intend to write soon,
but: this means that `Asg` contains _resolved_ data and itself has no
concept of scope.  The state of the ASG immediately after parsing _can_ be
used to derive what the scope _must_ be (and indeed that's what
`asg::air::test::scope::derive_scopes_from_asg` does), but once we start
performing optimizations, that will no longer be true in all cases.

This means that lexical scope is a property of parsing, which, well, seems
kind of obvious from its name.  But the awkwardness was that, if we consider
scope to be purely a parse-time thing---used only to construct the
relationships on the graph and then be discarded---then how do we query for
information on the graph?  We'd have to walk the graph in search of an
identifier, which is slow.

But when do we need to do such a thing?  For tests, it doesn't matter if
it's a little bit slow, and the graphs aren't all that large.  And for
operations like template expansion and optimizations, if they need access to
a particular index, then we'll be sure to generate or provide the
appropriate one.  If we need a central database of identifiers for tooling
in the future, we'll create one then.  No general-purpose identifier lookup
_is_ actually needed.

And with that, `Asg::lookup_or_missing` is removed.  It has been around
since the beginning of the ASG, when the linker was just a prototype, so
it's the end of TAMER's early era as I was trying to discover exactly what I
wanted the ASG to represent.

DEV-13162
2023-05-17 12:23:36 -04:00
Mike Gerwitz 33f34bf244 tamer: asg: Initial identifier scoping
Okay, this is finally distilling into something fairly simple and
reasonable, but I'm not quite there yet.

In particular, the responsibility is simply between `Asg` (as the owner of
the index) and `AirAggregateCtx` (as the owner of the stack frames from
which environments and scope are derived).  This was inevitable and I was
waiting for it, but now I have a good idea of how to clean it up and
proceed.

This also doesn't index in root yet (`active_rooting_oi` is still `None` for
`Root`), and I think I may remove `Pool` and just make it `Visible` at that
point, since it won't be going any further anyway.  I don't think the
distinction is meaningful and will just complicate implementations.

The tests also need some more cleanup---the assertions ideally would live in
independent tests, and the assertion failure is in a function call rather
than the test (function) itself, so requires a Rust backtrace to locate the
line number of (unless you look at the failure data).

So I suppose this is more of a mental synchronization point than
anything.  Nothing's broken, though.

DEV-13162
2023-05-16 14:58:21 -04:00
Mike Gerwitz 9fb2169a06 tamer: asg::air: Begin to introduce explicit scope testing
There's a lot of documentation on this in the commit itself, but this stems
from

  a) frustration with trying to understand how the system needs to operate
     with all of the objects involved; and
  b) recognizing that if I'm having difficulty, then others reading the
     system later on (including myself) and possibly looking to improve upon
     it are going to have a whole lot of trouble.

Identifier scope is something I've been mulling over for years, and more
formally for the past couple of months.  This finally begins to formalize
that, out of frustration with package imports.  But it will be a weight
lifted off of me as well, with issues of scope always looming.

This demonstrates a declarative means of testing for scope by scanning the
entire graph in tests to determine where an identifier has been
scoped.  Since no such scoping has been implemented yet, the tests
demonstrate how they will look, but otherwise just test for current
behavior.  There is more existing behavior to check, and further there will
be _references_ to check, as they'll also leave a trail of scope indexing
behind as part of the resolution process.

See the documentation introduced by this commit for more information on
that part of this commit.

Introducing the graph scanning, with the ASG's static assurances, required
more lowering of dynamic types into the static types required by the
API.  This was itself a confusing challenge that, while not all that bad in
retrospect, was something that I initially had some trouble with.  The
documentation includes clarifying remarks that hopefully make it all
understandable.

DEV-13162
2023-05-12 14:07:29 -04:00
Mike Gerwitz 7cfe6a6f8d tamer: asg::graph: Index Root->Pkg with canonical names
The previous commit introduced canonical names, and this uses them to index.

The next step will be to utilize those names to look up packages on
definition rather than creating a new package node, so that references to
yet-to-be-defined (or yet-to-be-imported) packages can be resolved on the
graph.

DEV-13162
2023-05-02 16:15:07 -04:00
Mike Gerwitz 77ada079e1 tamer: asg::graph::Asg.graph: Finally encapsulate
With the previous commit using a visitor implemented within the `asg`
module, we can now finally encapsulate the graph.  This is a wonderfully
liberating, long-awaited change, since I have been fighting with the lack of
encapsulation for some time; it has made certain changes challenging and has
made the system more difficult to reason about.  It also made it impossible
to assert that invariants were _actually_ properly enforced, if things could
just peer into and modify the graph directly, out from underneath the API
that provides those assurances.

This also removes our dependency on Petgraph outside of the `asg`
module.  There are no plans to migrate away from it currently; we'll see how
the graph continues to evolve over time and what redundancies are introduced
with our data structures.  It may render petgraph unnecessary.

Interestingly, because my DFS implementation is so similar to Petgraph's,
the emitted ordering is _identical_ between this commit and the previous.

DEV-13162
2023-04-28 15:36:07 -04:00
Mike Gerwitz e3094e0bad tamer: asg::graph::visit::topo: Introduce topological sort
This is an initial implementation that does not yet produce errors on
cycles.  Documentation is not yet complete.

The implementation is fairly basic, and similar to Petgraph's DFS.

A terminology note: the DFS will be ontology-aware (or at least aware of
edge metadata) to avoid traversing edges that would introduce cycles in
situations where they are permitted, which effectively performs a
topological sort on an implicitly _filtered_ graph.

This will end up replacing ld::xmle::lower::sort.

DEV-13162
2023-04-26 09:51:45 -04:00
Mike Gerwitz 42aa5bd407 tamer: asg::graph: Root->Ident {tree=>cross} edge
tameld isn't yet adding edges to Idents from their associated Pkg (see
previous commit), but this formalizes how the ontology will interpret such a
relationship.  The idea is that Idents are always owned by Pkgs, but they
may be optionally explicitly rooted, which will be used by a particular type
of DFS walk that is about to be written, which can ignore Root->Pkg and
focus instead on cross edges to Idents.

Though it's not lost on me that now that I'll be introducing a DFS for the
linker, the terms "cross" and "tree" edge now become ambiguous; I used to
call them "ontological X edge", but I had fallen out of that habit; perhaps
I need to reintroduce that rigor.

DEV-13162
2023-04-24 09:44:02 -04:00
Mike Gerwitz 48d9bca3b7 tamer: obj::xmlo: Add Pkg nodes for identifiers
This modifies the xmlo reader, xmlo->AIR lowering, and AIR->ASG to introduce
a package for identifiers.  It does not yet, however, add edges from the
package to the identifier.

Once edges are added, the DFS will change in undesirable ways, which will
require a new implementation.  This is desirable to decouple from Petgraph
anyway, and then will be able to restore the prior single-pass sort+cycle
check.

That will also encapsulate visiting behavior within the `asg::graph` module
and, in turn, allow encapsulating `Asg.graph` finally.

DEV-13162
2023-04-21 16:24:11 -04:00
Mike Gerwitz 6f68292df5 tamer: asg::graph::{index_identifier=>index}: Generalize
This may now index _any_ type of object, in preparation for indexing package
import paths.  In practice, this only makes sense (at least currently) for
`Pkg` and `Ident`.

This generalization also applies to `Asg::lookup_or_missing`.

DEV-13162
2023-04-20 16:46:30 -04:00
Mike Gerwitz f183600c3a tamer: asg: Move Ident-specific methods off of Asg
Historically, the ASG was better described as a "dependency graph",
containing only identifiers (which are simply called "symbols" in the
XSLT-based compiler).  Consequently, it was appropriate for the graph to
have operations specific to identifiers.  (Indeed, that's the only type of
object the graph supported.)

Much has changed since then.  This cleans things up, and makes parenting
identifiers to root an _explicit_ operation.  This will make it easier to
move forward with handling of scope, and importing identifiers into
packages, and removing `Source`, and so on.

DEV-13162
2023-04-19 12:40:35 -04:00
Mike Gerwitz 778e90c81d tamer: asg::air: Index package identifiers on `Pkg` rather than `Root`
I've been torturing myself trying to figure out how I want to generalize
indexing, lookups, and value numbering in a way that is appropriate for this
project (that is, not over-engineered relative to my needs).

Before I can do much of anything, though, I need to stop having indexing
only as a `Root` thing (previously it wasn't even tied to `Root`).  This
makes that change for tamec, but temporarily removes scoping concerns until
I can add more specific types of indexing.

Not only does this allow cleaning up some `Ident`-specific stuff from `Asg`,
but the cleanup also helps to show that portions of the system aren't still
using Root-based globals.

The linker (`tameld`) still uses the old `global` methods for now; those
will eventually go away, but this needs to change to unify both tamec and
tameld once we get to imports as part of the compiler.

DEV-13162
2023-04-19 12:40:34 -04:00
Mike Gerwitz a738a05461 tamer: asg::graph::object::rel: Hash impls for ObjectIndexTo{,Tree}
All ObjectIndex-like objects hash using only the underlying identifier,
which ultimately boils down to a `NodeIndex` (petgraph), which is just a
u32.  And so in that sense, the only purpose we have for hashing it is to
(a) reduce the space required to store mappings, and (b) compose with other
`Hash`es.

DEV-13708
2023-04-05 15:46:42 -04:00
Mike Gerwitz 02dba0d63a tamer: asg::graph::Asg: Index by (SymbolId, NodeIndex) pair
The prior commit begins to explain the end goal of being able to index
identifiers outside of the global environment.

This change continues to index things as before, but introduces a new key
based on the pair of the symbol id together with a node that is _part of_
its target environment.  The only environment utilized at the moment (in this
commit) is that of the root node (which is the global scope), in both
indexing and lookup.  Future commits will extend this, and contain more
information about and rationale for the implementation.

The new general index methods are restricted to `pub(super)` until an
abstraction can be put in place that is responsible for environment
indexing; that's a responsibility that is currently handled by
`AirAggregateCtx` for tamec, and the linker has no scoping
requirements since all of that has already been dealt with.

DEV-13708
2023-04-03 16:14:30 -04:00
Mike Gerwitz 5b0a4561a2 Revert "Revert "tamer: asg::graph::index: Use FxHashMap in place of Vec""
This reverts commit 1b7eac337cd5909c01ede3a5b3fba577898d5961.

This is a revert of the previous revert, just so that I (and you) have
references to prior rationale.

This was previously reverted because it wasn't worth doing, but now we have
a situation where we need to begin implementing lexical scoping rules for
nested containers (packages and templates).  In particular, as you'll see in
the commits that follow, we need to be able to look up an identifier that
may have been created as Missing at one level of scope (certain types of
blocks), but then define it at another level.

Or, even more simply at this point, since I'm not yet doing anything
sophisticated with scope: we're only indexing in the global environment, and
we need to be able to index elsewhere too.

The next commit will go into more information, but suffice it to say for now
that indexing is going to get more complicated than a SymbolId.

Sticking with FxHash for now; we don't need a stable hash now.

DEV-13708
2023-04-03 15:15:54 -04:00
Mike Gerwitz e3d60750a9 tamer: asg::air: Errors for rooting_ci() TODOs
This eliminates the TODOs that existed when looking for an OI for rooting an
identifier.

The change to `rooting_ci` is ridiculous, but I want to get other things
done before I jump down the rabbit hole of generalizing that (indexing local
identifiers).  Though I have an approach in mind.

DEV-13708
2023-03-31 13:57:11 -04:00
Mike Gerwitz 2ae33a1dfa tamer: asg::graph::object: ObjectIndexTo and ObjectIndexRelTo
The graph's ontology is defined in the direction of the edge: from OA
to OB.  This is enforced by the type system to ensure that no code path is
able to generate an invalid graph.

But that also makes it very difficult to work with a generic source to a
specific target.

This introduces a `ObjectIndexRelTo` trait that says whether `Self` is able
to be related to some `ObjectKind` `OB`, implements it for `ObjectIndex
where ObjectRelTo<OB>`, and introduces a new semi-opaque type
`ObjectIndexTo` that allows for the source `ObjectIndex` to be generic.

This then redefines some existing graph primitives in terms of
`ObjectIndexRelTo`, in particular creating edges, so that `ObjectIndex` can
be used as today, and the new `ObjectIndexTo` can be used in the same way
with the same API, without violating the graph ontology.

This will be used by `AirAggregate` to create dynamic targets for rooting
and splicing/expansion.

DEV-13708
2023-03-29 12:58:35 -04:00
Mike Gerwitz 9c0e20e58c tamer: asg: Shorthand and long-form template arguments
This applies to template application only; there's still some work to do for
template parameters in definitions (well, for deriving them in `xmli` at
least).  And, as you can see, there's still a lot of TODO items here.

I ended up backtracking on tree edges to Meta, and even on cross edges to
Meta, because it complicated xmli derivation with no benefit right now;
maybe a cross edge will be re-added in the future, but I need to move on and
see where this takes me.

But, it works.

DEV-13708
2023-03-29 12:58:35 -04:00
Mike Gerwitz fcd25d581c tamer: asg::air::expr: Do not cache (globally) identifiers created with StoreDangling
I'm not happy with this implementation.  The linear search is undesirable,
but not too bad (and maybe wouldn't even be worth caching, if this were the
whole story), but we _also_ need to prevent duplicate identifiers.  We are
not going to want to perform a linear search of a linked list (effectively)
every time we add an identifier to check for uniqueness, so I think the
caching is going to have to be generalized very shortly anyway.

As it stands now, a duplicate identifier would cause an error at expansion
time.  That's not what we want, but it's not terrible, because you can have
that same problem in normal circumstances without local conflicts.

But this'll be used for metavariables as well, where we absolutely _do_ want
to fail at template definition time.

DEV-13708
2023-03-29 12:58:35 -04:00
Mike Gerwitz 1c7df894ea tamer: asg::graph: *lookup{=>_global}*
Identifier lookups, as done using the graph methods today, look up from a
cache representing the global environment.

Templates must not contribute to this environment until expansion.  Further,
metavariables will not be present in this environment.  To avoid confusion
and help obviate accidental contributions to this environment, the methods
have been renamed.  This will also allow for the creation of more general
methods down the line.

DEV-13708
2023-03-29 12:58:35 -04:00
Mike Gerwitz a5b03e8790 tamer: Embed ASG ontology visualization in rustdoc-generated docs
There, in-your-face and not hidden in some tools directory.

DEV-13708
2023-03-10 14:28:00 -05:00
Mike Gerwitz 3587d032c3 tamer: asg::graph::object::rel::DynObjectRel: Store source data
This is generic over the source, just as the target, defaulting just the
same to `ObjectIndex`.

This allows us to use only the edge information provided rather than having
to perform another lookup on the graph and then assert that we found the
correct edge.  In this case, we're dealing with an `Ident->Expr` edge, of
which there is only one, but in other cases, there may be many such edges,
and it wouldn't be possible to know _which_ was referred to without also
keeping context of the previous edge in the walk.

So, in addition to avoiding more indirection and being more immune to logic
bugs, this also allows us to avoid states in `AsgTreeToXirf` for the purpose
of tracking previous edges in the current path.  And it means that the tree
walk can seed further traversals in conjunction with it, if that is so
needed for deriving sources.

More cleanup will be needed, but this does well to set us up for moving
forward; I was too uncomfortable with having to do the separate
lookup.  This is also a more intuitive API.

But it does have the awkward effect that now I don't need the pair---I just
need the `Object`---but I'm not going to remove it because I suspect I may
need it in the future.  We'll see.

The TODO references the fact that I'm using a convenient `resolve_oi_pairs`
instead of resolving only the target first and then the source only in the
code path that needs it.  I'll want to verify that Rust will properly
optimize to avoid the source resolution in branches that do not need it.

DEV-13708
2023-03-10 14:27:58 -05:00
Mike Gerwitz ee9128fbe0 tamer: asg::graph::{object::xir=>xmli}: Rename module
This better reflects what is being done and makes it easier for someone to
find.

DEV-13708
2023-03-10 14:27:58 -05:00
Mike Gerwitz 7f3ce44481 tamer: asg::graph: Formalize dynamic relationships (edges)
The `TreePreOrderDfs` iterator needed to expose additional edge context to
the caller (specifically, the `Span`).  This was getting a bit messy, so
this consolodates everything into a new `DynObjectRel`, which also
emphasizes that it is in need of narrowing.

Packing everything up like that also allows us to return more information to
the caller without complicating the API, since the caller does not need to
be concerned with all of those values individually.

Depth is kept separate, since that is a property of the traversal and is not
stored on the graph.  (Rather, it _is_ a property of the graph, but it's not
calculated until traversal.  But, depth will also vary for a given node
because of cross edges, and so we cannot store any concrete depth on the
graph for a given node.  Not even a canonical one, because once we start
doing inlining and common subexpression elimination, there will be shared
edges that are _not_ cross edges (the node is conceptually part of _both_
trees).  Okay, enough of this rambling parenthetical.)

DEV-13708
2023-03-10 14:27:57 -05:00
Mike Gerwitz e6f736298b tamer: asg::graph::visit::tree_reconstruction: New graph traversal
This begins to introduce a graph traversal useful for a source
reconstruction from the current state of the ASG.  The idea is to, after
having parsed and ingested the source through the lowering pipeline, to
re-output it to (a) prove that we have parsed correctly and (b) allow
progressively moving things from the XSLT-based compiler into TAMER.

There's quite a bit of documentation here; see that for more
information.  Generalizing this in an appropriate way took some time, but I
think this makes sense (that work began with the introduction of cross edges
in terms of the tree described by the graph's ontology).  But I do need to
come up with an illustration to include in the documentation.

DEV-13708
2023-03-10 14:27:57 -05:00
Mike Gerwitz 2d3b27ac01 tamer: asg: Root package definition
This causes a package definition to be rooted (so that it can be easily
accessed for a graph walk).  This keeps consistent with the new
`ObjectIndex`-based API by introducing a unit `Root` `ObjectKind` and the
boilerplate that goes with it.

This boilerplate, now glaringly obvious, will be refactored at some point,
since its repetition is onerous and distracting.

DEV-13159
2023-02-01 10:34:17 -05:00
Mike Gerwitz f753a23bad tamer: asg: Introduce edge from Package to Ident
Included in this diff are the corresponding changes to the graph to support
the change.  Adding the edge was easy, but we also need a way to get the
package for an identifier.  The easiest way to do that is to modify the edge
weight to include not just the target node type, but also the source.

DEV-13159
2023-02-01 10:34:17 -05:00
Mike Gerwitz 2f08985111 tamer: asg::graph::object::new_rel_dyn: Use Option
Rather than panicing at this level, let's panic at the caller, simplifying
impls and keeping them total.

This can't occur now, but an upcoming change introducing a package type will
allow for such a thing.

DEV-13159
2023-02-01 10:34:16 -05:00
Mike Gerwitz e6abd996b7 tamer: asg::graph::Asg: Non-exhaustive Debug impl
This hides information that's taking up a lot of space in the parser traces
and is not useful information.  In particular, the `index` contains a lot of
empty space due to pre-interned symbols.

The index was going to be converted into a HashMap, but that was reverted
because the tradeoff did not make sense, and so this problem remains; see
the previous commit for more information.

DEV-13159
2023-02-01 10:34:16 -05:00
Mike Gerwitz d066bb370f Revert "tamer: asg::graph::index: Use FxHashMap in place of Vec"
This reverts commit 1b7eac337cd5909c01ede3a5b3fba577898d5961.

I don't actually think this ends up being worth it in the end.  Sure, the
implementation is simpler at a glance, but it is more complex at runtime,
adding more cycles for little benefit.

There are ~220 pre-interned symbols at the time of writing, so ~880 bytes (4
bytes per symbol) are potentially wasted if _none_ of the pre-interned
symbols end up serving as identifiers in the graph.  The reality is that
some of them _will_ but, but using HashMap also introduces overhead, so in
practice, the savings is much less.  On a fairly small package, it was <100
bytes memory saving in `tamec`.  For `tameld`, it actually uses _more_
memory, especially on larger packages, because there are 10s of thousands of
symbols involved.  And we're incurring a rehashing cost on resize, unlike
this original plain `Vec` implementation.

So, I'm leaving this in the history to reference in the future or return to
it if others ask; maybe it'll be worth it in the future.
2023-02-01 10:34:16 -05:00
Mike Gerwitz 417df548cf tamer: asg::graph::index: Use FxHashMap in place of Vec
This was originally written before there were a bunch of preinterned
symbols.  Now the index vector is very sparse.

This simplifies things a bit.  If this ends up manifesting as a bottleneck
in the future, we can revisit the implementation.  While this does result in
more cycles, it's neglibable relative to the total cycle count.
2023-02-01 10:34:16 -05:00
Mike Gerwitz 055ff4a9d9 tamer: Remove graphml target
This was originally created to populate Neo4J for querying, but it has not
been utilized.  It's become a maintenance burden as I try to change the API
of and encapsulate the graph, which is important for upholding its
invariants.

This feature, or one like it, will return in the future.  I have other
related plans; we'll see if they materialize.

The graph can't be encapsulated fully just yet because of the linker; those
commits will come in the following days.

DEV-13597
2023-01-26 14:45:17 -05:00
Mike Gerwitz 8735c2fca3 tamer: asg::graph: Static- and runtime-enforced multi-kind edge ontolgoy
This allows for edges to be multiple types, and gives us two important
benefits:

  (a) Compiler-verified correctness to ensure that we don't generate graphs
      that do not adhere to the ontology; and
  (b) Runtime verification of types, so that bugs are still memory safe.

There is a lot more information in the documentation within the patch.

This took a lot of iterating to get something that was tolerable.  There's
quite a bit of boilerplate here, and maybe that'll be abstracted away better
in the future as the graph grows.

In particular, it was challenging to determine how I wanted to actually go
about narrowing and looking up edges.  Initially I had hoped to represent
the subsets as `ObjectKind`s as well so that you could use them anywhere
`ObjectKind` was expected, but that proved to be far too difficult because I
cannot return a reference to a subset of `Object` (the value would be owned
on generation).  And while in a language like C maybe I'd pad structures and
cast between them safely, since they _do_ overlap, I can't confidently do
that here since Rust's discriminant and layout are not under my control.

I tried playing around with `std::mem::Discriminant` as well, but
`discriminant` (the function) requires a _value_, meaning I couldn't get the
discriminant of a static `Object` variant without some dummy value; wasn't
worth it over `ObjectRelTy.`  We further can't assign values to enum
variants unless they hold no data.  Rust a decade from now may be different
and will be interesting to look back on this struggle.

DEV-13597
2023-01-26 14:45:14 -05:00
Mike Gerwitz 954b5a2795 Copyright year and name update
Ryan Specialty Group (RSG) rebranded to Ryan Specialty after its IPO.
2023-01-20 23:37:30 -05:00
Mike Gerwitz 1be0f2fe70 tamer: asg::object: Move into graph module
The ASG delegates certain operations to Objects so that they may enforce
their own invariants and ontology.  It is therefore important that only
objects have access to certain methods on `Asg`, otherwise those invariants
could be circumvented.

It should be noted that the nesting of this module is such that AIR should
_not_ have privileged access to the ASG---it too must utilize objects to
ensure those invariants are enforced in a single place.

DEV-13597
2023-01-20 23:37:30 -05:00
Mike Gerwitz c9746230ef tamer: asg::graph::test: Extract into own file
DEV-13597
2023-01-20 23:37:29 -05:00
Mike Gerwitz 4e3a81d7f5 tamer: asg: Bind transparent ident
This provides the initial implementation allowing an identifier to be
defined (bound to an object and made transparent).

I'm not yet entirely sure whether I'll stick with the "transparent" and
"opaque" terminology when there's also "declare" and "define", but a
`Missing` state is a type of declaration and so the distinction does still
seem to be important.

There is still work to be done on `ObjectIndex::<Ident>::bind_definition`,
which will follow.  I'm going to be balancing work to provide type-level
guarantees, since I don't have the time to go as far as I'd like.

DEV-13597
2023-01-20 23:37:29 -05:00
Mike Gerwitz 378fe3db66 tamer: asg::Asg::lookup: SymbolId=>SPair
This seems to have been an oversight from when I recently introduced SPairs
to ASG; I noticed it while working on another change and receiving back a
`DUMMY_SPAN`.

DEV-13597
2023-01-20 23:37:29 -05:00
Mike Gerwitz a9e65300fb tamer: diagnose::panic: Require thunk or static ref for diagnostic data
Some investigation into the disassembly of TAMER's binaries showed that Rust
was not able to conditionalize `expect`-like expressions as I was hoping due
to eager evaluation language semantics in combination with the use of
`format!`.

This solves the problem for the diagnostic system be creating types that
prevent this situation from occurring statically, without the need for a
lint.
2023-01-20 23:37:29 -05:00
Mike Gerwitz e6640c0019 tamer: Integrate clippy
This invokes clippy as part of `make check` now, which I had previously
avoided doing (I'll elaborate on that below).

This commit represents the changes needed to resolve all the warnings
presented by clippy.  Many changes have been made where I find the lints to
be useful and agreeable, but there are a number of lints, rationalized in
`src/lib.rs`, where I found the lints to be disagreeable.  I have provided
rationale, primarily for those wondering why I desire to deviate from the
default lints, though it does feel backward to rationalize why certain lints
ought to be applied (the reverse should be true).

With that said, this did catch some legitimage issues, and it was also
helpful in getting some older code up-to-date with new language additions
that perhaps I used in new code but hadn't gone back and updated old code
for.  My goal was to get clippy working without errors so that, in the
future, when others get into TAMER and are still getting used to Rust,
clippy is able to help guide them in the right direction.

One of the reasons I went without clippy for so long (though I admittedly
forgot I wasn't using it for a period of time) was because there were a
number of suggestions that I found disagreeable, and I didn't take the time
to go through them and determine what I wanted to follow.  Furthermore, it
was hard to make that judgment when I was new to the language and lacked
the necessary experience to do so.

One thing I would like to comment further on is the use of `format!` with
`expect`, which is also what the diagnostic system convenience methods
do (which clippy does not cover).  Because of all the work I've done trying
to understand Rust and looking at disassemblies and seeing what it
optimizes, I falsely assumed that Rust would convert such things into
conditionals in my otherwise-pure code...but apparently that's not the case,
when `format!` is involved.

I noticed that, after making the suggested fix with `get_ident`, Rust
proceeded to then inline it into each call site and then apply further
optimizations.  It was also previously invoking the thread lock (for the
interner) unconditionally and invoking the `Display` implementation.  That
is not at all what I intended for, despite knowing the eager semantics of
function calls in Rust.

Anyway, possibly more to come on that, I'm just tired of typing and need to
move on.  I'll be returning to investigate further diagnostic messages soon.
2023-01-20 23:37:29 -05:00
Mike Gerwitz f1cf35f499 tamer: asg: Add expression edges
This introduces a number of abstractions, whose concepts are not fully
documented yet since I want to see how it evolves in practice first.

This introduces the concept of edge ontology (similar to a schema) using the
type system.  Even though we are not able to determine what the graph will
look like statically---since that's determined by data fed to us at
runtime---we _can_ ensure that the code _producing_ the graph from those
data will produce a graph that adheres to its ontology.

Because of the typed `ObjectIndex`, we're also able to implement operations
that are specific to the type of object that we're operating on.  Though,
since the type is not (yet?) stored on the edge itself, it is possible to
walk the graph without looking at node weights (the `ObjectContainer`) and
therefore avoid panics for invalid type assumptions, which is bad, but I
don't think that'll happen in practice, since we'll want to be resolving
nodes at some point.  But I'll addres that more in the future.

Another thing to note is that walking edges is only done in tests right now,
and so there's no filtering or anything; once there are nodes (if there are
nodes) that allow for different outgoing edge types, we'll almost certainly
want filtering as well, rather than panicing.  We'll also want to be able to
query for any object type, but filter only to what's permitted by the
ontology.

DEV-13160
2023-01-20 23:37:29 -05:00
Mike Gerwitz 5e13c93a8f tamer: asg: New ObjectContainer for Node type
Working with the graph can be confusing with all of the layers
involved.  This begins to provide a better layer of abstraction that can
encapsulate the concept and enforce invariants.

Since I'm better able to enforce invariants now, this also removes the span
from the diagnostic message, since the invariant is now always enforced with
certainty.  I'm not removing the runtime panic, though; we can revisit that
if future profiling shows that it makes a negative impact.

DEV-13160
2023-01-20 23:37:29 -05:00
Mike Gerwitz 8786ee74fa tamer: asg::air: Expression building error cases
This addresses the two outstanding `todo!` match arms representing errors in
lowering expressions into the graph.  As noted in the comments, these errors
are unlikely to be hit when using TAME in the traditional way, since
e.g. XIR and NIR are going to catch the equivalent problems within their own
contexts (unbalanced tags and a valid expression grammar respectively).

_But_, the IR does need to stand on its own, and I further hope that some
tooling maybe can interact more directly with AIR in the future.

DEV-13160
2023-01-20 23:37:29 -05:00
Mike Gerwitz 40c941d348 tamer: asg::air::AirAggregate: Initial impl of nested exprs
This introduces a number of concepts together, again to demonstrate that
they were derived.

This introduces support for nested expressions, extending the previous
work.  It also supports error recovery for dangling expressions.

The parser states are a mess; there is a lot of duplicate code here that
needs refactoring, but I wanted to commit this first at a known-good state
so that the diff will demonstrate the need for the change that will
follow; the opportunities for abstraction are plainly visible.

The immutable stack introduced here could be generalized, if needed, in the
future.

Another important note is that Rust optimizes away the `memcpy`s for the
stack that was introduced here.  The initial Parser Context was introduced
because of `ArrayVec` inhibiting that elision, but Vec never had that
problem.  In the future, I may choose to go back and remove ArrayVec, but I
had wanted to keep memory allocation out of the picture as much as possible
to make the disassembly and call graph easier to reason about and to have
confidence that optimizations were being performed as intended.

With that said---it _should_ be eliding in tamec, since we're not doing
anything meaningful yet with the graph.  It does also elide in tameld, but
it's possible that Rust recognizes that those code paths are never taken
because tameld does nothing with expressions.  So I'll have to monitor this
as I progress and adjust accordingly; it's possible a future commit will
call BS on everything I just said.

Of course, the counter-point to that is that Rust is optimizing them away
anyway, but Vec _does_ still require allocation; I was hoping to keep such
allocation at the fringes.  But another counter-point is that it _still_ is
allocated at the fringe, when the context is initialized for the parser as
part of the lowering pipeline.  But I didn't know how that would all come
together back then.

...alright, enough rambling.

DEV-13160
2023-01-20 23:37:29 -05:00
Mike Gerwitz edbfc87a54 tamer: f::Functor: New trait
This commit is purposefully coupled with changes that utilize it to
demonstrate that the need for this abstraction has been _derived_, not
forced; TAMER doesn't aim to be functional for the sake of it, since
idiomatic Rust achieves many of its benefits without the formalisms.

But, the formalisms do occasionally help, and this is one such
example.  There is other existing code that can be refactored to take
advantage of this style as well.

I do _not_ wish to pull an existing functional dependency into TAMER; I want
to keep these abstractions light, and eliminate them as necessary, as Rust
continues to integrate new features into its core.  I also want to be able
to modify the abstractions to suit our particular needs.  (This is _not_ a
general recommendation; it's particular to TAMER and to my experience.)

This implementation of `Functor` is one such example.  While it is modeled
after Haskell in that it provides `fmap`, the primitive here is instead
`map`, with `fmap` derived from it, since `map` allows for better use of
Rust idioms.  Furthermore, it's polymorphic over _trait_ type parameters,
not method, allowing for separate trait impls for different container types,
which can in turn be inferred by Rust and allow for some very concise
mapping; this is particularly important for TAMER because of the disciplined
use of newtypes.

For example, `foo.overwrite(span)` and `foo.overwrite(name)` are both
self-documenting, and better alternatives than, say, `foo.map_span(|_|
span)` and `foo.map_symbol(|_| name)`; the latter are perfectly clear in
what they do, but lack a layer of abstraction, and are verbose.  But the
clarity of the _new_ form does rely on either good naming conventions of
arguments, or explicit type annotations using turbofish notation if
necessary.

This will be implemented on core Rust types as appropriate and as
possible.  At the time of writing, we do not yet have trait specialization,
and there's too many soundness issues for me to be comfortable enabling it,
so that limits that we can do with something like, say, a generic `Result`,
while also allowing for specialized implementations based on newtypes.

DEV-13160
2023-01-20 23:37:27 -05:00
Mike Gerwitz 0863536149 tamer: asg::Asg::get: Narrow object type
This uses `ObjectIndex` to automatically narrow the type to what is
expected.

Given that `ObjectIndex` is supposed to mean that there must be an object
with that index, perhaps the next step is to remove the `Option` from `get`
as well.

DEV-13160
2022-12-22 16:32:21 -05:00
Mike Gerwitz 6e90867212 tamer: asg::object::Object{Ref=>Index}: Associate object type
This makes the system a bit more ergonomic and introduces additional type
safety by associating the narrowed object type with the
`ObjectIndex` (previously `ObjectRef`).  Not only does this allow us to
explicitly state the type of object wherever those indices are stored, but
it also allows the API to automatically narrow to that type when operating
on it again without the caller having to worry about it.

DEV-13160
2022-12-22 15:18:08 -05:00