Commit Graph

79 Commits (1cf54887565cd46dccc97b98ee0b9696235e1a52)

Author SHA1 Message Date
Mike Gerwitz 34b64fd619 tamer: asg::air: AIR as a sum IR
This introduces a new macro `sum_ir!` to help with a long-standing problem
of not being able to easily narrow types in Rust without a whole lot of
boilerplate.  This patch includes a bit of documentation, so see that for
more information.

This was not a welcome change---I jumped down this rabbit hole trying to
decompose `AirAggregate` so that I can share portions of parsing with the
current parser and a template parser.  I can now proceed with that.

This is not the only implementation that I had tried.  I previously inverted
the approach, as I've been doing manually for some time: manually create
types to hold the sets of variants, and then create a sum type to hold those
types.  That works, but it resulted in a mess for systems that have to use
the IR, since now you have two enums to contend with.  I didn't find that to
be appropriate, because we shouldn't complicate the external API for
implementation details.

The enum for IRs is supposed to be like a bytecode---a list of operations
that can be performed with the IR.  They can be grouped if it makes sense
for a public API, but in my case, I only wanted subsets for the sake of
delegating responsibilities to smaller subsystems, while retaining the
context that `match` provides via its exhaustiveness checking but does not
expose as something concrete (which is deeply frustrating!).

Anyway, here we are; this'll be refined over time, hopefully, and
portions of it can be generalized for removing boilerplate from other IRs.

Another thing to note is that this syntax is really a compromise---I had to
move on, and I was spending too much time trying to get creative with
`macro_rules!`.  It isn't the best, and it doesn't seem very Rust-like in
some places and is therefore not necessarily all that intuitive.  This can
be refined further in the future.  But the end result, all things
considered, isn't too bad.

DEV-13708
2023-03-10 14:27:58 -05:00
Mike Gerwitz d42a46d2b8 tamer: NIR->xmli template definition setup
This sets the stage for template parsing, and finally decides how we're
going to represent templates on the ASG.  This is going to start simple,
since my original plans for improving how templates are
handled (conceptually) is going to have to wait.

This is the last difficult object type to figure out, with respect to graph
representation and derivation, so I wanted to get it out of the way.

DEV-13708
2023-03-10 14:27:58 -05:00
Mike Gerwitz 08278bc867 tamer: asg::air::Air::{ExprIdent=>BindIdent}: Rename
I wasn't initially sure whether I'd want separate tokens for different types
of identifying operations, but now that I see that it is clear from the
current state of the parser, there's no need.

This matches the name of the token in NIR.

DEV-13708
2023-03-10 14:27:58 -05:00
Mike Gerwitz 4afc8c22e6 tamer: asg::air: Merge Pkg closing span
The `Pkg` span will now properly reflect the entire definition of the
package including the opening and closing tags.

This was found while I was working on a graph traversal.

DEV-13597
2023-03-10 14:27:57 -05:00
Mike Gerwitz 39e98210be tamer: asg::graph::object::ident::ObjectIndex::<Ident>::bind_definition: Replace ident span
I noticed this while working on a graph traversal.  The unit test used the
same span for both the reference _and_ the binding, so I didn't notice. -_-

The problem with this, though, is that we do not have a separate span
representing the source location of the identifier reference.  The reason is
that we decided to re-use an existing node rather than creating another one,
which would add another inconvenient layer of indirection (and complexity).

So, I may have to add (optional?) spans to edges.

DEV-13708
2023-03-10 14:27:57 -05:00
Mike Gerwitz 2d3b27ac01 tamer: asg: Root package definition
This causes a package definition to be rooted (so that it can be easily
accessed for a graph walk).  This keeps consistent with the new
`ObjectIndex`-based API by introducing a unit `Root` `ObjectKind` and the
boilerplate that goes with it.

This boilerplate, now glaringly obvious, will be refactored at some point,
since its repetition is onerous and distracting.

DEV-13159
2023-02-01 10:34:17 -05:00
Mike Gerwitz f753a23bad tamer: asg: Introduce edge from Package to Ident
Included in this diff are the corresponding changes to the graph to support
the change.  Adding the edge was easy, but we also need a way to get the
package for an identifier.  The easiest way to do that is to modify the edge
weight to include not just the target node type, but also the source.

DEV-13159
2023-02-01 10:34:17 -05:00
Mike Gerwitz 39d093525c tamer: nir, asg: Introduce package to ASG
This does not yet create edges from identifiers to the package; just getting
this introduced was quite a bit of work, so I want to get this committed.

Note that this also includes a change to NIR so that `Close` contains the
entity so that we can pattern-match for AIR transformations rather than
retaining yet another stack with checks that are already going to be done by
AIR.  This makes NIR stand less on its own from a self-validation point, but
that's okay, given that it's the language that the user entered and,
conceptually, they could enter invalid NIR the same as they enter invalid
XML (e.g. from a REPL).

In _practice_, of course, NIR is lowered from XML and the schema is enforced
during that lowering and so the validation does exist as part of that
parsing.

These concessions speak more to the verbosity of the language (Rust) than
anything.

DEV-13159
2023-02-01 10:34:16 -05:00
Mike Gerwitz 39ebb74583 tamer: asg: Expression identifier references
This adds support for identifier references, adding `Ident` as a valid edge
type for `Expr`.

There is nothing in the system yet to enforce ontology through levels of
indirection; that will come later on.

I'm testing these changes with a very minimal NIR parse, which I'll commit
shortly.

DEV-13597
2023-01-26 14:45:17 -05:00
Mike Gerwitz ee30600f67 tamer: asg::air::Air: {*Expr=>Expr*}
Makes grouping and code completion easier when they're prefixed.

DEV-13597
2023-01-23 11:48:28 -05:00
Mike Gerwitz 954b5a2795 Copyright year and name update
Ryan Specialty Group (RSG) rebranded to Ryan Specialty after its IPO.
2023-01-20 23:37:30 -05:00
Mike Gerwitz 4e3a81d7f5 tamer: asg: Bind transparent ident
This provides the initial implementation allowing an identifier to be
defined (bound to an object and made transparent).

I'm not yet entirely sure whether I'll stick with the "transparent" and
"opaque" terminology when there's also "declare" and "define", but a
`Missing` state is a type of declaration and so the distinction does still
seem to be important.

There is still work to be done on `ObjectIndex::<Ident>::bind_definition`,
which will follow.  I'm going to be balancing work to provide type-level
guarantees, since I don't have the time to go as far as I'd like.

DEV-13597
2023-01-20 23:37:29 -05:00
Mike Gerwitz f1cf35f499 tamer: asg: Add expression edges
This introduces a number of abstractions, whose concepts are not fully
documented yet since I want to see how it evolves in practice first.

This introduces the concept of edge ontology (similar to a schema) using the
type system.  Even though we are not able to determine what the graph will
look like statically---since that's determined by data fed to us at
runtime---we _can_ ensure that the code _producing_ the graph from those
data will produce a graph that adheres to its ontology.

Because of the typed `ObjectIndex`, we're also able to implement operations
that are specific to the type of object that we're operating on.  Though,
since the type is not (yet?) stored on the edge itself, it is possible to
walk the graph without looking at node weights (the `ObjectContainer`) and
therefore avoid panics for invalid type assumptions, which is bad, but I
don't think that'll happen in practice, since we'll want to be resolving
nodes at some point.  But I'll addres that more in the future.

Another thing to note is that walking edges is only done in tests right now,
and so there's no filtering or anything; once there are nodes (if there are
nodes) that allow for different outgoing edge types, we'll almost certainly
want filtering as well, rather than panicing.  We'll also want to be able to
query for any object type, but filter only to what's permitted by the
ontology.

DEV-13160
2023-01-20 23:37:29 -05:00
Mike Gerwitz 8786ee74fa tamer: asg::air: Expression building error cases
This addresses the two outstanding `todo!` match arms representing errors in
lowering expressions into the graph.  As noted in the comments, these errors
are unlikely to be hit when using TAME in the traditional way, since
e.g. XIR and NIR are going to catch the equivalent problems within their own
contexts (unbalanced tags and a valid expression grammar respectively).

_But_, the IR does need to stand on its own, and I further hope that some
tooling maybe can interact more directly with AIR in the future.

DEV-13160
2023-01-20 23:37:29 -05:00
Mike Gerwitz dc3cd8bbc8 tamer: asg::air::AirAggregate: Reduce duplication
This refactors the previous commit a bit to remove the significant amount of
duplication, as planned.

DEV-7145
2023-01-20 23:37:29 -05:00
Mike Gerwitz 40c941d348 tamer: asg::air::AirAggregate: Initial impl of nested exprs
This introduces a number of concepts together, again to demonstrate that
they were derived.

This introduces support for nested expressions, extending the previous
work.  It also supports error recovery for dangling expressions.

The parser states are a mess; there is a lot of duplicate code here that
needs refactoring, but I wanted to commit this first at a known-good state
so that the diff will demonstrate the need for the change that will
follow; the opportunities for abstraction are plainly visible.

The immutable stack introduced here could be generalized, if needed, in the
future.

Another important note is that Rust optimizes away the `memcpy`s for the
stack that was introduced here.  The initial Parser Context was introduced
because of `ArrayVec` inhibiting that elision, but Vec never had that
problem.  In the future, I may choose to go back and remove ArrayVec, but I
had wanted to keep memory allocation out of the picture as much as possible
to make the disassembly and call graph easier to reason about and to have
confidence that optimizations were being performed as intended.

With that said---it _should_ be eliding in tamec, since we're not doing
anything meaningful yet with the graph.  It does also elide in tameld, but
it's possible that Rust recognizes that those code paths are never taken
because tameld does nothing with expressions.  So I'll have to monitor this
as I progress and adjust accordingly; it's possible a future commit will
call BS on everything I just said.

Of course, the counter-point to that is that Rust is optimizing them away
anyway, but Vec _does_ still require allocation; I was hoping to keep such
allocation at the fringes.  But another counter-point is that it _still_ is
allocated at the fringe, when the context is initialized for the parser as
part of the lowering pipeline.  But I didn't know how that would all come
together back then.

...alright, enough rambling.

DEV-13160
2023-01-20 23:37:29 -05:00
Mike Gerwitz 4b9b173e30 tamer: asg::air::Air::span: Provide spans
Not that they're loaded from object files yet, but this will at least work
once they are.

DEV-13160
2023-01-20 23:37:29 -05:00
Mike Gerwitz edbfc87a54 tamer: f::Functor: New trait
This commit is purposefully coupled with changes that utilize it to
demonstrate that the need for this abstraction has been _derived_, not
forced; TAMER doesn't aim to be functional for the sake of it, since
idiomatic Rust achieves many of its benefits without the formalisms.

But, the formalisms do occasionally help, and this is one such
example.  There is other existing code that can be refactored to take
advantage of this style as well.

I do _not_ wish to pull an existing functional dependency into TAMER; I want
to keep these abstractions light, and eliminate them as necessary, as Rust
continues to integrate new features into its core.  I also want to be able
to modify the abstractions to suit our particular needs.  (This is _not_ a
general recommendation; it's particular to TAMER and to my experience.)

This implementation of `Functor` is one such example.  While it is modeled
after Haskell in that it provides `fmap`, the primitive here is instead
`map`, with `fmap` derived from it, since `map` allows for better use of
Rust idioms.  Furthermore, it's polymorphic over _trait_ type parameters,
not method, allowing for separate trait impls for different container types,
which can in turn be inferred by Rust and allow for some very concise
mapping; this is particularly important for TAMER because of the disciplined
use of newtypes.

For example, `foo.overwrite(span)` and `foo.overwrite(name)` are both
self-documenting, and better alternatives than, say, `foo.map_span(|_|
span)` and `foo.map_symbol(|_| name)`; the latter are perfectly clear in
what they do, but lack a layer of abstraction, and are verbose.  But the
clarity of the _new_ form does rely on either good naming conventions of
arguments, or explicit type annotations using turbofish notation if
necessary.

This will be implemented on core Rust types as appropriate and as
possible.  At the time of writing, we do not yet have trait specialization,
and there's too many soundness issues for me to be comfortable enabling it,
so that limits that we can do with something like, say, a generic `Result`,
while also allowing for specialized implementations based on newtypes.

DEV-13160
2023-01-20 23:37:27 -05:00
Mike Gerwitz 6e90867212 tamer: asg::object::Object{Ref=>Index}: Associate object type
This makes the system a bit more ergonomic and introduces additional type
safety by associating the narrowed object type with the
`ObjectIndex` (previously `ObjectRef`).  Not only does this allow us to
explicitly state the type of object wherever those indices are stored, but
it also allows the API to automatically narrow to that type when operating
on it again without the caller having to worry about it.

DEV-13160
2022-12-22 15:18:08 -05:00
Mike Gerwitz 646633883f tamer: Initial concept for AIR/ASG Expr
This begins to place expressions on the graph---something that I've been
thinking about for a couple of years now, so it's interesting to finally be
doing it.

This is going to evolve; I want to get some things committed so that it's
clear how I'm moving forward.  The ASG makes things a bit awkward for a
number of reasons:

  1. I'm dealing with older code where I had a different model of doing
       things;
  2. It's mutable, rather than the mostly-functional lowering pipeline;
  3. We're dealing with an aggregate ever-evolving blob of data (the graph)
       rather than a stream of tokens; and
  4. We don't have as many type guarantees.

I've shown with the lowering pipeline that I'm able to take a mutable
reference and convert it into something that's both functional and
performant, where I remove it from its container (an `Option`), create a new
version of it, and place it back.  Rust is able to optimize away the memcpys
and such and just directly manipulate the underlying value, which is often a
register with all of the inlining.

_But_ this is a different scenario now.  The lowering pipeline has a narrow
context.  The graph has to keep hitting memory.  So we'll see how this
goes.  But it's most important to get this working and measure how it
performs; I'm not trying to prematurely optimize.  My attempts right now are
for the way that I wish to develop.

Speaking to #4 above, it also sucks that I'm not able to type the
relationships between nodes on the graph.  Rather, it's not that I _can't_,
but a project to created a typed graph library is beyond the scope of this
work and would take far too much time.  I'll leave that to a personal,
non-work project.  Instead, I'm going to have to narrow the type any time
the graph is accessed.  And while that sucks, I'm going to do my best to
encapsulate those details to make it as seamless as possible API-wise.  The
performance hit of performing the narrowing I'm hoping will be very small
relative to all the business logic going on (a single cache miss is bound to
be far more expensive than many narrowings which are just integer
comparisons and branching)...but we'll see.  Introducing branching sucks,
but branch prediction is pretty damn good in modern CPUs.

DEV-13160
2022-12-22 14:33:28 -05:00
Mike Gerwitz 0b2e563cdb tamer: asg: Associate spans with identifiers and introduce diagnostics
This ASG implementation is a refactored form of original code from the
proof-of-concept linker, which was well before the span and diagnostic
implementations, and well before I knew for certain how I was going to solve
that problem.

This was quite the pain in the ass, but introduces spans to the AIR tokens
and graph so that we always have useful diagnostic information.  With that
said, there are some important things to note:

  1. Linker spans will originate from the `xmlo` files until we persist
     spans to those object files during `tamec`'s compilation.  But it's
     better than nothing.
  2. Some additional refactoring is still needed for consistency, e.g. use
     of `SPair`.
  3. This is just a preliminary introduction.  More refactoring will come as
     tamec is continued.

DEV-13041
2022-12-16 14:44:38 -05:00
Mike Gerwitz 56d1ecf0a3 tamer: Air{Token=>}
Consistency with `Nir` et al.

DEV-13430
2022-12-13 14:36:38 -05:00
Mike Gerwitz be41d056bb tamer: nir::air: Lower to Air::TODO
This actually passes data to the next parser, whereas before we were
stopping short.

DEV-13160
2022-12-13 14:28:16 -05:00
Mike Gerwitz d55b3add77 tamer: asg::air::test: Extract into own file
Just minor preparatory work.

DEV-13160
2022-12-13 13:57:04 -05:00
Mike Gerwitz 2087672c47 tamer: parse::parser::finalize: Introduce FinalizedParser
This newtype allows a caller to prove (using types) that a parser of a given
type (`ParseState`) has been finalized.

This will be used by the lowering pipeline to ensure that all parsers in the
pipeline end up getting finalized (as you can see from a TODO added in the
code, one of them is missing).  The lack of such a type was an oversight
during the (rather stressed) development of the parsing system, and I
shouldn't need to resort to unit tests to verify that parsers have been
finalized.

DEV-13158
2022-10-26 12:44:19 -04:00
Mike Gerwitz ed8a2ce28a tamer: xir::parse::ele: Superstate not to accept early EOF
This was accepting an early EOF when the active child `ParseState` was in an
accepting state, because it was not ensuring that anything on the stack was
also accepting.

Ideally, there should be nothing on the stack, and hopefully in the future
that's what happens.  But with how things are today, it's important that, if
anything is on the stack, it is accepting.

Since `is_accepting` on the superstate is only called during finalization,
and because the check terminates early, and because the stack practically
speaking will only have a couple things on it max (unless we're in tail
position in a deeply nested tree, without TCO [yet]), this shouldn't be an
expensive check.

Implementing this did require that we expose `Context` to `is_accepting`,
which I had hoped to avoid having to do, but here we are.

DEV-7145
2022-08-12 00:47:15 -04:00
Mike Gerwitz e73c223a55 tamer: parser::Parser: cfg(test) tracing
This produces useful parse traces that are output as part of a failing test
case.  The parser generator macros can be a bit confusing to deal with when
things go wrong, so this helps to clarify matters.

This is _not_ intended to be machine-readable, but it does show that it
would be possible to generate machine-readable output to visualize the
entire lowering pipeline.  Perhaps something for the future.

I left these inline in Parser::feed_tok because they help to elucidate what
is going on, just by reading what the trace would output---that is, it helps
to make the method more self-documenting, albeit a tad bit more
verbose.  But with that said, it should probably be extracted at some point;
I don't want this to set a precedent where composition is feasible.

Here's an example from test cases:

  [Parser::feed_tok] (input IR: XIRF)
  |  ==> Parser before tok is parsing attributes for `package`.
  |   |  Attrs_(SutAttrsState_ { ___ctx: (QName(None, LocalPart(NCName(SymbolId(46 "package")))), OpenSpan(Span { len: 0, offset: 0, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10)), ___done: false })
  |
  |  ==> XIRF tok: `<unexpected>`
  |   |  Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))
  |
  |  ==> Parser after tok is expecting opening tag `<classify>`.
  |   |  ChildA(Expecting_)
  |   |  Lookahead: Some(Lookahead(Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))))
  = note: this trace was output as a debugging aid because `cfg(test)`.

  [Parser::feed_tok] (input IR: XIRF)
  |  ==> Parser before tok is expecting opening tag `<classify>`.
  |   |  ChildA(Expecting_)
  |
  |  ==> XIRF tok: `<unexpected>`
  |   |  Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))
  |
  |  ==> Parser after tok is attempting to recover by ignoring element with unexpected name `unexpected` (expected `classify`).
  |   |  ChildA(RecoverEleIgnore_(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)))
  |   |  Lookahead: None
  = note: this trace was output as a debugging aid because `cfg(test)`.

DEV-7145
2022-07-19 14:44:18 -04:00
Mike Gerwitz 91b55999e2 tamer: asg::air::{AirState=>AirAggregate}: Rename
Like the previous commit, this emphasizes what is happening.

DEV-7145
2022-06-02 13:26:46 -04:00
Mike Gerwitz b084e23497 tamer: Refactor asg_builder into obj::xmlo::lower and asg::air
This finally uses `parse` all the way up to aggregation into the ASG, as can
be seen by the mess in `poc`.  This will be further simplified---I just need
to get this committed so that I can mentally get it off my plate.  I've been
separating this commit into smaller commits, but there's a point where it's
just not worth the effort anymore.  I don't like making large changes such
as this one.

There is still work to do here.  First, it's worth re-mentioning that
`poc` means "proof-of-concept", and represents things that still need a
proper home/abstraction.

Secondly, `poc` is retrieving the context of two parsers---`LowerContext`
and `Asg`.  The latter is desirable, since it's the final aggregation point,
but the former needs to be eliminated; in particular, packages need to be
worked into the ASG so that `found` can be removed.

Recursively loading `xmlo` files still happens in `poc`, but the compiler
will need this as well.  Once packages are on the ASG, along with their
state, that responsibility can be generalized as well.

That will then simplify lowering even further, to the point where hopefully
everything has the same shape (once final aggregation has an abstraction),
after which we can then create a final abstraction to concisely stitch
everything together.  Right now, Rust isn't able to infer `S` for
`Lower<S, LS>`, which is unfortunate, but we'll be able to help it along
with a more explicit abstraction.

DEV-11864
2022-05-27 13:51:29 -04:00