These delegation methods have been a pain in my ass for quite some time, and
their lack of generalization makes the introduction of new delegation
methods (in the general sense, not necessarily trait methods) very tedious
and prone to inconsistencies.
I'm going to progressively refactor them in separate commits so it's clear
what I'm doing, primarily for future me to reference if need be.
DEV-13156
This beings to introduce more primitive operations to `TransitionResult` and
its components so that I can actually work with them without having to write
a bunch of concrete, boilerplate implementations. This is demonstrated in
part by `EchoState` (which is nearly all boilerplate, but whose correctness
should be verifiable at a glance), which will be used going forward as a
basis for default implementations for parsers (e.g. expansion delegation).
DEV-13156
This has evolved into a more robust and independent concept, but it is still
a utility in the sense that it's utilizing existing parsing framework
features and making them more convenient.
DEV-13156
These traits serve to abstract away some of the type-level details and
clearly state what the end result is (something stitchable with a parent).
I'm admittedly battling myself on this concept a bit. The proper layer of
abstraction is the concept of expansion, which is an abstraction that is
likely to be maintained all the way through, but we strip the abstraction
for the sake of delegation. Maybe the better option is to provide a
different method of delegation and avoid the stripping at all, and avoid the
awkward interaction with the dead state.
The awkwardness comes from the fact that delegating right now is so rigid
and defined in terms of a method on state rather than a mapping between
`TransitionResult`s. But I really need to move on... ;_;
The original design was trying to generalize this such that composition at
the attribute parser level (for NIR) would be able to just accept any
sitchable parser with the convention that the dead state is the replacement
token. But that is the wrong layer of abstraction, which not only makes it
confusing, but is asking for trouble when someone inevitably violates that
contract.
With all of that said, `StitchableExpansionState` _is_ a delegation. It
could just as easily be a function (`is_accepting` always delegates too), so
perhaps that should just be generalized as reifying delegation as a
`ParseState`.
DEV-13156
This parser really just allows me to continue developing the NIR
interpolation system using `Expansion` terminology, and avoid having to use
dead states in tests. This allows for the appropriate level of abstraction
to be used in isolation, and then only be stripped when stitching is
necessary.
Future commits will show how this is actually integrated and may introduce
additional abstraction to help.
DEV-13156
This is a shift in approach.
My original idea was to try to keep NIR parsing the way it was, since it's
already hard enough to reason about with the `ele_parse!` parser-generator
macro mess. The idea was to produce an IR that would explicitly be denoted
as "maybe sugared", and have a desugaring operation as part of the lowering
pipeline that would perform interpolation and lower the symbol into a plain
version.
The problem with that is:
1. The use of the type was going to introduce a lot of mapping for all the
NIR token variants there are going to be; and
2. _The types weren't even utilized for interpolation._
Instead, if we interpolated _as attributes are encountered_ while parsing
NIR, then we'd be able to expand directly into that NIR token stream and
handle _all_ symbols in a generic way, without any mapping beyond the
definition of NIR's grammar using `ele_parse!`.
This is a step in that direction---it removes `NirSymbolTy` and introduces a
generic abstraction for the concept of expansion, which will be utilized
soon by the attribute parser to allow replacing `TryFrom` with something
akin to `ParseFrom`, or something like that, which is able to produce a
token stream before finally yielding the value of the attribute (which will
be either the original symbol or the replacement metavariable, in the case
of interpolation).
(Note that interpolation isn't yet finished---errors still need to be
implemented. But I want a working vertical slice first.)
DEV-13156
This was a substantial change. Design and rationale are documented on
`AttrFieldSum` and related as part of this change, so please review the diff
for more information there.
If you're a Ryan employee, DEV-13209 gives plenty of profiling information,
including raw data and visualizations from kcachegrind. For everyone else:
you're able to easy produce your own from this commit and the previous and
comparing the `__memcpy_avk_unaligned_erms` calls. The reduction is
significant in this commit (~90%), and the number of Parsers invoking it has
been reduced. Rust has been able to optimize more aggressively, and
compound some of those optimizations, with the smaller `NirParseState`
width.
It also worth noting that `malloc` calls do not change at all between
these two changes, so when we refer to memory, we're referring to
pre-allocated memory on the stack, as TAMER was designed to utilize.
DEV-13209
This is a diagnostic replacement for `unreachable!`.
Eventually TAMER'll have build-time checks to enforce the use of these over
alternatives; I need to survey the old instances on a case-by-case basis to
see what diagnostic information can be reasonably presented in that context.
DEV-13209
The spans were previously not being calculated relative to the offset of the
original symbol span. Tests were passing because all of those spans began
at offset 0.
DEV-13156
This demonstrates how desugaring of interpolated strings will work, testing
one of the happy paths. The remaining work to be done is largely
refactoring; handling some other cases; and errors. Each of those items are
marked with `todo!`s.
I'm pleased with how this is turning out, and I'm excited to see diagnostic
reporting within the specification string using the derived spans once I get
a bit further along; this robust system is going to be much more helpful to
developers than the existing system in XSLT.
This also eliminates the ~50% performance degredation mentioned in a recent
commit by eliminating the SugaredNirSymbol enum and replacing it with a
newtype; this is a much better approach, though it doesn't change that I do
need to eventually address the excessive `memcpy`s on hot code paths.
DEV-13156
Not sure why I didn't add a prelude sooner, considering all the import
boilerplate. This will evolve as needed and I'll go back and replace other
imports when I'm not in the middle of something.
DEV-13156
Add initial descriptions and consolodate some of the types. There'll be
more to come; this is just to get `Display` derives working for types
that'll be using it. I'd like to see where this description manifests
itself before I decide how user-friendly I'd like it to be.
DEV-13156
This mirror is only a `Todo` variant at the moment, but my hope had been to
try to creatively nest or use generics to simplify the conversaion between
the two flavors without a lot of boilerplate. But it doesn't seem like I'm
going to be successful, and may have to resort to macros to remove
boilerplate.
But I need to stop fighting with myself and move on. Though I would still
like to keep the types purely compile-time via const generics if possible,
since they're not needed in memory (or disk) until we get to templates;
they're otherwise static relative to a NIR token variant.
DEV-13209
This simply detects whether a value will need to be further parsed for
interpolation; it does not yet perform the parsing itself, which will happen
during desugaring.
This introduces a performance regression, for an interesting reason. I
found that introducing a single new variant to `SugaredNir` (with a
`(SymbolId, Span)` pair), was causing the width of the `NirParseState` type
to increase just enough to cause Rust to be unable to optimize away a
significant number of memcpys related to `Parser` moves, and consequently
reducing performance by nearly 50% for `tamec`. Yikes.
I suspected this would be a problem, and indeed have tried in all other
cases to avoid aggregation until the ASG---the problem is that I had wanted
to aggregate attributes for NIR so that the IR could actually make some
progress toward simplifying the stream (and therefore working with the
data), and be able to validate against a grammar defined in a single
place. The problem is that the `NirParseState` type contains a sum type for
every attribute parser, and is therefore as wide as the largest one. That
is what Rust is having trouble optimizing memcpy away for.
Indeed, reducing the number of attributes improves the situation
drastically. However, it doesn't make it go away entirely.
If you look at a callgrind profile for `tameld` (or a dissassembly), you'll
notice that I put quite a bit of effort into ensuring that the hot code path
for the lowering pipeline contains _no_ memcpys for the parsers. But that
is not the case with `tamec`---I had to move on. But I do still have the
same escape hatch that I introduced for `tameld`, which is the mutable
`Context`.
It seems that may be the solution there too, but I want to get a bit further
along first to see how these data end up propagating before I go through
that somewhat significant effort.
DEV-13156
Various parts of the system have to be converted to use `diagnostic_panic!`,
which makes it very clear that this is a bug in TAMER that should be
reported. I just happened to see this one near code I was about to touch.
DEV-13156
This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass. The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.
DEV-13158
I'm struggling to go much further yet without sorting out some other things
first with regards to mutable `Context` and, in particular, the ASG.
I'm going to pause on refactoring the lowering pipeline---it's been improved
significantly with the recent work---and I will continue in the next few
weeks.
DEV-13158
Lowering errors in tamec end up utilizing recovery and reporting, so there
is a distinction between recoverable and unrecoverable errors.
tameld aborts on the first error, since recovery is not currently
supported (we'll want to add it, since tameld should output e.g. lists of
unresolved externs).
Note that tamec does not yet handle `FinalizeError` like tameld because it
uses `Lower::lower`, which does not yet finalize (though it does in practice
when it reaches the end of the stream and auto-finalizes, but that is
widened into a `ParseError`).
DEV-13158
This helps to clarify the situations under which these errors can occur, and
the generality also helps to show why the inner types are as they
are (e.g. use of `String`).
But more importantly, this allows for an error type in `finalize` that is
detached from the `ParseState`, which will be able to be utilized in the
lowering pipeline as a more general error distinguishable from other
lowering errors. At the moment I'm maintaining BC, but a following commit
will demonstrate the use case to introduce recoverable vs. non-recoverable
errors.
DEV-13158
This newtype allows a caller to prove (using types) that a parser of a given
type (`ParseState`) has been finalized.
This will be used by the lowering pipeline to ensure that all parsers in the
pipeline end up getting finalized (as you can see from a TODO added in the
code, one of them is missing). The lack of such a type was an oversight
during the (rather stressed) development of the parsing system, and I
shouldn't need to resort to unit tests to verify that parsers have been
finalized.
DEV-13158
This reverts commit 85ec626fcd804eb2fac3fd6f0339182554f72cfd.
This revert had to be modified to work alongside other changes. Interior
mutability is fortunately no longer needed after the previous commit which
allows reporting to occur in a single place in the lowering pipeline (at the
terminal parser).
DEV-13158
The term "terminal parser" isn't formalized yet in the system, but is meant
to refer to the innermost parser that is responsible for pulling tokens
through the lowering pipeline.
This approach is more of what one would expect when dealing with
`Result`-like monads---we are effectively chaining the inner operation while
propagating errors to short-circuit lowering and let the caller decide
whether recovery ought to be permitted with diagnostic messages. This will
become more clear as it is further refactored.
This also means that the previous changes for introducing interior
mutability for a shared mutable `Reporter` can be reverted, which is great,
since that approach was antithetical to how the streaming pipeline
operates (and introduces awkward mutable state into an
otherwise-mostly-immutable system).
DEV-13158
This extracts error tracking into the Reporter itself, which is already
shared between lowering operations. This can then be used to display the
number of errors.
A new formatter (in tamer::fmt) will be added to handle the singular/plural
conversion in place of "error(s)" in the future; I have more important
things to work on right now.
DEV-13158
Previously these errors would immediately abort.
This results in some duplicate code, but it's beginning to derive a common
implementation. Check out the commits that follow; this is really an
intermediate refactoring state.
DEV-13158
Another baby step. The small commits are intended to allow comprehension of
what changes when looking at the diffs.
This also removes a comment stating that errors do not fail compilation,
since they most certainly do.
DEV-13158
This begins refactoring the lowering pipeline to begin to obviate
abstraction boundaries. The lowering pipeline is the backbone of the
system, and so it needs to become clear and self-documenting, which will
take a little bit of work.
DEV-13158
This always annoys me when I add a dependency and I don't know where I ought
to put it.
Anyway, I was originally going to add the `regex` crate, but with further
planning, I may not end up having use for it. Nonetheless, at least this is
consistent.
Just preparing to actually define NIR itself. The _grammar_ has been
represented (derived from our internal systems, using them as a test case),
but the IR itself has not yet received a definition.
DEV-7145
This is a quick-and-dirty change. The lowering pipeline needs a proper
abstraction, but I'm about to be on vacation at the end of the week and
would like to get NIR->AIR lowering started before I consider that
abstraction further, so this will do for now.
NIR parsing has been tested in production without failing for over a week.
DEV-7145
This was originally the "noramlized" IR, but that's not possible to do
without template expansion, which is going to happen at a later point. So,
this is just "NIR", pronounced "near", which is an IR that is "near" to the
source code. You can define it was "Near IR" if you want, but it's just a
homonym with a not-quite-defined acronym to me.
DEV-7145
A type alias was added for BC before errors were hoisted out in a previous
commit, but they are unnecessary because of the associated type on
`ParseState`.
This also corrects the long-existing issue of using generated identifiers in
tests.
DEV-7145
This moves `paste::paste!` up a line and reduces a level of indentation,
since it's so squished. Aside from docblock reformatting, there are no
other changes.
DEV-7145
This slims out the macro even further. It does result in an
awkwardly-placed `PhantomData` because I don't want to add another variant
that isn't actually used (since they represent states).
DEV-7145
This is in preparation for hoisting out the common states, as was done with
the Sum NT in a previous commit.
I also think that organizing states in this way is more clear. The previous
embedding of the variants named after the NTs themselves was because the
parser was storing the child state within it, before the introduction of the
superstate trampoline.
DEV-7145
Everything except for one state was already accounted for. We can now have
confidence that the parser will never panic due to state transitions (beyond
legitimate error conditions).
There are some `unreachable!`s to contend with still.
DEV-7145
This is the same as the previous commits, but for non-sum NTs.
This also extracts errors into a separate module, which I had hoped to do in
a separate commit, but it's not worth separating them. My _original_ reason
for doing so was debugging (I'll get into that below), but I had wanted to
trim down `ele.rs` anyway, since that mess is large and a lot to grok.
My debugging was trying to figure out why Rust was failing to derive
`PartialEq` on `NtError` because of `AttrParseError`. As it turns out,
`AttrParseError::InvalidValue` was failing, thus the introduction of the
`PartialEq` trait bound on `AttrParseState::ValueError`. Figuring this out
required implementing `PartialEq` myself without `derive` (well, using LSP,
which did all the work for me).
I'm not sure why this was not failing previously, which is a bit of a
concern, though perhaps in the context of the macro-expanded code, Rust was
able to properly resolve the types.
DEV-7145
The `ele_parse!` macro is a monstrosity, and expands into many different
identifiers. The hope is that chipping away at things like this will not
only make the template easier to understand by framing portions of the
problem in terms of more traditional Rust code, but will also hopefully
reduce compile times by reducing the amount of code that is expanded by the
macro.
DEV-7145
This introduces an order-only prerequisite `bootstrap-if-necessary` for the
generation of `suppliers.mk`. Projects utilizing TAME as a dependency may
include a `bootstrap.mk` that overrides this target to trigger any
bootstrapping scripts that may be necessary due to toolchain updates.
DEV-7145