This is one of many changes that have been lingering that I need to start to
break apart in an attempt to commit the confusing and disappointing
conclusion to this package loading madness.
More information to come.
DEV-13162
...this has apparently been consuming errors for some time. This would
cause the parser to enter an invalid state in some cases and terminate.
This would _not_ permit an invalid link, as the graph would not be correct,
but it was masking the actual error.
This part of linker is in dire need of tests. This also ought to be
replaced with tamec's approach of reporting all errors.
DEV-13162
These have been a pain in the ass since TAMER began.
It seemed like a good idea at the time to have static code generated in this
way, but the lack of explicit dependencies just makes this a mess and works
against the operating theory of the system.
Furthermore, the _same_ static fragments were generated for each and every
map package.
There is still a post-link step (standalones) handled in XSLT; the
previously-static code has been moved there. This will eventually be
integrated into tameld itself, once TAMER has facilities for JS generation.
(This was discovered while trying to parent identifiers to packages.)
DEV-13162
With the previous commit using a visitor implemented within the `asg`
module, we can now finally encapsulate the graph. This is a wonderfully
liberating, long-awaited change, since I have been fighting with the lack of
encapsulation for some time; it has made certain changes challenging and has
made the system more difficult to reason about. It also made it impossible
to assert that invariants were _actually_ properly enforced, if things could
just peer into and modify the graph directly, out from underneath the API
that provides those assurances.
This also removes our dependency on Petgraph outside of the `asg`
module. There are no plans to migrate away from it currently; we'll see how
the graph continues to evolve over time and what redundancies are introduced
with our data structures. It may render petgraph unnecessary.
Interestingly, because my DFS implementation is so similar to Petgraph's,
the emitted ordering is _identical_ between this commit and the previous.
DEV-13162
This integrates the new topological sort, replacing the previous
implementation in the linker.
This will now allow encapsulating the graph, finally, and ensures that
future changes can be fully maintained within the `asg` module.
More cleanup will come over time.
DEV-13162
This modifies the xmlo reader, xmlo->AIR lowering, and AIR->ASG to introduce
a package for identifiers. It does not yet, however, add edges from the
package to the identifier.
Once edges are added, the DFS will change in undesirable ways, which will
require a new implementation. This is desirable to decouple from Petgraph
anyway, and then will be able to restore the prior single-pass sort+cycle
check.
That will also encapsulate visiting behavior within the `asg::graph` module
and, in turn, allow encapsulating `Asg.graph` finally.
DEV-13162
Historically, the ASG was better described as a "dependency graph",
containing only identifiers (which are simply called "symbols" in the
XSLT-based compiler). Consequently, it was appropriate for the graph to
have operations specific to identifiers. (Indeed, that's the only type of
object the graph supported.)
Much has changed since then. This cleans things up, and makes parenting
identifiers to root an _explicit_ operation. This will make it easier to
move forward with handling of scope, and importing identifiers into
packages, and removing `Source`, and so on.
DEV-13162
Identifier lookups, as done using the graph methods today, look up from a
cache representing the global environment.
Templates must not contribute to this environment until expansion. Further,
metavariables will not be present in this environment. To avoid confusion
and help obviate accidental contributions to this environment, the methods
have been renamed. This will also allow for the creation of more general
methods down the line.
DEV-13708
Also known as metavariables or template parameters.
This is a bit of a tortured excursion, trying to figure out how I want to
best represent this. I have a number of pages of hand-written notes that
I'd like to distill over time, but the rendered graph ontology (via
`asg-ontviz`) demonstrates the broad idea.
`AirTpl::TplApply` highlights some remaining questions. What I had _wanted_
to do is to separate the concepts of application and expansion, and support
partial application and such. But it's going to be too much work for now,
when it isn't needed---partial application can be worked around by simply
creating new templates and duplicating params, as we do today, although that
sucks and is a maintenance issue. But I'd rather address that head-on in
the future.
So it's looking like Option B is going to be the approach for now, with
templates being closed (as in, no free metavariables) and expanded at the
same time. This simplifies the parser and error conditions significantly
and makes it easier to utilize anonymous templates, since it'll still be the
active context.
My intent is to get at least the graph construction sorted out---not the
actual expansion and binding yet---enough that I can use templates to
represent parts of NIR that do not have proper graph representations or
desugaring yet, so that I can spit them back out again in the `xmli` file
and incrementally handle them. That was an option I had considered some
months ago, but didn't want to entertain it at the time because I wasn't
sure what doing so would look like; while it was an attractive approach
since it pushes existing primitives into the template system (something I've
wanted to do for years), I didn't want to potentially tank performance or
compromise the design for it after I had spent so much effort on all of this
so far.
But my efforts have yielded a system that significantly exceeds my initial
performance expectations, with a decent abstractions, and so this seems
viable.
DEV-13708
There's quite a bit of boilerplate here that'll eventually need factoring
out. But it's also clear that it is somewhat onerous to add new object
types.
Note that a good chunk of this burden is _intentional_, via exhaustiveness
checks---adding a new type of object is an exceptional occurrence (well, in
principle, but we haven't added them all yet, so it'll be more common
initially), and we'd rather be safe to ensure that everything is properly
considering how that new type of object interacts with it.
Let's not confuse coupling with safety---the latter causes a burden because
of the former, not because of itself; it provides a service to us.
But, nonetheless, we'll want to reduce this burden somewhat since there are
a number more to add.
DEV-13708
Technically, an "acceptor" in the context of state machines is actually a
state machine; the terminology here is more describing the configuration of
the state machine (`XirToXirf`) as an acceptor.
This change comes with significant documentation of the rationale and why
this is important; see that for more information.
This change is necessary so that we can enforce finalization on all parsers
in the lowering pipeline, which is not currently being done. If we were to
do that now, then `tameld` would fail because it halts parsing of the tokens
stream at the end of the `xmlo` header.
This is also quite the type soup, but I'm not going to refine this further
right now, since my focus is elsewhere (XMLI lowering).
DEV-13708
The reader previously yielded a `ParsedResult`, presumably to simplify
lowering operations. But the reader is not a `ParseState`, and does not
otherwise use the parsing API, so this was an inappropriate and confusing
coupling.
This resolves that, introducing a new `lowerable` which will translate an
iterator into something that can be placed in a lowering pipeline.
See the previous commit for more information.
DEV-13708
The token type was previously hard-coded to `UnknownToken`, since the use
case was the beginning of the lowering pipeline at the start of the program,
where there was no token type because the first parser (`XirReader`,
currently) is responsible for producing the first token type.
But when we're lowering from the graph (so, the other side of the lowering
pipeline), we _do_ have token types to deal with.
This also emphasizes the inappropriate coupling of `<XirReader as
Iterator>::Item` with `ParsedResult`; I'd like to follow the same approach
that I'm about to introduce with `tamec`, so see a future commit.
DEV-13708
This causes a package definition to be rooted (so that it can be easily
accessed for a graph walk). This keeps consistent with the new
`ObjectIndex`-based API by introducing a unit `Root` `ObjectKind` and the
boilerplate that goes with it.
This boilerplate, now glaringly obvious, will be refactored at some point,
since its repetition is onerous and distracting.
DEV-13159
This does not yet create edges from identifiers to the package; just getting
this introduced was quite a bit of work, so I want to get this committed.
Note that this also includes a change to NIR so that `Close` contains the
entity so that we can pattern-match for AIR transformations rather than
retaining yet another stack with checks that are already going to be done by
AIR. This makes NIR stand less on its own from a self-validation point, but
that's okay, given that it's the language that the user entered and,
conceptually, they could enter invalid NIR the same as they enter invalid
XML (e.g. from a REPL).
In _practice_, of course, NIR is lowered from XML and the schema is enforced
during that lowering and so the validation does exist as part of that
parsing.
These concessions speak more to the verbosity of the language (Rust) than
anything.
DEV-13159
This was originally created to populate Neo4J for querying, but it has not
been utilized. It's become a maintenance burden as I try to change the API
of and encapsulate the graph, which is important for upholding its
invariants.
This feature, or one like it, will return in the future. I have other
related plans; we'll see if they materialize.
The graph can't be encapsulated fully just yet because of the linker; those
commits will come in the following days.
DEV-13597
This seems to have been an oversight from when I recently introduced SPairs
to ASG; I noticed it while working on another change and receiving back a
`DUMMY_SPAN`.
DEV-13597
`Ident` is now `Opaque`, but the new `Transparent` state isn't actually used
yet in any transitions; that'll come next.
The original (now "opaque") identifiers were added for the linker, which
does not need (at present) the associated expressions, since they've already
been compiled. In the future I'd like to do LTO (link-time optimization),
and then the graph will need more information.
DEV-13160
This invokes clippy as part of `make check` now, which I had previously
avoided doing (I'll elaborate on that below).
This commit represents the changes needed to resolve all the warnings
presented by clippy. Many changes have been made where I find the lints to
be useful and agreeable, but there are a number of lints, rationalized in
`src/lib.rs`, where I found the lints to be disagreeable. I have provided
rationale, primarily for those wondering why I desire to deviate from the
default lints, though it does feel backward to rationalize why certain lints
ought to be applied (the reverse should be true).
With that said, this did catch some legitimage issues, and it was also
helpful in getting some older code up-to-date with new language additions
that perhaps I used in new code but hadn't gone back and updated old code
for. My goal was to get clippy working without errors so that, in the
future, when others get into TAMER and are still getting used to Rust,
clippy is able to help guide them in the right direction.
One of the reasons I went without clippy for so long (though I admittedly
forgot I wasn't using it for a period of time) was because there were a
number of suggestions that I found disagreeable, and I didn't take the time
to go through them and determine what I wanted to follow. Furthermore, it
was hard to make that judgment when I was new to the language and lacked
the necessary experience to do so.
One thing I would like to comment further on is the use of `format!` with
`expect`, which is also what the diagnostic system convenience methods
do (which clippy does not cover). Because of all the work I've done trying
to understand Rust and looking at disassemblies and seeing what it
optimizes, I falsely assumed that Rust would convert such things into
conditionals in my otherwise-pure code...but apparently that's not the case,
when `format!` is involved.
I noticed that, after making the suggested fix with `get_ident`, Rust
proceeded to then inline it into each call site and then apply further
optimizations. It was also previously invoking the thread lock (for the
interner) unconditionally and invoking the `Display` implementation. That
is not at all what I intended for, despite knowing the eager semantics of
function calls in Rust.
Anyway, possibly more to come on that, I'm just tired of typing and need to
move on. I'll be returning to investigate further diagnostic messages soon.
Working with the graph can be confusing with all of the layers
involved. This begins to provide a better layer of abstraction that can
encapsulate the concept and enforce invariants.
Since I'm better able to enforce invariants now, this also removes the span
from the diagnostic message, since the invariant is now always enforced with
certainty. I'm not removing the runtime panic, though; we can revisit that
if future profiling shows that it makes a negative impact.
DEV-13160
This uses `ObjectIndex` to automatically narrow the type to what is
expected.
Given that `ObjectIndex` is supposed to mean that there must be an object
with that index, perhaps the next step is to remove the `Option` from `get`
as well.
DEV-13160
This makes the system a bit more ergonomic and introduces additional type
safety by associating the narrowed object type with the
`ObjectIndex` (previously `ObjectRef`). Not only does this allow us to
explicitly state the type of object wherever those indices are stored, but
it also allows the API to automatically narrow to that type when operating
on it again without the caller having to worry about it.
DEV-13160
This begins to place expressions on the graph---something that I've been
thinking about for a couple of years now, so it's interesting to finally be
doing it.
This is going to evolve; I want to get some things committed so that it's
clear how I'm moving forward. The ASG makes things a bit awkward for a
number of reasons:
1. I'm dealing with older code where I had a different model of doing
things;
2. It's mutable, rather than the mostly-functional lowering pipeline;
3. We're dealing with an aggregate ever-evolving blob of data (the graph)
rather than a stream of tokens; and
4. We don't have as many type guarantees.
I've shown with the lowering pipeline that I'm able to take a mutable
reference and convert it into something that's both functional and
performant, where I remove it from its container (an `Option`), create a new
version of it, and place it back. Rust is able to optimize away the memcpys
and such and just directly manipulate the underlying value, which is often a
register with all of the inlining.
_But_ this is a different scenario now. The lowering pipeline has a narrow
context. The graph has to keep hitting memory. So we'll see how this
goes. But it's most important to get this working and measure how it
performs; I'm not trying to prematurely optimize. My attempts right now are
for the way that I wish to develop.
Speaking to #4 above, it also sucks that I'm not able to type the
relationships between nodes on the graph. Rather, it's not that I _can't_,
but a project to created a typed graph library is beyond the scope of this
work and would take far too much time. I'll leave that to a personal,
non-work project. Instead, I'm going to have to narrow the type any time
the graph is accessed. And while that sucks, I'm going to do my best to
encapsulate those details to make it as seamless as possible API-wise. The
performance hit of performing the narrowing I'm hoping will be very small
relative to all the business logic going on (a single cache miss is bound to
be far more expensive than many narrowings which are just integer
comparisons and branching)...but we'll see. Introducing branching sucks,
but branch prediction is pretty damn good in modern CPUs.
DEV-13160
This moves the special handling of circular dependencies out of
`poc.rs`---and to be clear, everything needs to be moved out of there---and
into the source of the error. The diagnostic system did not exist at the
time.
This is one example of how easy it will be to create robust diagnostics once
we have the spans on the graph. Once the spans resolve to the proper source
locations rather than the `xmlo` file, it'll Just Work.
It is worth noting, though, that this detection and error will ultimately
need to be moved so that it can occur when performing other operation on the
graph during compilation, such as type inference and unification. I don't
expect to go out of my way to detect cycles, though, since the linker will.
DEV-13430
This ASG implementation is a refactored form of original code from the
proof-of-concept linker, which was well before the span and diagnostic
implementations, and well before I knew for certain how I was going to solve
that problem.
This was quite the pain in the ass, but introduces spans to the AIR tokens
and graph so that we always have useful diagnostic information. With that
said, there are some important things to note:
1. Linker spans will originate from the `xmlo` files until we persist
spans to those object files during `tamec`'s compilation. But it's
better than nothing.
2. Some additional refactoring is still needed for consistency, e.g. use
of `SPair`.
3. This is just a preliminary introduction. More refactoring will come as
tamec is continued.
DEV-13041
Lowering errors in tamec end up utilizing recovery and reporting, so there
is a distinction between recoverable and unrecoverable errors.
tameld aborts on the first error, since recovery is not currently
supported (we'll want to add it, since tameld should output e.g. lists of
unresolved externs).
Note that tamec does not yet handle `FinalizeError` like tameld because it
uses `Lower::lower`, which does not yet finalize (though it does in practice
when it reaches the end of the stream and auto-finalizes, but that is
widened into a `ParseError`).
DEV-13158
The term "terminal parser" isn't formalized yet in the system, but is meant
to refer to the innermost parser that is responsible for pulling tokens
through the lowering pipeline.
This approach is more of what one would expect when dealing with
`Result`-like monads---we are effectively chaining the inner operation while
propagating errors to short-circuit lowering and let the caller decide
whether recovery ought to be permitted with diagnostic messages. This will
become more clear as it is further refactored.
This also means that the previous changes for introducing interior
mutability for a shared mutable `Reporter` can be reverted, which is great,
since that approach was antithetical to how the streaming pipeline
operates (and introduces awkward mutable state into an
otherwise-mostly-immutable system).
DEV-13158
Since we'll never be reading past the header, this is all that is needed.
If in the future this is violated, XIRF will cause a nice diagnostic error
displaying precisely what opening tag caused the increased level of nesting,
which will aid in debugging and allow us to determine if it ought to be
increased. Here's an example, if I set the max to `3`:
error: maximum XML element nesting depth of `3` exceeded
--> /home/.../foo.xmlo:261:10
|
261 | <preproc:sym-ref name=":_vproduct:vector_a"/>
| ^^^^^^^^^^^^^^^^ error: this opening tag increases the level of nesting past the limit of 3
Of course, the longer-term goal is to do away with `xmlo` entirely.
This had no (perceivable via `/usr/bin/time -v`, at least) impact on memory
or CPU time.
DEV-7145
This teaches XIRF to optionally refine Text into RefinedText, which
determines whether the given SymbolId represents entirely whitespace.
This is something I've been putting off for some time, but now that I'm
parsing source language for NIR, it is necessary, in that we can only permit
whitespace Text nodes in certain contexts.
The idea is to capture the most common whitespace as preinterned
symbols. Note that this heuristic ought to be determined from scanning a
codebase, which I haven't done yet; this is just an initial list.
The fallback is to look up the string associated with the SymbolId and
perform a linear scan, aborting on the first non-whitespace character. This
combination of checks should be sufficiently performant for now considering
that this is only being run on source files, which really are not all that
large. (They become large when template-expanded.) I'll optimize further
if I notice it show up during profiling.
This also frees XIR itself from being concerned by Whitespace. Initially I
had used quick-xml's whitespace trimming, but it messed up my span
calculations, and those were a pain in the ass to implement to begin with,
since I had to resort to pointer arithmetic. I'd rather avoid tweaking it.
tameld will not check for whitespace, since it's not important---xmlo files,
if malformed, are the fault of the compiler; we can ignore text nodes except
in the context of code fragments, where they are never whitespace (unless
that's also a compiler bug).
Onward and yonward.
DEV-7145
This isn't conceptally all that significant of a change, but there was a lot
of modify to get it working. I would generally separate this into a commit
for the implementation and another commit for the integration, but I decided
to keep things together.
This serves a role similar to AttrSpan---this allows deriving a span
representing the element name from a span representing the entire XIR
token. This will provide more useful context for errors---including the tag
delimiter(s) means that we care about the fact that an element is in that
position (as opposed to some other type of node) within the context of an
error. However, if we are expecting an element but take issue with the
element name itself, we want to place emphasis on that instead.
This also starts to consider the issue of span contexts---a blob of detached
data that is `Span` is useful for error context, but it's not useful for
manipulation or deriving additional information. For that, we need to
encode additional context, and this is an attempt at that.
I am interested in the concept of providing Spans that are guaranteed to
actually make sense---that are instantiated and manipulated with APIs that
ensure consistency. But such a thing buys us very little, practically
speaking, over what I have now for TAMER, and so I don't expect to actually
implement that for this project; I'll leave that for a personal
project. TAMER's already take a lot of my personal interests and it can
cause me a lot of grief sometimes (with regards to letting my aspirations
cause me more work).
DEV-7145
This provides much more clarity as to what is going on. Further, it's less
ambiguous, since I'm about to introduce a new type of xmlo lowering into XIR
for writing the actual xmlo files.
DEV-7145
This allows `XmlXirReader` to be used in a `Lower` operation, just as
everything else, bringing me one step closer to a pipeline that can be
concisely represented; this is finally beginning to unify in a clear way,
though it is still a bit of a mess.
This causes `XmlXirReader` to _act_ like a `parse::Parser` in that it yields
a `ParsedResult`, but it does not use `parse::Parser` itself; that was the
_original_ plan: convert it into a `ParseState` where `XmlXirReader` became
a context, and force `Parser` to yield by feeding it a stream of tokens with
`repeat`, but that ended up performing poorly relative to this change. I
did some investigation, which I might write about in the future, but for
now, this solution works just fine.
DEV-7145
This also modifies `poc` such that `Lower` is invoked as an associated
function rather than a method to emphasize the pattern that is forming, so
that it can be later abstracted away.
DEV-11864
The `while_ok` can just be implied with a lowering operation, and that
reduces the name complexity so that we can maybe introduce even more
specialized methods without resulting in a huge sentence as a name.
DEV-11864
This finally uses `parse` all the way up to aggregation into the ASG, as can
be seen by the mess in `poc`. This will be further simplified---I just need
to get this committed so that I can mentally get it off my plate. I've been
separating this commit into smaller commits, but there's a point where it's
just not worth the effort anymore. I don't like making large changes such
as this one.
There is still work to do here. First, it's worth re-mentioning that
`poc` means "proof-of-concept", and represents things that still need a
proper home/abstraction.
Secondly, `poc` is retrieving the context of two parsers---`LowerContext`
and `Asg`. The latter is desirable, since it's the final aggregation point,
but the former needs to be eliminated; in particular, packages need to be
worked into the ASG so that `found` can be removed.
Recursively loading `xmlo` files still happens in `poc`, but the compiler
will need this as well. Once packages are on the ASG, along with their
state, that responsibility can be generalized as well.
That will then simplify lowering even further, to the point where hopefully
everything has the same shape (once final aggregation has an abstraction),
after which we can then create a final abstraction to concisely stitch
everything together. Right now, Rust isn't able to infer `S` for
`Lower<S, LS>`, which is unfortunate, but we'll be able to help it along
with a more explicit abstraction.
DEV-11864
The `*_iter_while_ok` functions now compose like monads, flattening `Result`
at each step and drastically simplifying handling of error types. This also
removes the bunch of `?`s at the end of the expression, and allows me to use
`?` within the callback itself.
I had originally not used `Result` as the return type of the callback
because I was not entirely sure how I was going to use them, but it's now
clear that I _always_ use `Result` as the return type, and so there's no use
in trying to be too accommodating; it can always change in the future.
This is desirable not just for cleanup, but because trying to refactor
`asg_builder` into a pair of `Parser`s is really messy to chain without
flattening, especially given some state that has to leak temporarily to the
caller. More on that in a future commit.
DEV-11864
This was always the intent, but I didn't have a higher-level object
yet. This removes all the awkwardness that existed with working the root
in as an identifier.
DEV-11864
This wraps `Ident` in a new `Object` variant and modifies `Asg` so that its
nodes are of type `Object`.
This unfortunately requires runtime type checking. Whether or not that's
worth alleviating in the future depends on a lot of different things, since
it'll require my own graph implementation, and I have to focus on other
things right now. Maybe it'll be worth it in the future.
Note that this also gets rid of some doc examples that simply aren't worth
maintaining as the API evolves.
DEV-11864
A previous commit mentioned that there's not a place for `Dim`, and
duplicated it between `asg` and `xmlo`. Well, `Dtype` is also needed in
both, and so here's a home for now.
`Dtype` has always been an inappropriate detail for the system and will one
day be removed entirely in favor of higher-level types; the machine
representation is up to the compiler to decide.
DEV-11864
This matches xmlo::Dim, and could be the same thing, if we can find a home
for it in the future; it's not worth creating such a home right now when I'm
not yet sure what else ought to live there; the duplication may be fine.
The conversion from xmlo needs to be moved, and `Dim` is going to be used
for more than just identifiers (expressions will have type inference
performed).
DEV-11864
Previously, since the graph contained only identifiers, discovered roots
were stored in a separate vector and exposed to the caller. This not only
leaked details, but added complexity; this was left over from the
refactoring of the proof-of-concept linker some time ago.
This moves the root management into the ASG itself, mostly, with one item
being left over for now in the asg_builder (eligibility classifications).
There are two roots that were added automatically:
- __yield
- __worksheet
The former has been removed and is now expected to be explicitly mapped in
the return map, which is now enforced with an extern in `core/base`. This
is still special, in the sense that it is explicitly referenced by the
generated code, but there's nothing inherently special about it and I'll
continue to generalize it into oblivion in the future, such that the final
yield is just a convention.
`__worksheet` is the only symbol of type `IdentKind::Worksheet`, and so that
was generalized just as the meta and map entries were.
The goal in the future will be to have this more under the control of the
source language, and to consolodate individual roots under packages, so that
the _actual_ roots are few.
As far as the actual ASG goes: this introduces a single root node that is
used as the sole reference for reachability analysis and topological
sorting. The edges of that root node replace the vector that was removed.
DEV-11864
In the actual implementation (outside of tests), this is always looking up
before adding the symbol. This will simplify the API, while still retaining
errors, since the identifier will fail the state transition if the
identifier did not exist before attempting to set a fragment. So while this
is slower in microbenchmarks, this has no effect on real-world performance.
Further, I'm refactoring toward a streaming ASG aggregation, which is a lot
easier if we do not need to perform lookups in a separate step from the
ASG's primitives.
DEV-11864