I'm spending a lot of time considering how the future system will work,
which is complicating the needs of the system now, which is to re-output the
source XML so that we can selectively start to replace things.
So I'm going to punt on this.
I was also planning out how that edge reassignment out to work, along with
traits to try to enforce it, and that is also complicated, so I may wind up
wanting to leave them in the end, or handling this
differently. Specifically, I'll want to know how `value-of` expressions are
going to work on the graph first, since its target is going to be dynamic
and therefore not knowable at compile-time. (Rather, I know how I want to
make them work, but I want to observe that working in practice first.)
DEV-13597
There is extensive rationale in the documentation for this new macro. I'm
utilizing it to provide a more clear and friendly message for incomplete
ident resolution so that I can move on and return to those situations later.
It's worth noting that:
- Externs _will_ need to be handled in the near-term;
- Opaque and IdentFragment almost certainly won't be bound to a definition
until I introduce LTO, which is quite a ways off; and
- They may use the same mechanism and so may be able to be handled at the
same time anyway.
DEV-13597
The ASG delegates certain operations to Objects so that they may enforce
their own invariants and ontology. It is therefore important that only
objects have access to certain methods on `Asg`, otherwise those invariants
could be circumvented.
It should be noted that the nesting of this module is such that AIR should
_not_ have privileged access to the ASG---it too must utilize objects to
ensure those invariants are enforced in a single place.
DEV-13597
Starting to re-organize things to match my mental model of the new system;
the ASG abstraction has changed quite a bit since the early days.
This isn't quite enough, though; see next commit.
DEV-13597
This provides the initial implementation allowing an identifier to be
defined (bound to an object and made transparent).
I'm not yet entirely sure whether I'll stick with the "transparent" and
"opaque" terminology when there's also "declare" and "define", but a
`Missing` state is a type of declaration and so the distinction does still
seem to be important.
There is still work to be done on `ObjectIndex::<Ident>::bind_definition`,
which will follow. I'm going to be balancing work to provide type-level
guarantees, since I don't have the time to go as far as I'd like.
DEV-13597
This seems to have been an oversight from when I recently introduced SPairs
to ASG; I noticed it while working on another change and receiving back a
`DUMMY_SPAN`.
DEV-13597
`Ident` is now `Opaque`, but the new `Transparent` state isn't actually used
yet in any transitions; that'll come next.
The original (now "opaque") identifiers were added for the linker, which
does not need (at present) the associated expressions, since they've already
been compiled. In the future I'd like to do LTO (link-time optimization),
and then the graph will need more information.
DEV-13160
Some investigation into the disassembly of TAMER's binaries showed that Rust
was not able to conditionalize `expect`-like expressions as I was hoping due
to eager evaluation language semantics in combination with the use of
`format!`.
This solves the problem for the diagnostic system be creating types that
prevent this situation from occurring statically, without the need for a
lint.
This invokes clippy as part of `make check` now, which I had previously
avoided doing (I'll elaborate on that below).
This commit represents the changes needed to resolve all the warnings
presented by clippy. Many changes have been made where I find the lints to
be useful and agreeable, but there are a number of lints, rationalized in
`src/lib.rs`, where I found the lints to be disagreeable. I have provided
rationale, primarily for those wondering why I desire to deviate from the
default lints, though it does feel backward to rationalize why certain lints
ought to be applied (the reverse should be true).
With that said, this did catch some legitimage issues, and it was also
helpful in getting some older code up-to-date with new language additions
that perhaps I used in new code but hadn't gone back and updated old code
for. My goal was to get clippy working without errors so that, in the
future, when others get into TAMER and are still getting used to Rust,
clippy is able to help guide them in the right direction.
One of the reasons I went without clippy for so long (though I admittedly
forgot I wasn't using it for a period of time) was because there were a
number of suggestions that I found disagreeable, and I didn't take the time
to go through them and determine what I wanted to follow. Furthermore, it
was hard to make that judgment when I was new to the language and lacked
the necessary experience to do so.
One thing I would like to comment further on is the use of `format!` with
`expect`, which is also what the diagnostic system convenience methods
do (which clippy does not cover). Because of all the work I've done trying
to understand Rust and looking at disassemblies and seeing what it
optimizes, I falsely assumed that Rust would convert such things into
conditionals in my otherwise-pure code...but apparently that's not the case,
when `format!` is involved.
I noticed that, after making the suggested fix with `get_ident`, Rust
proceeded to then inline it into each call site and then apply further
optimizations. It was also previously invoking the thread lock (for the
interner) unconditionally and invoking the `Display` implementation. That
is not at all what I intended for, despite knowing the eager semantics of
function calls in Rust.
Anyway, possibly more to come on that, I'm just tired of typing and need to
move on. I'll be returning to investigate further diagnostic messages soon.
This introduces a number of abstractions, whose concepts are not fully
documented yet since I want to see how it evolves in practice first.
This introduces the concept of edge ontology (similar to a schema) using the
type system. Even though we are not able to determine what the graph will
look like statically---since that's determined by data fed to us at
runtime---we _can_ ensure that the code _producing_ the graph from those
data will produce a graph that adheres to its ontology.
Because of the typed `ObjectIndex`, we're also able to implement operations
that are specific to the type of object that we're operating on. Though,
since the type is not (yet?) stored on the edge itself, it is possible to
walk the graph without looking at node weights (the `ObjectContainer`) and
therefore avoid panics for invalid type assumptions, which is bad, but I
don't think that'll happen in practice, since we'll want to be resolving
nodes at some point. But I'll addres that more in the future.
Another thing to note is that walking edges is only done in tests right now,
and so there's no filtering or anything; once there are nodes (if there are
nodes) that allow for different outgoing edge types, we'll almost certainly
want filtering as well, rather than panicing. We'll also want to be able to
query for any object type, but filter only to what's permitted by the
ontology.
DEV-13160
Working with the graph can be confusing with all of the layers
involved. This begins to provide a better layer of abstraction that can
encapsulate the concept and enforce invariants.
Since I'm better able to enforce invariants now, this also removes the span
from the diagnostic message, since the invariant is now always enforced with
certainty. I'm not removing the runtime panic, though; we can revisit that
if future profiling shows that it makes a negative impact.
DEV-13160
This addresses the two outstanding `todo!` match arms representing errors in
lowering expressions into the graph. As noted in the comments, these errors
are unlikely to be hit when using TAME in the traditional way, since
e.g. XIR and NIR are going to catch the equivalent problems within their own
contexts (unbalanced tags and a valid expression grammar respectively).
_But_, the IR does need to stand on its own, and I further hope that some
tooling maybe can interact more directly with AIR in the future.
DEV-13160
This introduces a number of concepts together, again to demonstrate that
they were derived.
This introduces support for nested expressions, extending the previous
work. It also supports error recovery for dangling expressions.
The parser states are a mess; there is a lot of duplicate code here that
needs refactoring, but I wanted to commit this first at a known-good state
so that the diff will demonstrate the need for the change that will
follow; the opportunities for abstraction are plainly visible.
The immutable stack introduced here could be generalized, if needed, in the
future.
Another important note is that Rust optimizes away the `memcpy`s for the
stack that was introduced here. The initial Parser Context was introduced
because of `ArrayVec` inhibiting that elision, but Vec never had that
problem. In the future, I may choose to go back and remove ArrayVec, but I
had wanted to keep memory allocation out of the picture as much as possible
to make the disassembly and call graph easier to reason about and to have
confidence that optimizations were being performed as intended.
With that said---it _should_ be eliding in tamec, since we're not doing
anything meaningful yet with the graph. It does also elide in tameld, but
it's possible that Rust recognizes that those code paths are never taken
because tameld does nothing with expressions. So I'll have to monitor this
as I progress and adjust accordingly; it's possible a future commit will
call BS on everything I just said.
Of course, the counter-point to that is that Rust is optimizing them away
anyway, but Vec _does_ still require allocation; I was hoping to keep such
allocation at the fringes. But another counter-point is that it _still_ is
allocated at the fringe, when the context is initialized for the parser as
part of the lowering pipeline. But I didn't know how that would all come
together back then.
...alright, enough rambling.
DEV-13160
I had wanted to implement expression operations in terms of user-defined
functions (where primitives are just marked as intrinsic), and would still
like to, but I need to get this thing working, so I'll just include a note
for now.
Yes, TAMER's formalisms are inspired by APL, if that hasn't been documented
anywhere yet.
DEV-13160
This commit is purposefully coupled with changes that utilize it to
demonstrate that the need for this abstraction has been _derived_, not
forced; TAMER doesn't aim to be functional for the sake of it, since
idiomatic Rust achieves many of its benefits without the formalisms.
But, the formalisms do occasionally help, and this is one such
example. There is other existing code that can be refactored to take
advantage of this style as well.
I do _not_ wish to pull an existing functional dependency into TAMER; I want
to keep these abstractions light, and eliminate them as necessary, as Rust
continues to integrate new features into its core. I also want to be able
to modify the abstractions to suit our particular needs. (This is _not_ a
general recommendation; it's particular to TAMER and to my experience.)
This implementation of `Functor` is one such example. While it is modeled
after Haskell in that it provides `fmap`, the primitive here is instead
`map`, with `fmap` derived from it, since `map` allows for better use of
Rust idioms. Furthermore, it's polymorphic over _trait_ type parameters,
not method, allowing for separate trait impls for different container types,
which can in turn be inferred by Rust and allow for some very concise
mapping; this is particularly important for TAMER because of the disciplined
use of newtypes.
For example, `foo.overwrite(span)` and `foo.overwrite(name)` are both
self-documenting, and better alternatives than, say, `foo.map_span(|_|
span)` and `foo.map_symbol(|_| name)`; the latter are perfectly clear in
what they do, but lack a layer of abstraction, and are verbose. But the
clarity of the _new_ form does rely on either good naming conventions of
arguments, or explicit type annotations using turbofish notation if
necessary.
This will be implemented on core Rust types as appropriate and as
possible. At the time of writing, we do not yet have trait specialization,
and there's too many soundness issues for me to be comfortable enabling it,
so that limits that we can do with something like, say, a generic `Result`,
while also allowing for specialized implementations based on newtypes.
DEV-13160
This uses `ObjectIndex` to automatically narrow the type to what is
expected.
Given that `ObjectIndex` is supposed to mean that there must be an object
with that index, perhaps the next step is to remove the `Option` from `get`
as well.
DEV-13160
This makes the system a bit more ergonomic and introduces additional type
safety by associating the narrowed object type with the
`ObjectIndex` (previously `ObjectRef`). Not only does this allow us to
explicitly state the type of object wherever those indices are stored, but
it also allows the API to automatically narrow to that type when operating
on it again without the caller having to worry about it.
DEV-13160
This begins to place expressions on the graph---something that I've been
thinking about for a couple of years now, so it's interesting to finally be
doing it.
This is going to evolve; I want to get some things committed so that it's
clear how I'm moving forward. The ASG makes things a bit awkward for a
number of reasons:
1. I'm dealing with older code where I had a different model of doing
things;
2. It's mutable, rather than the mostly-functional lowering pipeline;
3. We're dealing with an aggregate ever-evolving blob of data (the graph)
rather than a stream of tokens; and
4. We don't have as many type guarantees.
I've shown with the lowering pipeline that I'm able to take a mutable
reference and convert it into something that's both functional and
performant, where I remove it from its container (an `Option`), create a new
version of it, and place it back. Rust is able to optimize away the memcpys
and such and just directly manipulate the underlying value, which is often a
register with all of the inlining.
_But_ this is a different scenario now. The lowering pipeline has a narrow
context. The graph has to keep hitting memory. So we'll see how this
goes. But it's most important to get this working and measure how it
performs; I'm not trying to prematurely optimize. My attempts right now are
for the way that I wish to develop.
Speaking to #4 above, it also sucks that I'm not able to type the
relationships between nodes on the graph. Rather, it's not that I _can't_,
but a project to created a typed graph library is beyond the scope of this
work and would take far too much time. I'll leave that to a personal,
non-work project. Instead, I'm going to have to narrow the type any time
the graph is accessed. And while that sucks, I'm going to do my best to
encapsulate those details to make it as seamless as possible API-wise. The
performance hit of performing the narrowing I'm hoping will be very small
relative to all the business logic going on (a single cache miss is bound to
be far more expensive than many narrowings which are just integer
comparisons and branching)...but we'll see. Introducing branching sucks,
but branch prediction is pretty damn good in modern CPUs.
DEV-13160
This ASG implementation is a refactored form of original code from the
proof-of-concept linker, which was well before the span and diagnostic
implementations, and well before I knew for certain how I was going to solve
that problem.
This was quite the pain in the ass, but introduces spans to the AIR tokens
and graph so that we always have useful diagnostic information. With that
said, there are some important things to note:
1. Linker spans will originate from the `xmlo` files until we persist
spans to those object files during `tamec`'s compilation. But it's
better than nothing.
2. Some additional refactoring is still needed for consistency, e.g. use
of `SPair`.
3. This is just a preliminary introduction. More refactoring will come as
tamec is continued.
DEV-13041
This newtype allows a caller to prove (using types) that a parser of a given
type (`ParseState`) has been finalized.
This will be used by the lowering pipeline to ensure that all parsers in the
pipeline end up getting finalized (as you can see from a TODO added in the
code, one of them is missing). The lack of such a type was an oversight
during the (rather stressed) development of the parsing system, and I
shouldn't need to resort to unit tests to verify that parsers have been
finalized.
DEV-13158
This was accepting an early EOF when the active child `ParseState` was in an
accepting state, because it was not ensuring that anything on the stack was
also accepting.
Ideally, there should be nothing on the stack, and hopefully in the future
that's what happens. But with how things are today, it's important that, if
anything is on the stack, it is accepting.
Since `is_accepting` on the superstate is only called during finalization,
and because the check terminates early, and because the stack practically
speaking will only have a couple things on it max (unless we're in tail
position in a deeply nested tree, without TCO [yet]), this shouldn't be an
expensive check.
Implementing this did require that we expose `Context` to `is_accepting`,
which I had hoped to avoid having to do, but here we are.
DEV-7145
This produces useful parse traces that are output as part of a failing test
case. The parser generator macros can be a bit confusing to deal with when
things go wrong, so this helps to clarify matters.
This is _not_ intended to be machine-readable, but it does show that it
would be possible to generate machine-readable output to visualize the
entire lowering pipeline. Perhaps something for the future.
I left these inline in Parser::feed_tok because they help to elucidate what
is going on, just by reading what the trace would output---that is, it helps
to make the method more self-documenting, albeit a tad bit more
verbose. But with that said, it should probably be extracted at some point;
I don't want this to set a precedent where composition is feasible.
Here's an example from test cases:
[Parser::feed_tok] (input IR: XIRF)
| ==> Parser before tok is parsing attributes for `package`.
| | Attrs_(SutAttrsState_ { ___ctx: (QName(None, LocalPart(NCName(SymbolId(46 "package")))), OpenSpan(Span { len: 0, offset: 0, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10)), ___done: false })
|
| ==> XIRF tok: `<unexpected>`
| | Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))
|
| ==> Parser after tok is expecting opening tag `<classify>`.
| | ChildA(Expecting_)
| | Lookahead: Some(Lookahead(Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))))
= note: this trace was output as a debugging aid because `cfg(test)`.
[Parser::feed_tok] (input IR: XIRF)
| ==> Parser before tok is expecting opening tag `<classify>`.
| | ChildA(Expecting_)
|
| ==> XIRF tok: `<unexpected>`
| | Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))
|
| ==> Parser after tok is attempting to recover by ignoring element with unexpected name `unexpected` (expected `classify`).
| | ChildA(RecoverEleIgnore_(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)))
| | Lookahead: None
= note: this trace was output as a debugging aid because `cfg(test)`.
DEV-7145
This finally uses `parse` all the way up to aggregation into the ASG, as can
be seen by the mess in `poc`. This will be further simplified---I just need
to get this committed so that I can mentally get it off my plate. I've been
separating this commit into smaller commits, but there's a point where it's
just not worth the effort anymore. I don't like making large changes such
as this one.
There is still work to do here. First, it's worth re-mentioning that
`poc` means "proof-of-concept", and represents things that still need a
proper home/abstraction.
Secondly, `poc` is retrieving the context of two parsers---`LowerContext`
and `Asg`. The latter is desirable, since it's the final aggregation point,
but the former needs to be eliminated; in particular, packages need to be
worked into the ASG so that `found` can be removed.
Recursively loading `xmlo` files still happens in `poc`, but the compiler
will need this as well. Once packages are on the ASG, along with their
state, that responsibility can be generalized as well.
That will then simplify lowering even further, to the point where hopefully
everything has the same shape (once final aggregation has an abstraction),
after which we can then create a final abstraction to concisely stitch
everything together. Right now, Rust isn't able to infer `S` for
`Lower<S, LS>`, which is unfortunate, but we'll be able to help it along
with a more explicit abstraction.
DEV-11864
This was always the intent, but I didn't have a higher-level object
yet. This removes all the awkwardness that existed with working the root
in as an identifier.
DEV-11864
This wraps `Ident` in a new `Object` variant and modifies `Asg` so that its
nodes are of type `Object`.
This unfortunately requires runtime type checking. Whether or not that's
worth alleviating in the future depends on a lot of different things, since
it'll require my own graph implementation, and I have to focus on other
things right now. Maybe it'll be worth it in the future.
Note that this also gets rid of some doc examples that simply aren't worth
maintaining as the API evolves.
DEV-11864
A previous commit mentioned that there's not a place for `Dim`, and
duplicated it between `asg` and `xmlo`. Well, `Dtype` is also needed in
both, and so here's a home for now.
`Dtype` has always been an inappropriate detail for the system and will one
day be removed entirely in favor of higher-level types; the machine
representation is up to the compiler to decide.
DEV-11864
asg_builder is about to be replaced, but in the process of simplifying the
destination IR (the ASG), I'm moving things into the proper place. This
never belonged here---it belongs with the actual lowering operation.
Previously, this was not reasoned about in terms of a lowering operation,
and was written when I was first introducing myself to Rust and trying to
get a proof-of-concept linker working.
DEV-11864
This matches xmlo::Dim, and could be the same thing, if we can find a home
for it in the future; it's not worth creating such a home right now when I'm
not yet sure what else ought to live there; the duplication may be fine.
The conversion from xmlo needs to be moved, and `Dim` is going to be used
for more than just identifiers (expressions will have type inference
performed).
DEV-11864
Previously, since the graph contained only identifiers, discovered roots
were stored in a separate vector and exposed to the caller. This not only
leaked details, but added complexity; this was left over from the
refactoring of the proof-of-concept linker some time ago.
This moves the root management into the ASG itself, mostly, with one item
being left over for now in the asg_builder (eligibility classifications).
There are two roots that were added automatically:
- __yield
- __worksheet
The former has been removed and is now expected to be explicitly mapped in
the return map, which is now enforced with an extern in `core/base`. This
is still special, in the sense that it is explicitly referenced by the
generated code, but there's nothing inherently special about it and I'll
continue to generalize it into oblivion in the future, such that the final
yield is just a convention.
`__worksheet` is the only symbol of type `IdentKind::Worksheet`, and so that
was generalized just as the meta and map entries were.
The goal in the future will be to have this more under the control of the
source language, and to consolodate individual roots under packages, so that
the _actual_ roots are few.
As far as the actual ASG goes: this introduces a single root node that is
used as the sole reference for reachability analysis and topological
sorting. The edges of that root node replace the vector that was removed.
DEV-11864
In the actual implementation (outside of tests), this is always looking up
before adding the symbol. This will simplify the API, while still retaining
errors, since the identifier will fail the state transition if the
identifier did not exist before attempting to set a fragment. So while this
is slower in microbenchmarks, this has no effect on real-world performance.
Further, I'm refactoring toward a streaming ASG aggregation, which is a lot
easier if we do not need to perform lookups in a separate step from the
ASG's primitives.
DEV-11864
These traits are no longer necessary now that I'm using concrete types; they
just add unnecessary noise and confusion as I attempt to further refactor.
Don't abstract prematurely.
DEV-11864
This removes the generic on the Asg (which was formerly BaseAsg),
hard-coding `IdentObject`, which will further evolve. This makes the IR an
actual concrete IR rather than an abstract data structure.
These tests bring me back a bit, since they were written as I was still
becoming familiar with Rust.
DEV-11864
This is the beginning of an incremental refactoring to remove generics, to
simplify the ASG. When I initially wrote the linker, I wasn't sure what
direction I was going in, but I was also negatively influenced by more
traditional approaches to both design and unit testing.
If we're going to call the ASG an IR, then it needs to be one---if the core
of the IR is generic, then it's more like an abstract data structure than
anything. We can abstract around the IR to slice it up into components that
are a little easier to reason about and understand how responsibilities are
segregated.
DEV-11864
RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no
"Group"), but I'm not sure if the legal name has been changed yet or not, so
I'll wait on that.
This replaces u8 and will be used for the new XmloReader.
Previously I wasn't sure what direction TAMER was going to go in with
regards to dimensionality, but I do not expect that higher dimensions will
be supported, and if they are, they'd very likely compile down to lower ones
and create an illusion of higher-dimensionality.
Whatever the future holds, it's not used today, and I'd rather these types
be correct.
ASG needs changing too, but one step at a time.
DEV-10863
This is simply not worth it; the size is not going to be the bottleneck (at
least any time soon) and the generic not only pollutes all the things that
will use ASG in the near future, but is also incompatible with the SymbolId
default that is used everywhere; if we have to force it to 32 bits anyway,
then we may as well just default it right off the bat.
I thought that this seemed like a good idea at the time, and saving bits is
certainly tempting, but it was premature.
See the previous commit. There is no sense in some common "IR" namespace,
since those IRs should live close to whatever system whose data they
represent.
In the case of these, they are general IRs that can apply to many different
parts of the system. If that proves to be a false statement, they'll be
moved.
DEV-10863