This begins to introduce a graph traversal useful for a source
reconstruction from the current state of the ASG. The idea is to, after
having parsed and ingested the source through the lowering pipeline, to
re-output it to (a) prove that we have parsed correctly and (b) allow
progressively moving things from the XSLT-based compiler into TAMER.
There's quite a bit of documentation here; see that for more
information. Generalizing this in an appropriate way took some time, but I
think this makes sense (that work began with the introduction of cross edges
in terms of the tree described by the graph's ontology). But I do need to
come up with an illustration to include in the documentation.
DEV-13708
The `Pkg` span will now properly reflect the entire definition of the
package including the opening and closing tags.
This was found while I was working on a graph traversal.
DEV-13597
I noticed this while working on a graph traversal. The unit test used the
same span for both the reference _and_ the binding, so I didn't notice. -_-
The problem with this, though, is that we do not have a separate span
representing the source location of the identifier reference. The reason is
that we decided to re-use an existing node rather than creating another one,
which would add another inconvenient layer of indirection (and complexity).
So, I may have to add (optional?) spans to edges.
DEV-13708
This introduces the concept of ontological cross edges.
The term "cross edge" is most often seen in the context of graph traversals,
e.g. the trees formed by a depth-first search. This, however, refers to the
trees that are inherent in the ontology of the graph.
For example, an `ExprRef` will produce a cross edge to the referenced
`Ident`, that that is a different tree than the current expression. (Well,
I suppose technically it _could_ be a back edge, but then that'd be a cycle
which would fail the process once we get to preventing it. So let's ignore
that for now.)
DEV-13708
This was done so we can use t:param template with the generated
enum, but not have to provide the value in the YML test. Without
a NONE enum as 0, the default value of 0 in YML test will have
a domain violation.
This causes a package definition to be rooted (so that it can be easily
accessed for a graph walk). This keeps consistent with the new
`ObjectIndex`-based API by introducing a unit `Root` `ObjectKind` and the
boilerplate that goes with it.
This boilerplate, now glaringly obvious, will be refactored at some point,
since its repetition is onerous and distracting.
DEV-13159
Included in this diff are the corresponding changes to the graph to support
the change. Adding the edge was easy, but we also need a way to get the
package for an identifier. The easiest way to do that is to modify the edge
weight to include not just the target node type, but also the source.
DEV-13159
This does not yet create edges from identifiers to the package; just getting
this introduced was quite a bit of work, so I want to get this committed.
Note that this also includes a change to NIR so that `Close` contains the
entity so that we can pattern-match for AIR transformations rather than
retaining yet another stack with checks that are already going to be done by
AIR. This makes NIR stand less on its own from a self-validation point, but
that's okay, given that it's the language that the user entered and,
conceptually, they could enter invalid NIR the same as they enter invalid
XML (e.g. from a REPL).
In _practice_, of course, NIR is lowered from XML and the schema is enforced
during that lowering and so the validation does exist as part of that
parsing.
These concessions speak more to the verbosity of the language (Rust) than
anything.
DEV-13159
Rather than panicing at this level, let's panic at the caller, simplifying
impls and keeping them total.
This can't occur now, but an upcoming change introducing a package type will
allow for such a thing.
DEV-13159
This hides information that's taking up a lot of space in the parser traces
and is not useful information. In particular, the `index` contains a lot of
empty space due to pre-interned symbols.
The index was going to be converted into a HashMap, but that was reverted
because the tradeoff did not make sense, and so this problem remains; see
the previous commit for more information.
DEV-13159
This reverts commit 1b7eac337cd5909c01ede3a5b3fba577898d5961.
I don't actually think this ends up being worth it in the end. Sure, the
implementation is simpler at a glance, but it is more complex at runtime,
adding more cycles for little benefit.
There are ~220 pre-interned symbols at the time of writing, so ~880 bytes (4
bytes per symbol) are potentially wasted if _none_ of the pre-interned
symbols end up serving as identifiers in the graph. The reality is that
some of them _will_ but, but using HashMap also introduces overhead, so in
practice, the savings is much less. On a fairly small package, it was <100
bytes memory saving in `tamec`. For `tameld`, it actually uses _more_
memory, especially on larger packages, because there are 10s of thousands of
symbols involved. And we're incurring a rehashing cost on resize, unlike
this original plain `Vec` implementation.
So, I'm leaving this in the history to reference in the future or return to
it if others ask; maybe it'll be worth it in the future.
This was originally written before there were a bunch of preinterned
symbols. Now the index vector is very sparse.
This simplifies things a bit. If this ends up manifesting as a bottleneck
in the future, we can revisit the implementation. While this does result in
more cycles, it's neglibable relative to the total cycle count.
This commit is what I've been sitting on for testing some of the recent
changes; it is a very basic demonstration of lowering all the way down
from source XML files into the ASG. This can be run on real files to
observe, beyond unit tests, how the system reacts.
Once this outputs data from the graph, we'll finally have tamec end-to-end
and can just keep filling the gaps.
I'm hoping to roll the desugaring process into NirToAir rather than having a
separate process as originally planned a couple of months back.
This also introduces the `wip-nir-to-air` feature flag. Currently,
interpolation will cause a `Nir::BindIdent` to be emitted in blocks that
aren't yet emitting NIR, and so results in an invalid parse.
DEV-13159
This adds support for identifier references, adding `Ident` as a valid edge
type for `Expr`.
There is nothing in the system yet to enforce ontology through levels of
indirection; that will come later on.
I'm testing these changes with a very minimal NIR parse, which I'll commit
shortly.
DEV-13597
This was originally created to populate Neo4J for querying, but it has not
been utilized. It's become a maintenance burden as I try to change the API
of and encapsulate the graph, which is important for upholding its
invariants.
This feature, or one like it, will return in the future. I have other
related plans; we'll see if they materialize.
The graph can't be encapsulated fully just yet because of the linker; those
commits will come in the following days.
DEV-13597
This allows for edges to be multiple types, and gives us two important
benefits:
(a) Compiler-verified correctness to ensure that we don't generate graphs
that do not adhere to the ontology; and
(b) Runtime verification of types, so that bugs are still memory safe.
There is a lot more information in the documentation within the patch.
This took a lot of iterating to get something that was tolerable. There's
quite a bit of boilerplate here, and maybe that'll be abstracted away better
in the future as the graph grows.
In particular, it was challenging to determine how I wanted to actually go
about narrowing and looking up edges. Initially I had hoped to represent
the subsets as `ObjectKind`s as well so that you could use them anywhere
`ObjectKind` was expected, but that proved to be far too difficult because I
cannot return a reference to a subset of `Object` (the value would be owned
on generation). And while in a language like C maybe I'd pad structures and
cast between them safely, since they _do_ overlap, I can't confidently do
that here since Rust's discriminant and layout are not under my control.
I tried playing around with `std::mem::Discriminant` as well, but
`discriminant` (the function) requires a _value_, meaning I couldn't get the
discriminant of a static `Object` variant without some dummy value; wasn't
worth it over `ObjectRelTy.` We further can't assign values to enum
variants unless they hold no data. Rust a decade from now may be different
and will be interesting to look back on this struggle.
DEV-13597
We only need a reference to the inner object, for which `AsRef` is the
proper and idiomatic solution.
There is a lot of boilerplate here that I hope to reduce in the future.
DEV-13597
ObjectRelTo is sufficient and, while I originally thought it was useful to
have it read left-to-right, it just ends up being a cognitive burden.
DEV-13597
I'm spending a lot of time considering how the future system will work,
which is complicating the needs of the system now, which is to re-output the
source XML so that we can selectively start to replace things.
So I'm going to punt on this.
I was also planning out how that edge reassignment out to work, along with
traits to try to enforce it, and that is also complicated, so I may wind up
wanting to leave them in the end, or handling this
differently. Specifically, I'll want to know how `value-of` expressions are
going to work on the graph first, since its target is going to be dynamic
and therefore not knowable at compile-time. (Rather, I know how I want to
make them work, but I want to observe that working in practice first.)
DEV-13597
There is extensive rationale in the documentation for this new macro. I'm
utilizing it to provide a more clear and friendly message for incomplete
ident resolution so that I can move on and return to those situations later.
It's worth noting that:
- Externs _will_ need to be handled in the near-term;
- Opaque and IdentFragment almost certainly won't be bound to a definition
until I introduce LTO, which is quite a ways off; and
- They may use the same mechanism and so may be able to be handled at the
same time anyway.
DEV-13597
The ASG delegates certain operations to Objects so that they may enforce
their own invariants and ontology. It is therefore important that only
objects have access to certain methods on `Asg`, otherwise those invariants
could be circumvented.
It should be noted that the nesting of this module is such that AIR should
_not_ have privileged access to the ASG---it too must utilize objects to
ensure those invariants are enforced in a single place.
DEV-13597
Starting to re-organize things to match my mental model of the new system;
the ASG abstraction has changed quite a bit since the early days.
This isn't quite enough, though; see next commit.
DEV-13597
This provides the initial implementation allowing an identifier to be
defined (bound to an object and made transparent).
I'm not yet entirely sure whether I'll stick with the "transparent" and
"opaque" terminology when there's also "declare" and "define", but a
`Missing` state is a type of declaration and so the distinction does still
seem to be important.
There is still work to be done on `ObjectIndex::<Ident>::bind_definition`,
which will follow. I'm going to be balancing work to provide type-level
guarantees, since I don't have the time to go as far as I'd like.
DEV-13597
This seems to have been an oversight from when I recently introduced SPairs
to ASG; I noticed it while working on another change and receiving back a
`DUMMY_SPAN`.
DEV-13597
`Ident` is now `Opaque`, but the new `Transparent` state isn't actually used
yet in any transitions; that'll come next.
The original (now "opaque") identifiers were added for the linker, which
does not need (at present) the associated expressions, since they've already
been compiled. In the future I'd like to do LTO (link-time optimization),
and then the graph will need more information.
DEV-13160
Some investigation into the disassembly of TAMER's binaries showed that Rust
was not able to conditionalize `expect`-like expressions as I was hoping due
to eager evaluation language semantics in combination with the use of
`format!`.
This solves the problem for the diagnostic system be creating types that
prevent this situation from occurring statically, without the need for a
lint.
This invokes clippy as part of `make check` now, which I had previously
avoided doing (I'll elaborate on that below).
This commit represents the changes needed to resolve all the warnings
presented by clippy. Many changes have been made where I find the lints to
be useful and agreeable, but there are a number of lints, rationalized in
`src/lib.rs`, where I found the lints to be disagreeable. I have provided
rationale, primarily for those wondering why I desire to deviate from the
default lints, though it does feel backward to rationalize why certain lints
ought to be applied (the reverse should be true).
With that said, this did catch some legitimage issues, and it was also
helpful in getting some older code up-to-date with new language additions
that perhaps I used in new code but hadn't gone back and updated old code
for. My goal was to get clippy working without errors so that, in the
future, when others get into TAMER and are still getting used to Rust,
clippy is able to help guide them in the right direction.
One of the reasons I went without clippy for so long (though I admittedly
forgot I wasn't using it for a period of time) was because there were a
number of suggestions that I found disagreeable, and I didn't take the time
to go through them and determine what I wanted to follow. Furthermore, it
was hard to make that judgment when I was new to the language and lacked
the necessary experience to do so.
One thing I would like to comment further on is the use of `format!` with
`expect`, which is also what the diagnostic system convenience methods
do (which clippy does not cover). Because of all the work I've done trying
to understand Rust and looking at disassemblies and seeing what it
optimizes, I falsely assumed that Rust would convert such things into
conditionals in my otherwise-pure code...but apparently that's not the case,
when `format!` is involved.
I noticed that, after making the suggested fix with `get_ident`, Rust
proceeded to then inline it into each call site and then apply further
optimizations. It was also previously invoking the thread lock (for the
interner) unconditionally and invoking the `Display` implementation. That
is not at all what I intended for, despite knowing the eager semantics of
function calls in Rust.
Anyway, possibly more to come on that, I'm just tired of typing and need to
move on. I'll be returning to investigate further diagnostic messages soon.
This introduces a number of abstractions, whose concepts are not fully
documented yet since I want to see how it evolves in practice first.
This introduces the concept of edge ontology (similar to a schema) using the
type system. Even though we are not able to determine what the graph will
look like statically---since that's determined by data fed to us at
runtime---we _can_ ensure that the code _producing_ the graph from those
data will produce a graph that adheres to its ontology.
Because of the typed `ObjectIndex`, we're also able to implement operations
that are specific to the type of object that we're operating on. Though,
since the type is not (yet?) stored on the edge itself, it is possible to
walk the graph without looking at node weights (the `ObjectContainer`) and
therefore avoid panics for invalid type assumptions, which is bad, but I
don't think that'll happen in practice, since we'll want to be resolving
nodes at some point. But I'll addres that more in the future.
Another thing to note is that walking edges is only done in tests right now,
and so there's no filtering or anything; once there are nodes (if there are
nodes) that allow for different outgoing edge types, we'll almost certainly
want filtering as well, rather than panicing. We'll also want to be able to
query for any object type, but filter only to what's permitted by the
ontology.
DEV-13160
Working with the graph can be confusing with all of the layers
involved. This begins to provide a better layer of abstraction that can
encapsulate the concept and enforce invariants.
Since I'm better able to enforce invariants now, this also removes the span
from the diagnostic message, since the invariant is now always enforced with
certainty. I'm not removing the runtime panic, though; we can revisit that
if future profiling shows that it makes a negative impact.
DEV-13160
This addresses the two outstanding `todo!` match arms representing errors in
lowering expressions into the graph. As noted in the comments, these errors
are unlikely to be hit when using TAME in the traditional way, since
e.g. XIR and NIR are going to catch the equivalent problems within their own
contexts (unbalanced tags and a valid expression grammar respectively).
_But_, the IR does need to stand on its own, and I further hope that some
tooling maybe can interact more directly with AIR in the future.
DEV-13160
This introduces a number of concepts together, again to demonstrate that
they were derived.
This introduces support for nested expressions, extending the previous
work. It also supports error recovery for dangling expressions.
The parser states are a mess; there is a lot of duplicate code here that
needs refactoring, but I wanted to commit this first at a known-good state
so that the diff will demonstrate the need for the change that will
follow; the opportunities for abstraction are plainly visible.
The immutable stack introduced here could be generalized, if needed, in the
future.
Another important note is that Rust optimizes away the `memcpy`s for the
stack that was introduced here. The initial Parser Context was introduced
because of `ArrayVec` inhibiting that elision, but Vec never had that
problem. In the future, I may choose to go back and remove ArrayVec, but I
had wanted to keep memory allocation out of the picture as much as possible
to make the disassembly and call graph easier to reason about and to have
confidence that optimizations were being performed as intended.
With that said---it _should_ be eliding in tamec, since we're not doing
anything meaningful yet with the graph. It does also elide in tameld, but
it's possible that Rust recognizes that those code paths are never taken
because tameld does nothing with expressions. So I'll have to monitor this
as I progress and adjust accordingly; it's possible a future commit will
call BS on everything I just said.
Of course, the counter-point to that is that Rust is optimizing them away
anyway, but Vec _does_ still require allocation; I was hoping to keep such
allocation at the fringes. But another counter-point is that it _still_ is
allocated at the fringe, when the context is initialized for the parser as
part of the lowering pipeline. But I didn't know how that would all come
together back then.
...alright, enough rambling.
DEV-13160
I had wanted to implement expression operations in terms of user-defined
functions (where primitives are just marked as intrinsic), and would still
like to, but I need to get this thing working, so I'll just include a note
for now.
Yes, TAMER's formalisms are inspired by APL, if that hasn't been documented
anywhere yet.
DEV-13160
This commit is purposefully coupled with changes that utilize it to
demonstrate that the need for this abstraction has been _derived_, not
forced; TAMER doesn't aim to be functional for the sake of it, since
idiomatic Rust achieves many of its benefits without the formalisms.
But, the formalisms do occasionally help, and this is one such
example. There is other existing code that can be refactored to take
advantage of this style as well.
I do _not_ wish to pull an existing functional dependency into TAMER; I want
to keep these abstractions light, and eliminate them as necessary, as Rust
continues to integrate new features into its core. I also want to be able
to modify the abstractions to suit our particular needs. (This is _not_ a
general recommendation; it's particular to TAMER and to my experience.)
This implementation of `Functor` is one such example. While it is modeled
after Haskell in that it provides `fmap`, the primitive here is instead
`map`, with `fmap` derived from it, since `map` allows for better use of
Rust idioms. Furthermore, it's polymorphic over _trait_ type parameters,
not method, allowing for separate trait impls for different container types,
which can in turn be inferred by Rust and allow for some very concise
mapping; this is particularly important for TAMER because of the disciplined
use of newtypes.
For example, `foo.overwrite(span)` and `foo.overwrite(name)` are both
self-documenting, and better alternatives than, say, `foo.map_span(|_|
span)` and `foo.map_symbol(|_| name)`; the latter are perfectly clear in
what they do, but lack a layer of abstraction, and are verbose. But the
clarity of the _new_ form does rely on either good naming conventions of
arguments, or explicit type annotations using turbofish notation if
necessary.
This will be implemented on core Rust types as appropriate and as
possible. At the time of writing, we do not yet have trait specialization,
and there's too many soundness issues for me to be comfortable enabling it,
so that limits that we can do with something like, say, a generic `Result`,
while also allowing for specialized implementations based on newtypes.
DEV-13160
Admittedly, there are _my_ debugging conventions. But I'm also the only one
working on this project right now.
I want to keep various things around without cluttering untracked file
output, because finding new files can be annoying in all the output.
Really, with a C background, I should have known that `write` may not write
all bytes, and I'm pretty sure I was aware, so I'm not sure how that slipped
my mind for every call. But it's not a great default, and I do feel like
`write_all` should be the deafult behavior, despite the syscall and C
library name.
It shouldn't take clippy to warn about something so significant.
This uses `ObjectIndex` to automatically narrow the type to what is
expected.
Given that `ObjectIndex` is supposed to mean that there must be an object
with that index, perhaps the next step is to remove the `Option` from `get`
as well.
DEV-13160
This makes the system a bit more ergonomic and introduces additional type
safety by associating the narrowed object type with the
`ObjectIndex` (previously `ObjectRef`). Not only does this allow us to
explicitly state the type of object wherever those indices are stored, but
it also allows the API to automatically narrow to that type when operating
on it again without the caller having to worry about it.
DEV-13160
This begins to place expressions on the graph---something that I've been
thinking about for a couple of years now, so it's interesting to finally be
doing it.
This is going to evolve; I want to get some things committed so that it's
clear how I'm moving forward. The ASG makes things a bit awkward for a
number of reasons:
1. I'm dealing with older code where I had a different model of doing
things;
2. It's mutable, rather than the mostly-functional lowering pipeline;
3. We're dealing with an aggregate ever-evolving blob of data (the graph)
rather than a stream of tokens; and
4. We don't have as many type guarantees.
I've shown with the lowering pipeline that I'm able to take a mutable
reference and convert it into something that's both functional and
performant, where I remove it from its container (an `Option`), create a new
version of it, and place it back. Rust is able to optimize away the memcpys
and such and just directly manipulate the underlying value, which is often a
register with all of the inlining.
_But_ this is a different scenario now. The lowering pipeline has a narrow
context. The graph has to keep hitting memory. So we'll see how this
goes. But it's most important to get this working and measure how it
performs; I'm not trying to prematurely optimize. My attempts right now are
for the way that I wish to develop.
Speaking to #4 above, it also sucks that I'm not able to type the
relationships between nodes on the graph. Rather, it's not that I _can't_,
but a project to created a typed graph library is beyond the scope of this
work and would take far too much time. I'll leave that to a personal,
non-work project. Instead, I'm going to have to narrow the type any time
the graph is accessed. And while that sucks, I'm going to do my best to
encapsulate those details to make it as seamless as possible API-wise. The
performance hit of performing the narrowing I'm hoping will be very small
relative to all the business logic going on (a single cache miss is bound to
be far more expensive than many narrowings which are just integer
comparisons and branching)...but we'll see. Introducing branching sucks,
but branch prediction is pretty damn good in modern CPUs.
DEV-13160
This will be used for expression start and end spans to merge into a span
that represents the entirety of the expression; see future commits for its
use.
Though, this has been generalized further than that to ensure that it makes
sense in any use case, to avoid potential pitfalls.
DEV-13160
This adds a line of padding between the last line of a source marking and
the first line of a footer, making it easier to read. This also matches the
behavior of Rust's error message.
This is something I intended to do previously, but didn't have the
time. Not that I do now, but now that we'll be showing some more robust
diagnostics to users, it ought to look decent.
DEV-13430