This has always been a lowering operation, but it was not phrased in terms
of it, which made the process a bit more confusing to understand.
The implementation hasn't changed, but this is an incremental refactoring
and so exposes BaseAsg and its `graph` field temporarily.
DEV-10859
Sections, as written, are specific to xmle files.
I think the intent originally was to have this be more generic, but that
doesn't really make sense.
By explicitly coupling it with `xmle` files, that will allow us to turn this
into a proper lowering operation with its own validations that will allow
`xmle::xir` to do its job without having to validate anything itself.
This outputs enough information to be a little bit useful in the event of an
error. In the future, we'll want to provide a (likely non-Display)
implementation that provides line number and source file context with
the problem characters indicated, like Rust.
This is a significant departure from my original plans---this makes it
_easy_ to display symbol values, despite me not wanting that to occur unless
absolutely necessary.
The reality is, based on the design of the system, they will only occur in
these situations:
1. Writing to files;
2. Displaying errors;
3. Tests; or
4. People not following the design of the system.
The fourth one is the most risky as people begin to contribute in the
future, but the reality is that those can be fixed as they are encountered,
since if they're not showing up in a profiler, then they must not be causing
much of a problem.
This removes `SymbolStr` in favor of, simply, `&'static str`.
The abstraction provided no additional safety since the slice was trivially
extracted (and commonly, in practice), and was inconvenient to work with.
This is part of a process of relaxing lookups so that symbols can be
conveniently displayed in errors; rather than trying to prevent the
developer from doing something bad, we'll just rely on conventions, hope
that it doesn't happen, and if it does, address it either at that time or
when it shows up in the profiler.
The docs still need to be improved, but they can be touched as we go.
This concludes the initial development of XIR. That was much more involved
that I had originally intended, but the result is good.
DEV-10561
This generalizes it a bit and provides tests, which was always the intent;
the existing code was POC to determine if this could be done without
performance degradation (see that commit for more information).
The intent is to support the composition and decomposition of spans such
that (A, B) is as documented here. This only performs the trivial case for
the sake of providing a convenient API when the developer would otherwise
just type (S, S).
This is intended to represent the sections written to the final xmle file,
and there was unnecessary complexity in separating everything.
By reducing this IR further, we can begin to constrain its types to
eliminate some of the runtime panics and error checking we have/had in the
writer.
The new writer has reached parity of the old, with the exception of some
edge case explicit error handling that should never occur (which will be
added), and cleanup/docs.
Removing this flag now allows me to perform that cleanup without having to
worry about updating the now-old implementation.
I ran `tameld` with the new writer against our production system with
numerous programs and a significant number of test cases, and diff'd the old
and new xmle files, and everything looks good.
This is a significant milestone, in the sense that it is the culmination of
the past month or so of work to prove that an Iterator-based XIR will be
viable for the system.
This barely had any impact on the performance from the previous commit
reporting the profiling. This performs at least as well as the quick-xml
based writer. In isolated benchmarks, it performs better, but in the real
world, the linker spends most of its time reading xmlo files, and so minor
differences in writing do not have a significant overall impact.
With that said, a lot of cleanup and documentation is still needed. That is
the subject of the upcoming commits, before this writer can finalized.
The previous iterators had to be used in a certain order because they mixed
concerns, out of concern for performance. This attempts to chain even more
iterators to see how it may perform.
To be clear: this will be cleaned up. This was just an experiment.
Here were profiles on the average of 50 runs of linking our largest program:
Baseline, pre-XIR (with fragments removed from output) 0.8082
XIR writer, pre-ElemWrap, no #[inline] 0.7844s
XIR writer, ElemWrap, no #[inline] 0.7918s
XIR writer, ElemWrap, inlines in obj::xmle::xir 0.7892s
XIR writer, ElemWrap, inlines in obj::xmle::xir and ir::asg::section 0.7858s
XIR writer, ElemWrap, inline in only ir::asg::section 0.781s
Pre-ElemWrap, inlines in ir::asg::section 0.7772s
These profiles are difficult, because they hit the filesystem so much. I
write to /dev/null, but it reads 100s of xmlo files from disk.
It's clear that the impact is fairly modest and within a margin of error; as
such, I will continue down the path of writing code that's easier to grok
and maintain, since not doing so would be a micro-optimization relative to
the concerns of the rest of the system at this point.
But the purpose of all of this work was to determine whether an
iterator-based XIR would be viable. It seems to be competitive. I'll
finish up the writer reimplementation and move on.
This contains some awkward coupling for opening and closing tags to reduce
the complexity of the `Iterator` types that must be manually
specified. That may be addressed shortly.
This was creating a heap-allocated `Vec` for each map symbol despite not
actually needing it. We do have multiple froms for return map values.
But by the time we may want this type of thing, we'll have a different IR
for it anyway.
See the docs for a much deeper discussion. In summary: traits do not
support static methods, and this is the workaround, which relies on unstable
nightly constant function features.
This implementation is tested using `qname_const!`, and will be utilized
with a new static type in a following commit.
This is to support two things:
1. Early switch to 2021 Edition, which is stable Oct 21; and
2. To make use of unstable const features.
The rationale is that switching to nightly does not really have any
significant downside for us, given that TAMER is used only by us and
the only risk is that unstable features may change a bit, which can be
mitigated with certain precautions.
The rationale for each unstable feature will be documented as they are used,
including documentation on what would be required to remove it and what
functionality would be lost / need to change in doing so.
This is far from fully documented; it's just a start. I'll document fully
once the implementation is done, to ensure I don't waste time documenting
things that may change.
These are getting large and messy.
And I now notice that I never completed the header test after
prototyping. Shame on me.
Also, errata from the previous commit message: the diffs are identical
_except for attribute escaping_ that is unnecessary; we're outputting data
read directly from existing XML files (output by Saxon), so characters are
already escaped as needed.
DEV-10561
The `l:dep` section of the `xmle` file, after formatting (since XIR writes
without newlines and indentation), is now identical to the existing xmle
writer. I can now move on to the other sections.
Note that the attribute movement in this commit is simply to get the diff to
properly align. Once the current xmle writer is removed, I'll organize them
a bit more sensibly.
`obj::xmle::xir` also needs documentation, now that it's shown to be viable.
The new xmle writer was having to intern before write, which did not make
sense.
This continues with consistently using symbols throughout the system, and
is a smaller size than `String` as a bonus.
`IdentKind` needs to be written to `xmle` files and displayed in error
messages. String slices were used when quick-xml was used for writing,
which will be going away with the new writer.
This has been a long time coming, and has been repeatedly stashed as other
parts of the system have evolved to support it. The introduction of the XIR
tree was to write tests for this (which are sloppy atm).
This currently writes out the `xmle` header and _most_ of the `l:dep`
section; it's missing the object-type-specific attributes. There is,
relatively speaking, not much more work to do here.
The feature flag `wip-xir-xmle-writer` was introduced to toggle this system
in place of `XmleWriter`. Initial benchmarks show that it will be
competitive with the quick-xml-based writer, but remember that is not the
goal: the purpose of this is to test XIR in a production system before we
continue to implement it for a frontend, and to refactor so that we do not
have multiple implementations writing XML files (once we echo the source XML
files).
I'm excited to get this done with so that I can move on. This has been
rather exhausting.
The 16-bit interner at present will be used only for span contexts. In the
future, this interner may become specialized specifically for that, but for
now let's just re-use what we already have so that I can move on.
DEV-10733
I want to make it clear in the assertion that the problem could be caused by
duplicate strings. We do not sort by string, because in part we may in the
future want to group certain symbols together in some arbitrary way so we
can compare ranges (using the markers).
If that doesn't end up happening, it may be better to just sort by string
to obviate the problem.
It's really awkward not having them caps, when not only are constants
expected to be, but also that we cannot maintain consistency between the
string and the identifier name in even the simplest of cases.
(We could use `r#`, but that's too cumbersome.)
`StaticSymbolId` was created before the more specific types, which render it
unnecessary. If we need a generic type, it can be re-introduced, but using
`static_symbol_newtypes!`.
This is the interner that is intended to be used with the majority of the
system; the 16-bit interner is left around for the moment, but will likely
later become specialized.
This had the writing on the wall all the same as the `'i` interner lifetime
that came before it. It was too much of a maintenance burden trying to
accommodate both 16-bit and 32-bit symbols generically.
There is a situation where we do still want 16-bit symbols---the
`Span`. Therefore, I have left generic support for symbol sizes, as well as
the different global interners, but `SymbolId` now defaults to 32-bit, as
does `Asg`. Further, the size parameter has been removed from the rest of
the code, with the exception of `Span`.
This cleans things up quite a bit, and is much nicer to work with. If we
want 16-bit symbols in the future for packing to increase CPU cache
performance, we can handle that situation then in that specific case; it's a
premature optimization that's not at all worth the effort here.
We'll see how the syntax evolves over time. It's not ideal to have to
specify the type, rather than having the compiler infer it, but I don't much
feel like getting into my first procedural macro right now, so we'll stick
with this approach for the time being.
This will set the stage to be able to safely e.g. create QNames statically
at compile-time and would allow us to make any attempts to bypass it
unsafe.
Previously, we were allocating only u32 versions of `SymbolId` for the
statically allocated symbols. This introduces a new symbol type with a very
small datatype (8 bits) that is able to cast into any `SymbolId`. This is
explained in the docs.
We'll be taking this typing further in future commits so that static symbols
are better-suited for compile-time guarantees for static newtype
construction.
DEV-10710
This is the beginning of static symbols, which is becoming increasing
necessary as it's quite a pain to have to deal with interning static strings
any place they're used.
It's _more_ of a pain to do that in conjunction with newtypes (e.g. `QName`,
`AttValue`, etc) that make use of `SymbolId`; this will allow us to
construct _those_ statically as well, and additional work to support that
will be coming up.
DEV-10701
These were using GiB of memory, which is ...unnecessary.
I reduced the iteration count significantly, but it was still wasting a lot
of time and memory and needed `with_capacity` to reduce the number of copies
after reallocation.
It is not typical that a buffer would contain this much information.
This broke when I removed `SelfClose`. I used to run
`make all fmt check bench` before every push, but they take a while to run,
in part because it uses nightly and has to recompile too.
But it looks like I need to be more diligent again.
This is exactly was I said I was _not_ going to do in the previous commit,
but apparently hacking late at night had me forget the whole reason that
XIRT is being introduced now---unit tests. I'll be emitting a XIR stream
and I need to parse it for convenience in the tests.
So, here's a good start. Next will be some generalizations that are useful
for the tests as well. This is pretty bare, but accomplishes the task.
See docs for more info.
The `tree` module is getting more difficult to navigate. The tests still
remain where they were, since a bunch of concerns are mixed together. Any
tests specific only to this module will be added here.
This is implemented only for the writer, since its use case is to be able to
concatenate strings without copying during writing.
It doesn't really make sense to support this in XIR Tree, since a reader
should never produce this. But if we ever run into this (e.g. due to some
internal processing pipeline), we'll address it then; XIR Tree might have to
do copying, then, but should probably wait until encountering all fragments
before interning. That'd be a distraction right now.
This commit will make more sense once the broader context is committed, but
it's needed for lowering from `Sections` into a XIR stream.
This will also change once we pre-allocate symbols, like rustc, when the
interner is initialized.
This is my first use of the `paste` crate, which is used to generate
identifiers. So this is partly an experiment, and it seems much better than
having to write a proc macro, at least at this point in time. If this code
stays around, it'll probably be generalized further and used elsewhere, but
I'd prefer not to go this route long-term.
This moves some logic into `ElementStack` (which would be part of `Stack` if
variants were their own types), rather than peering so deeply into its
data.
This correctly retains and restores the parent stack after processing an
attribute for a child element.
This does increase the size of [`Stack`] a bit, but we can evaluate whether
it's too large at a later time. It's currently 832 bits with `Ix=u32`,
which is large, but the question is whether it matters; we'll see as we
begin to use it.
This moves most of the parsing logic into `Stack`, which rightfully owns the
stack manipulation and state transitions. `ParserState` becomes exactly
what it says it is---a management of the persistent state of the parser, and
is also responsible for digesting tokens and dispatching their data to the
proper event.
This approach has a number of benefits over the old design: it's
self-documenting, making the intent clear; and it is easier to reason about
the subset of states (for both humans and Rusts) than a large match of
transitions.
This contains a number of TODO items that will be addressed shortly. It
also obviated that the previous commit was incomplete---it doesn't persist
`pstack` for attributes on child elements! That'll be fixed too.
This modifies the tree parser to handle child elements. It's mostly
proof-of-concept code; the next commit will clean it up a bit so that it's
largely self-documenting.
This removes `SelfClose` and merges it with `Close` by making the first
parameter an `Option`. This isn't really ideal, but it really simplifies
pattern matching, especially for the next commit. I'll have more details
there.
The primary motivation was lack of stabalization for binding after `@` in
matches, e.g. `Foo(name, ele) | ele @ Element { name, .. }`. It looks like
it's ready, though; maybe next Rust release?
https://github.com/rust-lang/rust/issues/65490
I don't know if I'll revert this change after then. This seems plenty
clear, albeit more verbose.
This introduces parser errors, but does not yet support error recovery; that
problem will be discussed in a commit in the near future, after the writer
is sorted out a bit more.
DEV-10561
The idea, previously, was that parsing could begin at attributes selectively
and be parsed independently. But that's really awkward with `Tree`, since
it effectively allows orphan attributes as children of an
`Element`. Nonsense.
Instead, if we truly only want an attribute list, we can offer a function to
create a parser with an empty `Stack::BuddingElement` that can accumulate
them.
Previously, `parser_from` was a simple wrapper around `parse`; now, this
provides a more convenient API where `next` will yield the next parsed
object.
See docs for much more information and rationale.
These traits are intended to eliminate boilerplate, primarily in tests, in
situations where from/into is not expected to fail.
Given that TAMER must only panic for internal compiler errors, this should
not often be used outside of test cases. Further, there may be better
options in the future (e.g. QNames could be statically compiled rather than
trying to convert at runtime, in this case).
This begins to introduce the XIR tree. I was originally going to wait on
this until after implementing the xmle writer in terms of XIR, but writing
unit tests is too much of a pain on the stream, so now is as good of a time
as any.
This has very limited support so far; it'll be added to as time goes on.
These groups happen to correspond with the sections of the xmle file, which
suggests again that this lives in the wrong place. But I should really have
my focus elsewhere right now, so I don't know if I'll go any further right
now. I guess we'll see as the writer is reimplemented.
`SectionsIter` was introduced to remove that responsibility from xmle
writer, since that's currently being reimplemented using XIR.
The existing iterator has been renamed SectionIter{ator=>} for a more
idiomatic name for iterator structs, and now has a static type rather than
relying on dynamic dispatch. The author of that code wasn't sure how to
handle it otherwise. (Which is understandable, since we were both still
getting acquainted with Rust.) There's no notable change in performance in
my benchmarking.
This abstraction is a bit awkward, in that it's named for object file
sections, but they aren't. Further, it's coupled with the ASG via
`SortableAsg` and perhaps should be generalized into a sorting routine that
takes a function for sorting, so that `Sections` can be moved into xmle's
packages.
The return value has no meaningful side-effects at all; the write operation
failing isn't worth pointing out, since it has to be used regardless.
The normal `write` does have useful side-effects, of course.
This change was primarily intended to clean up unit tests. Since it
allocates and returns a new buffer, I do not expect this to have much use
within TAMER itself in the near future. Maybe in later tooling.
If this is abused, person from the future: add `#[cfg(test)]` to its
definition.
I decided not to do this in a previous commit because I had documented
"NodeStream" elsewhere, so I'd like it to be in the Git history to
understand its evolution.
This never was a "Node" stream beyond the initial concept phase, because it
represents tokens that aren't themselves nodes. It is intended to generate
XML nodes, but may need to accommodate non-nodes (e.g. XML declarations) in
the future.
The name originated from `Node`, which was a tree-based IR that was
initially conceived, but removed because it's not yet needed. What we need
is a streaming IR for xmle writing, and then for reading and echoing back
out XML for the new frontend.
This is a working streaming IR for XML. I want to get this committed before
I go further cleaning it up and integrating it into the xmle writer.
This is lacking detailed documentation, and the names of things may end up
changing.
Initial benchmarks do show that it has a ~2x performance improvement over
quick-xml when dealing with two attributes on a node, and I suspect that
improvement will increase with the number of attributes. We will see how it
compares in real-world benchmarks once the linker has been modified to use
it.
The goal isn't to _avoid_ quick-xml---it'll be used in the future for things
like escaping that would be a huge waste to implement ourselves. It just so
happened that quick-xml was not beneficial for these changes; indeed, its
own writer is fairly simple for the portions that were implemented here, so
there's no use in fighting with its API, particularly around attributes and
our need to explicitly control whitespace (with the intent of handling code
formatters in the future).
To put this into perspective: the reason this work is being done isn't to
refactor the linker, or to speed it up, but to generalize XML writing and
provide a suitable IR for use in the compiler. The first step of the
frontend is to essentially echo the XML token stream back out so we can
incrementally parse it and do something useful, to incrementally rewrite the
compiler in Rust.
This adds benchmarking for the memchr crate. It is used primarily by
quick-xml at the moment, but the question is whether to rely on it for
certain operations for XIR.
The benchmarking on an Intel Xeon system shows that memchr and Rust's
contains() perform very similarly on small inputs, matching against a single
character, and so Rust's built-in should be preferred in that case so that
we're using APIs that are familiar to most people.
When larger inputs are compared against, there's a greater benefit (a little
under ~2x).
When comparing against two characters, they are again very close. But look
at when we compare two characters against _multiple_ inputs:
running 24 tests
test large_str:1️⃣:memchr_early_match ... bench: 4,938 ns/iter (+/- 124)
test large_str:1️⃣:memchr_late_match ... bench: 81,807 ns/iter (+/- 1,153)
test large_str:1️⃣:memchr_non_match ... bench: 82,074 ns/iter (+/- 1,062)
test large_str:1️⃣:rust_contains_one_byte_early_match ... bench: 9,425 ns/iter (+/- 167)
test large_str:1️⃣:rust_contains_one_byte_late_match ... bench: 123,685 ns/iter (+/- 3,728)
test large_str:1️⃣:rust_contains_one_byte_non_match ... bench: 123,117 ns/iter (+/- 2,200)
test large_str:1️⃣:rust_contains_one_char_early_match ... bench: 9,561 ns/iter (+/- 507)
test large_str:1️⃣:rust_contains_one_char_late_match ... bench: 123,929 ns/iter (+/- 2,377)
test large_str:1️⃣:rust_contains_one_char_non_match ... bench: 122,989 ns/iter (+/- 2,788)
test large_str:2️⃣:memchr2_early_match ... bench: 5,704 ns/iter (+/- 91)
test large_str:2️⃣:memchr2_late_match ... bench: 89,194 ns/iter (+/- 8,546)
test large_str:2️⃣:memchr2_non_match ... bench: 85,649 ns/iter (+/- 3,879)
test large_str:2️⃣:rust_contains_two_char_early_match ... bench: 66,785 ns/iter (+/- 3,385)
test large_str:2️⃣:rust_contains_two_char_late_match ... bench: 2,148,064 ns/iter (+/- 21,812)
test large_str:2️⃣:rust_contains_two_char_non_match ... bench: 2,322,082 ns/iter (+/- 22,947)
test small_str:1️⃣:memchr_mid_match ... bench: 4,737 ns/iter (+/- 842)
test small_str:1️⃣:memchr_non_match ... bench: 5,160 ns/iter (+/- 62)
test small_str:1️⃣:rust_contains_one_byte_non_match ... bench: 3,930 ns/iter (+/- 35)
test small_str:1️⃣:rust_contains_one_char_mid_match ... bench: 3,677 ns/iter (+/- 618)
test small_str:1️⃣:rust_contains_one_char_non_match ... bench: 5,415 ns/iter (+/- 221)
test small_str:2️⃣:memchr2_mid_match ... bench: 5,488 ns/iter (+/- 888)
test small_str:2️⃣:memchr2_non_match ... bench: 6,788 ns/iter (+/- 134)
test small_str:2️⃣:rust_contains_two_char_mid_match ... bench: 6,203 ns/iter (+/- 170)
test small_str:2️⃣:rust_contains_two_char_non_match ... bench: 7,853 ns/iter (+/- 713)
Yikes.
With that said, we won't be comparing against such large inputs
short-term. The larger strings (fragments) are copied verbatim, and not
compared against---but they _were_ prior to the previous commit that stopped
unencoding and re-encoding.
So: Rust built-ins for inputs that are expected to be small.
Fragments' text were unescaped on reading, producing an owned String and
spending time parsing the text to unescape. We were then copying that into
an internement pool (so, copying twice, effectively).
Further, we were then _re-escaping_ on write.
This was all wasteful, since we do not do any manipulation of the fragment
before outputting to the xmle file; we know that Saxon produced properly
escaped XML to begin with, and can trust to propagate it.
This also introduces a new global `clone_uninterned_utf8_unchecked` method.
In profiling this change, I tested (a) before this change, (b) after writing
without escaping, and (c) after both reading escaped and writing without
escaping.
(a) (b) (c)
sec mem (B) sec B sec B
0:00.95 47896 -> 0:00.91 47988 -> 0:00.87 48288
0:00.40 30176 -> 0:00.37 25656 -> 0:00.36 25788
0:00.39 45672 -> 0:00.37 45756 -> 0:00.35 34952
0:00.39 20716 -> 0:00.38 19604 -> 0:00.36 19956
0:00.33 16836 -> 0:00.32 16988 -> 0:00.31 16892
0:00.23 15268 -> 0:00.23 15236 -> 0:00.22 15312
0:00.44 20780 -> 0:00.44 20048 -> 0:00.41 20148
0:00.54 44516 -> 0:00.50 36964 -> 0:00.49 36728
0:00.62 55976 -> 0:00.57 46204 -> 0:00.54 41468
0:00.31 28016 -> 0:00.30 27308 -> 0:00.28 23844
0:00.23 15388 -> 0:00.22 15316 -> 0:00.21 15304
0:00.05 4888 -> 0:00.05 4760 -> 0:00.05 4948
0:00.41 19756 -> 0:00.41 19852 -> 0:00.40 19992
0:00.47 20828 -> 0:00.46 20844 -> 0:00.44 20968
0:00.27 18152 -> 0:00.26 18184 -> 0:00.25 18312
Interestingly, the peak memory usage increases very slightly between the
second and third steps (though decreases from the first), likely because the
raw (encoded) is larger than the unencoded text (e.g. `>` takes more
space than `>`).
Fragments were previously represented by `String` to avoid the cost of
interning (hashing and copying). This change modifies it to use uninterned
symbols, which does still have a copy overhead but it does not hash.
Initial tests shows a small performance decrease of about 15% and a small
memory increase of similar proportion. However, once I realized that I was
not clearing buffers from quick_xml events and implemented that change in a
previous commit, this change ended up being approximately on par with
`String`, despite the copying of some pretty large fragments.
YMMV, though, and perhaps on less powerful systems time may increase
slightly.
The upcoming XIR (XML IR) was originally going to support both owned strings
and symbols, but now we'll just use uninterned symbols; I can't rationalize
complicating the API at this time when it will provide an almost
imperceivable performance benefit. If ever that changes in the future,
that change will be entertained.
The end result is that the fate of a fragment's underlying memory is
determined by whatever is processing the data, _not_ by the API itself---the
API was previously forcing use of a String, whereas now it's up to the
caller to determine whether we want comparable interns. For fragments,
that's not likely ever to be the case, especially considering that the
representation will change so drastically in the future.
This clears the buffers used by quick_xml, which was apparently forgotten
during initial development (I think I expected it to re-use the previously
allocated space automatically).
This has significant effects in some cases. For example, one of our UI
builds drops from ~9KiB to ~5KiB peak memory usage. Other builds for larger
suppliers are only slightly effected because of some of their massive
fragments.
This adds support for uninterned symbols. This came about as I was creating
Xir (not yet committed) where I had to decide if I wanted `SymbolId` for all
values, even though some values (e.g. large text blocks like compiled code
fragments for xmle files) will never be compared, and so would be wastefull
hashed.
Previous IRs used `String`, but that was clumsy; see documentation in this
commit for rationale.
This is an initial implementation optimized for expected use
cases. Hopefully that pans out and doesn't come back to bite me.
Regarding the context: it only allows for interned paths atm, which are
strings (and so much be valid UTF-8, which is fine for us, but sucks for
something more general-purpose). I'll be curious if the context needs
extension later on, or if different contexts will be stored in IRs (e.g. to
store a template application site as well as the location of the expansion
within the template body).
SymboldIds must only be constructed by interners, otherwise we lose
confidence in the type.
This offers an associated function to construct raw SymbolIds from integers
for testing purposes.
This is a major change, and I apologize for it all being in one commit. I
had wanted to break it up, but doing so would have required a significant
amount of temporary work that was not worth doing while I'm the only one
working on this project at the moment.
This accomplishes a number of important things, now that I'm preparing to
write the first compiler frontend for TAMER:
1. `Symbol` has been removed; `SymbolId` is used in its place.
2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer.
3. Using symbols no longer requires dereferencing.
4. **Lifetimes no longer pollute the entire system! (`'i`)**
5. Two global interners are offered to produce `SymbolStr` with `'static`
lifetimes, simplfiying lifetime management and borrowing where strings
are still needed.
6. A nice API is provided for interning and lookups (e.g. "foo".intern())
which makes this look like a core feature of Rust.
Unfortunately, making this change required modifications to...virtually
everything. And that serves to emphasize why this change was needed:
_everything_ used symbols, and so there's no use in not providing globals.
I implemented this in a way that still provides for loose coupling through
Rust's trait system. Indeed, Rustc offers a global interner, and I decided
not to go that route initially because it wasn't clear to me that such a
thing was desirable. It didn't become apparent to me, in fact, until the
recent commit where I introduced `SymbolIndexSize` and saw how many things
had to be touched; the linker evolved so rapidly as I was trying to learn
Rust that I lost track of how bad it got.
Further, this shows how the design of the internment system was a bit
naive---I assumed certain requirements that never panned out. In
particular, everything using symbols stored `&'i Symbol<'i>`---that is, a
reference (usize) to an object containing an index (32-bit) and a string
slice (128-bit). So it was a reference to a pretty large value, which was
allocated in the arena alongside the interned string itself.
But, that was assuming that something would need both the symbol index _and_
a readily available string. That's not the case. In fact, it's pretty
clear that interning happens at the beginning of execution, that `SymbolId`
is all that's needed during processing (unless an error occurs; more on that
below); and it's not until _the very end_ that we need to retrieve interned
strings from the pool to write either to a file or to display to the
user. It was horribly wasteful!
So `SymbolId` solves the lifetime issue in itself for most systems, but it
still requires that an interner be available for anything that needs to
create or resolve symbols, which, as it turns out, is still a lot of
things. Therefore, I decided to implement them as thread-local static
variables, which is very similar to what Rustc does itself (Rustc's are
scoped). TAMER does not use threads, so the resulting `'static` lifetime
should be just fine for now. Eventually I'd like to implement `!Send` and
`!Sync`, though, to prevent references from escaping the thread (as noted in
the patch); I can't do that yet, since the feature has not yet been
stabalized.
In the end, this leaves us with a system that's much easier to use and
maintain; hopefully easier for newcomers to get into without having to deal
with so many complex lifetimes; and a nice API that makes it a pleasure to
work with symbols.
Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I
end up regretting that down the line, but it exists for an important reason:
the `Span` and other structures that'll be introduced need to pack a lot of
data into 64 bits so they can be freely copied around to keep lifetimes
simple without wreaking havoc in other ways, but a 32-bit symbol size needed
by the linker is too large for that. (Actually, the linker doesn't yet need
32 bits for our systems, but it's going to in the somewhat near future
unless we optimize away a bunch of symbols...but I'd really rather not have
the linker hit a limit that requires a lot of code changes to resolve).
Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid
that for now. Most systems can just use on of the `PkgSymbolId` or
`ProgSymbolId` type aliases and not have to worry about it. Systems that
are actually shared between the compiler and the linker do, though, but it's
not like we don't already have a bunch of trait bounds.
Of course, as we implement link-time optimizations (LTO) in the future, it's
possible most things will need the size and I'll grow frustrated with that
and possibly revisit this. We shall see.
Anyway, this was exhausting...and...onward to the first frontend!
Oh boy. What a mess of a change.
This demonstrates some significant issues we have with Symbol. I had
originally modelled the system a bit after Rustc's, but deviated in certain
regards:
1. This has a confurable base type to enable better packing without bit
twiddling and potentially unsafe tricks I'd rather avoid unless
necessary; and
2. The lifetime is not static, and there is no global, singleton interner;
and
3. I pass around references to a Symbol rather than passing around an
index into an interner.
For #3---this is done because there's no singleton interner and therefore
resolving a symbol requires a direct reference to an available interner. It
also wasn't clear to me (and still isn't, in fact) whether more than one
interner may be used for different contexts.
But, that doesn't preclude removing lifetimes and just passing around
indexes; in fact, I plan to do this in the frontend where the parser and
such will have direct interner access and can therefore just look up based
on a symbol index. We could reserve references for situations where
exposing an interner would be undesirable.
Anyway, more to come...
As mentioned in the previous commit, this flips the types such that the base
type if the primitive and the associated type is the `NonZero*` type; this
is much more natural, concise, and allows Rust to infer the proper type in
most every situation.
The next step will be to stop defaulting the index type for SymbolIndex and
related, since we are about to care very much what size it is (compiler
vs. linker).
This was previously a NonZeroU32, but it was intended to support NonZeroU16
as well for packages, so that we can fit symbols into smaller spaces. In
particular, the upcoming Span wants to fit within 8 bytes, and so requires a
smaller SymbolIndex type.
I'm unhappy with this current implementation, and so comments are unfinished
and there are a couple ignores for dead code warnings. I want to flip the
`SupportedSymbolIndex` trait so that users can specify the primitive rather
than the NonZero* type, which is really awkward-looking and verbose,
especially if you have to do `SymbolIndex::<NonZeroU32>::from_int` or
something. It also prevents (at least in the cases I've observed) Rust from
inferring the proper type for you based on the argument you provide.
So, the goal will be `SymbolIndex::<u32>::from_int(n)`, for example.
The first step in the process is to emit the raw XML events that can then be
immediately output again to echo the results into another file. This will
then allow us to begin parsing the input incrementally, and begin to morph
the output into a real `xmlo` file.
This introduces the beginnings of frontends for TAMER, gated behind a
`wip-features` flag.
This will be introduced in stages:
1. Replace the existing copy with a parser-based copy (echo back out the
tokens), when the flag is on.
2. Begin to parse portions of the source, augmenting the output xmlo (xmli
at the moment). The XSLT-based compiler will be modified to skip
compilation steps as necessary.
As portions of the compilation are implemented in TAMER, they'll be placed
behind their own feature flags and stabalized, which will incrementally
remove the compilation steps from the XSLT-based system. The result should
be substantial incremental performance improvements.
Short-term, the priorities are for loading identifiers into an IR
are (though the order may change):
1. Echo
2. Imports
3. Extern declarations.
4. Simple identifiers (e.g. param, const, template, etc).
5. Classifications.
6. Documentation expressions.
7. Calculation expressions.
8. Template applications.
9. Template definitions.
10. Inline templates.
After each of those are done, the resulting xmlo (xmli) will have fully
reconstructed the source document from the IR produced during parsing.
This was incorrect to begin with---it does not make sense that an input
mapping should depend upon the identifier that it maps to, in the sense that
we make use of these dependencies. If we add weak symbol references in the
future, then this can be reintroduced.
By removing this, we free tameld from having to perform the check itself.
.rev-xmlo bumped to force rebuilding of object files since the linker now
expects that no such dependencies will exist within them.
This is something that changed when the TAMER POC was initially created, as
I was learning Rust. I don't recall the original reason why this was moved,
but it could have been moved back long ago.
In our systems, constants can hold tables (as matrices) with tens or
hundreds of thousands of rows, and there are a number of them in certain
projects. As an example, the YAML-based test cases for one of our systems
went from ~2m30s to ~45s after this change was made. Much of the cost
savings comes from saving GC.
A previous commit used a rustdoc tool lint, but that support wasn't added
until 1.52.0 (2021-05-06).
Note that this represents the minimum _required_ version to build TAMER; you
can use a later version.
This checks explicitly for unresolved objects while sorting and provides an
explicit error for them. For example, this will catch externs that have no
concrete resolution.
This previously fell all the way through to the unreachable! block. The old
POC implementation was catching unresolved objects, albeit with a debug
error.
This will be used for the next commit, but this change has been isolated
both because it distracts from the implementation change in the next commit,
and because it cleans up the code by removing the need for a type parameter
on `AsgError`.
Note that the sort test cases now use `unwrap` instead of having
`{,Sortable}AsgError` support one or the other---this is because that does
not currently happen in practice, and there is not supposed to be a
hierarchy; they are siblings (though perhaps their name may imply otherwise).
The only reason this function was a method of `BaseAsg` was because of
`self.graph`, which is accessible within the scope of this
module. `check_cycles` is logically associated with `SortableAsg`, and so
should exist alongside it (though it can't exist as an associated function
of that trait).
We want to be able to build a representation of the dependency graph so
we can easily inspect it.
We do not want to make GraphML by default. It is better to use a tool.
We use "petgraph-graphml".
This was originally omitted because there wasn't a use case for it. Now
that we're adding context to errors, however, an owned value is highly
desirable.
This adds almost no measurable overhead to the internment system in
benchmarks (largely within the margin of error).
This is a union (sum type) of three other errors types, plus errors specific
to this builder.
This commit does a good job demonstrating the boilerplate, as well as a need
for additional context (in the case of `IdentKindError`), that we'll want to
work on abstracting away.
The `Debug` bound is inconvenient and requires propagation to any types that
use it. Further, it's really awkward having `Display` depend on `Debug`; if
we want to render a useful display here, we can write one.
To be clear: IndexType implements Debug.
For now, this is pretty-printed by another part of the code, which we don't
want to implement in `Display` because it requires looking things up from
the graph.
This flips the API from using XmloWriter as the context to using Asg and
consuming anything that can produce XmloResults. This not only makes more
sense, but avoids having to create a trait for XmloReader, and simplifies
the trait bounds we have to concern ourselves with.
This just tidies things up a little bit before I get into some further
refactoring. I wrote the original code when I was just learning Rust not
too long ago, so it's interesting to see how my understanding has changed
over that relatively short period of time.
This abstracts away the canonicalizer and solves the problem whereby
canonicalization was not being performed prior to recording whether a path
has been visited. This ensures that multiple relative paths to the same
file will be properly recognized as visited.
This will be entirely replaced in an upcoming commit. See that for
details. I don't feel like dealing with the conflicts for rearranging and
squashing these commits.
This also includes an implementation to visit paths only once. Note that it
does not yet canonicalize the path before visiting, so relative paths to the
same file can slip through, and relative paths to _different_ files could be
erroneously considered to have been visited.
This will be fixed in an upcoming commit.
This serves as a constructor for the time being, decoupling from POC. We
may do something better once we have a better idea of how the various
abstractions around this will evolve.
Add a stub executable that will eventually become a full-featured TAME
compiler. The first implementation will only copy the source file to an
intermediary file that will be compiled by the XSLT compiler.
This is an awkward system that I'd like to remove at some point. It adds
complexity. For the meantime, overrides have been arbitrarily restricted to
a single override (no override-override). But it's needed being until we
rework maps and can handle the illusion of overrides using the template
system.
Benchmark performance for this method is still substantially slower. And
oddly, this nearly doubled the speed of the other two calls (granted, at
that speed, it doesn't matter).
All of these refactoring commits to arrive at this one final change: the
ability to store the source location for externs so that we can report on
what package is expecting an identifier to be defined.
Phew. Goodnight.
This undoes work I did earlier today...but now we'll be able to support a
Source on an extern.
There is duplicate code between `BaseAsg::declare{,_extern}` that will be
resolved in an upcoming commit. Upcoming commits will also simplify
terminology and clean up methods on ObjectState.
There is some duplication here with `declare` that will be cleared up in a
following commit. Reintroducing this method is necessary so that Source can
be used to represent the source location of the extern itself; it's
currently None to indicate an extern in `declare`.
This is the first step in a more incremental refactoring that previous
commits to undo the optional Source in `ObjectState::ident`. This provides
an explicit transition to an extern, with the intent of requiring an initial
missing state. This will simplify logic on the ASG.
Note that the Source provided to this new method is not yet used. That too
will come in a following commit and will represent the source of the defined
extern rather than the concrete identifier.
This properly verifies extern types, and cleans up Asg's API a little so
that externs aren't handled much differently than other declarations.
With that said, after making src optional, I realized that we will indeed
want source information for externs themselves so we can direct the user to
what package is expecting that symbol (as the old linker does). So this
approach will not work, and I'll have to undo some of those changes.
This is essential to clarify what exactly the different object types
represent with the new generic abstractions. For example, we will have
expressions as an object type.
There's a lot here to make the object stored on the `Asg` generic. This
introduces `ObjectState` for state transitions and `ObjectData` for pure
data retrieval. This will allow not only for mocking, but will be useful to
enforce compile-time restrictions on the type of objects expected by the
linker vs. the compiler (e.g. the linker will not have expressions).
This commit intentionally leaves the corresponding tests in their original
location to prove that the functionality has not changed; they'll be moved
in a future commit.
This also leaves the names as "Object" to reduce the number the cognative
overhead of this commit. It will be renamed to something like "IdentObject"
in the near future to clarify the intent of the current object type and to
open the way for expressions and a type that marries both of them in the
future.
Once all of this is done, we'll finally be able to make changes to the
compatibility logic in state transitions to implement extern compatibility
checks during resolution.
DEV-7087
The next commit will generalize this further. This moves logic out of
BaseAsg so that we can implement more sophisticated transitions for
compatability checks.
The logic is still tested as part of BaseAsg; the next commit will change
that as it's generalized further.
* tamer/src/ir/asg/base.rs: Extract object transitions.
* tamer/src/ir/asg/graph.rs (AsgError)[IncompatibleIdent]: New variant.
(From<TransitionError> for AsgError): Basic type translation.
* tamer/src/ir/asg/object.rs (TransitionResult): New type.
(impl Object): Transition methods.
(TransitionError): New enum.
This variant is unnecessary, as it was used only by the indexer to represent
the absence of a node, for which was can simply use `None` in the containing
`Option`.
* tamer/Cargo.toml: Add `lazy_static`.
* tamer/Cargo.lock: Update.
* tamer/src/ir/asg/base.rs (with_capacity): Use `None` in place of
`Some(Object::Empty)`.
* tamer/src/ir/asg/object.rs: Adjust state machine graphic.
(Empty): Remove variant.
(Missing): Remove reference to variance.
* tamer/src/lib.rs: Import `lazy_static` for test builds.
* tamer/obj/xmle/writer/writer.rs (Section::iter): Remove `Object::Empty`
from documentation.
(test::): Remove references to `Object::Missing`. `lazy_static!` used
here.
* tamer/obj/xmle/writer/xmle.rs (test::write_section_catch_missing): Replace
reference to `Object::Missing`.
This still isn't comprehensive. Further, it won't be able to be, because
we'd have to rely on Petgraph implementation details: there are potentially
many acceptable orderings for a given graph.
Create a trait that sorts a graph into `Sections` that can then be used
as an IR. The `BaseAsg` should implement the trait using what was
originally in the POC.
If we cannot set a fragment, we need to display the error to the user.
We are currently ignoring "___head", "___tail", and objects that are
both virtual and overridden. Those will be corrected in with future
changes.
We want to add an option to set the output file to the linker so we do
not need to redirect output to awk any longer.
This also adds integration tests for tameld.
This begins to introduce the ASG, backed by Petgraph. The API will continue
to evolve, and Petgraph will likely be encapsulated so that our
implementation can vary independently from it (or even remove it in the
future).
This introduces the reader for xmlo files produced by the XSLT-based
compiler. It is an initial implementation but is not complete; see future
commits.
One of the benefits of storing a reference to the interned string on the
symbol itself is that we get to get its underlying value essentially for
free.
This ordering will simplify streaming processing of xmlo files in
TAMER. Specifically, we know that symbols will have been declared by the
time dependencies are added to the graph (and so we should only be creating
edges to existing nodes); and we can halt reading as soon as the closing
fragments tag is encountered, avoiding parsing the entirety of these massive
XML files.
On one particularly large program, this cuts time down from ~0.333s to
~0.300 in the POC linker.
Contrary to what I said previously, this replaces the previous
implementation with an arena-backed internment system. The motivation for
this change was investigating how Rustc performed its string interning, and
why they chose to associate integer identifiers with symbols.
The intent was originally to use Rustc's arena allocator directly, but that
create pulled in far too many dependencies and depended on nightly
Rust. Bumpalo provides a very similar implementation to Rustc's
DroplessArena, so I went with that instead.
Rustc also relies on a global, singleton interner. I do not do that
here. Instead, the returned Symbol carries a lifetime of the underlying
arena, as well as a pointer to the interned string.
Now that this is put to rest, it's time to move on.
For strings of any notable length, Fx Hash outperforms FNV. Rustc also
moved to this hash function and noticed performance
improvements. Fortunately, as was accounted for in the design, this was a
trivial switch.
Here are some benchmarks to back up that claim:
test hash_set::fnv::with_all_new_1000 ... bench: 133,096 ns/iter (+/- 1,430)
test hash_set::fnv::with_all_new_1000_with_capacity ... bench: 82,591 ns/iter (+/- 592)
test hash_set::fnv::with_all_new_rc_str_1000_baseline ... bench: 162,073 ns/iter (+/- 1,277)
test hash_set::fnv::with_one_new_1000 ... bench: 37,334 ns/iter (+/- 256)
test hash_set::fnv::with_one_new_rc_str_1000_baseline ... bench: 18,263 ns/iter (+/- 261)
test hash_set::fx::with_all_new_1000 ... bench: 85,217 ns/iter (+/- 1,111)
test hash_set::fx::with_all_new_1000_with_capacity ... bench: 59,383 ns/iter (+/- 752)
test hash_set::fx::with_all_new_rc_str_1000_baseline ... bench: 98,802 ns/iter (+/- 1,117)
test hash_set::fx::with_one_new_1000 ... bench: 42,484 ns/iter (+/- 1,239)
test hash_set::fx::with_one_new_rc_str_1000_baseline ... bench: 15,000 ns/iter (+/- 233)
test hash_set::with_all_new_1000 ... bench: 137,645 ns/iter (+/- 1,186)
test hash_set::with_all_new_rc_str_1000_baseline ... bench: 163,129 ns/iter (+/- 1,725)
test hash_set::with_one_new_1000 ... bench: 59,051 ns/iter (+/- 1,202)
test hash_set::with_one_new_rc_str_1000_baseline ... bench: 37,986 ns/iter (+/- 771)
This will be used for generating the common tests between HashSet and
HashMap implementations.
This is my first macro in Rust. There does not seem to be a way to
concatenate identifiers (!), so I'm placing them within modules
instead. That ended up working out just fine, since then I can use a type
to provide the SUT.
This is missing two key things that I'll add shortly: a HashMap-based one
for use in the ASG for node mapping, and an entry-based system for
manipulations.
This has been a nice start for exploring various aspects of Rust
development, as well as conventions that I'd like to implement. In
particular:
- Robust documentation intended to guide people through learning the
necessary material about the compiler, as well as related work to
rationalize design decisions;
- Benchmarks;
- TDD;
- And just getting used to Rust in general.
I've beat this one to death, so I'll commit this and make smaller changes
going forward to show how easily it can evolve.
(This module was originally named `intern` but this commit and those that
follow rewrote it to `sym`.)
This is enabled by default in nightly, and is not available at all in
stable. Considering the PITA that it will be to go back and rewrite docs to
use the new format, and how important of a feature this is, we will just
make use of it now.
Given that developers should be doing TDD and therefore running this target
frequently, this has the effect of providing immediate feedback when
formatting is needed and outputting a diff. Developers will then quickly
understand what changes need to be made to avoid future issues (and can run
`cargo fmt` to fix it), at which point they'll rarely ever encounter
formatting errors.
The original purpose was to ensure pipelines fail when the formatter has not
been run.
This makes use of Petgraph for representing the dependency graph and uses a
separate data structure for both string interning and indexing by symbol
name.
This is garbage code. Do not use it. It is intentionally throwaway.
While I've researched Rust, I haven't actually _used_ it for a project, so
this is a combination of me exploring various ways of accomplishing the
problem and forcing myself to learn certain aspects of the language.
I'll likely be using petgraph, and this also currently lacks symbol
abstractions. This commit also performs far too much heap allocation
copying strings around. But it _does_ perform the topological sort.
Since this only stores the symbol name, it lacks enough information about
the symbol to perform a proper linking.