Oh boy. What a mess of a change.
This demonstrates some significant issues we have with Symbol. I had
originally modelled the system a bit after Rustc's, but deviated in certain
regards:
1. This has a confurable base type to enable better packing without bit
twiddling and potentially unsafe tricks I'd rather avoid unless
necessary; and
2. The lifetime is not static, and there is no global, singleton interner;
and
3. I pass around references to a Symbol rather than passing around an
index into an interner.
For #3---this is done because there's no singleton interner and therefore
resolving a symbol requires a direct reference to an available interner. It
also wasn't clear to me (and still isn't, in fact) whether more than one
interner may be used for different contexts.
But, that doesn't preclude removing lifetimes and just passing around
indexes; in fact, I plan to do this in the frontend where the parser and
such will have direct interner access and can therefore just look up based
on a symbol index. We could reserve references for situations where
exposing an interner would be undesirable.
Anyway, more to come...
As mentioned in the previous commit, this flips the types such that the base
type if the primitive and the associated type is the `NonZero*` type; this
is much more natural, concise, and allows Rust to infer the proper type in
most every situation.
The next step will be to stop defaulting the index type for SymbolIndex and
related, since we are about to care very much what size it is (compiler
vs. linker).
This was previously a NonZeroU32, but it was intended to support NonZeroU16
as well for packages, so that we can fit symbols into smaller spaces. In
particular, the upcoming Span wants to fit within 8 bytes, and so requires a
smaller SymbolIndex type.
I'm unhappy with this current implementation, and so comments are unfinished
and there are a couple ignores for dead code warnings. I want to flip the
`SupportedSymbolIndex` trait so that users can specify the primitive rather
than the NonZero* type, which is really awkward-looking and verbose,
especially if you have to do `SymbolIndex::<NonZeroU32>::from_int` or
something. It also prevents (at least in the cases I've observed) Rust from
inferring the proper type for you based on the argument you provide.
So, the goal will be `SymbolIndex::<u32>::from_int(n)`, for example.
The first step in the process is to emit the raw XML events that can then be
immediately output again to echo the results into another file. This will
then allow us to begin parsing the input incrementally, and begin to morph
the output into a real `xmlo` file.
This introduces the beginnings of frontends for TAMER, gated behind a
`wip-features` flag.
This will be introduced in stages:
1. Replace the existing copy with a parser-based copy (echo back out the
tokens), when the flag is on.
2. Begin to parse portions of the source, augmenting the output xmlo (xmli
at the moment). The XSLT-based compiler will be modified to skip
compilation steps as necessary.
As portions of the compilation are implemented in TAMER, they'll be placed
behind their own feature flags and stabalized, which will incrementally
remove the compilation steps from the XSLT-based system. The result should
be substantial incremental performance improvements.
Short-term, the priorities are for loading identifiers into an IR
are (though the order may change):
1. Echo
2. Imports
3. Extern declarations.
4. Simple identifiers (e.g. param, const, template, etc).
5. Classifications.
6. Documentation expressions.
7. Calculation expressions.
8. Template applications.
9. Template definitions.
10. Inline templates.
After each of those are done, the resulting xmlo (xmli) will have fully
reconstructed the source document from the IR produced during parsing.
This was incorrect to begin with---it does not make sense that an input
mapping should depend upon the identifier that it maps to, in the sense that
we make use of these dependencies. If we add weak symbol references in the
future, then this can be reintroduced.
By removing this, we free tameld from having to perform the check itself.
.rev-xmlo bumped to force rebuilding of object files since the linker now
expects that no such dependencies will exist within them.
This is something that changed when the TAMER POC was initially created, as
I was learning Rust. I don't recall the original reason why this was moved,
but it could have been moved back long ago.
In our systems, constants can hold tables (as matrices) with tens or
hundreds of thousands of rows, and there are a number of them in certain
projects. As an example, the YAML-based test cases for one of our systems
went from ~2m30s to ~45s after this change was made. Much of the cost
savings comes from saving GC.
This can occur in generated code (e.g. from proguic if a question-based
predicate inherits a predicate already specified). This commit does not
change anything that's emitted; it merely allows proceeding.
TAMER can be smarter about this; I don't want to invest more time into
generalizing deduplication of predicates.
There was a bug whereby TRUE matches would keep whatever value was being
matched on, even if it was not a boolean. That was an oversight from the
proof-of-concept code, and this fixes it; that's why this is behind a flag!
This also adjusts the class aliasing optimization so that it doesn't check
for a `TRUE` symbol name, which was a bad idea to begin with.
This change also ends up expanding `lv:match[@value="TRUE"]` into the long
form, where it didn't previously; this will result in slightly larger xmlo
files in some cases, but it's nothing significant, and it does not impact
compilation times.
This is a nearly-10-year-old bug that was introduced when the Summary Page
was modified to use the then-new symbol table. The compiler previously
concatenated all packages into a single XML tree and processed that, so no
package resolution was necessary here before.
A long time ago (about a decade), package names were required, but they are
now generated by the compiler relative to the root path. The name here was
incorrect, which was generating an incorrect path for the linked symbols,
which was causing problems with the Summary Page.
See RELEASES.md for a list of changes.
This was a significant effort that began about six months ago, but was
paused at a number of points. Rather than risking further pauses from
interruptions, the new classification system has been gated behind a
package-level feature flag, since it causes BC breaks in certain buggy
situations.
Since this flag was introduced late, there is the potential that it causes
bugs when new optimizations are mixed with the old system.
This largely reintroduces the legacy classification system, but there are a
number of things that are not affected by the flag. For example:
1. Alias classifications are still optimized when the flag is off;
2. Classifications without predicates emit slightly different code than
before, though their functionality has not changed;
3. There's been a lot of refactoring and minor optimizations that are
unaffected by the flag;
4. lv:match/@pattern will now emit a warning; and
5. Cleaning and casting of input data is not gated.
This allows us to incrementally migrate to the new system where behavior may
be different, but this is admittedly a bit dangerous in that the new system
was aggressively tested and reasoned about, so reintroducing the legacy
system may combine in unexpected ways.
This is another significant milestone.
The next logical step with classification optimization is to inline all of
those intermediate classifications generated from any and all blocks, since
there are so many of them. This means having the parent classification
absorb all dependencies; not output dependencies for the classification; not
compile the assignments for those classifications; and to inline them at the
match site. They’re used only once, since they’re generated for each
individual block.
We need to keep the actual classification generation around (and just inline
them) for now, probably until TAMER, because we depend upon their symbol for
determining their dimensionality, which we need for the optimization work we
just did---we must inline them into the proper group (matrix, vector, or
scalar).
The optimization work done up to this point had inlining in mind---only a
little bit of work was needed to make sure that every classification can
simply be stripped of its assignment and be a valid expression that can be
inlined in place of the original reference.
The result of that was predictably significant for the `ui/package` program
that I've been testing with:
- 4,514 classifications were inlined;
- The file size dropped to 7.5MiB (from 8.2MiB previously---remember that
we started at 16MiB); and
- GC ticks were cut in half, from 67->31.
Unfortunately, this optimization added nearly 1m of time to the compilation
of that program. Speaking from the future: the UI build optimizations in
liza-proguic were introduced to offset this difference (and provide a net
gain in performance).
This convets disjunctive classifications into conjunctive and places an
<any> within it.
This ends up handling all the generated qwhen classifications from proguic,
which were probably converted into <any> by a previous optimization pass.
The UI program I've been using to test these compiler optimizations has
decreased in size down from 8.2MiB since the beginning of this branch; we
started at ~16MiB.
See comments. This is meant to help mitigate the damage done by one of our
code generation systems. The benefit is significant, allowing the code
generator to remain simple. By placing this optimization within the
compiler, hand-written and template-generated code also benefit.
Rather than extracting every any/all into their own classifications,
eliminate them (and replace them with their body) if they contain only one
predicate. This is most likely to happen after template expansion, and
there were an alarming number of them in our system.
Stripping them out of one of our programs saved ~0.2MiB of output, and
removed many intermediate classifications. It removed ~1,075 lines, which
should correspond closely to the actual number of classifications.
Discovering this required stripping the template barriers, which was done in
a previous commit.
Unfortunately, the performance improvement from this wasn't significantly,
largely because of the nondeterminisim of GC, which can easily mask the
gains. But a new line `v8::internal::FixedArray::set(int,
v8::internal::Object)` appeared in the profiler output, making me wonder
whether the JIT is starting to understand more interesting properties of the
system.
`mprotect` and `v8::internal::heap_internals::GenerationalBarrier` also
appeared, which are related to GC.