This largely reintroduces the legacy classification system, but there are a
number of things that are not affected by the flag. For example:
1. Alias classifications are still optimized when the flag is off;
2. Classifications without predicates emit slightly different code than
before, though their functionality has not changed;
3. There's been a lot of refactoring and minor optimizations that are
unaffected by the flag;
4. lv:match/@pattern will now emit a warning; and
5. Cleaning and casting of input data is not gated.
This allows us to incrementally migrate to the new system where behavior may
be different, but this is admittedly a bit dangerous in that the new system
was aggressively tested and reasoned about, so reintroducing the legacy
system may combine in unexpected ways.
This is another significant milestone.
The next logical step with classification optimization is to inline all of
those intermediate classifications generated from any and all blocks, since
there are so many of them. This means having the parent classification
absorb all dependencies; not output dependencies for the classification; not
compile the assignments for those classifications; and to inline them at the
match site. They’re used only once, since they’re generated for each
individual block.
We need to keep the actual classification generation around (and just inline
them) for now, probably until TAMER, because we depend upon their symbol for
determining their dimensionality, which we need for the optimization work we
just did---we must inline them into the proper group (matrix, vector, or
scalar).
The optimization work done up to this point had inlining in mind---only a
little bit of work was needed to make sure that every classification can
simply be stripped of its assignment and be a valid expression that can be
inlined in place of the original reference.
The result of that was predictably significant for the `ui/package` program
that I've been testing with:
- 4,514 classifications were inlined;
- The file size dropped to 7.5MiB (from 8.2MiB previously---remember that
we started at 16MiB); and
- GC ticks were cut in half, from 67->31.
Unfortunately, this optimization added nearly 1m of time to the compilation
of that program. Speaking from the future: the UI build optimizations in
liza-proguic were introduced to offset this difference (and provide a net
gain in performance).
This convets disjunctive classifications into conjunctive and places an
<any> within it.
This ends up handling all the generated qwhen classifications from proguic,
which were probably converted into <any> by a previous optimization pass.
The UI program I've been using to test these compiler optimizations has
decreased in size down from 8.2MiB since the beginning of this branch; we
started at ~16MiB.
See comments. This is meant to help mitigate the damage done by one of our
code generation systems. The benefit is significant, allowing the code
generator to remain simple. By placing this optimization within the
compiler, hand-written and template-generated code also benefit.
Rather than extracting every any/all into their own classifications,
eliminate them (and replace them with their body) if they contain only one
predicate. This is most likely to happen after template expansion, and
there were an alarming number of them in our system.
Stripping them out of one of our programs saved ~0.2MiB of output, and
removed many intermediate classifications. It removed ~1,075 lines, which
should correspond closely to the actual number of classifications.
Discovering this required stripping the template barriers, which was done in
a previous commit.
Unfortunately, the performance improvement from this wasn't significantly,
largely because of the nondeterminisim of GC, which can easily mask the
gains. But a new line `v8::internal::FixedArray::set(int,
v8::internal::Object)` appeared in the profiler output, making me wonder
whether the JIT is starting to understand more interesting properties of the
system.
`mprotect` and `v8::internal::heap_internals::GenerationalBarrier` also
appeared, which are related to GC.
!!!
(Message from the future: this ends up being reintroduced and the new
classification system being placed behind a feature toggle. But it will be
eliminated eventually.)
This is a major milestone for class optimization---the old anyValue-based
system is no longer in use; the classification system has been wholly
rewritten.
The ticks in the sampling profiler are now where they should be, open to
further optimization with a much more solid foundation.
[JavaScript]:
ticks total nonlib name
5 0.6% 3.0% LazyCompile: *vu [...]/ui/package.strip.js:25191:16
5 0.6% 3.0% LazyCompile: *M [...]/ui/package.strip.js:25267:15
3 0.4% 1.8% LazyCompile: *vmu [...]/ui/package.strip.js:25144:17
3 0.4% 1.8% LazyCompile: *ve [...]/ui/package.strip.js:25204:16
2 0.2% 1.2% LazyCompile: *precision [...]/ui/package.strip.js:25137:23
2 0.2% 1.2% LazyCompile: *me [...]/ui/package.strip.js:25178:16
2 0.2% 1.2% LazyCompile: *cmatch [...]/ui/package.strip.js:25495:20
2 0.2% 1.2% LazyCompile: *ceq [...]/ui/package.strip.js:25273:17
1 0.1% 0.6% LazyCompile: *init_defaults [...]/ui/package.strip.js:25624:27
1 0.1% 0.6% LazyCompile: *MM [...]/ui/package.strip.js:25268:16
1 0.1% 0.6% LazyCompile: *E [...]/ui/package.strip.js:25239:15
1 0.1% 0.6% LazyCompile: *<anonymous> [...]/ui/package.strip.js:25184:13
1 0.1% 0.6% LazyCompile: *<anonymous> [...]/ui/package.strip.js:25171:13
Much better than the 102 ticks that anyValue was taking some time ago!
A lot of time used to be spent compiling functions as well, a lot of which
was removed by previous commits, bringing us to:
[C++]:
ticks total nonlib name
50 5.9% 30.5% node::contextify::ContextifyContext::CompileFunction(v8::FunctionCallbackInfo<v8::Value> const&)
20 2.4% 12.2% write
9 1.1% 5.5% node::native_module::NativeModuleEnv::CompileFunction(v8::FunctionCallbackInfo<v8::Value> const&)
6 0.7% 3.7% __pthread_cond_timedwait
4 0.5% 2.4% mmap
All of this work has simplified the output enough that it's obviated a slew
of other optimizations that can be done in future work, though a lot of that
may wait for TAMER, since performing them in XSLT will be difficult and not
performant; the compiler is slow enough as it is.
This shaves ~1m off of the total build time for our largest system. Output
is impressively slow.
Around this point in time, we have the following profile from V8's sampling
profiler:
[JavaScript]:
ticks total nonlib name
36 2.8% 10.7% LazyCompile: *anyValue [...]/ui/package.strip.new.js:31020:22
3 0.2% 0.9% LazyCompile: *m1v1u [...]/ui/package.strip.new.js:30941:19
2 0.2% 0.6% LazyCompile: *precision [...]/ui/package.strip.new.js:30934:23
1 0.1% 0.3% LazyCompile: *vu [...]/ui/package.strip.new.js:30964:16
1 0.1% 0.3% LazyCompile: *init_defaults [...]/ui/package.strip.new.js:31341:27
This allows us to easily see their shape looking at the compiled code. See
the previous commit for more of an explanation and examples. And future
commits.
This allows us to analyze the compiler runlog and determine the frequency of
certain shapes to prioritize optimization efforts.
This is a proof-of-concept. It also contains arrow functions, which do not
exist in ES5.
The notation m#v#s# refers to matrix, vector, and scalar counts of a
classification. This optimization therefore focuses on classifications with
a single vector and a single matrix.
I'd like to note that this commit message was written in retrospect, months
later, after I returned to these proof-of-concept commits to finalize
them. I'll try my best to have things make sense in a historical context
based on my notes.
The choice to focus on m1v1 was based on taking survey of the shape of
classifications in our largest rating system. m1v*, and specifically m1v1,
was the largest by far, followed by v1s1. Here's an example program used
for a UI:
$ grep -h 'internal: [svm][0-9]\+[svm][0-9]\+ ' run*.log > result
$ cut -d' ' -f2 result | sort | uniq -c | sort -rn
10056 m1v1
1788 m1v2
473 v1s1
18 v2s1
13 v1s5
8 v1s3
7 v1s2
4 v2s5
2 v4s4
2 v4s2
2 v2s8
2 v2s6
2 v1s9
2 v1s4
1 v7s7
1 v6s2
1 v5s7
1 v5s5
1 v5s4
1 v5s2
1 v4s9
1 v4s7
1 v4s3
1 v3s9
1 v3s7
1 v3s5
1 v3s2
1 v3s1
1 v33s21
1 v2s60
1 v2s4
1 v2s3
1 v2s2
1 v28s1
1 v23s8
1 v22s9
1 v1s8
1 v1s6
1 v18s24
1 v15s14
1 v14s6
1 v14s5
1 v13s7
1 v13s6
1 v12s6
1 v11s1
1 m76v7
1 m3v1
1 m1v3
1 m1374v1
The excessively large ones (like the last one) are aggregate classifications
that are generated by a template. But note the first count.
Here's another example, one of the raters:
8812 m1v1
311 v1s1
17 v2s1
14 v1s5
4 v2s5
4 v1s6
4 v11s10
3 v3s1
3 v1s8
2 v5s14
2 v4s7
2 v3s9
2 v3s5
2 v2s4
2 v1s9
2 v1s4
2 v1s2
1 v8s7
1 v7s7
1 v7s15
1 v6s4
1 v6s2
1 v6s10
1 v5s8
1 v5s7
1 v5s4
1 v5s2
1 v53s9
1 v4s9
1 v4s4
1 v4s3
1 v4s2
1 v4s11
1 v3s8
1 v3s7
1 v3s20
1 v3s2
1 v3s19
1 v3s15
1 v2s8
1 v2s60
1 v2s6
1 v2s2
1 v2s12
1 v29s20
1 v28s1
1 v23s8
1 v1s3
1 v15s23
1 v13s6
1 v13s20
1 v12s6
1 v12s10
1 v11s1
1 m1v2
1 m1s1
Given these examples, m1v1 is an easy first choice for this commit.
The general pattern for this commit and those that follow is to match on a
specific shape of classification that we're optimizing for, falling back to
the old anyValue-based system for all other cases, with the intent of
eventually removing it.
This has long been a curse, and I don't know why I didn't resolve it sooner.
This makes explicit some of the odd things that this is doing, to maintain
the previous behavior. Changing that behavior would be ideal, but ought to
be done separately and put behind a feature flag.
This reverts commit e2d9467633bb75d79dbc8fe9f8971bfa412ea59f.
BUT: it does cause more data to be returned, perhaps unnecessarily. See if
that may offset the slight increase in GC cost.
Further, we may end up getting rid of some of these generated values; check
after we do some class optimizations.
This was a waste of time; it actually reduces performance slightly and increased
GC, unintuitively enough.
Leaving commit here and reverting to keep it for reference.
When the Summary Page was _first written_ (the first part of TAME), it was
compiled in the browser---development consisted of refreshing the page,
which was familiar to how we wrote PHP at the time. No compile process.
In that situation, we couldn't have the XSLT stylesheet failing to
translate. But of course those days are long since gone, and this must be a
compile-time error.
It shouldn't ever get to this point, granted.
Single-predicate classifications matching on TRUE can be optimized into
aliases. These sometimes occur in hand-written code, but can also be
generated by templates.
The classification system rewrite removed the debug value collection that
previously existed. It didn't make a whole lot of sense anyway, given that
that compiler rearranges matches.
This falls back to showing the value of the @on, which should be good
enough, and is honestly better than what we had before.
This represents the old cmatch system (which is in use today, but the
classification system has since been rewritten, though it has not yet been
merged). It was my attempt over a decade ago to reason about how this
system ought to work.
I think it's fair to say that this is absolute insanity and that the new
formulation is significantly better.
The intent originally was to try to keep developers to a reasonable name
length, but generated identifiers can easily exceed this, and we further do
not support namespacing.
This can be handled at a template level instead for enforcing naming
conventions.
This had gotten quite out of date from the actual rater.xsd, which existed
outside of this repository, that is used during our build process. That was
an unintended artifact from moving files around.
That file has been removed and symlinked to this one.
First thing to note: this belong in liza-proguic, not here. But it's here
right now, so for now I'm making the change. The relationship between TAME
and proguic is awkward and will hopefully be improved upon in the near
future.
As for this actual change: step-level fragments will be concatenated such
that the imports will appear at the step level rather than the root.
I did say it was _experimental_ guided TRO.
This waits to perform the actual argument reassignment until after
processing the expressions associated with the new arguments, since they
will otherwise be replaced when their original values are still needed.
This change simply prevents failure in such situations, (e.g. on invalidated
fields in Liza). We'll worry about proper errors and correctness, which
ought to be compile-time, in TAMER.
The MathJax CDN stopped working in April 2017. I updated it to the
recommended CDN with the last version from April 2017 to ensure it works
like it used to work before the CDN stopped.
I added the checksum to ensure the content of the script.
This problem manifested when the name of the attempted classification is the
same name as another object. For example, if we have `t:match-class
name="foo"`, and `foo` is a param instead of a class, then `@yields` will
fail, and it'd fall back to matching on the param.
This is absolutely not what we want.
The error message in this context is ugly, but it does work.
Example:
!!! Unknown match @on (/lv:package/lv:classify/match): `error: unable to
determine @yields for class `scheduled_ai' (has the class been imported?)'
is unknown for classification --vis-scheduled-ai-type
This implements TCO in the XSLT compiler by requiring a human to manually
indicate when a recursive call is in tail position. This was somewhat
urgently needed to resolve stack exhaustion on large rate tables.
TAMER will do this properly by determining itself whether a call is in tail
position. Until then, this will serve as a test for this type of feature.
Replacing the existing macros with templates will allow us to now have
to deal with macros in the new compiler.
The `indexNameType` pattern needed to change to allow for variables. I
also had to remove the prefix for the `gentle-no` option of `rate`.
Create a "yield" and add backwards compatibility for the macro of the
same name. This is one of 2 macros that need to be replaced so we do not
have to worry about them with the new compiler.
All systems should be using the provided Makefile, so this shouldn't be
invoked anymore. The new linker is still considered a proof-of-concept, but
bugs have been encountered in the old one that are not worth investing the
time into fixing.
The new linker has been used in production for nearly a couple months and is
functioning properly.
This ordering will simplify streaming processing of xmlo files in
TAMER. Specifically, we know that symbols will have been declared by the
time dependencies are added to the graph (and so we should only be creating
edges to existing nodes); and we can halt reading as soon as the closing
fragments tag is encountered, avoiding parsing the entirety of these massive
XML files.
On one particularly large program, this cuts time down from ~0.333s to
~0.300 in the POC linker.
This not only reduces file size, but also has a significant performance
benefit for the UI, which is almost entirely classifications. A run for one
of our systems was reduced from 1m30s to 11s from this change.
This was used to provide additional information on the stack for debugging
the compiled code. Since this is very rarely needed, and is only needed by
someone debugging the compiler, it can be manually enabled if desired.
This also wraps it so that it'll be stripped if it is included.