This shaves ~1m off of the total build time for our largest system. Output
is impressively slow.
Around this point in time, we have the following profile from V8's sampling
profiler:
[JavaScript]:
ticks total nonlib name
36 2.8% 10.7% LazyCompile: *anyValue [...]/ui/package.strip.new.js:31020:22
3 0.2% 0.9% LazyCompile: *m1v1u [...]/ui/package.strip.new.js:30941:19
2 0.2% 0.6% LazyCompile: *precision [...]/ui/package.strip.new.js:30934:23
1 0.1% 0.3% LazyCompile: *vu [...]/ui/package.strip.new.js:30964:16
1 0.1% 0.3% LazyCompile: *init_defaults [...]/ui/package.strip.new.js:31341:27
This allows us to easily see their shape looking at the compiled code. See
the previous commit for more of an explanation and examples. And future
commits.
This allows us to analyze the compiler runlog and determine the frequency of
certain shapes to prioritize optimization efforts.
This is a proof-of-concept. It also contains arrow functions, which do not
exist in ES5.
The notation m#v#s# refers to matrix, vector, and scalar counts of a
classification. This optimization therefore focuses on classifications with
a single vector and a single matrix.
I'd like to note that this commit message was written in retrospect, months
later, after I returned to these proof-of-concept commits to finalize
them. I'll try my best to have things make sense in a historical context
based on my notes.
The choice to focus on m1v1 was based on taking survey of the shape of
classifications in our largest rating system. m1v*, and specifically m1v1,
was the largest by far, followed by v1s1. Here's an example program used
for a UI:
$ grep -h 'internal: [svm][0-9]\+[svm][0-9]\+ ' run*.log > result
$ cut -d' ' -f2 result | sort | uniq -c | sort -rn
10056 m1v1
1788 m1v2
473 v1s1
18 v2s1
13 v1s5
8 v1s3
7 v1s2
4 v2s5
2 v4s4
2 v4s2
2 v2s8
2 v2s6
2 v1s9
2 v1s4
1 v7s7
1 v6s2
1 v5s7
1 v5s5
1 v5s4
1 v5s2
1 v4s9
1 v4s7
1 v4s3
1 v3s9
1 v3s7
1 v3s5
1 v3s2
1 v3s1
1 v33s21
1 v2s60
1 v2s4
1 v2s3
1 v2s2
1 v28s1
1 v23s8
1 v22s9
1 v1s8
1 v1s6
1 v18s24
1 v15s14
1 v14s6
1 v14s5
1 v13s7
1 v13s6
1 v12s6
1 v11s1
1 m76v7
1 m3v1
1 m1v3
1 m1374v1
The excessively large ones (like the last one) are aggregate classifications
that are generated by a template. But note the first count.
Here's another example, one of the raters:
8812 m1v1
311 v1s1
17 v2s1
14 v1s5
4 v2s5
4 v1s6
4 v11s10
3 v3s1
3 v1s8
2 v5s14
2 v4s7
2 v3s9
2 v3s5
2 v2s4
2 v1s9
2 v1s4
2 v1s2
1 v8s7
1 v7s7
1 v7s15
1 v6s4
1 v6s2
1 v6s10
1 v5s8
1 v5s7
1 v5s4
1 v5s2
1 v53s9
1 v4s9
1 v4s4
1 v4s3
1 v4s2
1 v4s11
1 v3s8
1 v3s7
1 v3s20
1 v3s2
1 v3s19
1 v3s15
1 v2s8
1 v2s60
1 v2s6
1 v2s2
1 v2s12
1 v29s20
1 v28s1
1 v23s8
1 v1s3
1 v15s23
1 v13s6
1 v13s20
1 v12s6
1 v12s10
1 v11s1
1 m1v2
1 m1s1
Given these examples, m1v1 is an easy first choice for this commit.
The general pattern for this commit and those that follow is to match on a
specific shape of classification that we're optimizing for, falling back to
the old anyValue-based system for all other cases, with the intent of
eventually removing it.
This has long been a curse, and I don't know why I didn't resolve it sooner.
This makes explicit some of the odd things that this is doing, to maintain
the previous behavior. Changing that behavior would be ideal, but ought to
be done separately and put behind a feature flag.
This reverts commit e2d9467633bb75d79dbc8fe9f8971bfa412ea59f.
BUT: it does cause more data to be returned, perhaps unnecessarily. See if
that may offset the slight increase in GC cost.
Further, we may end up getting rid of some of these generated values; check
after we do some class optimizations.
This was a waste of time; it actually reduces performance slightly and increased
GC, unintuitively enough.
Leaving commit here and reverting to keep it for reference.
When the Summary Page was _first written_ (the first part of TAME), it was
compiled in the browser---development consisted of refreshing the page,
which was familiar to how we wrote PHP at the time. No compile process.
In that situation, we couldn't have the XSLT stylesheet failing to
translate. But of course those days are long since gone, and this must be a
compile-time error.
It shouldn't ever get to this point, granted.
Single-predicate classifications matching on TRUE can be optimized into
aliases. These sometimes occur in hand-written code, but can also be
generated by templates.
A previous commit used a rustdoc tool lint, but that support wasn't added
until 1.52.0 (2021-05-06).
Note that this represents the minimum _required_ version to build TAMER; you
can use a later version.
This template prepares for the introduction of the new classification
system, which is a full rewrite that is both more performant and more
correct in its behavior. Unfortunately, the corrections will cause problems
with old code that may be relying on certain cases, particularly where
undefined values are implicitly treated as zero.
Consequently, the legacy and new systems will exist side-by-side, able to be
toggled on as desired so people can verify that behavior is correct before
we switch it on by default. This template allows switching on the system
for an entire package (if it's placed at the toplevel), or portions of a
package, though the latter should only be used in exceptional circumstances.
See the test cases in commits to follow for more information.
This package is not used today. See RELEASES.md for more information; This
is a dangerous package that never should have existed.
This also fixes the test suite.
The classification system rewrite removed the debug value collection that
previously existed. It didn't make a whole lot of sense anyway, given that
that compiler rearranges matches.
This falls back to showing the value of the @on, which should be good
enough, and is honestly better than what we had before.