Same concept as previous commits: rather than iterating over the symbol
table and scanning the tree for the matching node, iterate over the document
and look up from a symbol map: O(n²) => O(n).
This gives a respectable performance boost to compilation of certain
packages (best improving packages with many classifications or rate blocks).
* src/current/compiler/fragments.xsl (@xmlns:xs, @xmlns:map): New namespace
declarations.
(preproc:compile-fragments): Generate `preproc:fragment' nodes and match
on document rather than symbols.
[lv:package]: Generate map and tunnel it.
* src/current/compiler/js.xsl (compile)[lv:classify, lv:match]: Use
symtable-map.
(compile-class-condition)[lv:rate]: Likewise.
(compile-cmatch)[lv:rate]: Likewise.
This is the first step to improving the map. Note that this duplicates the
symbol table generation code that's used in a few other places
now---that'll be cleaned up in future commits once I have a better idea of
all the places this will be used and try to move it to a higher level.
* src/current/compiler/validate.xsl (@xmlns:xs, @xmlns:map): New namespace
definitions.
(lvv:validate)[lv:package]: Generate symbol table map. Tunnel to
templates.
[c:apply[@name], lv:classify[@as]//lv:match, lv:match[@value]]
[c:*[@name or @of], c:apply/c:arg[@name], lv:rate/lv:class]: Use it.
This only saves 1--2s on a 30s run, but I want to move into this direction,
so it'll simplify future refactoring if I just add it. Small changes like
these will accumulate, too.
* src/current/compiler/linker.xsl (l:orig-package, l:root-symtable-map): New
variables.
(l:resov-extern): Use it.
This now uses year ranges, which I'll update annually.
This also renames "R-T Specialty" to "Ryan Specialty Group". The latter is
the parent company of the former. I was originally employed under the
former when LoVullo Associates was purchased, by I now work for the parent
company.
This is a significant performance improvement for dependency
generation (which is responsible for building the dependency graph for a
package).
The previous algorithm ran in O(n²) time: it would iterate over the given
symbol table, and for _each_ symbol, do a linear scan of the entire document
to search for the corresponding source block. This resulted in explosive
depgen time for larger packages.
This makes the algorithm run in O(n) by:
- Using an XSLT 3 map for the symbol table for O(1) lookups; and
- Iterating over the _document_ a single time rather than the symbol
table, referencing the symbol table as needed (in O(1) time).
There are other parts of the system that can benefit from these same
improvements. This is important, since we need to be able to handle many
thousands of symbols efficiently.
* src/current/compiler/linker.xsl (l:depgen-sym): Recognize smybol `no-deps'
property, permitting missing dependencies. This allows us to avoid
creating nonsense nodes just to satisfy the linker, while still allowing
the linker to perform essential checks to defend against compiler bugs.
* src/current/compiler/map.xsl (lvmc:stub-symtable): Set @no-deps on
`___head' and `___tail' symbols.
(lvmc:mapsym): Set `no-deps' as appropriate on map symbols.
(preproc:depgen)[lvm:map[@from]]: Generate `preproc:sym-dep' node, which
is now expected by the depgen process.
(preproc:depgen)[lvm:map[*]]: Likewise.
(preproc:depgen)[*[@lvmc:type='retmap']//lvmm:map[@from]]: Remove
unnecessary template.
(preproc:symtable)[lvm:map[@value]]: Pass `no-deps' to `lvmc:mapsym'.
* src/current/include/depgen.xsl (preproc:depgen)[preproc:symtable]: Create
and use XSLT 3 map in place of `preproc:symtable' tree. This allows for
constant-time lookups. Provide to templates via tunnelling. Use it in
place of exiting tree references. Process source tree rather than
iterating over symbol table.
(preproc:depgen)[lv:rate, c:sum[@generates], c:product[@generates],
lv:classify, lv:function/lv:param, lv:function, lv:typedef]: Produce
`preproc:sym-dep' nodes (which was previously done while iterating
over the symbol table).
(preproc:depgen)[preproc:sym]: Remove all such processing, since we no
longer iterate over the symbol table.
(preproc:depgen)[c:value-of]: Use symtable map.
(preproc:depgen-match): Likewise.
(preproc:depgen)[lv:union]: Modify to handle changes to lv:typedef
template.
(preproc:depgen)[text()]: Remove and replace with `node()'.
* src/current/include/preproc/package.xsl (preproc:resolv-syms): Remove
logging of symbol resolution. This has a slight performace impact since
there is a lot of output.
* src/current/include/preproc/symtable.xsl
(lv:function/lv:param, c:let/c:Values/c:value): Set `no-deps'.
* src/symtable/symbols.xsl: Add documentation of `no-deps'.
(preproc:symtable)[lv:meta]: Set `no-deps'.
The term "set" is all wrong---it is actally intended to be a vector, and can
absolutely have duplicate elements (and often does).
* src/current/calc.xsd (vector): Add, recommending in place of `set'.
* src/current/compiler/js-calc.xsl (compile-calc)[c:set|c:vector]:
Add `c:vector' and provide deprecation notice for `c:set'.
* src/current/include/calc-display.xsl (c:set|c:vector): Likewise.
A better option is to pre-process all inputs, but I need a quick
fix to my stupidity. 0||""==="".
* src/current/compiler/map.xsl (lvmc:compile)[lvm:map//lvm:from[*]]: Correct oval default.
I need to revert this for now because it breaks YAML test cases. The proper
fix is a more expressive type system with dependent types that would allow
it to know the proper number of indexes to initialize relative to other
inputs. I wanted to implement this anyway to help catch iteration-related
bugs.
I'm tabling this for now, though, since I have other things that I need to
work on.
This reverts commit 4406cbe553.
This is an assumption that's existed since the Summary Page was first
devised---that all vectors have at least one value. This is because the
bucket (originating from Liza) always has at least one value in its vectors.
Of course, we still have a problem in that the Summary Page initializes
everything to have a single value by default, and that's still the
case. But this will at least allow for things _outside_ the Summary Page to
provide an empty array. I'll have to address the Summary Page separately,
and that's going to be difficult, since we don't really want to change the
behavior across the board.
* src/current/compiler/js.xsl (set_defaults): Default max index to 0 if
`length' is unavailable, rather than 1.
The previous length check existed as a really bad array check (before
Array.isArray was a thing). This has been broken since Nov 2012.
The problem manifests itself when you want an empty array. We then have:
[] => [[]] => [DEFAULT_VALUE]
* src/current/compiler/map.xsl (lvmc:compile)[lvm:map//lvm:from[*]]: Use
`Array.isArray' in place of length check.
* src/current/compiler/map.xsl:
(lvmc:gen-input-default): Add argument.
[dim]: New param, defaulting to `$sym/@dim'.
(lvmc:compile)[lvm:map//lvm:from[*]]: Provide appropriate dimension value
to `set_defaults'. Provide compile-time error if nesting of `from'
nodes exceeds what is appropriate for the symbol dimensions.
This was a bit of a nasty one. Fortunately, this was only used as a
validation, so the code that the compiler produced was still correct.
The problem was that a version of Saxon sometime between 9.5 and 9.8 added
an optimization to eliminate conditionals with no body. Consequently, the
kluge to force the variable to be evaluated was optimized away,
`lvmc:get-symbol' was never called, and no error was ever produced.
This would be best refactored, but that's not something I have time to take
up at the moment priority-wise. This should be future-proof since this
would never be a noop.
* src/current/compiler/map.xsl (lvmc:compile)[lvm:map//lvm:from[*]]: Force
evaluation of `$sym' by ensuring that the condition will not be a noop.
This is a long-standing bug, apparently. The location of this code makes it
difficult to test directly (that is in dire need of correcting), but
fortunately we have a number of tests in systems that use TAME that
indirectly test this.
The problem manifested when a matrix was already in the store, but then a
scalar or vector predicate was considered. Without making the branch that
was modified here, it modified store such that it would always yield a
vector.
* src/current/compiler/js.xsl (anyValue): Consider store dimension when
recursing.
This has a significant performance impact: processing time is cut in about
half and memory usage is reduced by more than 50%. For example, a
package that previously took 30s and 2.1GiB of memory to link now takes
14s and less than 900MiB of memory.
I had tried to perform this optimization a couple years ago but was
thwarted (I think) by the classifier markers. The previous commit did away
with those. I'm encouraged by the gains from the low-hanging fruit.
* src/current/compiler/linker.xsl
(l:process-empty, l:stack-empty): Convert from l:pstack and
l:sym-stack (respectively) to empty preproc:sym sequences.
(l:depgen-process-sym)[preproc:sym]: Append to sequence rather than
outputting new l:sym-stack tree.
Update all annotations and uses accordingly.
This is something that I thought would be useful back in the day when TAME
was in its infancy, but it is not important. Rather than having the linker
spend time trying to figure out what symbols belong in the classifier---and
rather than keeping that complexity around---this simplifies things by
making the existing `classify' method simply perform _all_ calculations, and
then yield only the classification portion of the result.
This isn't a problem in practice because, if we only desire the use of a
classifier, then we create a "supplier" that only uses classifications and
has no other dependencies. The end result is, as far as we care, the same.
* src/current/compiler/js.xsl (compiler:entry-rater)[lv:package]: Initialize
`classes' rather than invoking classifier
(compiler:entry-classifier)[lv:package]: Invoke all calculations and
return only classes to provide equivalent behavior.
(compiler:exit-classifier): Post-process classifications from calculation
results, iterating through classmap.
(compiler:classifier-yields-map)[lv:package]: Output all classifications
that are not generated. This differs slightly from the original
implementation in that it includes all non-generated classes rather than
just classes that have a non-generated `@yields'; this distinction is
important since `compiler:exit-classifier' is now using it to produce a
classification result set that doesn't contain all the generated
stuff (since it didn't before, and shouldn't now).
* src/current/compiler/linker.xsl: Update copyright year.
(l:resolv-deps)[preproc:sym[@l:mark-inclass]]: Remove template.
(l:resolv-deps)[preproc:sym...@l:mark-inclass...]: Remove template.
(l:depgen-sym): Set type of result to `element(preproc:sym)', since
`l:mark-inclass' is no longer produced.
[inclass, needs-class-mark]: Remove variables and all instances where
they are used.
(l:dep-aug)[inclass]: Remove param. Stop producing `@inclass' attribute.
(l:link-classifier)[lv:package]: Do not process any dependencies. This
can be removed entirely in the future since it now only produces static
code, which we can perhaps combine with a different block.
(l:link-rater)[lv:package]: Remove mention of `inclass' for dependencies;
all dependencies will now be compiled into this block.
This includes a SHA256 implementation which is _not_ intended for secure
cryptographic operations; see src/js/sha256.js header for more information.
* src/current/compiler/js.xsl (compiler:static): Echo src/js/sha256.js.
[map_method_uppercase, map_method_hash]: New functions.
* src/current/link.xsl: Include dslc-base.xsl.
* src/js/sha256.js: New file.
* src/current/compiler/map.xsl
(lvmc:get-method-func, lvmc:value-ref, lvmc:transformation-wrap): New
functions, partyl extracted from below.
(lvmc:compile)[lvm:map//lvm:from]: Use `lvmc:value-ref'.
[lvm:map//lvm:from/lvm:translate]: Add `[@key]' to match.
[lvm:map//lvm:transform]: New match. Ignore node entirely.
(lvmc:concat-compile): Propagate symtable to `lvmc:compile'.
This is all really confusing because this doesn't use the same import
specification as packages; maps got stuck in a partial transition. So,
let's provide some helpful errors rather than silently failing.
* src/current/compiler/map.xsl (preproc:symtable)[lvm:import]:
Error if missing `@path'. Provide more information if `@package' was
provided to help clarify.
* src/current/compiler/js.xsl (compiler:js-number): New function to
remove leading zeroes.
(compile)[lv:const]: Use it.
* src/current/compiler/js-calc.xsl (compile-calc)[c:const]: Use it.
* src/current/compiler/js.xsl (compile-class-condtion)[lv:rate]: Do not
consider @no's in predicate generation when `@preproc:gentle-no' is set.
* src/current/include/preproc/macros.xsl (preproc:macros)[lv:rate-each]: Set
`@preproc:gentle-no' on generated `lv:rate', since the generator handles
`@no' itself.
In order for the cmatch algorithm to work properly, predicates must be
re-ordered on @dim descending.
* src/current/compiler/js.xsl (compile)[lv:classify]: Order all different
dimensions, not just scalars.
Rather than producing a syntax error, provide a useful warning and simply
yield 0.
* src/current/compiler/js.xsl (compile)[lv:rate]: Warn and yield 0 when no
calculation is provided in the body.
We've never done this before (thus this bug lasting so many years), but only
because it doesn't really make much sense in practice; this was caught when
writing test cases.
* src/current/compiler/js.xsl (compile)[lv:match]: Compile `consts' instead
of `args' when referencing a constant.
Cruft left around from the symbol table refactoring long ago.
* src/current/compiler/js.xsl
(compile)[preproc:rate]: Remove template.
[preproc:class]: Remove template.
(compile-rates)[lv:package]: Remove template.
This allows the result of a rate block to be a matrix; there was previously
no way for a named value to be assigned a matrix unless it was a parameter.
This is a bit of a kluge---the compiler won't discover the proper type
information and won't perform the proper safeguards.
* src/current/calc.xsd (sum)[@dim]: Add attribute.
* src/current/compiler/js-calc.xsl: Add xs namespace.
(compile-calc): Do not perform casting when @dim > 1.
* src/current/include/preproc/expand.xsl
(lv:rate-each): Include @dim in c:sum expansion.
* src/current/include/preproc/macros.xsl:
(c:*/@generates): Use @dim to determine symbol dimensions.
* src/current/include/preproc/expand.xsl: Parse @dim aliases (e.g. "vector",
"matrix").
This is important to include all terminating classifications, which
include assertions. This is essential now that @keep support has been
removed; this essentially does the same thing, but in a more
sane/strict manner.
* src/current/compiler/linker.xsl (l:depgen)[preproc:symtable]:
Include package-level eligibility class in initial dependency list.
This will hopefully provide a performance boost, and is a cheap
optimization to make. The information it collected was pretty useless
in practice.
* src/current/compiler/js-calc.xsl (compile) [c:*]:
Do not encase calculations with function ancestors in debug
collection functions.
These used to be automatically added via @keep.
* src/current/compiler/linker.xsl (l:depgen): Include meta symbols.
* src/symtable/symbols.xsl (lv:meta): @pollute instead of @keep.
In particular, I want it to handle absolute import paths. It does
this for the return map; I forgot to add it here.
* src/current/compiler/map.xsl (lvmc:compile) [lvm:program-map]:
Preprocess nodes.
And everything else.
This is a big (important) change; it addresses one of the greatest
pains of the system.
Keeps were added during the DSL rewrite (to support symbols and such)
to work around the issue that there was no symbol-driven map; it
allowed symbols to persist disjoint from the `__yield' dependency
graph so that they could be mapped back and used by external systems.
The problem with that is that it's both messy (coupling the concept of
external dependencies with the actual code) and difficult to work
with. It had a huge performance impact on the linker for two reasons:
- Checking whether a package had already been seen and importing the
keeps on first visit was expensive because of tree searching and
manipulation; and
- _every_ keep was imported and processed by the linker, even if it
wouldn't end up being used by a particular program.
The later especially had huge performance impacts on the entire
system.
The entire dependency graph is now map-driven, with the exception of
the implicit `__yield' (which will eventually be moved into the map as
well and the magic `lv:yield' removed in favor of a template).
Performance-wise: our largest program ("dwelling") has many thousands
of symbols and the largest package imported the majority of them, many
of them unneeded, as the result of @keep subgraphs. Compilation of
the largest package within that (for the UI) took about a minute and a
half and ate up ~6GiB of RAM, for what really is a trivial task of
resolving externs, some basic symbol processing, a topological sort,
and ordering code fragments.
After this change, it takes ~15s and less than 2GiB of RAM. Still a
lot---and more improvements can be made---but much, much better.
@keep and friends was left in rater.xsd so that nothing breaks while
code is cleaned up; it'll be removed in the future.
* src/current/compiler/linker.xsl: Remove @keep support.
* src/current/dot/attr-keep.xsl: Remove now-unneeded template.
* src/current/dot/defnode.xsl: Remove @keep and related.
* src/current/include/preproc/eligclass.xsl: Remove @keep and related.
* src/current/include/preproc/expand.xsl: Remove @keep and related.
* src/current/include/preproc/macros.xsl: Remove @keep and related.
* src/current/include/preproc/symtable.xsl: Remove @keep and related.
* src/current/rater.xsd: Add TODO to remove @keep and friends.
This is a backwards-incompatible change that, like the input map,
requires the use of symbols in the return map. This will allow us to
forego the use of @keep and will have the return map be the authority
of what gets linked (all of its dependencies).
* src/current/compiler/map.xsl: Add symbol support to return-map.
This allows for the proper importing of symbols into the package
generated by the map compiler, which in turn allows for processing
their default values.
`set_defaults' wasn't in scope of maps.
* src/current/compiler/js.xsl (compiler:exit-rater lv:package):
Remove static output.
* src/current/compiler/linker.xsl (l:link-deps lv:package):
Link static after all other blocks, at highest scope within the
compiled module.
(Copyright headers will be added in the next commit; these are the
original files, unaltered in any way.)
The internal project name at LoVullo is simply "Calc DSL". This
liberates the entire thing. If anything was missed, I'll be added
later.
To continue building at LoVullo with this move, symlinks are used for
the transition; this is the exact code that is used in production.
There is a lot here---over 25,000 lines. Much of it is in disarray from
the environment surrounding its development, but it does work well for
what it was intended to do.
(LoVullo folks: fork point is 65723a0 in calcdsl.git.)