All systems should be using the provided Makefile, so this shouldn't be
invoked anymore. The new linker is still considered a proof-of-concept, but
bugs have been encountered in the old one that are not worth investing the
time into fixing.
The new linker has been used in production for nearly a couple months and is
functioning properly.
This only saves 1--2s on a 30s run, but I want to move into this direction,
so it'll simplify future refactoring if I just add it. Small changes like
these will accumulate, too.
* src/current/compiler/linker.xsl (l:orig-package, l:root-symtable-map): New
variables.
(l:resov-extern): Use it.
This now uses year ranges, which I'll update annually.
This also renames "R-T Specialty" to "Ryan Specialty Group". The latter is
the parent company of the former. I was originally employed under the
former when LoVullo Associates was purchased, by I now work for the parent
company.
This is a significant performance improvement for dependency
generation (which is responsible for building the dependency graph for a
package).
The previous algorithm ran in O(n²) time: it would iterate over the given
symbol table, and for _each_ symbol, do a linear scan of the entire document
to search for the corresponding source block. This resulted in explosive
depgen time for larger packages.
This makes the algorithm run in O(n) by:
- Using an XSLT 3 map for the symbol table for O(1) lookups; and
- Iterating over the _document_ a single time rather than the symbol
table, referencing the symbol table as needed (in O(1) time).
There are other parts of the system that can benefit from these same
improvements. This is important, since we need to be able to handle many
thousands of symbols efficiently.
* src/current/compiler/linker.xsl (l:depgen-sym): Recognize smybol `no-deps'
property, permitting missing dependencies. This allows us to avoid
creating nonsense nodes just to satisfy the linker, while still allowing
the linker to perform essential checks to defend against compiler bugs.
* src/current/compiler/map.xsl (lvmc:stub-symtable): Set @no-deps on
`___head' and `___tail' symbols.
(lvmc:mapsym): Set `no-deps' as appropriate on map symbols.
(preproc:depgen)[lvm:map[@from]]: Generate `preproc:sym-dep' node, which
is now expected by the depgen process.
(preproc:depgen)[lvm:map[*]]: Likewise.
(preproc:depgen)[*[@lvmc:type='retmap']//lvmm:map[@from]]: Remove
unnecessary template.
(preproc:symtable)[lvm:map[@value]]: Pass `no-deps' to `lvmc:mapsym'.
* src/current/include/depgen.xsl (preproc:depgen)[preproc:symtable]: Create
and use XSLT 3 map in place of `preproc:symtable' tree. This allows for
constant-time lookups. Provide to templates via tunnelling. Use it in
place of exiting tree references. Process source tree rather than
iterating over symbol table.
(preproc:depgen)[lv:rate, c:sum[@generates], c:product[@generates],
lv:classify, lv:function/lv:param, lv:function, lv:typedef]: Produce
`preproc:sym-dep' nodes (which was previously done while iterating
over the symbol table).
(preproc:depgen)[preproc:sym]: Remove all such processing, since we no
longer iterate over the symbol table.
(preproc:depgen)[c:value-of]: Use symtable map.
(preproc:depgen-match): Likewise.
(preproc:depgen)[lv:union]: Modify to handle changes to lv:typedef
template.
(preproc:depgen)[text()]: Remove and replace with `node()'.
* src/current/include/preproc/package.xsl (preproc:resolv-syms): Remove
logging of symbol resolution. This has a slight performace impact since
there is a lot of output.
* src/current/include/preproc/symtable.xsl
(lv:function/lv:param, c:let/c:Values/c:value): Set `no-deps'.
* src/symtable/symbols.xsl: Add documentation of `no-deps'.
(preproc:symtable)[lv:meta]: Set `no-deps'.
This has a significant performance impact: processing time is cut in about
half and memory usage is reduced by more than 50%. For example, a
package that previously took 30s and 2.1GiB of memory to link now takes
14s and less than 900MiB of memory.
I had tried to perform this optimization a couple years ago but was
thwarted (I think) by the classifier markers. The previous commit did away
with those. I'm encouraged by the gains from the low-hanging fruit.
* src/current/compiler/linker.xsl
(l:process-empty, l:stack-empty): Convert from l:pstack and
l:sym-stack (respectively) to empty preproc:sym sequences.
(l:depgen-process-sym)[preproc:sym]: Append to sequence rather than
outputting new l:sym-stack tree.
Update all annotations and uses accordingly.
This is something that I thought would be useful back in the day when TAME
was in its infancy, but it is not important. Rather than having the linker
spend time trying to figure out what symbols belong in the classifier---and
rather than keeping that complexity around---this simplifies things by
making the existing `classify' method simply perform _all_ calculations, and
then yield only the classification portion of the result.
This isn't a problem in practice because, if we only desire the use of a
classifier, then we create a "supplier" that only uses classifications and
has no other dependencies. The end result is, as far as we care, the same.
* src/current/compiler/js.xsl (compiler:entry-rater)[lv:package]: Initialize
`classes' rather than invoking classifier
(compiler:entry-classifier)[lv:package]: Invoke all calculations and
return only classes to provide equivalent behavior.
(compiler:exit-classifier): Post-process classifications from calculation
results, iterating through classmap.
(compiler:classifier-yields-map)[lv:package]: Output all classifications
that are not generated. This differs slightly from the original
implementation in that it includes all non-generated classes rather than
just classes that have a non-generated `@yields'; this distinction is
important since `compiler:exit-classifier' is now using it to produce a
classification result set that doesn't contain all the generated
stuff (since it didn't before, and shouldn't now).
* src/current/compiler/linker.xsl: Update copyright year.
(l:resolv-deps)[preproc:sym[@l:mark-inclass]]: Remove template.
(l:resolv-deps)[preproc:sym...@l:mark-inclass...]: Remove template.
(l:depgen-sym): Set type of result to `element(preproc:sym)', since
`l:mark-inclass' is no longer produced.
[inclass, needs-class-mark]: Remove variables and all instances where
they are used.
(l:dep-aug)[inclass]: Remove param. Stop producing `@inclass' attribute.
(l:link-classifier)[lv:package]: Do not process any dependencies. This
can be removed entirely in the future since it now only produces static
code, which we can perhaps combine with a different block.
(l:link-rater)[lv:package]: Remove mention of `inclass' for dependencies;
all dependencies will now be compiled into this block.
This is important to include all terminating classifications, which
include assertions. This is essential now that @keep support has been
removed; this essentially does the same thing, but in a more
sane/strict manner.
* src/current/compiler/linker.xsl (l:depgen)[preproc:symtable]:
Include package-level eligibility class in initial dependency list.
These used to be automatically added via @keep.
* src/current/compiler/linker.xsl (l:depgen): Include meta symbols.
* src/symtable/symbols.xsl (lv:meta): @pollute instead of @keep.
And everything else.
This is a big (important) change; it addresses one of the greatest
pains of the system.
Keeps were added during the DSL rewrite (to support symbols and such)
to work around the issue that there was no symbol-driven map; it
allowed symbols to persist disjoint from the `__yield' dependency
graph so that they could be mapped back and used by external systems.
The problem with that is that it's both messy (coupling the concept of
external dependencies with the actual code) and difficult to work
with. It had a huge performance impact on the linker for two reasons:
- Checking whether a package had already been seen and importing the
keeps on first visit was expensive because of tree searching and
manipulation; and
- _every_ keep was imported and processed by the linker, even if it
wouldn't end up being used by a particular program.
The later especially had huge performance impacts on the entire
system.
The entire dependency graph is now map-driven, with the exception of
the implicit `__yield' (which will eventually be moved into the map as
well and the magic `lv:yield' removed in favor of a template).
Performance-wise: our largest program ("dwelling") has many thousands
of symbols and the largest package imported the majority of them, many
of them unneeded, as the result of @keep subgraphs. Compilation of
the largest package within that (for the UI) took about a minute and a
half and ate up ~6GiB of RAM, for what really is a trivial task of
resolving externs, some basic symbol processing, a topological sort,
and ordering code fragments.
After this change, it takes ~15s and less than 2GiB of RAM. Still a
lot---and more improvements can be made---but much, much better.
@keep and friends was left in rater.xsd so that nothing breaks while
code is cleaned up; it'll be removed in the future.
* src/current/compiler/linker.xsl: Remove @keep support.
* src/current/dot/attr-keep.xsl: Remove now-unneeded template.
* src/current/dot/defnode.xsl: Remove @keep and related.
* src/current/include/preproc/eligclass.xsl: Remove @keep and related.
* src/current/include/preproc/expand.xsl: Remove @keep and related.
* src/current/include/preproc/macros.xsl: Remove @keep and related.
* src/current/include/preproc/symtable.xsl: Remove @keep and related.
* src/current/rater.xsd: Add TODO to remove @keep and friends.
`set_defaults' wasn't in scope of maps.
* src/current/compiler/js.xsl (compiler:exit-rater lv:package):
Remove static output.
* src/current/compiler/linker.xsl (l:link-deps lv:package):
Link static after all other blocks, at highest scope within the
compiled module.
(Copyright headers will be added in the next commit; these are the
original files, unaltered in any way.)
The internal project name at LoVullo is simply "Calc DSL". This
liberates the entire thing. If anything was missed, I'll be added
later.
To continue building at LoVullo with this move, symlinks are used for
the transition; this is the exact code that is used in production.
There is a lot here---over 25,000 lines. Much of it is in disarray from
the environment surrounding its development, but it does work well for
what it was intended to do.
(LoVullo folks: fork point is 65723a0 in calcdsl.git.)