From b2a996c1dfd382b6e59c4862a3a93d278cfb5942 Mon Sep 17 00:00:00 2001 From: Mike Gerwitz Date: Fri, 6 Oct 2023 16:31:18 -0400 Subject: [PATCH] expand-sequence/expand-group: Retain until hoisting This is a rather small change for quite a bit of effort in researching what was going wrong. It's at last seven rabbit holes deep, or maybe several herd of yaks, depending on your choice of measure and the current conversion rate. The problem can be summarized fair succinctly: `expand-sequence/expand-group` exists to prevent an expansion repass for every single child element of the `expand-sequence`, which would be quadratic. Basically, it restores the usual template expansion process for that set of children. But apparently `expand-group` was stripped on the first pass, which expanded its children inline, which then meant that each of the children were subject to their own individual passes, defeating the purpose of the optimization. As is the nature of quadratic-time processes, that was not noticed until inputs became especially large, and not only that, but were combined with nested `expand-sequence`s. I would say that this never worked the way that I intended it to, but I'm not certain. I was working quite a bit with TeX at the time, so it's possible that I modeled it after `\expandafter`. But that's not an appropriate model for TAME. TAMER will be removing expand-sequence entirely, since it will have enough of an understanding of the source system to determine what requires expansion and what requires ordering (e.g. for symbol table iteration). I'll also be making changes to simplify the process by further restricting what type of iteration can take place. But for the time being, the change was necessary. In our largest systems, this change cut off ~15m total of build time if run serially (at `-j1`). After sorting two runtabs for comparison (e.g. `sort -k4`), you can get the total like so: $ paste <( sort -k4 runtab-a ) <( sort -k4 runtab-b ) | grep xmlo\$ \ | cut -f2,5,6 \ | awk '{ total += ($1 - $2) } END { print total / 1000 }' Similarly, this Awk expression will give the time differences: $ awk '{ print ($1 - $2)/1000, $5 }' Further, the previous commit also introduced a `xmle-sym-cmp` tool to check for differences between xmle symbol tables in an automated way, irrespective of ordering (since there are many valid topological sorts). This revealed that the change fixed a bug (likely because of the forced repass after `expand-group` hoisting) that was causing symbol table introspection to fail to discover symbols in certain cases, which in our case, was resulting in the failure to generate a small number of aggregate classifications correctly. The whole repass system is a concerning mess, but it's not worth the effort to try to redo all of that when that work can be done in TAMER. DEV-15069 --- src/current/include/preproc/template.xsl | 18 +++++++++++++++--- src/preproc/expand/expand-sequence.xsl | 15 ++++++++++++++- 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/src/current/include/preproc/template.xsl b/src/current/include/preproc/template.xsl index 295d1cf4..c327a7e8 100644 --- a/src/current/include/preproc/template.xsl +++ b/src/current/include/preproc/template.xsl @@ -1523,8 +1523,21 @@ --> + + + + @@ -1564,7 +1577,6 @@ -