expand-sequence/expand-group: Retain until hoisting

This is a rather small change for quite a bit of effort in researching what was going wrong. It's at last seven rabbit holes deep, or maybe several herd of yaks, depending on your choice of measure and the current conversion rate. The problem can be summarized fair succinctly: `expand-sequence/expand-group` exists to prevent an expansion repass for every single child element of the `expand-sequence`, which would be quadratic. Basically, it restores the usual template expansion process for that set of children. But apparently `expand-group` was stripped on the first pass, which expanded its children inline, which then meant that each of the children were subject to their own individual passes, defeating the purpose of the optimization. As is the nature of quadratic-time processes, that was not noticed until inputs became especially large, and not only that, but were combined with nested `expand-sequence`s. I would say that this never worked the way that I intended it to, but I'm not certain. I was working quite a bit with TeX at the time, so it's possible that I modeled it after `\expandafter`. But that's not an appropriate model for TAME. TAMER will be removing expand-sequence entirely, since it will have enough of an understanding of the source system to determine what requires expansion and what requires ordering (e.g. for symbol table iteration). I'll also be making changes to simplify the process by further restricting what type of iteration can take place. But for the time being, the change was necessary. In our largest systems, this change cut off ~15m total of build time if run serially (at `-j1`). After sorting two runtabs for comparison (e.g. `sort -k4`), you can get the total like so: $ paste <( sort -k4 runtab-a ) <( sort -k4 runtab-b ) | grep xmlo\$ \ | cut -f2,5,6 \ | awk '{ total += ($1 - $2) } END { print total / 1000 }' Similarly, this Awk expression will give the time differences: $ awk '{ print ($1 - $2)/1000, $5 }' Further, the previous commit also introduced a `xmle-sym-cmp` tool to check for differences between xmle symbol tables in an automated way, irrespective of ordering (since there are many valid topological sorts). This revealed that the change fixed a bug (likely because of the forced repass after `expand-group` hoisting) that was causing symbol table introspection to fail to discover symbols in certain cases, which in our case, was resulting in the failure to generate a small number of aggregate classifications correctly. The whole repass system is a concerning mess, but it's not worth the effort to try to redo all of that when that work can be done in TAMER. DEV-15069
2023-10-06 16:31:18 -04:00 · 2023-10-06 16:31:18 -04:00 · b2a996c1df
parent 7692d0d848
commit b2a996c1df
2 changed files with 29 additions and 4 deletions
--- a/src/current/include/preproc/template.xsl
+++ b/src/current/include/preproc/template.xsl
@ -1523,8 +1523,21 @@
 -->
 <template mode="preproc:macros" priority="5"
              match="lv:expand-group">
-  <!-- strip expand-group -->
+  <!-- the expand-group node will be stripped during hoisting -->
-  <apply-templates mode="preproc:macros" />
+  <copy>
    <apply-templates mode="preproc:macros" />
  </copy>
 </template>
 <!--
  expand-group must be unwrapped when hositing out of an expand-sequence.
 -->
 <template mode="eseq:hoist" priority="5"
          match="lv:expand-group">
  <!-- unwrap children -->
  <sequence select="node()" />
  <preproc:repass need-sym="(lv:expand-group hoist)" />
 </template>
@ -1564,7 +1577,6 @@
 </function>
 <template mode="preproc:macros" priority="9"
              match="node()[ not( . instance of element() ) ]">
  <sequence select="." />
--- a/src/preproc/expand/expand-sequence.xsl
+++ b/src/preproc/expand/expand-sequence.xsl
@ -412,6 +412,9 @@
  If no head node exists, the result is the single expansion sequence
  node unchanged.
  Hositing behavior may be configured via the @code{eseq:hoist}
  template mode.
 -->
 <function name="_eseq:hoist" as="node()+">
  <param name="eseq-node" as="element()" />
@ -419,7 +422,7 @@
  <variable name="head" as="node()?"
            select="$eseq-node/node()[1]" />
-  <sequence select="$head" />
+  <apply-templates mode="eseq:hoist" select="$head" />
  <!-- This @code{for-each} is purely to set the context for
       @code{copy}, since we do not know the sequence element
@ -432,4 +435,14 @@
  </for-each>
 </function>
 <!--
  Hoist a node out of a sequence.
  The caller may provide templates that alter the behavior of hositing.
 -->
 <template mode="eseq:hoist" priority="1"
          match="node()">
  <sequence select="." />
 </template>
 </stylesheet>