_table-row_: Performance fix: place table in const/text() instead of const/@values
This is an interesting one. For some context: TAME uses `csvm` files to provide syntactic sugar for large tables of values ("rate tables", as they're often called, since they contain insurance rates and other data). This gets desugared into a `csv` which in turn is compiled via `csv2xml` into a package. That package uses the `_table-*_` templates to define a table, which is represented as a matrix using `const/@values`. Here's an example of a generated table in a package: ``` <t:create-table name="foo"> <t:table-rows data=" 1,2,3; 4,5,6;" /> </t:create-table> ``` Some of the tables are quite large, generating tens of MiB of data in `@data`. This in itself isn't a problem. But when Saxon parses the `@data` attribute, it normalizes the whitespace, as mandated by the XML spec, and removes the newlines. Therefore, when the template is expanded and the `xmlo` file is produced, the template produced a `const/@values` with a huge amount of data on one line. Then, when another package imports that `xmlo` file via `<import package="..." />`, which is done via `document()` in XSLT, Saxon takes a long time to parse it. 60s on my machine for a ~20MiB line. This problem does not exist for JS fragments; Saxon doesn't mind large text nodes. So that is the approach that is taken here. The template system doesn't have a way to output text yet, so this takes an approach that minimizes changes as much as possible: - `param-copy` will expand `with-param/@value` as a text node. - `const/@values="-"` will cause TAME to use the child text node as the value of `@values`. - `_table-rows_` is modified to use the above two features. The reason for using `@values="-"` is so that other parts of the compiler do not have to be modified to recognize the new text convention, which is otherwise awkward because newlines are text nodes. The `-` convention comes from command line programs, which means "read from stdin", generally; this is okay since `-` is never a valid matrix specification. This must have been a problem for a very long time, but wasn't all that noticeable until recent performance optimizations, since so many other things around it were also slow. DEV-15131main
parent
b82294b1bd
commit
e20076235e
|
@ -160,7 +160,11 @@
|
||||||
<const name="{@__tid@}_RATE_TABLE"
|
<const name="{@__tid@}_RATE_TABLE"
|
||||||
type="float"
|
type="float"
|
||||||
desc="{@__tname@} table; {@__desc@}"
|
desc="{@__tname@} table; {@__desc@}"
|
||||||
values="@data@" />
|
values="-">
|
||||||
|
<!-- `@values="-"` above tells TAME to read the value from the
|
||||||
|
child text node -->
|
||||||
|
<param-copy name="@data@" />
|
||||||
|
</const>
|
||||||
</if>
|
</if>
|
||||||
<unless name="@data@">
|
<unless name="@data@">
|
||||||
<const name="{@__tid@}_RATE_TABLE"
|
<const name="{@__tid@}_RATE_TABLE"
|
||||||
|
|
|
@ -472,7 +472,7 @@
|
||||||
<param name="const" as="element( lv:const )" />
|
<param name="const" as="element( lv:const )" />
|
||||||
|
|
||||||
<variable name="values-def" as="xs:string?"
|
<variable name="values-def" as="xs:string?"
|
||||||
select="$const/@values" />
|
select="compiler:const-values( $const )" />
|
||||||
|
|
||||||
<choose>
|
<choose>
|
||||||
<when test="$values-def and contains( $values-def, ';' )">
|
<when test="$values-def and contains( $values-def, ';' )">
|
||||||
|
@ -487,6 +487,21 @@
|
||||||
</function>
|
</function>
|
||||||
|
|
||||||
|
|
||||||
|
<function name="compiler:const-values" as="xs:string?">
|
||||||
|
<param name="const" as="element( lv:const )" />
|
||||||
|
|
||||||
|
<!-- @values="-", a convention from command-line programs where '-' means
|
||||||
|
"read from stdin", will take the value from the child text of the
|
||||||
|
constant; this is done because Saxon performs very, very poorly on
|
||||||
|
huge single-line attributes (e.g. 60s for ~20MiB single-line
|
||||||
|
attribute) -->
|
||||||
|
<sequence select="if ( $const/@values = '-' ) then
|
||||||
|
$const/text()
|
||||||
|
else
|
||||||
|
$const/@values" />
|
||||||
|
</function>
|
||||||
|
|
||||||
|
|
||||||
<!--
|
<!--
|
||||||
Produce a sequence of items
|
Produce a sequence of items
|
||||||
|
|
||||||
|
@ -505,7 +520,9 @@
|
||||||
|
|
||||||
<when test="$set/@values and $allow-values">
|
<when test="$set/@values and $allow-values">
|
||||||
<sequence select="tokenize(
|
<sequence select="tokenize(
|
||||||
normalize-space( $set/@values ), ',' )" />
|
normalize-space(
|
||||||
|
compiler:const-values( $set ) ),
|
||||||
|
',' )" />
|
||||||
</when>
|
</when>
|
||||||
|
|
||||||
<otherwise>
|
<otherwise>
|
||||||
|
|
|
@ -653,11 +653,13 @@
|
||||||
<variable name="varname" select="@name" />
|
<variable name="varname" select="@name" />
|
||||||
<variable name="param" select="$params[ @name=$varname ]" />
|
<variable name="param" select="$params[ @name=$varname ]" />
|
||||||
|
|
||||||
<variable name="copy" as="node()*">
|
<variable name="copy">
|
||||||
<choose>
|
<choose>
|
||||||
<!-- TAMER desugared @values@ application convention (see tplshort.rs) -->
|
<!-- TAMER desugared @values@ application convention (see
|
||||||
<when test="$param/@value">
|
tplshort.rs); this will go away once the template system is fully
|
||||||
<!-- the value is the name of a template to copy the body from -->
|
implemented in TAMER -->
|
||||||
|
<when test="$varname = '@values@' and $param/@value">
|
||||||
|
<!-- the value may be the name of a template to copy the body from -->
|
||||||
<variable name="dsgr" select="$param/@value" />
|
<variable name="dsgr" select="$param/@value" />
|
||||||
|
|
||||||
<!-- the template is always positioned as the immeditely-following
|
<!-- the template is always positioned as the immeditely-following
|
||||||
|
@ -667,6 +669,11 @@
|
||||||
/*" />
|
/*" />
|
||||||
</when>
|
</when>
|
||||||
|
|
||||||
|
<!-- non-node value is copied as text -->
|
||||||
|
<when test="$param/@value">
|
||||||
|
<sequence select="string( $param/@value )" />
|
||||||
|
</when>
|
||||||
|
|
||||||
<!-- old applicatication convention has child nodes within
|
<!-- old applicatication convention has child nodes within
|
||||||
`with-param` -->
|
`with-param` -->
|
||||||
<otherwise>
|
<otherwise>
|
||||||
|
|
Loading…
Reference in New Issue