From e20076235e27852faa9a3e7ec07bb9ca29c00538 Mon Sep 17 00:00:00 2001 From: Mike Gerwitz Date: Wed, 18 Oct 2023 10:20:49 -0400 Subject: [PATCH] _table-row_: Performance fix: place table in const/text() instead of const/@values This is an interesting one. For some context: TAME uses `csvm` files to provide syntactic sugar for large tables of values ("rate tables", as they're often called, since they contain insurance rates and other data). This gets desugared into a `csv` which in turn is compiled via `csv2xml` into a package. That package uses the `_table-*_` templates to define a table, which is represented as a matrix using `const/@values`. Here's an example of a generated table in a package: ``` ``` Some of the tables are quite large, generating tens of MiB of data in `@data`. This in itself isn't a problem. But when Saxon parses the `@data` attribute, it normalizes the whitespace, as mandated by the XML spec, and removes the newlines. Therefore, when the template is expanded and the `xmlo` file is produced, the template produced a `const/@values` with a huge amount of data on one line. Then, when another package imports that `xmlo` file via ``, which is done via `document()` in XSLT, Saxon takes a long time to parse it. 60s on my machine for a ~20MiB line. This problem does not exist for JS fragments; Saxon doesn't mind large text nodes. So that is the approach that is taken here. The template system doesn't have a way to output text yet, so this takes an approach that minimizes changes as much as possible: - `param-copy` will expand `with-param/@value` as a text node. - `const/@values="-"` will cause TAME to use the child text node as the value of `@values`. - `_table-rows_` is modified to use the above two features. The reason for using `@values="-"` is so that other parts of the compiler do not have to be modified to recognize the new text convention, which is otherwise awkward because newlines are text nodes. The `-` convention comes from command line programs, which means "read from stdin", generally; this is okay since `-` is never a valid matrix specification. This must have been a problem for a very long time, but wasn't all that noticeable until recent performance optimizations, since so many other things around it were also slow. DEV-15131 --- core/vector/table.xml | 6 +++++- src/current/compiler/js.xsl | 21 +++++++++++++++++++-- src/current/include/preproc/template.xsl | 15 +++++++++++---- 3 files changed, 35 insertions(+), 7 deletions(-) diff --git a/core/vector/table.xml b/core/vector/table.xml index fb64a47d..0e73f1aa 100644 --- a/core/vector/table.xml +++ b/core/vector/table.xml @@ -160,7 +160,11 @@ + values="-"> + + + + select="compiler:const-values( $const )" /> @@ -487,6 +487,21 @@ + + + + + + + + - - + + + + + + +