tame/core
Mike Gerwitz e20076235e _table-row_: Performance fix: place table in const/text() instead of const/@values
This is an interesting one.

For some context: TAME uses `csvm` files to provide syntactic sugar for
large tables of values ("rate tables", as they're often called, since they
contain insurance rates and other data).  This gets desugared into a `csv`
which in turn is compiled via `csv2xml` into a package.  That package uses
the `_table-*_` templates to define a table, which is represented as a
matrix using `const/@values`.

Here's an example of a generated table in a package:

```
  <t:create-table name="foo">
        <t:table-rows data="
          1,2,3;
          4,5,6;" />
  </t:create-table>
```

Some of the tables are quite large, generating tens of MiB of data in
`@data`.  This in itself isn't a problem.  But when Saxon parses the `@data`
attribute, it normalizes the whitespace, as mandated by the XML spec, and
removes the newlines.  Therefore, when the template is expanded and the
`xmlo` file is produced, the template produced a `const/@values` with a huge
amount of data on one line.

Then, when another package imports that `xmlo` file via `<import
package="..." />`, which is done via `document()` in XSLT, Saxon takes a
long time to parse it.  60s on my machine for a ~20MiB line.

This problem does not exist for JS fragments; Saxon doesn't mind large text
nodes.  So that is the approach that is taken here.

The template system doesn't have a way to output text yet, so this takes an
approach that minimizes changes as much as possible:

  - `param-copy` will expand `with-param/@value` as a text node.
  - `const/@values="-"` will cause TAME to use the child text node as the
    value of `@values`.
  - `_table-rows_` is modified to use the above two features.

The reason for using `@values="-"` is so that other parts of the compiler do
not have to be modified to recognize the new text convention, which is
otherwise awkward because newlines are text nodes.  The `-` convention comes
from command line programs, which means "read from stdin", generally; this
is okay since `-` is never a valid matrix specification.

This must have been a problem for a very long time, but wasn't all that
noticeable until recent performance optimizations, since so many other things
around it were also slow.

DEV-15131
2023-10-18 11:43:48 -04:00
..
numeric Copyright year and name update 2023-01-20 23:37:30 -05:00
test Copyright year and name update 2023-01-20 23:37:30 -05:00
vector _table-row_: Performance fix: place table in const/text() instead of const/@values 2023-10-18 11:43:48 -04:00
.gitignore [DEV-7136] Add xmli files 2020-04-08 08:27:47 -04:00
alias.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
assert.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
base.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
cond.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
configure.ac core build 2018-11-08 11:15:12 -05:00
convention.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
datetime.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
dummy.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
extern.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
insurance.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
map.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
param.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
retry.xml [DEV-7087] core/retry (__retry): dim=0 2020-03-26 09:08:13 -04:00
state.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
symbol.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
tdat.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
tplgen.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
ui.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
when.xml Copyright year and name update 2023-01-20 23:37:30 -05:00