tame/core/vector
Mike Gerwitz e20076235e _table-row_: Performance fix: place table in const/text() instead of const/@values
This is an interesting one.

For some context: TAME uses `csvm` files to provide syntactic sugar for
large tables of values ("rate tables", as they're often called, since they
contain insurance rates and other data).  This gets desugared into a `csv`
which in turn is compiled via `csv2xml` into a package.  That package uses
the `_table-*_` templates to define a table, which is represented as a
matrix using `const/@values`.

Here's an example of a generated table in a package:

```
  <t:create-table name="foo">
        <t:table-rows data="
          1,2,3;
          4,5,6;" />
  </t:create-table>
```

Some of the tables are quite large, generating tens of MiB of data in
`@data`.  This in itself isn't a problem.  But when Saxon parses the `@data`
attribute, it normalizes the whitespace, as mandated by the XML spec, and
removes the newlines.  Therefore, when the template is expanded and the
`xmlo` file is produced, the template produced a `const/@values` with a huge
amount of data on one line.

Then, when another package imports that `xmlo` file via `<import
package="..." />`, which is done via `document()` in XSLT, Saxon takes a
long time to parse it.  60s on my machine for a ~20MiB line.

This problem does not exist for JS fragments; Saxon doesn't mind large text
nodes.  So that is the approach that is taken here.

The template system doesn't have a way to output text yet, so this takes an
approach that minimizes changes as much as possible:

  - `param-copy` will expand `with-param/@value` as a text node.
  - `const/@values="-"` will cause TAME to use the child text node as the
    value of `@values`.
  - `_table-rows_` is modified to use the above two features.

The reason for using `@values="-"` is so that other parts of the compiler do
not have to be modified to recognize the new text convention, which is
otherwise awkward because newlines are text nodes.  The `-` convention comes
from command line programs, which means "read from stdin", generally; this
is okay since `-` is never a valid matrix specification.

This must have been a problem for a very long time, but wasn't all that
noticeable until recent performance optimizations, since so many other things
around it were also slow.

DEV-15131
2023-10-18 11:43:48 -04:00
..
arithmetic.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
cmatch.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
common.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
convert.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
count.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
define.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
filter.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
fold.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
interpolate.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
length.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
list.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
matrix.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
minmax.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
stub.xml Copyright year and name update 2023-01-20 23:37:30 -05:00
table.xml _table-row_: Performance fix: place table in const/text() instead of const/@values 2023-10-18 11:43:48 -04:00