This is an interesting one.
For some context: TAME uses `csvm` files to provide syntactic sugar for
large tables of values ("rate tables", as they're often called, since they
contain insurance rates and other data). This gets desugared into a `csv`
which in turn is compiled via `csv2xml` into a package. That package uses
the `_table-*_` templates to define a table, which is represented as a
matrix using `const/@values`.
Here's an example of a generated table in a package:
```
<t:create-table name="foo">
<t:table-rows data="
1,2,3;
4,5,6;" />
</t:create-table>
```
Some of the tables are quite large, generating tens of MiB of data in
`@data`. This in itself isn't a problem. But when Saxon parses the `@data`
attribute, it normalizes the whitespace, as mandated by the XML spec, and
removes the newlines. Therefore, when the template is expanded and the
`xmlo` file is produced, the template produced a `const/@values` with a huge
amount of data on one line.
Then, when another package imports that `xmlo` file via `<import
package="..." />`, which is done via `document()` in XSLT, Saxon takes a
long time to parse it. 60s on my machine for a ~20MiB line.
This problem does not exist for JS fragments; Saxon doesn't mind large text
nodes. So that is the approach that is taken here.
The template system doesn't have a way to output text yet, so this takes an
approach that minimizes changes as much as possible:
- `param-copy` will expand `with-param/@value` as a text node.
- `const/@values="-"` will cause TAME to use the child text node as the
value of `@values`.
- `_table-rows_` is modified to use the above two features.
The reason for using `@values="-"` is so that other parts of the compiler do
not have to be modified to recognize the new text convention, which is
otherwise awkward because newlines are text nodes. The `-` convention comes
from command line programs, which means "read from stdin", generally; this
is okay since `-` is never a valid matrix specification.
This must have been a problem for a very long time, but wasn't all that
noticeable until recent performance optimizations, since so many other things
around it were also slow.
DEV-15131
This removes the deprecated `@const@` argument in favor of shorthand
`@value@` constants, which were introduced long ago precisely to avoid
having to define separate `@const@` parameters for all of these templates.
DEV-7145
"keep" is an old feature that forced the linker to retain symbols that were
unused. This was removed long ago in favor of having all linker roots
defined by the return map.
This also removes an old `@always`, which seems like a typo for
`when="always"` or something...not entirely sure.
DEV-7145
Accumulators were an ancient TAME feature removed long ago during The Great
Refactoring (...okay, that part didn't fit the definition of a "refactor",
but that's technically what that's referring to).
TAMER will not accept it.
DEV-7145
This has been optional for many years and is not actually used by the
current compiler. TAMER can infer it, in situations where it actually
matters in the future.
So, rather than adding support for this in the new parser, let's clean up.
DEV-7145
RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no
"Group"), but I'm not sure if the legal name has been changed yet or not, so
I'll wait on that.
TAMER rejects this, because we shouldn't be using anything but UTF-8. My
use of this encoding is ancient, from over a decade ago, that was apparently
just copied around.
DEV-10936
We were still having issues with this function when taking the positive
branch, when predicates cause many matches within tables. This was causing
us to hit stack limits in certain browsers on the Summary Page.
This converts it to an iterator so that all branches are tail-recursive, and
then enables TCO on them.
I was disappointed to find that there's little performance or memory benefit
in running our test suite.
This now uses year ranges, which I'll update annually.
This also renames "R-T Specialty" to "Ryan Specialty Group". The latter is
the parent company of the former. I was originally employed under the
former when LoVullo Associates was purchased, by I now work for the parent
company.
These provide a more pleasent abstraction than having to reference CMP_OP_*
constants.
* core/test/core/vector/interpolate.xml: {t:when=>t:where-eq}.
* core/test/core/vector/table.xml: Likewise, but using the other variants
where appropriate given the value of `@op'.
* core/vector/interpolate.xml: Likewise.
* core/vector/table.xml (_when_, _where_): Rename former to latter and
provide deprecation warning.
(_when-lt_, _when-lte_, _when-gt_, _when-gte_): Add abstractions.
* src/current/rater.xsd: Permit template variable as tenplate name.
This is fairly primitive support and it completely sidesteps the bisect
algorithm for now. The next commit will abstract this a little bit further
to make it less awkward to use.
* core/test/core/vector/table.xml: New test cases.
* core/vector/filter.xml (CmpOp): New typedef.
(mfilter): Document that bisecting will not happen unless `CMP_OP_EQ'
is used. Implement that restriction.
[op]: New parameter. Provide it to `mrange'.
(_mfilter, _mrange_cmp): Rename from `_mfilter'. Implement new comparison
check based on `op'
[op]: New argument.
* core/vector/table.xml (_when_)[@op@]: New param. Add it to the produced
vector.
(_mquery): Unpack op (from `_when_') in call to `mfilter'.
Just trying to clean up a little as I go to start to make it easier
to understand.
* core/vector/filter.xml: Use _when-*_ templates and c:recurse.
* core/vector/table.xml: Likewise.
* test/core/suite.xml: Import new fold test package.
* test/core/vector/fold.xml: New test package.
* vector/fold.xml: New package. Adds `_unfold-vector-grouped_'.
This makes it more unlikely to actually occur in a table lookup;
the previous value worried me.
* vector/filter.xml (TABLE_WHEN_MASK_VALUE): Decrease value.
Products of vectors and matrices respectively. It's surprising that this
was unneeded until now based on the requirements of the projects we have
done thusfar---dot products and other features have been sufficient.
* vector/arithmetic.xml (_vproduct_, _mproduct_): New templates.
(_vproduct, _mproduct): New functions.
This is much more general-purpose and is necessary when operating on more
than one list.
* vector/list.xml: Add numeric/common import, exported.
(_cons-until-empty_): Add @index@, incremented at each recursion.
This uses the GNU Octave or MATLAB-style matrix definitions for tables,
which produces a single node instead of a node per field and row, which
results in a significantly smaller tree and drastically improves processing
time.