This is an interesting one.
For some context: TAME uses `csvm` files to provide syntactic sugar for
large tables of values ("rate tables", as they're often called, since they
contain insurance rates and other data). This gets desugared into a `csv`
which in turn is compiled via `csv2xml` into a package. That package uses
the `_table-*_` templates to define a table, which is represented as a
matrix using `const/@values`.
Here's an example of a generated table in a package:
```
<t:create-table name="foo">
<t:table-rows data="
1,2,3;
4,5,6;" />
</t:create-table>
```
Some of the tables are quite large, generating tens of MiB of data in
`@data`. This in itself isn't a problem. But when Saxon parses the `@data`
attribute, it normalizes the whitespace, as mandated by the XML spec, and
removes the newlines. Therefore, when the template is expanded and the
`xmlo` file is produced, the template produced a `const/@values` with a huge
amount of data on one line.
Then, when another package imports that `xmlo` file via `<import
package="..." />`, which is done via `document()` in XSLT, Saxon takes a
long time to parse it. 60s on my machine for a ~20MiB line.
This problem does not exist for JS fragments; Saxon doesn't mind large text
nodes. So that is the approach that is taken here.
The template system doesn't have a way to output text yet, so this takes an
approach that minimizes changes as much as possible:
- `param-copy` will expand `with-param/@value` as a text node.
- `const/@values="-"` will cause TAME to use the child text node as the
value of `@values`.
- `_table-rows_` is modified to use the above two features.
The reason for using `@values="-"` is so that other parts of the compiler do
not have to be modified to recognize the new text convention, which is
otherwise awkward because newlines are text nodes. The `-` convention comes
from command line programs, which means "read from stdin", generally; this
is okay since `-` is never a valid matrix specification.
This must have been a problem for a very long time, but wasn't all that
noticeable until recent performance optimizations, since so many other things
around it were also slow.
DEV-15131
This has been optional for many years and is not actually used by the
current compiler. TAMER can infer it, in situations where it actually
matters in the future.
So, rather than adding support for this in the new parser, let's clean up.
DEV-7145
RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no
"Group"), but I'm not sure if the legal name has been changed yet or not, so
I'll wait on that.
TAMER rejects this, because we shouldn't be using anything but UTF-8. My
use of this encoding is ancient, from over a decade ago, that was apparently
just copied around.
DEV-10936
This now uses year ranges, which I'll update annually.
This also renames "R-T Specialty" to "Ryan Specialty Group". The latter is
the parent company of the former. I was originally employed under the
former when LoVullo Associates was purchased, by I now work for the parent
company.
These provide a more pleasent abstraction than having to reference CMP_OP_*
constants.
* core/test/core/vector/interpolate.xml: {t:when=>t:where-eq}.
* core/test/core/vector/table.xml: Likewise, but using the other variants
where appropriate given the value of `@op'.
* core/vector/interpolate.xml: Likewise.
* core/vector/table.xml (_when_, _where_): Rename former to latter and
provide deprecation warning.
(_when-lt_, _when-lte_, _when-gt_, _when-gte_): Add abstractions.
* src/current/rater.xsd: Permit template variable as tenplate name.
This is fairly primitive support and it completely sidesteps the bisect
algorithm for now. The next commit will abstract this a little bit further
to make it less awkward to use.
* core/test/core/vector/table.xml: New test cases.
* core/vector/filter.xml (CmpOp): New typedef.
(mfilter): Document that bisecting will not happen unless `CMP_OP_EQ'
is used. Implement that restriction.
[op]: New parameter. Provide it to `mrange'.
(_mfilter, _mrange_cmp): Rename from `_mfilter'. Implement new comparison
check based on `op'
[op]: New argument.
* core/vector/table.xml (_when_)[@op@]: New param. Add it to the produced
vector.
(_mquery): Unpack op (from `_when_') in call to `mfilter'.
Just trying to clean up a little as I go to start to make it easier
to understand.
* core/vector/filter.xml: Use _when-*_ templates and c:recurse.
* core/vector/table.xml: Likewise.
This uses the GNU Octave or MATLAB-style matrix definitions for tables,
which produces a single node instead of a node per field and row, which
results in a significantly smaller tree and drastically improves processing
time.
Some notes on this: The "Calc DSL" is the name of the DSL before it
became "TAME". This takes the entire core library and squashes its 91
commits into a single one; the reason for this is because those
commits often contain LoVullo-specific details that are either
irrelevant or should not be included.
This library has limited value to the public at the time of this
commit, since TAME has not yet been released (it requires some
additional cleanup and filtering before then). It is also in need of
heavy refactoring and reorganization, since it has accumulated a lot
of cruft, especially since the project in which the Calc DSL was
introduced was rushed (to put it lightly). Forgive the mess.
[LoVullo employees: the commit was extracted from dsl.git 4a3aea9;
full history can be found there. This commit contains some additional
minor tweaks in addition to squashing. It filters on the :/core/
directory.]