Commit Graph

21 Commits (bbe775d870b434f91970f265f5f7599f04b3e920)

Author SHA1 Message Date
Mike Gerwitz e20076235e _table-row_: Performance fix: place table in const/text() instead of const/@values
This is an interesting one.

For some context: TAME uses `csvm` files to provide syntactic sugar for
large tables of values ("rate tables", as they're often called, since they
contain insurance rates and other data).  This gets desugared into a `csv`
which in turn is compiled via `csv2xml` into a package.  That package uses
the `_table-*_` templates to define a table, which is represented as a
matrix using `const/@values`.

Here's an example of a generated table in a package:

```
  <t:create-table name="foo">
        <t:table-rows data="
          1,2,3;
          4,5,6;" />
  </t:create-table>
```

Some of the tables are quite large, generating tens of MiB of data in
`@data`.  This in itself isn't a problem.  But when Saxon parses the `@data`
attribute, it normalizes the whitespace, as mandated by the XML spec, and
removes the newlines.  Therefore, when the template is expanded and the
`xmlo` file is produced, the template produced a `const/@values` with a huge
amount of data on one line.

Then, when another package imports that `xmlo` file via `<import
package="..." />`, which is done via `document()` in XSLT, Saxon takes a
long time to parse it.  60s on my machine for a ~20MiB line.

This problem does not exist for JS fragments; Saxon doesn't mind large text
nodes.  So that is the approach that is taken here.

The template system doesn't have a way to output text yet, so this takes an
approach that minimizes changes as much as possible:

  - `param-copy` will expand `with-param/@value` as a text node.
  - `const/@values="-"` will cause TAME to use the child text node as the
    value of `@values`.
  - `_table-rows_` is modified to use the above two features.

The reason for using `@values="-"` is so that other parts of the compiler do
not have to be modified to recognize the new text convention, which is
otherwise awkward because newlines are text nodes.  The `-` convention comes
from command line programs, which means "read from stdin", generally; this
is okay since `-` is never a valid matrix specification.

This must have been a problem for a very long time, but wasn't all that
noticeable until recent performance optimizations, since so many other things
around it were also slow.

DEV-15131
2023-10-18 11:43:48 -04:00
Mike Gerwitz 954b5a2795 Copyright year and name update
Ryan Specialty Group (RSG) rebranded to Ryan Specialty after its IPO.
2023-01-20 23:37:30 -05:00
Mike Gerwitz 9f98cbf9b4 core: Remove `const/@type`
This has been optional for many years and is not actually used by the
current compiler.  TAMER can infer it, in situations where it actually
matters in the future.

So, rather than adding support for this in the new parser, let's clean up.

DEV-7145
2022-08-15 11:57:45 -04:00
Mike Gerwitz 1ad2fb1dc8 Copyright year update 2022
RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no
"Group"), but I'm not sure if the legal name has been changed yet or not, so
I'll wait on that.
2022-05-03 14:14:29 -04:00
Mike Gerwitz 2ea66f4f97 tame: @encoding="{ISO-8859-1=>utf-8}" for all XML-based files
TAMER rejects this, because we shouldn't be using anything but UTF-8.  My
use of this encoding is ancient, from over a decade ago, that was apparently
just copied around.

DEV-10936
2022-05-02 12:00:42 -04:00
Mike Gerwitz 2e50af1220 Copyright year update 2021 2021-07-22 15:00:15 -04:00
Mike Gerwitz bfea768f89 Copyright year 2020 update 2020-03-06 11:05:18 -05:00
Mike Gerwitz e022a3133d Copyright year simplification and update to Ryan Specialty Group
This now uses year ranges, which I'll update annually.

This also renames "R-T Specialty" to "Ryan Specialty Group".  The latter is
the parent company of the former.  I was originally employed under the
former when LoVullo Associates was purchased, by I now work for the parent
company.
2019-02-07 13:23:09 -05:00
Mike Gerwitz 11109d4361 core: Add _where-*_ query predicate templates
These provide a more pleasent abstraction than having to reference CMP_OP_*
constants.

* core/test/core/vector/interpolate.xml: {t:when=>t:where-eq}.
* core/test/core/vector/table.xml: Likewise, but using the other variants
    where appropriate given the value of `@op'.
* core/vector/interpolate.xml: Likewise.
* core/vector/table.xml (_when_, _where_): Rename former to latter and
    provide deprecation warning.
  (_when-lt_, _when-lte_, _when-gt_, _when-gte_): Add abstractions.
* src/current/rater.xsd: Permit template variable as tenplate name.
2019-02-04 10:22:46 -05:00
Mike Gerwitz 36a3e348b6 core: Add comparison operators for table query predicates
This is fairly primitive support and it completely sidesteps the bisect
algorithm for now.  The next commit will abstract this a little bit further
to make it less awkward to use.

* core/test/core/vector/table.xml: New test cases.
* core/vector/filter.xml (CmpOp): New typedef.
  (mfilter): Document that bisecting will not happen unless `CMP_OP_EQ'
    is used.  Implement that restriction.
    [op]: New parameter.  Provide it to `mrange'.
  (_mfilter, _mrange_cmp): Rename from `_mfilter'.  Implement new comparison
    check based on `op'
    [op]: New argument.
* core/vector/table.xml (_when_)[@op@]: New param.  Add it to the produced
    vector.
  (_mquery): Unpack op (from `_when_') in call to `mfilter'.
2019-02-04 10:22:46 -05:00
Mike Gerwitz 74f8b56fcc Use some modern shorthands for core/vector/{table,filter}
Just trying to clean up a little as I go to start to make it easier
to understand.

* core/vector/filter.xml: Use _when-*_ templates and c:recurse.
* core/vector/table.xml: Likewise.
2019-02-04 10:22:46 -05:00
Mike Gerwitz 73d691273e core: Replace all occurrences of c:{set=>vector}
The former is deprecated and never made any sense at all.
2019-02-01 16:01:56 -05:00
Mike Gerwitz 98f9b6fadb vector/table: Extract bisect functions into vector/filter
* vector/filter.xml (bisect, foremost, _mask-unless_): Add to package.
* vector/table.xml (bisect, foremost, _mask-unless_): Remove from package.
2018-09-11 09:30:52 -04:00
Mike Gerwitz ec7d1c2a24 vector/table: Extract mfilter and range into vector/filter
* vector/filter.xml: New package.
* vector/table.xml (mfilter, _mfilter, range): Extract into vector/filter.
2018-09-11 09:30:52 -04:00
Mike Gerwitz 1fa833eb47 {L=>}GPL
I don't recall why I licensed under the LGPL initially.
2018-09-11 09:30:52 -04:00
Mike Gerwitz 088a948891 Update all copyrights from LoVullo to R-T Specialty 2018-09-11 09:30:51 -04:00
Arthur Domino 49663c5779 Added missing export in table package for numeric/common 2018-09-11 09:30:50 -04:00
Mike Gerwitz 9805eaf755 Support table data definition via _table-rows_/@data@
This uses the GNU Octave or MATLAB-style matrix definitions for tables,
which produces a single node instead of a node per field and row, which
results in a significantly smaller tree and drastically improves processing
time.
2018-09-11 09:30:49 -04:00
Mike Gerwitz a3b6b45df9 LGPL license; copyright notice added to head of each file 2018-09-11 09:30:48 -04:00
Mike Gerwitz fb1416837b Package namespace/imports/decl cleanup 2018-09-11 09:30:48 -04:00
Mike Gerwitz 4ddda94a4c TAME core library extracted from Calc DSL repository
Some notes on this:  The "Calc DSL" is the name of the DSL before it
became "TAME".  This takes the entire core library and squashes its 91
commits into a single one; the reason for this is because those
commits often contain LoVullo-specific details that are either
irrelevant or should not be included.

This library has limited value to the public at the time of this
commit, since TAME has not yet been released (it requires some
additional cleanup and filtering before then).  It is also in need of
heavy refactoring and reorganization, since it has accumulated a lot
of cruft, especially since the project in which the Calc DSL was
introduced was rushed (to put it lightly).  Forgive the mess.

[LoVullo employees: the commit was extracted from dsl.git 4a3aea9;
full history can be found there.  This commit contains some additional
minor tweaks in addition to squashing.  It filters on the :/core/
directory.]
2018-09-11 09:30:48 -04:00