employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	abc37ef0cc	depgen: Clean up TeX symbol generation See the `preproc:tex-gen` comment for more information. This retains the existing behavior, cleaning it up quite a bit. This has no impact on performance; it's just continued refactoring to prepare for optimization. DEV-15114	2023-10-19 13:00:39 -04:00
Mike Gerwitz	659a0e71fb	depgen (preproc:symtable): Simplify symbol iteration This just continues to refactor to try to make sense of this code, which has evolved into quite the mess over the years. The two primary goals are to (a) find possible optimizations and (b) make sure how this functions is clear for when it's reimplemented in TAMER. I'm doing this in small commits so that the steps are more obvious. The specific list of attributes is what I found to be output in practice in the `xmlo` files. DEV-15114	2023-10-19 10:56:22 -04:00
Mike Gerwitz	f415e05f31	depgen: Remove lax symbol concept I had to dig through the old repository (prior to extracting into this one) to see why this was introduced. It seems that it was for the linker, and TAMER's linker has no concept of lax symbols, so this is not used. To make matters worse, the code I modified here could not have worked (in depgen) because `$syms/@name` _will always have a value_. Anyway, removing this has no effect on the compiled packages. DEV-15114	2023-10-19 10:56:22 -04:00
Mike Gerwitz	b3f92e0678	depgen: Combine preproc:symtable sym-ref generation cases This does not impact performance, but it makes it less confusing. What a mess this whole thing is. I'll have to incrementally refactor it until it makes sense enough to optimize. For this who don't know, from XSLT 1.0 days: "rtf" means "result tree fragment", before sequences were a thing, and you had to treat generated trees specially. Yeah, old code. DEV-15114	2023-10-19 10:56:22 -04:00
Mike Gerwitz	e20076235e	_table-row_: Performance fix: place table in const/text() instead of const/@values This is an interesting one. For some context: TAME uses `csvm` files to provide syntactic sugar for large tables of values ("rate tables", as they're often called, since they contain insurance rates and other data). This gets desugared into a `csv` which in turn is compiled via `csv2xml` into a package. That package uses the `_table-*_` templates to define a table, which is represented as a matrix using `const/@values`. Here's an example of a generated table in a package: ``` <t:create-table name="foo"> <t:table-rows data=" 1,2,3; 4,5,6;" /> </t:create-table> ``` Some of the tables are quite large, generating tens of MiB of data in `@data`. This in itself isn't a problem. But when Saxon parses the `@data` attribute, it normalizes the whitespace, as mandated by the XML spec, and removes the newlines. Therefore, when the template is expanded and the `xmlo` file is produced, the template produced a `const/@values` with a huge amount of data on one line. Then, when another package imports that `xmlo` file via `<import package="..." />`, which is done via `document()` in XSLT, Saxon takes a long time to parse it. 60s on my machine for a ~20MiB line. This problem does not exist for JS fragments; Saxon doesn't mind large text nodes. So that is the approach that is taken here. The template system doesn't have a way to output text yet, so this takes an approach that minimizes changes as much as possible: - `param-copy` will expand `with-param/@value` as a text node. - `const/@values="-"` will cause TAME to use the child text node as the value of `@values`. - `_table-rows_` is modified to use the above two features. The reason for using `@values="-"` is so that other parts of the compiler do not have to be modified to recognize the new text convention, which is otherwise awkward because newlines are text nodes. The `-` convention comes from command line programs, which means "read from stdin", generally; this is okay since `-` is never a valid matrix specification. This must have been a problem for a very long time, but wasn't all that noticeable until recent performance optimizations, since so many other things around it were also slow. DEV-15131	2023-10-18 11:43:48 -04:00
Mike Gerwitz	b82294b1bd	preproc/symtable (preproc:symtable-complete): Do not re-process imported symbols It's embarrassing how much time this saved on builds. This apparently has always been doing a linear scan on the entire symbol table for _every single param in the symbol table_, including those that were imported. This is not only unnecessary, but has no effect on the end result of the system. This cut build times almost in half, due to the number of symbols in some of our packages. All for unnecessary work. Like most things that have quadratic (or polynomial) time complexity, they don't show up during initial development, and are hard to even profile for, because their effects are so small. Now that our system has grown substantially, it had a massive effect. DEV-15114	2023-10-16 13:45:25 -04:00
Mike Gerwitz	0b04807cfd	prperoc/template: Use symtable map instead of preproc:symtable XPaths This is more of the same, utilizing the map I created previously. The results are pretty significant; this commit and the previous cuts ~3.5m of build time (if done serially) off of our largest system. My goal is to get non-parallelizable portions of our build down to the point where they are no longer the bottleneck. This just about does that. DEV-15095	2023-10-12 16:18:27 -04:00
Mike Gerwitz	7ccf0a0cfa	preproc/expand.xsl: Remove check for injected templates This incurs the cost of a symtable lookup via XPath for a feature that has not been used in a long time (I don't even recall it). DEV-15095	2023-10-12 15:56:02 -04:00
Mike Gerwitz	a8ef1b4fd1	Begin to use symtable-map for template/macro passes I wanted to get this committed before I continue because it required changes to the `expand-sequence` system---tunneling params cannot pass through functions, so this accepts a context to pass back to the calling system via the `eseq:expand-node` override. Otherwise, the key change here is the elimination of a preproc:symtable XPath within a `template/@match`, which was a huge performance problem with the preceding commits. This improves build times modestly, but there are more changes that this sets up for, so I'll keep going. DEV-15095	2023-10-12 15:48:35 -04:00
Mike Gerwitz	50e31f4616	current/compiler/js-calc.xsl: Replace all preproc:symtable XPaths with map This uses the already-available symtable-map to avoid expensive XPaths resulting in (what I assume to be) linear scans of the symbol table. This effectively makes the fragment compilation time vanish. This had the effect of shaving ~4.5m total off of our largest system (if I were to do `-j1`), and a couple minutes when run in parallel. DEV-15095	2023-10-12 12:00:45 -04:00
Mike Gerwitz	b7372fe7cd	current/compiler/js-calc.xsl (compile-calc-value): Drastically reduce matching complexity This takes a bunch of individual templates and combines them into one, while also utilizing the already-available symbol table map in place of using an XPath on `preproc:symtable`. The results are much more drastic than I was expecting. I was exploring this because one of our largest packages was spending most of its time (~5m) in fragment compilation, which was a surprise to me. Prior work I did for runtime optimizations led to optimizations in its parent `js.xsl`, but not in `js-calc.xsl`, which has largely been untouched since it was originally written for XSLT 1 over 10 years ago. Because it was originally written for XLST 1, it does not take advantage of maps, tunneling variables, or various other options. Further, it was written in a naive way that was convenient (and clear) at the time, and wholly acceptable for smaller inputs. But, as is the case with quadratic-time systems, there are severe growing pains. This change reduced the package compilation time from 5-6m down to 1m15s, and this was just the first attempt at optimizing it. I should have taken a look at this long ago, but my efforts were focused on TAMER, and I didn't want to divert that focus. That was a mistake. Symptoms of this problem were already prevalent ~10 months ago, when the package was taking 3 minutes to compile (so the time has since doubled). This also eliminates `@magic`, which has not been used for a long time (it used to be used for a "constant" that held the current date/time; such a value is now passed into the system like any other input). After making this change, the resulting packages are byte-for-byte identical. I also noticed, though I haven't tried to measure it, that there seem to be fewer multi-core spikes; this is possibly related to Saxon not trying to evaluate expensive `template/match` expressions concurrently anymore. If true, this will also help with resource contention for parallel builds. DEV-15095	2023-10-12 11:32:22 -04:00
Mike Gerwitz	b2a996c1df	expand-sequence/expand-group: Retain until hoisting This is a rather small change for quite a bit of effort in researching what was going wrong. It's at last seven rabbit holes deep, or maybe several herd of yaks, depending on your choice of measure and the current conversion rate. The problem can be summarized fair succinctly: `expand-sequence/expand-group` exists to prevent an expansion repass for every single child element of the `expand-sequence`, which would be quadratic. Basically, it restores the usual template expansion process for that set of children. But apparently `expand-group` was stripped on the first pass, which expanded its children inline, which then meant that each of the children were subject to their own individual passes, defeating the purpose of the optimization. As is the nature of quadratic-time processes, that was not noticed until inputs became especially large, and not only that, but were combined with nested `expand-sequence`s. I would say that this never worked the way that I intended it to, but I'm not certain. I was working quite a bit with TeX at the time, so it's possible that I modeled it after `\expandafter`. But that's not an appropriate model for TAME. TAMER will be removing expand-sequence entirely, since it will have enough of an understanding of the source system to determine what requires expansion and what requires ordering (e.g. for symbol table iteration). I'll also be making changes to simplify the process by further restricting what type of iteration can take place. But for the time being, the change was necessary. In our largest systems, this change cut off ~15m total of build time if run serially (at `-j1`). After sorting two runtabs for comparison (e.g. `sort -k4`), you can get the total like so: $ paste <( sort -k4 runtab-a ) <( sort -k4 runtab-b ) \| grep xmlo\$ \ \| cut -f2,5,6 \ \| awk '{ total += ($1 - $2) } END { print total / 1000 }' Similarly, this Awk expression will give the time differences: $ awk '{ print ($1 - $2)/1000, $5 }' Further, the previous commit also introduced a `xmle-sym-cmp` tool to check for differences between xmle symbol tables in an automated way, irrespective of ordering (since there are many valid topological sorts). This revealed that the change fixed a bug (likely because of the forced repass after `expand-group` hoisting) that was causing symbol table introspection to fail to discover symbols in certain cases, which in our case, was resulting in the failure to generate a small number of aggregate classifications correctly. The whole repass system is a concerning mess, but it's not worth the effort to try to redo all of that when that work can be done in TAMER. DEV-15069	2023-10-10 16:16:39 -04:00
Mike Gerwitz	b0eca41c96	Remove legacy classification system flag This should have been cleaned up long ago; it hasn't been used for a couple of years now. DEV-10806	2023-10-05 11:55:24 -04:00
Mike Gerwitz	5e883e3c4f	expand-sequence: Invoke is-expandable only once per expansion head `is-expandable` is an expensive XPath, and it was being invoked twice per node: one for each complementary `match`. DEV-10806	2023-10-05 10:42:59 -04:00
Mike Gerwitz	3d8c4d1ed0	preproc/domain: Eliminate duplicate domain generation I have long forgotten about this system. It converts typedefs into a more generic domain, but the way in which it does so causes duplicate domains, for two reasons: - Both `preproc:mkdomain` and the caller (`preproc:expand`) recurse into unions and generate domains; and - Each `preproc:expand` pass generates domains. So, for example, if there are two `preproc:expand` passes on a union, then the outer typedef (union) will have domains generated twice (once for each pass), and the inner typedefs will have domains generated four times (for each expansion pass, and twice for each pass). This resolves the issue before the next commit makes further changes to move this into a generated header file.	2023-09-28 10:21:52 -04:00
Mike Gerwitz	4e7d202d2d	Remove __rseed random value from XSLT-based compiler This was used before __pkguniq to generate identifiers. Back then, I seemed to think determinism was a problem and that randomness was desirable for helping to ensure uniqueness between packages. That was a mistake; we _want_ a deterministic system (which is far easier to debug and verify the results of), we just want uniqueness. DEV-14965	2023-09-20 12:38:41 -04:00
Mike Gerwitz	418bd34005	tame: Introduce __pkguniq and preproc:pkg-generate-id to replace generate-id This modifies the XSLT-based compiler to generate ids that are expected to be unique across packages. No such guarantee exists today; `generate-id()` relies on the position of the node within a tree, which could easily be the same across multiple compiler invocations for separate packages. This situation seldom occurs, but has happened with increased frequency lately in a system with >1000 packages. It is more likely to occur in packages that are very similar to one-another or where the beginning of the package is similar (such as packages used as configuration for taxes for each individual state). This derives a SHA-256 hash from the canonical package name (well, not canonical acccording to TAMER, but close: without the leading slash), truncating it to 32 bits. I used a birthday attack to estimate what the size of this value ought to be: sqrt(2^32) = 65536, which is way more packages than the poor XSLT-based compiler is going to handle. If ever it needs to be increased due to conflicts, that is simple enough. DEV-14965	2023-09-20 12:33:34 -04:00
Mike Gerwitz	672cc54c14	compiler/js.xsl: Derive supplier name from base package name At or around `00492ace01`, I modified packages to output canonical `@name`s, which contains a leading forward slash. Previously, names omitted that slash. I did not believe that this caused any problems. It seems that the XSLT-based `standalones` system utilizes this package name to derive a supplier name, which is supposed to be the filename of the package without any path. Since the package name changed from `suppliers/foo` to `/suppliers/foo`, for example, this was now producing "suppliers/name" instead of "name". Of course, it was never a good idea to strip off only the first path component. But, this is how it has been since TAME was originally created well over a decade ago. I did not catch this since I was diff'ing the output of the xmle files, not the final JS files. I had thought that was sufficient, given what I was changing, but I was wrong. DEV-14502	2023-06-08 16:46:18 -04:00
Mike Gerwitz	068804b397	tamer: Remove {ret}map:___{head,tail} These have been a pain in the ass since TAMER began. It seemed like a good idea at the time to have static code generated in this way, but the lack of explicit dependencies just makes this a mess and works against the operating theory of the system. Furthermore, the _same_ static fragments were generated for each and every map package. There is still a post-link step (standalones) handled in XSLT; the previously-static code has been moved there. This will eventually be integrated into tameld itself, once TAMER has facilities for JS generation. (This was discovered while trying to parent identifiers to packages.) DEV-13162	2023-04-30 15:06:47 -04:00
Mike Gerwitz	5dd77e7b41	tame: rater.xsd: templateName: Permit multiple leading/trailing underscores This is needed by TAMER's template desugaring. The XSD is superceded by `nir::parse`, but can't go away until TAMER fully supplants the XSLT-based compiler. ...and after all this time, I still never got rid of the duplicate XSD. Or even recall which one is the duplicate. DEV-13708	2023-04-12 14:54:00 -04:00
Mike Gerwitz	2325eb1b2f	tame: preproc/template.xsl: param-copy: Utilize TAMER application convention TAMER desugars shorthand template application bodies (`@values@`) into _the name of a closed template_ whose body should be expanded into place. This change recognizes that convention, and makes use of it. Desugaring is part of `nir::tplshort`. DEV-13708	2023-04-12 14:52:06 -04:00
Mike Gerwitz	954b5a2795	Copyright year and name update Ryan Specialty Group (RSG) rebranded to Ryan Specialty after its IPO.	2023-01-20 23:37:30 -05:00
Brandon Ellis	00f46b0032	[DEV-12990] Add gt, gte, lt, lte operators to if/unless This includes updating Tamer's parser to account for the new operator possibilities.	2022-09-22 11:38:06 -04:00
Mike Gerwitz	5edefde201	Makefile (bin): Target to build only binaries Systems utilizing TAME as a build dependency are not interested in everything else that gets built (tests and docs, primarily). DEV-7145	2022-09-07 09:53:44 -04:00
Corey Vollmer	2901f06318	[DEV-9619] Return sha256 This fixes the implementation of sha256 to be compatible with our system.	2022-07-27 12:55:17 -04:00
Corey Vollmer	f667a1a58e	[DEV-9619] Update sha256 script to handle UTF8 This commit replaces the sha256 script with a newer implemention which supports all UTF8 characters. https://github.com/emn178/js-sha256/blob/master/src/sha256.js Note that this commit breaks the system, the following commit fixes this.	2022-07-22 08:46:35 -04:00
Mike Gerwitz	95229916ca	current/compiler/worksheet: Generate lv:package/@name This is present on all other packages. Rather than complicating TAMER to accommodate a missing name, it's trivial to just add it. This will, unfortunately, invalidate and require rebuilding of all xmlo files, based on the `.rev-xmlo` bump. DEV-11864	2022-05-26 10:20:05 -04:00
Mike Gerwitz	0d999b56cd	src/current/summary.xsl: Correct invalid UTF-8 sequence This broke when encoding was set to UTF-8 on this file.	2022-05-04 11:11:02 -04:00
Mike Gerwitz	2954c591a1	src/current/include/preproc/symtable: Remove extern @dtype check I attempted to resolve an error previously, and I thought I had, but apparently some symbols acquire a @dtype at some point in the process, or lose it. Regardless, I have no interest in debugging or resolving this mess, since it's going away. The linker ensures that externs match, so while this could potentially allow conflicting imports within a package (unlikely, given that extern templates are recommended), it still will not resolve with a conflicting concrete implementation. I'm not worried. DEV-1036	2022-05-04 10:50:14 -04:00
Mike Gerwitz	43c99cb61a	src/current/include/preproc/symtable.xsl: Treat mutual missing extern @dtype as match Extern resolution has apparently been failing for quite some time, resulting in `preproc:error` nodes in the _symbol table_ of return maps. This was caught by the new xmlo parser, which does not ignore nodes it does not care about. The failure was caused by missing `@dtype`---the externs did in fact match, and if they did not, then the linker would have failed. This doesn't modify the map compiler to properly detect these, because this compiler is going away in the hopefully-near future, and the problems will now be caught, though in a very unideal way (as a parse error during xmlo reading). DEV-10936	2022-05-04 09:29:29 -04:00
Mike Gerwitz	602cec5560	src/current/compiler/map.xsl: Omit preproc:from from retmap symbols preproc:sym/preproc:from is used for generating `knownFields` using the _input_ map, so this has no use for return map values; the map still produces edges to its dependencies. The issue is that there are return map entries in some of our systems that are producing multiple `preproc:from`, but I somewhat-recently modified the system to support only a single map, to remove dynamic allocation. This resolves that problem. With that said, `knownFields` was created for Liza to know when the classifier ought to be invoked, to save time. Back when it was first introduced ~10y ago, this provided significant savings, however the structure of our system now is such that nearly every single field invokes the classifier. Furthermore, these details should remain encapsulated; if we wanted to make that determination, we should be provided with a delta, which we could also use to do incremental classification in the future, if there's an ROI there after other improvements have been made. So, eventually, preproc:sym/preproc:from will go away entirely. DEV-10936	2022-05-04 09:26:18 -04:00
Mike Gerwitz	1ad2fb1dc8	Copyright year update 2022 RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no "Group"), but I'm not sure if the legal name has been changed yet or not, so I'll wait on that.	2022-05-03 14:14:29 -04:00
Mike Gerwitz	c4828e7e7a	tame: src/current/compiler/worksheet: Place fragments in header The new xmlo parser was failing on a worksheet xmlo file because fragments were not properly placed within the header. This was a change made when tameld was introduced so that we could stop reading xmlo files early. DEV-10936	2022-05-03 09:11:13 -04:00
Mike Gerwitz	2ea66f4f97	tame: @encoding="{ISO-8859-1=>utf-8}" for all XML-based files TAMER rejects this, because we shouldn't be using anything but UTF-8. My use of this encoding is ancient, from over a decade ago, that was apparently just copied around. DEV-10936	2022-05-02 12:00:42 -04:00
Mike Gerwitz	70d1ad17b8	map: Force param/@default in translation to be numeric The default ought to be numeric, always, but until we have the compiler checking for that, I'm going to leave the casting in place. DEV-10484	2022-03-07 12:22:18 -05:00
Mike Gerwitz	054ad9b4c4	map: Properly apply param/@default for translation fallback This was broken by the previous fix, because I had cast to a numeric value before invoking `set_defaults`, which needs the empty string retained so that it knows whether a default ought to be applied. This also ensures that `set_values` will always return a numeric value when that default is applied. DEV-10484	2022-03-07 11:47:58 -05:00
Mike Gerwitz	501a9441a5	map: Produce 0 instead of NaN for non-numeric string values This has been a problem for...ever, but the old classification system (and calculations) had `\|\|0` for ever variable reference, whereas the new one does not; NaNs result in undefined behavior in the new classification system, since those values are not expected to exist. This ought to have automated tests, but it will be rewritten in TAMER. DEV-10484	2022-03-03 13:22:24 -05:00
Mike Gerwitz	297b88c3c1	x/0=0 with global flag for new classification system This was originally my plan with the new classification system, but it was undone because I had hoped to punt on the somewhat controversial issue. Unfortunately, I see no other way. Here I attempt to summarize the reasons why, many of which are specific to the design decisions of TAME. Keep in mind that TAME is a domain-specific language (DSL) for writing insurance rating systems. It should act intuitively for our use case, while still being mathematically sound. If you still aren't convinced, please see the link at the bottom. Target Language Semantics (ECMAScript) -------------------------------------- First: let's establish what happens today. TAME compiles into ECMAScript, which uses IEEE 754-2008 floating-point arithmetic. Here we have: x/0 = Infinity, x > 0; x/0 = -Infinity, x < 0; 0/0 = NaN, x = 0. This is immediately problematic: TAME's calculations must produce concrete real numbers, always. NaN is not valid in its domain, and Infinity is of no practical use in our computational model (TAME is build for insurance rating systems, and one will never have infinite premium). Put plainly: the behavior is undefined in TAME when any of these values are yielded by an expression. Furthermore, we have _three different possible situations_ depending on whether the numerator is positive, negative, or zero. This makes it more difficult to reason about the behavior of the system, for values we do not want in the first place. We then have these issues in ECMAScript: Infinity * 0 = NaN. -Infinity * 0 = NaN. NaN * 0 = NaN. These are of particular concern because of how predicates work in TAME, which will be discussed further below. But it is also problematic because of how it propagates: once you have NaN, you'll always have NaN, unless you break out of the situation with some control structure that avoids using it in an expression at all. Let's now consider predicates: NaN > 0 = false. NaN < 0 = false. NaN === 0 = false. NaN === NaN = false. These will be discussed in terms of classification predicates (matches). We also have issues of serialization: JSON.stringify(Infinity) = "null". JSON.stringify(NaN) = "null". These means that these values are difficult to transfer between systems, even if we wanted them. TAME's Predicates ----------------- TAME has a classification system based on first-order logic, where ⊥ is represented by 0 and ⊤ is represented by 1. These classifications are used as predicates to calculations via the @class attribute of a rate block. For example: <rate-each class="property" generates="propValue" index="k"> <c:quotient> <c:value-of name="buildingTiv" index="k" /> <c:value-of name="tivPropDivisor" index="k" /> </c:quotient> </rate> As can be observed via the Summary Page, this calculation compiles into the following mathematical expression: ∑ₖ(pₖ(tₖ/dₖ)), that is—the quotient is then multiplied by the value of the `property` classification, which is a 0 or 1 respectively for that index. Let's say that tivPropDivisor were defined in this way: <rate-each class="property" generates="tivPropDivisor" index="k"> <!--- ... logic here ... --> </rate> It does not matter what the logic here is. Observe that the predicate here is `property` as well, which means that, if this risk is not a property risk, then `tivPropDivisor` will be `0`. Looking back at `propValue`, let's say that we do have a property risk, and that `buildingTiv` is `[100_000, 200_000]` and `tivPropDivisor` is 1000. We then have: 1(100,000 / 1000) + 1(200,000 / 1000)) = 300. Consider instead what happens if `property` is 0. Since we have no property locations, we have `[0, 0]` as `buildingTiv` and `tivPropDivisor` is 0. 0(0/0) + 0(0/0)) = 0(NaN + NaN) = NaN. This is clearly not what was intended. The predicate is expected to be _strongly_ zero, as if using an Iverson bracket: ((0/0)[0] + (0/0)[0]) = 0. Of course, one option is to redefine TAME such that we use Iverson's convention in place of summation, however this is neither necessary nor desirable given that (a) NaN is not valid within the domain of any TAME expression, and (b) Summation is elegantly generalized and efficiently computed using vector arithmetic and SIMD functions. That is: there's no use in messing with TAME's computational model for a valid that should be impossible to represent. Short-Circuiting Computation ---------------------------- There's another way to look at it, though: that we intended to skip the computation entirely, and so it doesn't matter what the quotient is. If the compiler were smart enough (and maybe one day it will be), it would know that the predicate of `tivPropDivisor` and `propValue` are the same and so there is no circumstance under which we would compute `propValue` and have `tivPropDivisor` be 0. The problem is: that short-circuiting is employed as an _optimization_, and is an implementation detail. Mathematically, the expression is unchanged, and is still invalid within TAME's domain. It is unrepresentable, and so this is not an out. But let's pretend that it was defined that way, which would yield this: { ∑ₖ(pₖ(tₖ/dₖ)), ∀x∈p(x = 1); propValue = < { 0, otherwise. This is the optimization that is employed, but it's still not mathematically correct! What happens if p₀ = 1, but p₁ = 0? Then we have: 1(100,000/1000) + 0(0/0) = 100 + NaN = NaN, but the _intent_ was clearly to have 100 + 0 = 100, and so we return to the original problem once again. Classification Predicates and Intent ------------------------------------ Classifications are used as predicates for equations, but classifications _themselves_ have predicates in the form of _matches_. Consider, for example, a classification that may be used in an assertion to prevent negative premium from being generated: <t:assert failure="premBuilding must not be negative for any index"> <t:match-gte value="premBuilding" value="#0" /> </t:assert> Simple enough—the system will fail if the premium for a given building is below $0. But what happens if premBuilding is calculated as so? <rate-each class="property" yields="premBuildingTotal" generates="premBuilding" index="k"> <c:product> <c:value-of name="propValue" index="k" /> <c:value-of name="propRate" index="k" /> </c:product> </rate-each> Alas, if `property` is false for any index, then we know that `propValue` is NaN, and NaN * x = NaN, and so `premBuilding` is NaN. The above assertion will compile the match into the first-order sentence ∀x∈b(x > 0). Unfortunately, NaN is not greater than, less than, equal to, or any other sort of thing to 0, and so _this assertion will trigger_. This causes practical problems with the `_premium_` template, which has an `@allow-zero@` argument to permit zero premium. Consider this real-world case that I found (variables renamed), to avoid a strawman: <t:premium class="loc" round="cent" yields="locInitialTotal" generates="locInitial" index="k" allow-zero="true" desc="..."> <c:value-of name="premAdditional" /> <c:quotient> <c:value-of name="premLoc" index="k" /> <c:value-of name="premTotal" /> </c:quotient> </t:premium> This appears to be responsible for splitting up `premAdditional` relative to the total premium contribution of each location. It explicitly states that it wants to permit a zero value. The intent of this block is clear: a value of 0 is explicitly permitted and _expected_. But if `premTotal` is for whatever reason 0—whether it be due to a test case or some unexpected input—then it'll yield a NaN and make the entire expression NaN. Or if `premAdditional` or `premLoc` are tainted by a NaN, the same result will occur. The assertion will trigger. And, indeed, this is what I'm seeing with test cases against the new classification system. What about Infinity? Is it intuitive that, should `propValue` in the previous example be positive and `propRate` be 0, that we would, rather than producing a very small value, produce an infinitely large one? Does that match intuition? Remember, this system is a domain-specific language for _our_ purposes—it is not intended to be used to model infinities. For example, say we had this submission because the premium exceeds our authority to write with some carrier: <t:submit reason="Premium exceeds authority"> <t:match-gt name="premBuilding" value="#100k" /> </t:submit> If we had (100,000 / 0) = ∞, then this submit reason would trigger. Surely that was not intended, since we have `property` as a predicate and `propRate` with the same predicate, implying that the answer we _actually_ want is 0! In that case, what we _probably_ want to trigger is something like <rate yields="premFinal"> <t:maxreduce> <c:value-of name="premBuildingTotal" /> <c:value-of name="#500" /> </t:maxreduce> </rate>, in order to apply a minimum premium of $500. But if `premBuildingTotal` is Infinity, then you won't get that—you'll get Infinity, which is of course nonsense. And nevermind -Infinity. Why Wasn't This a Problem Before? --------------------------------- So why bring this up now? Why have we survived a decade without this? We haven't, really—these bugs have been hidden. But the old classification system covered them up; predicates would implicitly treat missing values as 0 by enclosing them in `(x\|\|0)` in the compiled code. Observe this ECMAScript code: NaN \|\| 0 = 0. Consequently, the old classification system absorbed bad values and treated them implicitly as 0. But that was a bug, and had to be removed; it meant that missing indexes in classifications would trigger predicates that were not intended to be triggered, if they matched against 0, or matched against a value less than some number larger than zero. (See `core/test/core/class` for examples.) The new classification system does not perform such defaulting. _But it also does not expect to receive values outside of its valid domain._ Consequently, _NaN and Infinity lead to undefined behavior_, and the current implementation causes the predicate to match (NaN < 0) and therefore fail. The reason for this is because that this implementation is intended to convey precisely the computation necessary for the classification system, as formally defined, so that it can be later optimized even further. Checking for values outside the domain not only should not be necessary, but it would prevent such future optimizations. Furthermore, parameters used to compile into (param\|\|0), to account for missing values or empty strings. This changed somewhat recently with `5a816a4701`, which pre-cast all inputs and allowed relaxing many of those casts since they were both wasteful and no longer necessary. Given that, for all practical purposes, 0/0=0 in the system <1yr ago. Infinity, of course, is a different story, since (Infinity\|\|0)=Infinity; this one has always been a problem. Let's Just Fail --------------- Okay, so we cannot have a valid expression, so let's just fail. We could mean that in two different ways: 1. Fail at runtime if we divide by 0; or 2. Fail at compile-time if we _could_ divide by 0. Both of these have their own challenges. Let's dismiss #2 right off the bat for now, because until we have TAMER, that's not really feasible. We need something today. We will discuss that in the future. For #1—we cannot just throw an error and halt computation, because if the `canterm` flag passed into the system is `false`, then _computation must proceed and return all results_. Terminating classifications are checked after returning rather than throwing errors. Since we have to proceed with computation, then the computations have to be valid, and so we're left with the same problem again—we cannot have undefined behavior. One could argue that, okay, we have undefined behavior, but we're going to fail because of the assertion anyway! That's potentially defensible, but it is at the moment undesirable, because we get so many failures. And, relative to the section below, it's not clear to me what benefit we get from that behavior other than making things more difficult for ourselves. Furthermore, such an assertion would have to be defined for every calculation that performs a quotient, and would have to set some intermediate flag in the calculation which would then have to be checked for after-the-fact. This muddies the generated calculation, which causes problems for optimizations, because it requires peering into state of the calculation that may be hidden or optimized away. If we decide that calculations must be valid because we cannot fail, and we have to stick with the domain of calculations, then `x/0` must be _something_ within that domain. x/0=0 Makes Sense With the Current System ----------------------------------------- Let's take a step back. Consider a developer who is unaware that NaN/Infinity are permitted in the system—they just know that division by zero is a bad thing to do because that's what they learned, and they want to avoid it in their code. Consider that they started with this: <rate-each class="property" generates="propValue" index="k"> <c:quotient> <c:value-of name="buildingTiv" index="k" /> <c:value-of name="tivPropDivisor" index="k" /> </c:quotient> </rate> They have inspected the output of `tivPropDivisor` and see that it is sometimes 0. They understand that `property` is a predicate for the calculation, and so reasonably think that they could do something like this: <classify as="nonzero-tiv-prop-divisor" ...> <t:match-ne on="tivPropDivisor" value="#0" /> </classify> and then change the rate-each to <rate-each class="property nonzero-tiv-prop-divisor" ...>. Except that, of course, we know that will have no effect, because a NaN is a NaN. This is not intuitive. So they'd have to do this: <rate-each class="property" generates="propValue" index="k"> <c:cases> <c:case> <t:when-ne name="tivPropDivisor" value="#0" /> <c:quotient> <c:value-of name="buildingTiv" index="k" /> <c:value-of name="tivPropDivisor" index="k" /> </c:quotient> </c:case> <c:otherwise> <c:value-of name="#0" /> </c:otherwise> </c:cases> </rate>. But for what purpose? What have we gained over simply having x/0=0, which does this for you? The reason why this is so unintuitive is because 0 is the default case in every other part of the system. If something doesn't match a predicate, the value becomes 0. If a value at an index is not defined, it is implicitly zero. A non-matching predicate is 0. This is exploited for reducing values using summation. So the behavior of the system with regards to 0 is always on the mind of the developer. If we add it in another spot, they would think nothing of it. It would be nice if it acted as an identity in a monoidic operation, e.g. as 0 for sums but as 1 for products, but that's not how the system works at all today. And indeed such a thing could be introduced using a special template in place of `c:value-of` that copies the predicates of the referenced value and does the right thing. The _danger_, of course, is that this is _not_ how the system as worked, and so changing the behavior has the risk of breaking something that has relied on undefined behavior for so long. This is indeed a risk, but I have taken some confident in (a) all the test cases for our system pass despite a significant number of x/0=0 being triggered due to limited inputs, and (b) these situations are _not correct today_, resulting in `null` in serialized result data because `JSON.stringify([NaN, Infinity]) === "[null, null]"`. Given all of that, predictable incorrect behavior is better than undefined behavior. So x/0=0 Isn't Bad? ------------------- No, and it's mathematically sound. This decision isn't unprecedented— Coq, Lean, Agda, and other theorem provers define x/0=0. APL originally defined x/0=1, but later switched to 0. Other languages do their own thing depending on what is right for their particular situation. Division is normally derived from a × a⁻¹ = 1, a ≠ 0. We're simply not using that definition—when we say "quotient", or use the `/` symbol, we mean a _different_ function (`div`, in the compiled JS), where we have an _additional_ axiom that a / 0 = 0. And, similarly, 0⁻¹ = 0. So we've taken a _normally undefined_ case and given it a definition. No inconsistency arises. In fact, this makes _sense_ to do, because _this is what we want_. The alternative, as mentioned above, is a lot of boilerplate—checking for 0 any time we want to do division. Complicating the compiler to check for those cases. And so on. It's easier to simple state that, in TAME, quotients have this extra convenient feature whereby you don't have to worry about your denominator being zero because it'll act as though you enclosed it in a case statement, and because of that, all your code continues to operate in an intuitive way. I really recommend reading this blog post regarding the Lean theorem prover: https://xenaproject.wordpress.com/2020/07/05/division-by-zero-in-type-theory-a-faq/	2022-02-28 16:27:51 -05:00
Mike Gerwitz	ce0da76ccf	Improve symbol table processing time preproc:symtable-process-symbols is run on each pass (e.g. during initial processing and after each template expansion) to introduce new symbols into the symbol table from imports and newly discovered symbols. This processing was previously optimized a bit using maps to reduce the cost of symbol table lookups, but the processing was still inefficient, relying on XSLT1-style processing (as originally written) for deduplication. This now uses `for-each-group` and `perform-sort` to offload the expensive computation onto Saxon, which is much more efficient. Symbol table processing has long been a culprit, but I hadn't attempted to optimize further in recent months because of TAMER work. Since TAMER has been on pause for a few months with other things needing my attention, I needed to provide a short-term performance improvement to keep up with increasing build times. DEV-11716	2022-02-22 22:05:07 -05:00
Mike Gerwitz	2e50af1220	Copyright year update 2021	2021-07-22 15:00:15 -04:00
Mike Gerwitz	1f24cfdf25	Remove :map: sym-dep generation This was incorrect to begin with---it does not make sense that an input mapping should depend upon the identifier that it maps to, in the sense that we make use of these dependencies. If we add weak symbol references in the future, then this can be reintroduced. By removing this, we free tameld from having to perform the check itself. .rev-xmlo bumped to force rebuilding of object files since the linker now expects that no such dependencies will exist within them.	2021-07-22 14:27:15 -04:00
Mike Gerwitz	53360548da	tame: Ignore duplicate conjunctive predicates in value list optimization error This can occur in generated code (e.g. from proguic if a question-based predicate inherits a predicate already specified). This commit does not change anything that's emitted; it merely allows proceeding. TAMER can be smarter about this; I don't want to invest more time into generalizing deduplication of predicates.	2021-07-19 14:53:25 -04:00
Mike Gerwitz	2ad0d1425a	compiler: Correct handling of TRUE matches There was a bug whereby TRUE matches would keep whatever value was being matched on, even if it was not a boolean. That was an oversight from the proof-of-concept code, and this fixes it; that's why this is behind a flag! This also adjusts the class aliasing optimization so that it doesn't check for a `TRUE` symbol name, which was a bad idea to begin with. This change also ends up expanding `lv:match[@value="TRUE"]` into the long form, where it didn't previously; this will result in slightly larger xmlo files in some cases, but it's nothing significant, and it does not impact compilation times.	2021-07-15 14:55:32 -04:00
Mike Gerwitz	37977a8816	entry-form.xsl: Correctly generate HTML for params with imported types This is a nearly-10-year-old bug that was introduced when the Summary Page was modified to use the then-new symbol table. The compiler previously concatenated all packages into a single XML tree and processed that, so no package resolution was necessary here before.	2021-07-14 09:59:45 -04:00
Mike Gerwitz	513b8d7b86	worksheet.xsl: Allow package name to auto-generate A long time ago (about a decade), package names were required, but they are now generated by the compiler relative to the root path. The name here was incorrect, which was generating an incorrect path for the linked symbols, which was causing problems with the Summary Page.	2021-07-14 09:51:08 -04:00
Mike Gerwitz	f5ba4b013b	summary: Make Summay Page compiler less chatty It produces a lot of output that either results in spam (internal errors) or pollutes the log with unnecessary information.	2021-07-01 13:54:34 -04:00
Mike Gerwitz	d0e3a5622c	Remove class-level notice for new system This was not intentionally committed.	2021-06-24 09:59:00 -04:00
Mike Gerwitz	e9598b7cb5	Correct short runtime var declarations They were not actually defined before being aliased.	2021-06-23 11:44:36 -04:00
Mike Gerwitz	6f2b4090cd	Correct behavior of matrix matching with separate index sets in new system This behavior was largely correct, but was not commutative if the size of the matrices (rows or columns) was smaller than a following match.	2021-06-23 11:44:36 -04:00
Mike Gerwitz	e90ebd226c	Remove arrow functions from classifier runtime We need to support as far back as IE11, unfortunately, which is ES5.	2021-06-23 11:44:36 -04:00

1 2 3 4 5 ...

369 Commits (abc37ef0ccd37fbf4970c81d9c8689b464a94257)