employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Brandon Ellis	00f46b0032	[DEV-12990] Add gt, gte, lt, lte operators to if/unless This includes updating Tamer's parser to account for the new operator possibilities.	2022-09-22 11:38:06 -04:00
Mike Gerwitz	5edefde201	Makefile (bin): Target to build only binaries Systems utilizing TAME as a build dependency are not interested in everything else that gets built (tests and docs, primarily). DEV-7145	2022-09-07 09:53:44 -04:00
Corey Vollmer	2901f06318	[DEV-9619] Return sha256 This fixes the implementation of sha256 to be compatible with our system.	2022-07-27 12:55:17 -04:00
Corey Vollmer	f667a1a58e	[DEV-9619] Update sha256 script to handle UTF8 This commit replaces the sha256 script with a newer implemention which supports all UTF8 characters. https://github.com/emn178/js-sha256/blob/master/src/sha256.js Note that this commit breaks the system, the following commit fixes this.	2022-07-22 08:46:35 -04:00
Mike Gerwitz	95229916ca	current/compiler/worksheet: Generate lv:package/@name This is present on all other packages. Rather than complicating TAMER to accommodate a missing name, it's trivial to just add it. This will, unfortunately, invalidate and require rebuilding of all xmlo files, based on the `.rev-xmlo` bump. DEV-11864	2022-05-26 10:20:05 -04:00
Mike Gerwitz	0d999b56cd	src/current/summary.xsl: Correct invalid UTF-8 sequence This broke when encoding was set to UTF-8 on this file.	2022-05-04 11:11:02 -04:00
Mike Gerwitz	2954c591a1	src/current/include/preproc/symtable: Remove extern @dtype check I attempted to resolve an error previously, and I thought I had, but apparently some symbols acquire a @dtype at some point in the process, or lose it. Regardless, I have no interest in debugging or resolving this mess, since it's going away. The linker ensures that externs match, so while this could potentially allow conflicting imports within a package (unlikely, given that extern templates are recommended), it still will not resolve with a conflicting concrete implementation. I'm not worried. DEV-1036	2022-05-04 10:50:14 -04:00
Mike Gerwitz	43c99cb61a	src/current/include/preproc/symtable.xsl: Treat mutual missing extern @dtype as match Extern resolution has apparently been failing for quite some time, resulting in `preproc:error` nodes in the _symbol table_ of return maps. This was caught by the new xmlo parser, which does not ignore nodes it does not care about. The failure was caused by missing `@dtype`---the externs did in fact match, and if they did not, then the linker would have failed. This doesn't modify the map compiler to properly detect these, because this compiler is going away in the hopefully-near future, and the problems will now be caught, though in a very unideal way (as a parse error during xmlo reading). DEV-10936	2022-05-04 09:29:29 -04:00
Mike Gerwitz	602cec5560	src/current/compiler/map.xsl: Omit preproc:from from retmap symbols preproc:sym/preproc:from is used for generating `knownFields` using the _input_ map, so this has no use for return map values; the map still produces edges to its dependencies. The issue is that there are return map entries in some of our systems that are producing multiple `preproc:from`, but I somewhat-recently modified the system to support only a single map, to remove dynamic allocation. This resolves that problem. With that said, `knownFields` was created for Liza to know when the classifier ought to be invoked, to save time. Back when it was first introduced ~10y ago, this provided significant savings, however the structure of our system now is such that nearly every single field invokes the classifier. Furthermore, these details should remain encapsulated; if we wanted to make that determination, we should be provided with a delta, which we could also use to do incremental classification in the future, if there's an ROI there after other improvements have been made. So, eventually, preproc:sym/preproc:from will go away entirely. DEV-10936	2022-05-04 09:26:18 -04:00
Mike Gerwitz	1ad2fb1dc8	Copyright year update 2022 RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no "Group"), but I'm not sure if the legal name has been changed yet or not, so I'll wait on that.	2022-05-03 14:14:29 -04:00
Mike Gerwitz	c4828e7e7a	tame: src/current/compiler/worksheet: Place fragments in header The new xmlo parser was failing on a worksheet xmlo file because fragments were not properly placed within the header. This was a change made when tameld was introduced so that we could stop reading xmlo files early. DEV-10936	2022-05-03 09:11:13 -04:00
Mike Gerwitz	2ea66f4f97	tame: @encoding="{ISO-8859-1=>utf-8}" for all XML-based files TAMER rejects this, because we shouldn't be using anything but UTF-8. My use of this encoding is ancient, from over a decade ago, that was apparently just copied around. DEV-10936	2022-05-02 12:00:42 -04:00
Mike Gerwitz	70d1ad17b8	map: Force param/@default in translation to be numeric The default ought to be numeric, always, but until we have the compiler checking for that, I'm going to leave the casting in place. DEV-10484	2022-03-07 12:22:18 -05:00
Mike Gerwitz	054ad9b4c4	map: Properly apply param/@default for translation fallback This was broken by the previous fix, because I had cast to a numeric value before invoking `set_defaults`, which needs the empty string retained so that it knows whether a default ought to be applied. This also ensures that `set_values` will always return a numeric value when that default is applied. DEV-10484	2022-03-07 11:47:58 -05:00
Mike Gerwitz	501a9441a5	map: Produce 0 instead of NaN for non-numeric string values This has been a problem for...ever, but the old classification system (and calculations) had `\|\|0` for ever variable reference, whereas the new one does not; NaNs result in undefined behavior in the new classification system, since those values are not expected to exist. This ought to have automated tests, but it will be rewritten in TAMER. DEV-10484	2022-03-03 13:22:24 -05:00
Mike Gerwitz	297b88c3c1	x/0=0 with global flag for new classification system This was originally my plan with the new classification system, but it was undone because I had hoped to punt on the somewhat controversial issue. Unfortunately, I see no other way. Here I attempt to summarize the reasons why, many of which are specific to the design decisions of TAME. Keep in mind that TAME is a domain-specific language (DSL) for writing insurance rating systems. It should act intuitively for our use case, while still being mathematically sound. If you still aren't convinced, please see the link at the bottom. Target Language Semantics (ECMAScript) -------------------------------------- First: let's establish what happens today. TAME compiles into ECMAScript, which uses IEEE 754-2008 floating-point arithmetic. Here we have: x/0 = Infinity, x > 0; x/0 = -Infinity, x < 0; 0/0 = NaN, x = 0. This is immediately problematic: TAME's calculations must produce concrete real numbers, always. NaN is not valid in its domain, and Infinity is of no practical use in our computational model (TAME is build for insurance rating systems, and one will never have infinite premium). Put plainly: the behavior is undefined in TAME when any of these values are yielded by an expression. Furthermore, we have _three different possible situations_ depending on whether the numerator is positive, negative, or zero. This makes it more difficult to reason about the behavior of the system, for values we do not want in the first place. We then have these issues in ECMAScript: Infinity * 0 = NaN. -Infinity * 0 = NaN. NaN * 0 = NaN. These are of particular concern because of how predicates work in TAME, which will be discussed further below. But it is also problematic because of how it propagates: once you have NaN, you'll always have NaN, unless you break out of the situation with some control structure that avoids using it in an expression at all. Let's now consider predicates: NaN > 0 = false. NaN < 0 = false. NaN === 0 = false. NaN === NaN = false. These will be discussed in terms of classification predicates (matches). We also have issues of serialization: JSON.stringify(Infinity) = "null". JSON.stringify(NaN) = "null". These means that these values are difficult to transfer between systems, even if we wanted them. TAME's Predicates ----------------- TAME has a classification system based on first-order logic, where ⊥ is represented by 0 and ⊤ is represented by 1. These classifications are used as predicates to calculations via the @class attribute of a rate block. For example: <rate-each class="property" generates="propValue" index="k"> <c:quotient> <c:value-of name="buildingTiv" index="k" /> <c:value-of name="tivPropDivisor" index="k" /> </c:quotient> </rate> As can be observed via the Summary Page, this calculation compiles into the following mathematical expression: ∑ₖ(pₖ(tₖ/dₖ)), that is—the quotient is then multiplied by the value of the `property` classification, which is a 0 or 1 respectively for that index. Let's say that tivPropDivisor were defined in this way: <rate-each class="property" generates="tivPropDivisor" index="k"> <!--- ... logic here ... --> </rate> It does not matter what the logic here is. Observe that the predicate here is `property` as well, which means that, if this risk is not a property risk, then `tivPropDivisor` will be `0`. Looking back at `propValue`, let's say that we do have a property risk, and that `buildingTiv` is `[100_000, 200_000]` and `tivPropDivisor` is 1000. We then have: 1(100,000 / 1000) + 1(200,000 / 1000)) = 300. Consider instead what happens if `property` is 0. Since we have no property locations, we have `[0, 0]` as `buildingTiv` and `tivPropDivisor` is 0. 0(0/0) + 0(0/0)) = 0(NaN + NaN) = NaN. This is clearly not what was intended. The predicate is expected to be _strongly_ zero, as if using an Iverson bracket: ((0/0)[0] + (0/0)[0]) = 0. Of course, one option is to redefine TAME such that we use Iverson's convention in place of summation, however this is neither necessary nor desirable given that (a) NaN is not valid within the domain of any TAME expression, and (b) Summation is elegantly generalized and efficiently computed using vector arithmetic and SIMD functions. That is: there's no use in messing with TAME's computational model for a valid that should be impossible to represent. Short-Circuiting Computation ---------------------------- There's another way to look at it, though: that we intended to skip the computation entirely, and so it doesn't matter what the quotient is. If the compiler were smart enough (and maybe one day it will be), it would know that the predicate of `tivPropDivisor` and `propValue` are the same and so there is no circumstance under which we would compute `propValue` and have `tivPropDivisor` be 0. The problem is: that short-circuiting is employed as an _optimization_, and is an implementation detail. Mathematically, the expression is unchanged, and is still invalid within TAME's domain. It is unrepresentable, and so this is not an out. But let's pretend that it was defined that way, which would yield this: { ∑ₖ(pₖ(tₖ/dₖ)), ∀x∈p(x = 1); propValue = < { 0, otherwise. This is the optimization that is employed, but it's still not mathematically correct! What happens if p₀ = 1, but p₁ = 0? Then we have: 1(100,000/1000) + 0(0/0) = 100 + NaN = NaN, but the _intent_ was clearly to have 100 + 0 = 100, and so we return to the original problem once again. Classification Predicates and Intent ------------------------------------ Classifications are used as predicates for equations, but classifications _themselves_ have predicates in the form of _matches_. Consider, for example, a classification that may be used in an assertion to prevent negative premium from being generated: <t:assert failure="premBuilding must not be negative for any index"> <t:match-gte value="premBuilding" value="#0" /> </t:assert> Simple enough—the system will fail if the premium for a given building is below $0. But what happens if premBuilding is calculated as so? <rate-each class="property" yields="premBuildingTotal" generates="premBuilding" index="k"> <c:product> <c:value-of name="propValue" index="k" /> <c:value-of name="propRate" index="k" /> </c:product> </rate-each> Alas, if `property` is false for any index, then we know that `propValue` is NaN, and NaN * x = NaN, and so `premBuilding` is NaN. The above assertion will compile the match into the first-order sentence ∀x∈b(x > 0). Unfortunately, NaN is not greater than, less than, equal to, or any other sort of thing to 0, and so _this assertion will trigger_. This causes practical problems with the `_premium_` template, which has an `@allow-zero@` argument to permit zero premium. Consider this real-world case that I found (variables renamed), to avoid a strawman: <t:premium class="loc" round="cent" yields="locInitialTotal" generates="locInitial" index="k" allow-zero="true" desc="..."> <c:value-of name="premAdditional" /> <c:quotient> <c:value-of name="premLoc" index="k" /> <c:value-of name="premTotal" /> </c:quotient> </t:premium> This appears to be responsible for splitting up `premAdditional` relative to the total premium contribution of each location. It explicitly states that it wants to permit a zero value. The intent of this block is clear: a value of 0 is explicitly permitted and _expected_. But if `premTotal` is for whatever reason 0—whether it be due to a test case or some unexpected input—then it'll yield a NaN and make the entire expression NaN. Or if `premAdditional` or `premLoc` are tainted by a NaN, the same result will occur. The assertion will trigger. And, indeed, this is what I'm seeing with test cases against the new classification system. What about Infinity? Is it intuitive that, should `propValue` in the previous example be positive and `propRate` be 0, that we would, rather than producing a very small value, produce an infinitely large one? Does that match intuition? Remember, this system is a domain-specific language for _our_ purposes—it is not intended to be used to model infinities. For example, say we had this submission because the premium exceeds our authority to write with some carrier: <t:submit reason="Premium exceeds authority"> <t:match-gt name="premBuilding" value="#100k" /> </t:submit> If we had (100,000 / 0) = ∞, then this submit reason would trigger. Surely that was not intended, since we have `property` as a predicate and `propRate` with the same predicate, implying that the answer we _actually_ want is 0! In that case, what we _probably_ want to trigger is something like <rate yields="premFinal"> <t:maxreduce> <c:value-of name="premBuildingTotal" /> <c:value-of name="#500" /> </t:maxreduce> </rate>, in order to apply a minimum premium of $500. But if `premBuildingTotal` is Infinity, then you won't get that—you'll get Infinity, which is of course nonsense. And nevermind -Infinity. Why Wasn't This a Problem Before? --------------------------------- So why bring this up now? Why have we survived a decade without this? We haven't, really—these bugs have been hidden. But the old classification system covered them up; predicates would implicitly treat missing values as 0 by enclosing them in `(x\|\|0)` in the compiled code. Observe this ECMAScript code: NaN \|\| 0 = 0. Consequently, the old classification system absorbed bad values and treated them implicitly as 0. But that was a bug, and had to be removed; it meant that missing indexes in classifications would trigger predicates that were not intended to be triggered, if they matched against 0, or matched against a value less than some number larger than zero. (See `core/test/core/class` for examples.) The new classification system does not perform such defaulting. _But it also does not expect to receive values outside of its valid domain._ Consequently, _NaN and Infinity lead to undefined behavior_, and the current implementation causes the predicate to match (NaN < 0) and therefore fail. The reason for this is because that this implementation is intended to convey precisely the computation necessary for the classification system, as formally defined, so that it can be later optimized even further. Checking for values outside the domain not only should not be necessary, but it would prevent such future optimizations. Furthermore, parameters used to compile into (param\|\|0), to account for missing values or empty strings. This changed somewhat recently with `5a816a4701`, which pre-cast all inputs and allowed relaxing many of those casts since they were both wasteful and no longer necessary. Given that, for all practical purposes, 0/0=0 in the system <1yr ago. Infinity, of course, is a different story, since (Infinity\|\|0)=Infinity; this one has always been a problem. Let's Just Fail --------------- Okay, so we cannot have a valid expression, so let's just fail. We could mean that in two different ways: 1. Fail at runtime if we divide by 0; or 2. Fail at compile-time if we _could_ divide by 0. Both of these have their own challenges. Let's dismiss #2 right off the bat for now, because until we have TAMER, that's not really feasible. We need something today. We will discuss that in the future. For #1—we cannot just throw an error and halt computation, because if the `canterm` flag passed into the system is `false`, then _computation must proceed and return all results_. Terminating classifications are checked after returning rather than throwing errors. Since we have to proceed with computation, then the computations have to be valid, and so we're left with the same problem again—we cannot have undefined behavior. One could argue that, okay, we have undefined behavior, but we're going to fail because of the assertion anyway! That's potentially defensible, but it is at the moment undesirable, because we get so many failures. And, relative to the section below, it's not clear to me what benefit we get from that behavior other than making things more difficult for ourselves. Furthermore, such an assertion would have to be defined for every calculation that performs a quotient, and would have to set some intermediate flag in the calculation which would then have to be checked for after-the-fact. This muddies the generated calculation, which causes problems for optimizations, because it requires peering into state of the calculation that may be hidden or optimized away. If we decide that calculations must be valid because we cannot fail, and we have to stick with the domain of calculations, then `x/0` must be _something_ within that domain. x/0=0 Makes Sense With the Current System ----------------------------------------- Let's take a step back. Consider a developer who is unaware that NaN/Infinity are permitted in the system—they just know that division by zero is a bad thing to do because that's what they learned, and they want to avoid it in their code. Consider that they started with this: <rate-each class="property" generates="propValue" index="k"> <c:quotient> <c:value-of name="buildingTiv" index="k" /> <c:value-of name="tivPropDivisor" index="k" /> </c:quotient> </rate> They have inspected the output of `tivPropDivisor` and see that it is sometimes 0. They understand that `property` is a predicate for the calculation, and so reasonably think that they could do something like this: <classify as="nonzero-tiv-prop-divisor" ...> <t:match-ne on="tivPropDivisor" value="#0" /> </classify> and then change the rate-each to <rate-each class="property nonzero-tiv-prop-divisor" ...>. Except that, of course, we know that will have no effect, because a NaN is a NaN. This is not intuitive. So they'd have to do this: <rate-each class="property" generates="propValue" index="k"> <c:cases> <c:case> <t:when-ne name="tivPropDivisor" value="#0" /> <c:quotient> <c:value-of name="buildingTiv" index="k" /> <c:value-of name="tivPropDivisor" index="k" /> </c:quotient> </c:case> <c:otherwise> <c:value-of name="#0" /> </c:otherwise> </c:cases> </rate>. But for what purpose? What have we gained over simply having x/0=0, which does this for you? The reason why this is so unintuitive is because 0 is the default case in every other part of the system. If something doesn't match a predicate, the value becomes 0. If a value at an index is not defined, it is implicitly zero. A non-matching predicate is 0. This is exploited for reducing values using summation. So the behavior of the system with regards to 0 is always on the mind of the developer. If we add it in another spot, they would think nothing of it. It would be nice if it acted as an identity in a monoidic operation, e.g. as 0 for sums but as 1 for products, but that's not how the system works at all today. And indeed such a thing could be introduced using a special template in place of `c:value-of` that copies the predicates of the referenced value and does the right thing. The _danger_, of course, is that this is _not_ how the system as worked, and so changing the behavior has the risk of breaking something that has relied on undefined behavior for so long. This is indeed a risk, but I have taken some confident in (a) all the test cases for our system pass despite a significant number of x/0=0 being triggered due to limited inputs, and (b) these situations are _not correct today_, resulting in `null` in serialized result data because `JSON.stringify([NaN, Infinity]) === "[null, null]"`. Given all of that, predictable incorrect behavior is better than undefined behavior. So x/0=0 Isn't Bad? ------------------- No, and it's mathematically sound. This decision isn't unprecedented— Coq, Lean, Agda, and other theorem provers define x/0=0. APL originally defined x/0=1, but later switched to 0. Other languages do their own thing depending on what is right for their particular situation. Division is normally derived from a × a⁻¹ = 1, a ≠ 0. We're simply not using that definition—when we say "quotient", or use the `/` symbol, we mean a _different_ function (`div`, in the compiled JS), where we have an _additional_ axiom that a / 0 = 0. And, similarly, 0⁻¹ = 0. So we've taken a _normally undefined_ case and given it a definition. No inconsistency arises. In fact, this makes _sense_ to do, because _this is what we want_. The alternative, as mentioned above, is a lot of boilerplate—checking for 0 any time we want to do division. Complicating the compiler to check for those cases. And so on. It's easier to simple state that, in TAME, quotients have this extra convenient feature whereby you don't have to worry about your denominator being zero because it'll act as though you enclosed it in a case statement, and because of that, all your code continues to operate in an intuitive way. I really recommend reading this blog post regarding the Lean theorem prover: https://xenaproject.wordpress.com/2020/07/05/division-by-zero-in-type-theory-a-faq/	2022-02-28 16:27:51 -05:00
Mike Gerwitz	ce0da76ccf	Improve symbol table processing time preproc:symtable-process-symbols is run on each pass (e.g. during initial processing and after each template expansion) to introduce new symbols into the symbol table from imports and newly discovered symbols. This processing was previously optimized a bit using maps to reduce the cost of symbol table lookups, but the processing was still inefficient, relying on XSLT1-style processing (as originally written) for deduplication. This now uses `for-each-group` and `perform-sort` to offload the expensive computation onto Saxon, which is much more efficient. Symbol table processing has long been a culprit, but I hadn't attempted to optimize further in recent months because of TAMER work. Since TAMER has been on pause for a few months with other things needing my attention, I needed to provide a short-term performance improvement to keep up with increasing build times. DEV-11716	2022-02-22 22:05:07 -05:00
Mike Gerwitz	2e50af1220	Copyright year update 2021	2021-07-22 15:00:15 -04:00
Mike Gerwitz	1f24cfdf25	Remove :map: sym-dep generation This was incorrect to begin with---it does not make sense that an input mapping should depend upon the identifier that it maps to, in the sense that we make use of these dependencies. If we add weak symbol references in the future, then this can be reintroduced. By removing this, we free tameld from having to perform the check itself. .rev-xmlo bumped to force rebuilding of object files since the linker now expects that no such dependencies will exist within them.	2021-07-22 14:27:15 -04:00
Mike Gerwitz	53360548da	tame: Ignore duplicate conjunctive predicates in value list optimization error This can occur in generated code (e.g. from proguic if a question-based predicate inherits a predicate already specified). This commit does not change anything that's emitted; it merely allows proceeding. TAMER can be smarter about this; I don't want to invest more time into generalizing deduplication of predicates.	2021-07-19 14:53:25 -04:00
Mike Gerwitz	2ad0d1425a	compiler: Correct handling of TRUE matches There was a bug whereby TRUE matches would keep whatever value was being matched on, even if it was not a boolean. That was an oversight from the proof-of-concept code, and this fixes it; that's why this is behind a flag! This also adjusts the class aliasing optimization so that it doesn't check for a `TRUE` symbol name, which was a bad idea to begin with. This change also ends up expanding `lv:match[@value="TRUE"]` into the long form, where it didn't previously; this will result in slightly larger xmlo files in some cases, but it's nothing significant, and it does not impact compilation times.	2021-07-15 14:55:32 -04:00
Mike Gerwitz	37977a8816	entry-form.xsl: Correctly generate HTML for params with imported types This is a nearly-10-year-old bug that was introduced when the Summary Page was modified to use the then-new symbol table. The compiler previously concatenated all packages into a single XML tree and processed that, so no package resolution was necessary here before.	2021-07-14 09:59:45 -04:00
Mike Gerwitz	513b8d7b86	worksheet.xsl: Allow package name to auto-generate A long time ago (about a decade), package names were required, but they are now generated by the compiler relative to the root path. The name here was incorrect, which was generating an incorrect path for the linked symbols, which was causing problems with the Summary Page.	2021-07-14 09:51:08 -04:00
Mike Gerwitz	f5ba4b013b	summary: Make Summay Page compiler less chatty It produces a lot of output that either results in spam (internal errors) or pollutes the log with unnecessary information.	2021-07-01 13:54:34 -04:00
Mike Gerwitz	d0e3a5622c	Remove class-level notice for new system This was not intentionally committed.	2021-06-24 09:59:00 -04:00
Mike Gerwitz	e9598b7cb5	Correct short runtime var declarations They were not actually defined before being aliased.	2021-06-23 11:44:36 -04:00
Mike Gerwitz	6f2b4090cd	Correct behavior of matrix matching with separate index sets in new system This behavior was largely correct, but was not commutative if the size of the matrices (rows or columns) was smaller than a following match.	2021-06-23 11:44:36 -04:00
Mike Gerwitz	e90ebd226c	Remove arrow functions from classifier runtime We need to support as far back as IE11, unfortunately, which is ES5.	2021-06-23 11:44:36 -04:00
Mike Gerwitz	934824b2ee	Reintroduce legacy classification system, place new behind flag This largely reintroduces the legacy classification system, but there are a number of things that are not affected by the flag. For example: 1. Alias classifications are still optimized when the flag is off; 2. Classifications without predicates emit slightly different code than before, though their functionality has not changed; 3. There's been a lot of refactoring and minor optimizations that are unaffected by the flag; 4. lv:match/@pattern will now emit a warning; and 5. Cleaning and casting of input data is not gated. This allows us to incrementally migrate to the new system where behavior may be different, but this is admittedly a bit dangerous in that the new system was aggressively tested and reasoned about, so reintroducing the legacy system may combine in unexpected ways.	2021-06-23 11:44:36 -04:00
Mike Gerwitz	7dbb653624	Inline intermediate any/all classifications This is another significant milestone. The next logical step with classification optimization is to inline all of those intermediate classifications generated from any and all blocks, since there are so many of them. This means having the parent classification absorb all dependencies; not output dependencies for the classification; not compile the assignments for those classifications; and to inline them at the match site. They’re used only once, since they’re generated for each individual block. We need to keep the actual classification generation around (and just inline them) for now, probably until TAMER, because we depend upon their symbol for determining their dimensionality, which we need for the optimization work we just did---we must inline them into the proper group (matrix, vector, or scalar). The optimization work done up to this point had inlining in mind---only a little bit of work was needed to make sure that every classification can simply be stripped of its assignment and be a valid expression that can be inlined in place of the original reference. The result of that was predictably significant for the `ui/package` program that I've been testing with: - 4,514 classifications were inlined; - The file size dropped to 7.5MiB (from 8.2MiB previously---remember that we started at 16MiB); and - GC ticks were cut in half, from 67->31. Unfortunately, this optimization added nearly 1m of time to the compilation of that program. Speaking from the future: the UI build optimizations in liza-proguic were introduced to offset this difference (and provide a net gain in performance).	2021-06-23 11:44:36 -04:00
Mike Gerwitz	97caefab1b	Extract classify/@terminate into own template Note that next-match does not cause a return from the template, as odd as it looks.	2021-06-23 11:44:36 -04:00
Mike Gerwitz	1517a03994	Combine all class optimizations into one	2021-06-23 11:44:36 -04:00
Mike Gerwitz	d1dae3e1b1	Explicit types for match raising	2021-06-23 11:44:36 -04:00
Mike Gerwitz	5adf1b7589	Combine all m* optimizations With the recent refactoring, it's clear that these are the same thing.	2021-06-23 11:44:35 -04:00
Mike Gerwitz	a563c3ce62	Remove lv:match checks from class optimization checks We handle all cases now, and prohibited @pattern that wasn't.	2021-06-23 11:44:35 -04:00
Mike Gerwitz	e3fd9388bb	Abstract function wrapping for class type raising This will let us clean up the implementation a bit more.	2021-06-23 11:44:35 -04:00
Mike Gerwitz	10089659b1	Extract lv:classify compilation into function To support following commits for inlining.	2021-06-23 11:44:35 -04:00
Mike Gerwitz	0cd6d40dd9	compiler: Remove whitespace from vector/matrix constants	2021-06-23 11:44:35 -04:00
Mike Gerwitz	9dbda93b4f	{precision=>p} to reduce byte count	2021-06-23 11:44:35 -04:00
Mike Gerwitz	f14417f32a	Remove unused domains var	2021-06-23 11:44:35 -04:00
Mike Gerwitz	e0907c6db2	compiler: Do not output whitespace between nodes	2021-06-23 11:44:35 -04:00
Mike Gerwitz	4ee050323a	Apply hositing optimization to classify/@any This convets disjunctive classifications into conjunctive and places an <any> within it. This ends up handling all the generated qwhen classifications from proguic, which were probably converted into <any> by a previous optimization pass. The UI program I've been using to test these compiler optimizations has decreased in size down from 8.2MiB since the beginning of this branch; we started at ~16MiB.	2021-06-23 11:44:35 -04:00
Mike Gerwitz	658e55f2fa	Hoist any-all common predicate for binary conjunctive classifications See comments. This is meant to help mitigate the damage done by one of our code generation systems. The benefit is significant, allowing the code generator to remain simple. By placing this optimization within the compiler, hand-written and template-generated code also benefit.	2021-06-23 11:44:35 -04:00
Mike Gerwitz	25d500fec5	Generalized value list optimization Note that this was also broken for vectors and scalars by the commit that expanded non-TRUE @value.	2021-06-23 11:44:35 -04:00
Mike Gerwitz	8e457dab34	Strip single-predicate any/all instead of extracting Rather than extracting every any/all into their own classifications, eliminate them (and replace them with their body) if they contain only one predicate. This is most likely to happen after template expansion, and there were an alarming number of them in our system. Stripping them out of one of our programs saved ~0.2MiB of output, and removed many intermediate classifications. It removed ~1,075 lines, which should correspond closely to the actual number of classifications. Discovering this required stripping the template barriers, which was done in a previous commit. Unfortunately, the performance improvement from this wasn't significantly, largely because of the nondeterminisim of GC, which can easily mask the gains. But a new line `v8::internal::FixedArray::set(int, v8::internal::Object)` appeared in the profiler output, making me wonder whether the JIT is starting to understand more interesting properties of the system. `mprotect` and `v8::internal::heap_internals::GenerationalBarrier` also appeared, which are related to GC.	2021-06-23 11:44:35 -04:00
Mike Gerwitz	2d519947f7	Strip template barriers from expanded classifications The barriers deeply frustrate static analysis.	2021-06-23 11:44:35 -04:00
Mike Gerwitz	f8b166a42d	Remove lv:join This is a long-forgotten and long-unused feature that has been long-superceded by symbol table introspection in inline-template.	2021-06-23 11:44:35 -04:00
Mike Gerwitz	c191af8d53	Remove anyValue and related code !!! (Message from the future: this ends up being reintroduced and the new classification system being placed behind a feature toggle. But it will be eliminated eventually.)	2021-06-23 11:44:35 -04:00
Mike Gerwitz	8147bec24f	mvs! This is a major milestone for class optimization---the old anyValue-based system is no longer in use; the classification system has been wholly rewritten. The ticks in the sampling profiler are now where they should be, open to further optimization with a much more solid foundation. [JavaScript]: ticks total nonlib name 5 0.6% 3.0% LazyCompile: vu [...]/ui/package.strip.js:25191:16 5 0.6% 3.0% LazyCompile: M [...]/ui/package.strip.js:25267:15 3 0.4% 1.8% LazyCompile: vmu [...]/ui/package.strip.js:25144:17 3 0.4% 1.8% LazyCompile: ve [...]/ui/package.strip.js:25204:16 2 0.2% 1.2% LazyCompile: precision [...]/ui/package.strip.js:25137:23 2 0.2% 1.2% LazyCompile: me [...]/ui/package.strip.js:25178:16 2 0.2% 1.2% LazyCompile: cmatch [...]/ui/package.strip.js:25495:20 2 0.2% 1.2% LazyCompile: ceq [...]/ui/package.strip.js:25273:17 1 0.1% 0.6% LazyCompile: init_defaults [...]/ui/package.strip.js:25624:27 1 0.1% 0.6% LazyCompile: MM [...]/ui/package.strip.js:25268:16 1 0.1% 0.6% LazyCompile: E [...]/ui/package.strip.js:25239:15 1 0.1% 0.6% LazyCompile: <anonymous> [...]/ui/package.strip.js:25184:13 1 0.1% 0.6% LazyCompile: <anonymous> [...]/ui/package.strip.js:25171:13 Much better than the 102 ticks that anyValue was taking some time ago! A lot of time used to be spent compiling functions as well, a lot of which was removed by previous commits, bringing us to: [C++]: ticks total nonlib name 50 5.9% 30.5% node::contextify::ContextifyContext::CompileFunction(v8::FunctionCallbackInfo<v8::Value> const&) 20 2.4% 12.2% write 9 1.1% 5.5% node::native_module::NativeModuleEnv::CompileFunction(v8::FunctionCallbackInfo<v8::Value> const&) 6 0.7% 3.7% __pthread_cond_timedwait 4 0.5% 2.4% mmap All of this work has simplified the output enough that it's obviated a slew of other optimizations that can be done in future work, though a lot of that may wait for TAMER, since performing them in XSLT will be difficult and not performant; the compiler is slow enough as it is.	2021-06-23 11:44:35 -04:00
Mike Gerwitz	542ff46b6d	mvs0 optimization ...getting close!	2021-06-23 11:44:35 -04:00

1 2 3 4 5 ...

347 Commits (26aaf6efc159002fe028871dfb82f19b5240e075)