There's a lot of change that's likely going to take place with this thing,
but it's a start. The abstract summarizes the purpose of this---to formally
define TAME in terms of algebra, first-order logic, and [ZFC] set theory.
This came about while working on compiler changes and optimizations, since
it's difficult to ensure correctness (and discover further optimizations)
without being able to formally define the language. The focus at the moment
is the classification system rewrite, which can be expressed in terms of
first order logic and set theory.
This commit contains essentially a POC with some carefully chosen
mathematical foundations (abstractions of which are subject to change) and a
basic representation of a subset of the classification system for scalars.
%.xml{=>o}: %csvo rater/core/vector/table.xmlo
That is: we'll only build an object file when we try to build another object
file. This was causing problems with dependency generation, because it will
triggering compilation early.
This should have been done many years ago. This will determine if any of
the dependencies have changed for the included suppliers.mk and regenerate
it as needed, without the developer having to do so manually when imports
change.
* build-aux/Makefile.am (suppliers.mk): Invoke ant with `-q` to eliminate
"processing" messages for each and every file. This also speeds up
operation slightly.
* build-aux/gen-make: Remove information echos for each file.
These changes will allow for suppliers.mk to be regenerated automatically
without being so invasive.
The intent originally was to try to keep developers to a reasonable name
length, but generated identifiers can easily exceed this, and we further do
not support namespacing.
This can be handled at a template level instead for enforcing naming
conventions.
This had gotten quite out of date from the actual rater.xsd, which existed
outside of this repository, that is used during our build process. That was
an unintended artifact from moving files around.
That file has been removed and symlinked to this one.
Note: this really belongs in liza-proguic, and should be moved in the near
future.
liza-proguic is being modified to generate step-level packages, which are
significantly faster to build than larger ones (XSLT TAME scales
terribly). These changes handle those new dependencies.
One important thing to note with this change is that suppliers.mk now
requires proguic to have run before generation so that those generated
dependencies can be properly examined. This is a quick operation, so that
is not problematic.
This also depends on the .version.xml change that was previously made: when
the timestamp changed every time, we got into an infinite build loop.
First thing to note: this belong in liza-proguic, not here. But it's here
right now, so for now I'm making the change. The relationship between TAME
and proguic is awkward and will hopefully be improved upon in the near
future.
As for this actual change: step-level fragments will be concatenated such
that the imports will appear at the step level rather than the root.
This will be generated automatically by the Makefile. It's not appropriate
to generate in the configure script, and I do not recall why I did
so---possibly to work around the issue of delayed tab completion when it
needs regeneration?
This removes suppmk-gen in favor of more generic Makefile targets---in this
case, having `%.tdat` depend upon `rater/core/tdat.xml`, even though that's
not quite true (the %.xml file generated from it needs it). But these files
are going away soon; a pending TAME optimization branch removes support for
the underlying pattern primitive entirely; CSVMs should be used instead.
The timestamp of the file will now only be updated if the hash (version)
_actually_ changes. This allows this to be used as a target dependency
without forcing a rebuild each and every time.
This solves issues of hitting stack limits, particularly in browsers, when
querying matrices that return a large number of rows for one or more
predicates.
We were still having issues with this function when taking the positive
branch, when predicates cause many matches within tables. This was causing
us to hit stack limits in certain browsers on the Summary Page.
This converts it to an iterator so that all branches are tail-recursive, and
then enables TCO on them.
I was disappointed to find that there's little performance or memory benefit
in running our test suite.
I did say it was _experimental_ guided TRO.
This waits to perform the actual argument reassignment until after
processing the expressions associated with the new arguments, since they
will otherwise be replaced when their original values are still needed.
This change simply prevents failure in such situations, (e.g. on invalidated
fields in Liza). We'll worry about proper errors and correctness, which
ought to be compile-time, in TAMER.
The MathJax CDN stopped working in April 2017. I updated it to the
recommended CDN with the last version from April 2017 to ensure it works
like it used to work before the CDN stopped.
I added the checksum to ensure the content of the script.
This problem manifested when the name of the attempted classification is the
same name as another object. For example, if we have `t:match-class
name="foo"`, and `foo` is a param instead of a class, then `@yields` will
fail, and it'd fall back to matching on the param.
This is absolutely not what we want.
The error message in this context is ugly, but it does work.
Example:
!!! Unknown match @on (/lv:package/lv:classify/match): `error: unable to
determine @yields for class `scheduled_ai' (has the class been imported?)'
is unknown for classification --vis-scheduled-ai-type
This was urgently needed for a project using TAME. Somehow, we've gone
all of these years without a table in which the first predicate is unable to
sufficiently filter out enough results that we do not hit stack limits.
Each recursive step of mrange before inlining and TCO, at the time of
writing, was adding eight stack frames. This is because each let (and many
other things) compile into self-applying functions. Since mrange is invoked
once for every single row for a given value, we quickly run out of stack
space.
For example, consider this table:
1, $a, $b
2, $a, $b
2, $b, $c
2, $c, $d
3, $a, $b
If we were to filter the first column on the value 2, it would first bisect
to find the middle row, backtrack to the first, and then move forward to the
last, producing:
2, $a, $b
2, $b, $c
2, $c, $d
This is at least three mrange calls, for a potential total of 8*3=24 stack
frames, depending on implementation details I don't quite recall at the
moment about the how the query system works.
We had over 1000 rows after applying the first predicate; the stack was
exhausted before it could even reach the last row.
Tail call optimization (TCO) is the process of turning recursive calls in
tail position into jumps. So, rather than the stack growing on a recursive
call, it stays constant. A common way to accomplish this in stack-based
languages is using a trampoline.
In our case, we enclose the entirety of the function in a `do` loop, and
clear a flag indicating that a tail call took place. When we reach a
recursive tail call, we set that flag. Then, instead of invoking the
function again, we _overwrite the original arguments_ with their new
values, and simply return 0. When the function hits the end of the loop, it
will see that the flag is set, and jump back to the beginning of the
function, starting all over with the new values.
Compiling in this functionality is not difficult. Tracking whether a given
call is in tail position, however, is a bit of a pain given how the XSLT
code is currently written. Given that this is all being replaced with
TAMER, it's difficult to stomach making too many changes to the compiler,
when we can do it properly in the future with TAMER. But we need the
feature now.
As a compromise, I call this implementation "guided" TCO---we rely on a
human to indicate that a call is in tail position by setting an experimental
flag manually. That frees us from having to have the compiler do it, but
does create some nasty problems if the human is wrong. Consequently, this
should only be used in core, and people should not use it unless they know
what they're doing.
Using this feature currently outputs a warning---that way, if there are
problems, people have some idea of where they maybe can look. The warning
will be removed in the future after this has been in production for some
time (granted, our test suite passes).
Once again: TAMER will implement proper tail calls automatically, without
the need for a human to intervene.
For more information on tail calls:
- https://en.wikipedia.org/wiki/Tail_call