This was urgently needed for a project using TAME. Somehow, we've gone
all of these years without a table in which the first predicate is unable to
sufficiently filter out enough results that we do not hit stack limits.
Each recursive step of mrange before inlining and TCO, at the time of
writing, was adding eight stack frames. This is because each let (and many
other things) compile into self-applying functions. Since mrange is invoked
once for every single row for a given value, we quickly run out of stack
space.
For example, consider this table:
1, $a, $b
2, $a, $b
2, $b, $c
2, $c, $d
3, $a, $b
If we were to filter the first column on the value 2, it would first bisect
to find the middle row, backtrack to the first, and then move forward to the
last, producing:
2, $a, $b
2, $b, $c
2, $c, $d
This is at least three mrange calls, for a potential total of 8*3=24 stack
frames, depending on implementation details I don't quite recall at the
moment about the how the query system works.
We had over 1000 rows after applying the first predicate; the stack was
exhausted before it could even reach the last row.
Tail call optimization (TCO) is the process of turning recursive calls in
tail position into jumps. So, rather than the stack growing on a recursive
call, it stays constant. A common way to accomplish this in stack-based
languages is using a trampoline.
In our case, we enclose the entirety of the function in a `do` loop, and
clear a flag indicating that a tail call took place. When we reach a
recursive tail call, we set that flag. Then, instead of invoking the
function again, we _overwrite the original arguments_ with their new
values, and simply return 0. When the function hits the end of the loop, it
will see that the flag is set, and jump back to the beginning of the
function, starting all over with the new values.
Compiling in this functionality is not difficult. Tracking whether a given
call is in tail position, however, is a bit of a pain given how the XSLT
code is currently written. Given that this is all being replaced with
TAMER, it's difficult to stomach making too many changes to the compiler,
when we can do it properly in the future with TAMER. But we need the
feature now.
As a compromise, I call this implementation "guided" TCO---we rely on a
human to indicate that a call is in tail position by setting an experimental
flag manually. That frees us from having to have the compiler do it, but
does create some nasty problems if the human is wrong. Consequently, this
should only be used in core, and people should not use it unless they know
what they're doing.
Using this feature currently outputs a warning---that way, if there are
problems, people have some idea of where they maybe can look. The warning
will be removed in the future after this has been in production for some
time (granted, our test suite passes).
Once again: TAMER will implement proper tail calls automatically, without
the need for a human to intervene.
For more information on tail calls:
- https://en.wikipedia.org/wiki/Tail_call
This implements TCO in the XSLT compiler by requiring a human to manually
indicate when a recursive call is in tail position. This was somewhat
urgently needed to resolve stack exhaustion on large rate tables.
TAMER will do this properly by determining itself whether a call is in tail
position. Until then, this will serve as a test for this type of feature.
This handles moving to another repository structure (our gigarepo) where
this relative path is no longer true. The absolute path generated by this
is okay since it's ephemeral and only used for this build invocation.
This checks explicitly for unresolved objects while sorting and provides an
explicit error for them. For example, this will catch externs that have no
concrete resolution.
This previously fell all the way through to the unreachable! block. The old
POC implementation was catching unresolved objects, albeit with a debug
error.
This will be used for the next commit, but this change has been isolated
both because it distracts from the implementation change in the next commit,
and because it cleans up the code by removing the need for a type parameter
on `AsgError`.
Note that the sort test cases now use `unwrap` instead of having
`{,Sortable}AsgError` support one or the other---this is because that does
not currently happen in practice, and there is not supposed to be a
hierarchy; they are siblings (though perhaps their name may imply otherwise).
The only reason this function was a method of `BaseAsg` was because of
`self.graph`, which is accessible within the scope of this
module. `check_cycles` is logically associated with `SortableAsg`, and so
should exist alongside it (though it can't exist as an associated function
of that trait).
Merge branch 'jira-7504'
* jira-7504:
[DEV-7504] Update RELEASES.md to make it less technical
[DEV-7504] Add cypher script for post-graph import
[DEV-7504] Add make target for "graphml"
[DEV-7504] Add GraphML generation
We want to be able to build a representation of the dependency graph so
we can easily inspect it.
We do not want to make GraphML by default. It is better to use a tool.
We use "petgraph-graphml".
This was never completed and will be able to be deleted entirely, but I
didn't want to lose this history by having it sit out in a branch. Joe is
working on something better.
This begins providing release notes for changes and provides scripts to
facilitate this:
- tools/mkrelease will update RELEASES.md and run some checks.
- build-aux/release-check is intended for use in pipelines (e.g. see
.gitlab-ci.yml) to verify that releases were done properly.
This was originally omitted because there wasn't a use case for it. Now
that we're adding context to errors, however, an owned value is highly
desirable.
This adds almost no measurable overhead to the internment system in
benchmarks (largely within the margin of error).
This is a union (sum type) of three other errors types, plus errors specific
to this builder.
This commit does a good job demonstrating the boilerplate, as well as a need
for additional context (in the case of `IdentKindError`), that we'll want to
work on abstracting away.
The `Debug` bound is inconvenient and requires propagation to any types that
use it. Further, it's really awkward having `Display` depend on `Debug`; if
we want to render a useful display here, we can write one.
To be clear: IndexType implements Debug.
For now, this is pretty-printed by another part of the code, which we don't
want to implement in `Display` because it requires looking things up from
the graph.
This flips the API from using XmloWriter as the context to using Asg and
consuming anything that can produce XmloResults. This not only makes more
sense, but avoids having to create a trait for XmloReader, and simplifies
the trait bounds we have to concern ourselves with.
This just tidies things up a little bit before I get into some further
refactoring. I wrote the original code when I was just learning Rust not
too long ago, so it's interesting to see how my understanding has changed
over that relatively short period of time.
This abstracts away the canonicalizer and solves the problem whereby
canonicalization was not being performed prior to recording whether a path
has been visited. This ensures that multiple relative paths to the same
file will be properly recognized as visited.