tame/tamer
Mike Gerwitz b1c0783c75 tamer: xir::XirString: WIP implementation (likely going away)
I'm not fond of this implementation, which is why it's not fully
completed.  I wanted to commit this for future reference, and take the
opportunity to explain why I don't like it.

First: this task started as an idea to implement a third variant to
AttrValue and friends that indicates that a value is fixed, in the sense of
a fixed-point function: escaped or unescaped, its value is the same.  This
would allow us to skip wasteful escape/unescape operations.

In doing so, it became obvious that there's no need to leak this information
through the API, and indeed, no part of the system should care.  When we
read XML, it should be unescaped, and when we write, it should be
escaped.  The reason that this didn't quite happen to begin with was an
optimization: I'll be creating an echo writer in place of the current
filesystem-based copy in tamec shortly, and this would allow streaming XIR
directly from the reader to the writer without any unescaping or
re-escaping.

When we unescape, we know the value that it came from, so we could simply
store both symbols---they're 32-bit, so it results in a nicely compressed
64-bit value, so it's essentially cost-free, as long as we accept the
expense of internment.  This is `XirString`.  Then, when we want to escape
or unescape, we first check to see whether a symbol already exists and, if
so, use it.

While this works well for echoing streams, it won't work all that well in
practice: the unescaped SymbolId will be taken and the XirString discarded,
since nothing after XIR should be coupled with it.  Then, when we later
construct a XIR stream for writting, XirString will no longer be available
and our previously known escape is lost, so the writer will have to
re-escape.

Further, if we look at XirString's generic for the XirStringEscaper---it
uses phantom, which hints that maybe it's not in the best place.  Indeed,
I've already acknowledged that only a reader unescapes and only a writer
escapes, and that the rest of the system works with normal (unescaped)
values, so only readers and writers should be part of this process.  I also
already acknowledged that XirString would be lost and only the unescaped
SymbolId would be used.

So what's the point of XirString, then, if it won't be a useful optimization
beyond the temporary echo writer?

Instead, we can take the XirStringWriter and implement two caches on that:
mapping SymbolId from escaped->unescaped and vice-versa.  These can be
simple vectors, since SymbolId is a 32-bit value we will not have much
wasted space for symbols that never get read or written.  We could even
optimize for preinterned symbols using markers, though I'll probably not do
so, and I'll explain why later.

If we do _that_, we get even _better_ optimizations through caching that
_will_ apply in the general case (so, not just for echo), and we're able to
ditch XirString entirely and simply use a SymbolId.  This makes for a much
more friendly API that isn't leaking implementation details, though it
_does_ put an onus on the caller to pass the encoder to both the reader and
the writer, _if_ it wants to take advantage of a cache.  But that burden is
not significant (and is, again, optional if we don't want it).

So, that'll be the next step.
2021-11-10 12:22:10 -05:00
..
benches tamer: xir::XirString: WIP implementation (likely going away) 2021-11-10 12:22:10 -05:00
build-aux Copyright year update 2021 2021-07-22 15:00:15 -04:00
src tamer: xir::XirString: WIP implementation (likely going away) 2021-11-10 12:22:10 -05:00
tests Copyright year update 2021 2021-07-22 15:00:15 -04:00
.gitignore TAMER: Initial commit 2019-11-18 14:05:47 -05:00
Cargo.lock tamer: xir::XirString: WIP implementation (likely going away) 2021-11-10 12:22:10 -05:00
Cargo.toml tamer: xir::XirString: WIP implementation (likely going away) 2021-11-10 12:22:10 -05:00
Makefile.am tamer: Makefile.am (bench-build): New target, default for all 2021-10-08 09:27:56 -04:00
README.md Copyright year update 2021 2021-07-22 15:00:15 -04:00
autogen.sh Copyright year update 2021 2021-07-22 15:00:15 -04:00
bootstrap Copyright year update 2021 2021-07-22 15:00:15 -04:00
configure.ac tamer: Switch back to nightly toolchain 2021-10-02 00:58:14 -04:00
rustfmt.toml tamer/rustfmt (max_width): Set to 80 2019-11-27 09:15:15 -05:00

README.md

TAME in Rust (TAMER)

TAME was written to help tame the complexity of developing comparative insurance rating systems. This project aims to tame the complexity and performance issues of TAME itself. TAMER is therefore more tame than TAME.

TAME was originally written in XSLT. For more information about the project, see the parent README.md.

Building

To bootstrap from the source repository, run ./bootstrap.

To configure the build for your system, run ./configure. To build, run make. To run tests, run make check.

You may also invoke cargo directly, which make will do for you using options provided to configure.

Note that the default development build results in terrible runtime performance! See [#Build Flags][] below for instructions on how to generate a release binary.

Build Flags

The environment variable CARGO_BUILD_FLAGS can be used to provide additional arguments to cargo build when invoked via make. This can be provided optionally during configure and can be overridden when invoking make. For example:

# release build
$ ./configure && make CARGO_BUILD_FLAGS=--release
$ ./configure CARGO_BUILD_FLAGS=--release && make

# dev build
$ ./configure && make
$ ./configure CARGO_BUILD_FLAGS=--release && make CARGO_BUILD_FLAGS=

Hacking

This section contains advice for those developing TAMER.

Running Tests

Developers should be using test-driven development (TDD). make check will run all necessary tests.

Code Format

Rust provides rustfmt that can automatically format code for you. This project mandates its use and therefore eliminates personal preference in code style (for better or worse).

Formatting checks are run during make check and, on failure, will output the diff that would be applied if you ran make fmt (or make fix); this will run cargo fmt for you (and will use the binaries configured via configure).

Since developers should be doing test-driven development (TDD) and therefore should be running make check frequently, the hope is that frequent feedback on formatting issues will allow developers to quickly adjust their habits to avoid triggering formatting errors at all.

If you want to automatically fix formatting errors and then run tests:

$ make fmt check

Benchmarking

Benchmarks serve two purposes: external integration tests (which are subject to module visibility constraints) and actual benchmarking. To run benchmarks, invoke make bench.

Note that link-time optimizations (LTO) are performed on the binary for benchmarking so that its performance reflects release builds that will be used in production.

The configure script will automatically detect whether the test feature is unstable (as it was as of the time of writing) and, if so, will automatically fall back to invoking nightly (by running cargo +nightly bench).

If you do not have nightly, run you install it via rustup install nightly.