tame/tamer
Mike Gerwitz fc235b7ecc tamer: memchr benches
This adds benchmarking for the memchr crate.  It is used primarily by
quick-xml at the moment, but the question is whether to rely on it for
certain operations for XIR.

The benchmarking on an Intel Xeon system shows that memchr and Rust's
contains() perform very similarly on small inputs, matching against a single
character, and so Rust's built-in should be preferred in that case so that
we're using APIs that are familiar to most people.

When larger inputs are compared against, there's a greater benefit (a little
under ~2x).

When comparing against two characters, they are again very close.  But look
at when we compare two characters against _multiple_ inputs:

  running 24 tests
  test large_str:1️⃣:memchr_early_match                 ... bench:       4,938 ns/iter (+/- 124)
  test large_str:1️⃣:memchr_late_match                  ... bench:      81,807 ns/iter (+/- 1,153)
  test large_str:1️⃣:memchr_non_match                   ... bench:      82,074 ns/iter (+/- 1,062)
  test large_str:1️⃣:rust_contains_one_byte_early_match ... bench:       9,425 ns/iter (+/- 167)
  test large_str:1️⃣:rust_contains_one_byte_late_match  ... bench:     123,685 ns/iter (+/- 3,728)
  test large_str:1️⃣:rust_contains_one_byte_non_match   ... bench:     123,117 ns/iter (+/- 2,200)
  test large_str:1️⃣:rust_contains_one_char_early_match ... bench:       9,561 ns/iter (+/- 507)
  test large_str:1️⃣:rust_contains_one_char_late_match  ... bench:     123,929 ns/iter (+/- 2,377)
  test large_str:1️⃣:rust_contains_one_char_non_match   ... bench:     122,989 ns/iter (+/- 2,788)
  test large_str:2️⃣:memchr2_early_match                ... bench:       5,704 ns/iter (+/- 91)
  test large_str:2️⃣:memchr2_late_match                 ... bench:      89,194 ns/iter (+/- 8,546)
  test large_str:2️⃣:memchr2_non_match                  ... bench:      85,649 ns/iter (+/- 3,879)
  test large_str:2️⃣:rust_contains_two_char_early_match ... bench:      66,785 ns/iter (+/- 3,385)
  test large_str:2️⃣:rust_contains_two_char_late_match  ... bench:   2,148,064 ns/iter (+/- 21,812)
  test large_str:2️⃣:rust_contains_two_char_non_match   ... bench:   2,322,082 ns/iter (+/- 22,947)
  test small_str:1️⃣:memchr_mid_match                   ... bench:       4,737 ns/iter (+/- 842)
  test small_str:1️⃣:memchr_non_match                   ... bench:       5,160 ns/iter (+/- 62)
  test small_str:1️⃣:rust_contains_one_byte_non_match   ... bench:       3,930 ns/iter (+/- 35)
  test small_str:1️⃣:rust_contains_one_char_mid_match   ... bench:       3,677 ns/iter (+/- 618)
  test small_str:1️⃣:rust_contains_one_char_non_match   ... bench:       5,415 ns/iter (+/- 221)
  test small_str:2️⃣:memchr2_mid_match                  ... bench:       5,488 ns/iter (+/- 888)
  test small_str:2️⃣:memchr2_non_match                  ... bench:       6,788 ns/iter (+/- 134)
  test small_str:2️⃣:rust_contains_two_char_mid_match   ... bench:       6,203 ns/iter (+/- 170)
  test small_str:2️⃣:rust_contains_two_char_non_match   ... bench:       7,853 ns/iter (+/- 713)

Yikes.

With that said, we won't be comparing against such large inputs
short-term.  The larger strings (fragments) are copied verbatim, and not
compared against---but they _were_ prior to the previous commit that stopped
unencoding and re-encoding.

So: Rust built-ins for inputs that are expected to be small.
2021-08-18 14:23:03 -04:00
..
benches tamer: memchr benches 2021-08-18 14:23:03 -04:00
build-aux Copyright year update 2021 2021-07-22 15:00:15 -04:00
src tamer: tameld: Skip fragment unescaping only to re-escape on write 2021-08-18 11:39:06 -04:00
tests Copyright year update 2021 2021-07-22 15:00:15 -04:00
.gitignore TAMER: Initial commit 2019-11-18 14:05:47 -05:00
Cargo.lock tamer: memchr benches 2021-08-18 14:23:03 -04:00
Cargo.toml tamer: memchr benches 2021-08-18 14:23:03 -04:00
Makefile.am tamer: Makefile.am (all): Binaries and doc 2021-07-23 22:23:10 -04:00
README.md Copyright year update 2021 2021-07-22 15:00:15 -04:00
autogen.sh Copyright year update 2021 2021-07-22 15:00:15 -04:00
bootstrap Copyright year update 2021 2021-07-22 15:00:15 -04:00
configure.ac tamer: configure.ac: Configure-time feature flags (via Cargo) 2021-07-23 10:16:44 -04:00
rustfmt.toml tamer/rustfmt (max_width): Set to 80 2019-11-27 09:15:15 -05:00

README.md

TAME in Rust (TAMER)

TAME was written to help tame the complexity of developing comparative insurance rating systems. This project aims to tame the complexity and performance issues of TAME itself. TAMER is therefore more tame than TAME.

TAME was originally written in XSLT. For more information about the project, see the parent README.md.

Building

To bootstrap from the source repository, run ./bootstrap.

To configure the build for your system, run ./configure. To build, run make. To run tests, run make check.

You may also invoke cargo directly, which make will do for you using options provided to configure.

Note that the default development build results in terrible runtime performance! See [#Build Flags][] below for instructions on how to generate a release binary.

Build Flags

The environment variable CARGO_BUILD_FLAGS can be used to provide additional arguments to cargo build when invoked via make. This can be provided optionally during configure and can be overridden when invoking make. For example:

# release build
$ ./configure && make CARGO_BUILD_FLAGS=--release
$ ./configure CARGO_BUILD_FLAGS=--release && make

# dev build
$ ./configure && make
$ ./configure CARGO_BUILD_FLAGS=--release && make CARGO_BUILD_FLAGS=

Hacking

This section contains advice for those developing TAMER.

Running Tests

Developers should be using test-driven development (TDD). make check will run all necessary tests.

Code Format

Rust provides rustfmt that can automatically format code for you. This project mandates its use and therefore eliminates personal preference in code style (for better or worse).

Formatting checks are run during make check and, on failure, will output the diff that would be applied if you ran make fmt (or make fix); this will run cargo fmt for you (and will use the binaries configured via configure).

Since developers should be doing test-driven development (TDD) and therefore should be running make check frequently, the hope is that frequent feedback on formatting issues will allow developers to quickly adjust their habits to avoid triggering formatting errors at all.

If you want to automatically fix formatting errors and then run tests:

$ make fmt check

Benchmarking

Benchmarks serve two purposes: external integration tests (which are subject to module visibility constraints) and actual benchmarking. To run benchmarks, invoke make bench.

Note that link-time optimizations (LTO) are performed on the binary for benchmarking so that its performance reflects release builds that will be used in production.

The configure script will automatically detect whether the test feature is unstable (as it was as of the time of writing) and, if so, will automatically fall back to invoking nightly (by running cargo +nightly bench).

If you do not have nightly, run you install it via rustup install nightly.