This adds benchmarking for the memchr crate. It is used primarily by
quick-xml at the moment, but the question is whether to rely on it for
certain operations for XIR.
The benchmarking on an Intel Xeon system shows that memchr and Rust's
contains() perform very similarly on small inputs, matching against a single
character, and so Rust's built-in should be preferred in that case so that
we're using APIs that are familiar to most people.
When larger inputs are compared against, there's a greater benefit (a little
under ~2x).
When comparing against two characters, they are again very close. But look
at when we compare two characters against _multiple_ inputs:
running 24 tests
test large_str:1️⃣:memchr_early_match ... bench: 4,938 ns/iter (+/- 124)
test large_str:1️⃣:memchr_late_match ... bench: 81,807 ns/iter (+/- 1,153)
test large_str:1️⃣:memchr_non_match ... bench: 82,074 ns/iter (+/- 1,062)
test large_str:1️⃣:rust_contains_one_byte_early_match ... bench: 9,425 ns/iter (+/- 167)
test large_str:1️⃣:rust_contains_one_byte_late_match ... bench: 123,685 ns/iter (+/- 3,728)
test large_str:1️⃣:rust_contains_one_byte_non_match ... bench: 123,117 ns/iter (+/- 2,200)
test large_str:1️⃣:rust_contains_one_char_early_match ... bench: 9,561 ns/iter (+/- 507)
test large_str:1️⃣:rust_contains_one_char_late_match ... bench: 123,929 ns/iter (+/- 2,377)
test large_str:1️⃣:rust_contains_one_char_non_match ... bench: 122,989 ns/iter (+/- 2,788)
test large_str:2️⃣:memchr2_early_match ... bench: 5,704 ns/iter (+/- 91)
test large_str:2️⃣:memchr2_late_match ... bench: 89,194 ns/iter (+/- 8,546)
test large_str:2️⃣:memchr2_non_match ... bench: 85,649 ns/iter (+/- 3,879)
test large_str:2️⃣:rust_contains_two_char_early_match ... bench: 66,785 ns/iter (+/- 3,385)
test large_str:2️⃣:rust_contains_two_char_late_match ... bench: 2,148,064 ns/iter (+/- 21,812)
test large_str:2️⃣:rust_contains_two_char_non_match ... bench: 2,322,082 ns/iter (+/- 22,947)
test small_str:1️⃣:memchr_mid_match ... bench: 4,737 ns/iter (+/- 842)
test small_str:1️⃣:memchr_non_match ... bench: 5,160 ns/iter (+/- 62)
test small_str:1️⃣:rust_contains_one_byte_non_match ... bench: 3,930 ns/iter (+/- 35)
test small_str:1️⃣:rust_contains_one_char_mid_match ... bench: 3,677 ns/iter (+/- 618)
test small_str:1️⃣:rust_contains_one_char_non_match ... bench: 5,415 ns/iter (+/- 221)
test small_str:2️⃣:memchr2_mid_match ... bench: 5,488 ns/iter (+/- 888)
test small_str:2️⃣:memchr2_non_match ... bench: 6,788 ns/iter (+/- 134)
test small_str:2️⃣:rust_contains_two_char_mid_match ... bench: 6,203 ns/iter (+/- 170)
test small_str:2️⃣:rust_contains_two_char_non_match ... bench: 7,853 ns/iter (+/- 713)
Yikes.
With that said, we won't be comparing against such large inputs
short-term. The larger strings (fragments) are copied verbatim, and not
compared against---but they _were_ prior to the previous commit that stopped
unencoding and re-encoding.
So: Rust built-ins for inputs that are expected to be small.
Oh boy. What a mess of a change.
This demonstrates some significant issues we have with Symbol. I had
originally modelled the system a bit after Rustc's, but deviated in certain
regards:
1. This has a confurable base type to enable better packing without bit
twiddling and potentially unsafe tricks I'd rather avoid unless
necessary; and
2. The lifetime is not static, and there is no global, singleton interner;
and
3. I pass around references to a Symbol rather than passing around an
index into an interner.
For #3---this is done because there's no singleton interner and therefore
resolving a symbol requires a direct reference to an available interner. It
also wasn't clear to me (and still isn't, in fact) whether more than one
interner may be used for different contexts.
But, that doesn't preclude removing lifetimes and just passing around
indexes; in fact, I plan to do this in the frontend where the parser and
such will have direct interner access and can therefore just look up based
on a symbol index. We could reserve references for situations where
exposing an interner would be undesirable.
Anyway, more to come...
This introduces the beginnings of frontends for TAMER, gated behind a
`wip-features` flag.
This will be introduced in stages:
1. Replace the existing copy with a parser-based copy (echo back out the
tokens), when the flag is on.
2. Begin to parse portions of the source, augmenting the output xmlo (xmli
at the moment). The XSLT-based compiler will be modified to skip
compilation steps as necessary.
As portions of the compilation are implemented in TAMER, they'll be placed
behind their own feature flags and stabalized, which will incrementally
remove the compilation steps from the XSLT-based system. The result should
be substantial incremental performance improvements.
Short-term, the priorities are for loading identifiers into an IR
are (though the order may change):
1. Echo
2. Imports
3. Extern declarations.
4. Simple identifiers (e.g. param, const, template, etc).
5. Classifications.
6. Documentation expressions.
7. Calculation expressions.
8. Template applications.
9. Template definitions.
10. Inline templates.
After each of those are done, the resulting xmlo (xmli) will have fully
reconstructed the source document from the IR produced during parsing.
We want to be able to build a representation of the dependency graph so
we can easily inspect it.
We do not want to make GraphML by default. It is better to use a tool.
We use "petgraph-graphml".
This variant is unnecessary, as it was used only by the indexer to represent
the absence of a node, for which was can simply use `None` in the containing
`Option`.
* tamer/Cargo.toml: Add `lazy_static`.
* tamer/Cargo.lock: Update.
* tamer/src/ir/asg/base.rs (with_capacity): Use `None` in place of
`Some(Object::Empty)`.
* tamer/src/ir/asg/object.rs: Adjust state machine graphic.
(Empty): Remove variant.
(Missing): Remove reference to variance.
* tamer/src/lib.rs: Import `lazy_static` for test builds.
* tamer/obj/xmle/writer/writer.rs (Section::iter): Remove `Object::Empty`
from documentation.
(test::): Remove references to `Object::Missing`. `lazy_static!` used
here.
* tamer/obj/xmle/writer/xmle.rs (test::write_section_catch_missing): Replace
reference to `Object::Missing`.
We want to add an option to set the output file to the linker so we do
not need to redirect output to awk any longer.
This also adds integration tests for tameld.
Contrary to what I said previously, this replaces the previous
implementation with an arena-backed internment system. The motivation for
this change was investigating how Rustc performed its string interning, and
why they chose to associate integer identifiers with symbols.
The intent was originally to use Rustc's arena allocator directly, but that
create pulled in far too many dependencies and depended on nightly
Rust. Bumpalo provides a very similar implementation to Rustc's
DroplessArena, so I went with that instead.
Rustc also relies on a global, singleton interner. I do not do that
here. Instead, the returned Symbol carries a lifetime of the underlying
arena, as well as a pointer to the interned string.
Now that this is put to rest, it's time to move on.
For strings of any notable length, Fx Hash outperforms FNV. Rustc also
moved to this hash function and noticed performance
improvements. Fortunately, as was accounted for in the design, this was a
trivial switch.
Here are some benchmarks to back up that claim:
test hash_set::fnv::with_all_new_1000 ... bench: 133,096 ns/iter (+/- 1,430)
test hash_set::fnv::with_all_new_1000_with_capacity ... bench: 82,591 ns/iter (+/- 592)
test hash_set::fnv::with_all_new_rc_str_1000_baseline ... bench: 162,073 ns/iter (+/- 1,277)
test hash_set::fnv::with_one_new_1000 ... bench: 37,334 ns/iter (+/- 256)
test hash_set::fnv::with_one_new_rc_str_1000_baseline ... bench: 18,263 ns/iter (+/- 261)
test hash_set::fx::with_all_new_1000 ... bench: 85,217 ns/iter (+/- 1,111)
test hash_set::fx::with_all_new_1000_with_capacity ... bench: 59,383 ns/iter (+/- 752)
test hash_set::fx::with_all_new_rc_str_1000_baseline ... bench: 98,802 ns/iter (+/- 1,117)
test hash_set::fx::with_one_new_1000 ... bench: 42,484 ns/iter (+/- 1,239)
test hash_set::fx::with_one_new_rc_str_1000_baseline ... bench: 15,000 ns/iter (+/- 233)
test hash_set::with_all_new_1000 ... bench: 137,645 ns/iter (+/- 1,186)
test hash_set::with_all_new_rc_str_1000_baseline ... bench: 163,129 ns/iter (+/- 1,725)
test hash_set::with_one_new_1000 ... bench: 59,051 ns/iter (+/- 1,202)
test hash_set::with_one_new_rc_str_1000_baseline ... bench: 37,986 ns/iter (+/- 771)
This is missing two key things that I'll add shortly: a HashMap-based one
for use in the ASG for node mapping, and an entry-based system for
manipulations.
This has been a nice start for exploring various aspects of Rust
development, as well as conventions that I'd like to implement. In
particular:
- Robust documentation intended to guide people through learning the
necessary material about the compiler, as well as related work to
rationalize design decisions;
- Benchmarks;
- TDD;
- And just getting used to Rust in general.
I've beat this one to death, so I'll commit this and make smaller changes
going forward to show how easily it can evolve.
(This module was originally named `intern` but this commit and those that
follow rewrote it to `sym`.)
This makes use of Petgraph for representing the dependency graph and uses a
separate data structure for both string interning and indexing by symbol
name.