2019-11-14 16:43:07 -05:00
|
|
|
[package]
|
|
|
|
name = "tamer"
|
|
|
|
version = "0.0.0"
|
|
|
|
authors = ["Mike Gerwitz <mike.gerwitz@ryansg.com>"]
|
2019-11-27 13:47:42 -05:00
|
|
|
description="TAME in Rust"
|
2019-11-14 16:43:07 -05:00
|
|
|
license="GPLv3+"
|
2021-10-01 09:42:19 -04:00
|
|
|
edition = "2021"
|
2019-11-14 16:43:07 -05:00
|
|
|
|
2019-11-27 09:17:27 -05:00
|
|
|
[profile.dev]
|
|
|
|
# Release-level optimizations. Spending the extra couple of moments
|
|
|
|
# compile-time is well worth the huge savings we get at runtime. Note that
|
|
|
|
# this is still every so slightly slower than a release build; see other
|
|
|
|
# profile options for release at
|
|
|
|
# <https://doc.rust-lang.org/cargo/reference/manifest.html>.
|
|
|
|
opt-level = 3
|
2019-11-14 16:43:07 -05:00
|
|
|
|
2019-12-03 12:17:44 -05:00
|
|
|
[profile.release]
|
|
|
|
lto = true
|
|
|
|
|
2019-12-04 09:57:08 -05:00
|
|
|
[profile.bench]
|
|
|
|
# We want our benchmarks to be representative of how well TAME will perform
|
|
|
|
# in a release.
|
|
|
|
lto = true
|
|
|
|
|
2019-11-14 16:43:07 -05:00
|
|
|
[dependencies]
|
2021-09-28 14:52:31 -04:00
|
|
|
arrayvec = ">= 0.7.1"
|
2019-12-23 23:26:42 -05:00
|
|
|
bumpalo = ">= 2.6.0"
|
2022-09-21 14:37:38 -04:00
|
|
|
exitcode = "1.1.2"
|
2023-04-26 09:49:50 -04:00
|
|
|
fixedbitset = ">= 0.4.1" # also used by petgraph
|
tamer::sym: FNV => Fx Hash
For strings of any notable length, Fx Hash outperforms FNV. Rustc also
moved to this hash function and noticed performance
improvements. Fortunately, as was accounted for in the design, this was a
trivial switch.
Here are some benchmarks to back up that claim:
test hash_set::fnv::with_all_new_1000 ... bench: 133,096 ns/iter (+/- 1,430)
test hash_set::fnv::with_all_new_1000_with_capacity ... bench: 82,591 ns/iter (+/- 592)
test hash_set::fnv::with_all_new_rc_str_1000_baseline ... bench: 162,073 ns/iter (+/- 1,277)
test hash_set::fnv::with_one_new_1000 ... bench: 37,334 ns/iter (+/- 256)
test hash_set::fnv::with_one_new_rc_str_1000_baseline ... bench: 18,263 ns/iter (+/- 261)
test hash_set::fx::with_all_new_1000 ... bench: 85,217 ns/iter (+/- 1,111)
test hash_set::fx::with_all_new_1000_with_capacity ... bench: 59,383 ns/iter (+/- 752)
test hash_set::fx::with_all_new_rc_str_1000_baseline ... bench: 98,802 ns/iter (+/- 1,117)
test hash_set::fx::with_one_new_1000 ... bench: 42,484 ns/iter (+/- 1,239)
test hash_set::fx::with_one_new_rc_str_1000_baseline ... bench: 15,000 ns/iter (+/- 233)
test hash_set::with_all_new_1000 ... bench: 137,645 ns/iter (+/- 1,186)
test hash_set::with_all_new_rc_str_1000_baseline ... bench: 163,129 ns/iter (+/- 1,725)
test hash_set::with_one_new_1000 ... bench: 59,051 ns/iter (+/- 1,202)
test hash_set::with_one_new_rc_str_1000_baseline ... bench: 37,986 ns/iter (+/- 771)
2019-12-10 15:32:25 -05:00
|
|
|
fxhash = ">= 0.2.1"
|
2020-03-04 15:31:20 -05:00
|
|
|
getopts = "0.2"
|
tamer: memchr benches
This adds benchmarking for the memchr crate. It is used primarily by
quick-xml at the moment, but the question is whether to rely on it for
certain operations for XIR.
The benchmarking on an Intel Xeon system shows that memchr and Rust's
contains() perform very similarly on small inputs, matching against a single
character, and so Rust's built-in should be preferred in that case so that
we're using APIs that are familiar to most people.
When larger inputs are compared against, there's a greater benefit (a little
under ~2x).
When comparing against two characters, they are again very close. But look
at when we compare two characters against _multiple_ inputs:
running 24 tests
test large_str::one::memchr_early_match ... bench: 4,938 ns/iter (+/- 124)
test large_str::one::memchr_late_match ... bench: 81,807 ns/iter (+/- 1,153)
test large_str::one::memchr_non_match ... bench: 82,074 ns/iter (+/- 1,062)
test large_str::one::rust_contains_one_byte_early_match ... bench: 9,425 ns/iter (+/- 167)
test large_str::one::rust_contains_one_byte_late_match ... bench: 123,685 ns/iter (+/- 3,728)
test large_str::one::rust_contains_one_byte_non_match ... bench: 123,117 ns/iter (+/- 2,200)
test large_str::one::rust_contains_one_char_early_match ... bench: 9,561 ns/iter (+/- 507)
test large_str::one::rust_contains_one_char_late_match ... bench: 123,929 ns/iter (+/- 2,377)
test large_str::one::rust_contains_one_char_non_match ... bench: 122,989 ns/iter (+/- 2,788)
test large_str::two::memchr2_early_match ... bench: 5,704 ns/iter (+/- 91)
test large_str::two::memchr2_late_match ... bench: 89,194 ns/iter (+/- 8,546)
test large_str::two::memchr2_non_match ... bench: 85,649 ns/iter (+/- 3,879)
test large_str::two::rust_contains_two_char_early_match ... bench: 66,785 ns/iter (+/- 3,385)
test large_str::two::rust_contains_two_char_late_match ... bench: 2,148,064 ns/iter (+/- 21,812)
test large_str::two::rust_contains_two_char_non_match ... bench: 2,322,082 ns/iter (+/- 22,947)
test small_str::one::memchr_mid_match ... bench: 4,737 ns/iter (+/- 842)
test small_str::one::memchr_non_match ... bench: 5,160 ns/iter (+/- 62)
test small_str::one::rust_contains_one_byte_non_match ... bench: 3,930 ns/iter (+/- 35)
test small_str::one::rust_contains_one_char_mid_match ... bench: 3,677 ns/iter (+/- 618)
test small_str::one::rust_contains_one_char_non_match ... bench: 5,415 ns/iter (+/- 221)
test small_str::two::memchr2_mid_match ... bench: 5,488 ns/iter (+/- 888)
test small_str::two::memchr2_non_match ... bench: 6,788 ns/iter (+/- 134)
test small_str::two::rust_contains_two_char_mid_match ... bench: 6,203 ns/iter (+/- 170)
test small_str::two::rust_contains_two_char_non_match ... bench: 7,853 ns/iter (+/- 713)
Yikes.
With that said, we won't be comparing against such large inputs
short-term. The larger strings (fragments) are copied verbatim, and not
compared against---but they _were_ prior to the previous commit that stopped
unencoding and re-encoding.
So: Rust built-ins for inputs that are expected to be small.
2021-08-18 14:18:24 -04:00
|
|
|
memchr = ">= 2.3.4" # quick-xml expects =2.3.4 at the time
|
2021-09-20 16:46:16 -04:00
|
|
|
paste = ">= 1.0.5"
|
2022-09-21 14:37:38 -04:00
|
|
|
petgraph = "0.6.0"
|
|
|
|
quick-xml = ">= 0.23.0-alpha3"
|
|
|
|
static_assertions = ">= 1.1.0"
|
tamer: diagnostic: Column resolution
Determining the column number is not as simple as performing byte
arithmetic, because certain characters have different widths. Even if we
only accepted ASCII, control characters aren't visible to the user.
This uses the unicode-width crate as an alternative to POSIX wcwidth, to
determine (hopefully) the number of fixed-width cells that a unicode
character will take up on a terminal. For example, control characters are
zero-width, while an emoji is likely double-width. See test cases for more
information on that.
There is also the unicode-segmentation crate, which can handle extended
grapheme clusters and such, but (a) we'll be outputting the line to the
terminal and (b) there's no guarantee that the user's editor displays
grapheme clusters as a single column. LSP measures in UTF-16,
apparently. I use both Emacs and Vim from a terminal, so unicode-width
applies to me. There's too much variation to try to solve that right now.
The columns can be considered a visual span---this gives us enough
information to draw line annotations, which will happen soon.
Here are some useful links:
- https://hsivonen.fi/string-length/
- https://unicode.org/reports/tr29/
- https://github.com/rust-analyzer/rowan/issues/17
- https://www.reddit.com/r/rust/comments/gpw2ra/how_is_the_rust_compiler_able_to_tell_the_visible/
DEV-10935
2022-04-21 14:16:21 -04:00
|
|
|
unicode-width = "0.1.5"
|
2020-03-04 15:31:20 -05:00
|
|
|
|
tamer: Initial frontend concept
This introduces the beginnings of frontends for TAMER, gated behind a
`wip-features` flag.
This will be introduced in stages:
1. Replace the existing copy with a parser-based copy (echo back out the
tokens), when the flag is on.
2. Begin to parse portions of the source, augmenting the output xmlo (xmli
at the moment). The XSLT-based compiler will be modified to skip
compilation steps as necessary.
As portions of the compilation are implemented in TAMER, they'll be placed
behind their own feature flags and stabalized, which will incrementally
remove the compilation steps from the XSLT-based system. The result should
be substantial incremental performance improvements.
Short-term, the priorities are for loading identifiers into an IR
are (though the order may change):
1. Echo
2. Imports
3. Extern declarations.
4. Simple identifiers (e.g. param, const, template, etc).
5. Classifications.
6. Documentation expressions.
7. Calculation expressions.
8. Template applications.
9. Template definitions.
10. Inline templates.
After each of those are done, the resulting xmlo (xmli) will have fully
reconstructed the source document from the IR produced during parsing.
2021-07-23 22:24:08 -04:00
|
|
|
# Feature flags can be specified using `./configure FEATURES=foo,bar,baz`.
|
|
|
|
#
|
|
|
|
# Flags beginning with "wip-" are short-lived flags that exist only during
|
|
|
|
# development of a particular feature; you should not hard-code them
|
|
|
|
# anywhere, since the build will break once they are removed. Enabling WIP
|
|
|
|
# flags should also be expected to cause undesirable behavior in some form
|
|
|
|
# or another. Once WIP features are finalized, they are enabled by default
|
|
|
|
# and the flag removed.
|
|
|
|
[features]
|
2021-09-28 14:52:31 -04:00
|
|
|
|
2022-07-21 22:05:21 -04:00
|
|
|
# Cause `Parser` to emit a verbose, human-readable trace to stderr for every
|
|
|
|
# token. This is not intended to be machine-readable, so please do not
|
|
|
|
# parse it.
|
|
|
|
#
|
|
|
|
# This is enabled automatically for the `test` profile.
|
|
|
|
parser-trace-stderr = []
|
|
|
|
|