employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	6864fbc1cd	tamer: Start of XIR-based xmle writer This has been a long time coming, and has been repeatedly stashed as other parts of the system have evolved to support it. The introduction of the XIR tree was to write tests for this (which are sloppy atm). This currently writes out the `xmle` header and _most_ of the `l:dep` section; it's missing the object-type-specific attributes. There is, relatively speaking, not much more work to do here. The feature flag `wip-xir-xmle-writer` was introduced to toggle this system in place of `XmleWriter`. Initial benchmarks show that it will be competitive with the quick-xml-based writer, but remember that is not the goal: the purpose of this is to test XIR in a production system before we continue to implement it for a frontend, and to refactor so that we do not have multiple implementations writing XML files (once we echo the source XML files). I'm excited to get this done with so that I can move on. This has been rather exhausting.	2021-09-28 14:52:53 -04:00
Mike Gerwitz	3bb6f0cf35	tamer: ir::asg::ident: AsRef impls for SymbolId types This commit will make more sense once the broader context is committed, but it's needed for lowering from `Sections` into a XIR stream. This will also change once we pre-allocate symbols, like rustc, when the interner is initialized. This is my first use of the `paste` crate, which is used to generate identifiers. So this is partly an experiment, and it seems much better than having to write a proc macro, at least at this point in time. If this code stays around, it'll probably be generalized further and used elsewhere, but I'd prefer not to go this route long-term.	2021-09-20 16:50:40 -04:00
Mike Gerwitz	fc235b7ecc	tamer: memchr benches This adds benchmarking for the memchr crate. It is used primarily by quick-xml at the moment, but the question is whether to rely on it for certain operations for XIR. The benchmarking on an Intel Xeon system shows that memchr and Rust's contains() perform very similarly on small inputs, matching against a single character, and so Rust's built-in should be preferred in that case so that we're using APIs that are familiar to most people. When larger inputs are compared against, there's a greater benefit (a little under ~2x). When comparing against two characters, they are again very close. But look at when we compare two characters against _multiple_ inputs: running 24 tests test large_str:1️⃣:memchr_early_match ... bench: 4,938 ns/iter (+/- 124) test large_str:1️⃣:memchr_late_match ... bench: 81,807 ns/iter (+/- 1,153) test large_str:1️⃣:memchr_non_match ... bench: 82,074 ns/iter (+/- 1,062) test large_str:1️⃣:rust_contains_one_byte_early_match ... bench: 9,425 ns/iter (+/- 167) test large_str:1️⃣:rust_contains_one_byte_late_match ... bench: 123,685 ns/iter (+/- 3,728) test large_str:1️⃣:rust_contains_one_byte_non_match ... bench: 123,117 ns/iter (+/- 2,200) test large_str:1️⃣:rust_contains_one_char_early_match ... bench: 9,561 ns/iter (+/- 507) test large_str:1️⃣:rust_contains_one_char_late_match ... bench: 123,929 ns/iter (+/- 2,377) test large_str:1️⃣:rust_contains_one_char_non_match ... bench: 122,989 ns/iter (+/- 2,788) test large_str:2️⃣:memchr2_early_match ... bench: 5,704 ns/iter (+/- 91) test large_str:2️⃣:memchr2_late_match ... bench: 89,194 ns/iter (+/- 8,546) test large_str:2️⃣:memchr2_non_match ... bench: 85,649 ns/iter (+/- 3,879) test large_str:2️⃣:rust_contains_two_char_early_match ... bench: 66,785 ns/iter (+/- 3,385) test large_str:2️⃣:rust_contains_two_char_late_match ... bench: 2,148,064 ns/iter (+/- 21,812) test large_str:2️⃣:rust_contains_two_char_non_match ... bench: 2,322,082 ns/iter (+/- 22,947) test small_str:1️⃣:memchr_mid_match ... bench: 4,737 ns/iter (+/- 842) test small_str:1️⃣:memchr_non_match ... bench: 5,160 ns/iter (+/- 62) test small_str:1️⃣:rust_contains_one_byte_non_match ... bench: 3,930 ns/iter (+/- 35) test small_str:1️⃣:rust_contains_one_char_mid_match ... bench: 3,677 ns/iter (+/- 618) test small_str:1️⃣:rust_contains_one_char_non_match ... bench: 5,415 ns/iter (+/- 221) test small_str:2️⃣:memchr2_mid_match ... bench: 5,488 ns/iter (+/- 888) test small_str:2️⃣:memchr2_non_match ... bench: 6,788 ns/iter (+/- 134) test small_str:2️⃣:rust_contains_two_char_mid_match ... bench: 6,203 ns/iter (+/- 170) test small_str:2️⃣:rust_contains_two_char_non_match ... bench: 7,853 ns/iter (+/- 713) Yikes. With that said, we won't be comparing against such large inputs short-term. The larger strings (fragments) are copied verbatim, and not compared against---but they _were_ prior to the previous commit that stopped unencoding and re-encoding. So: Rust built-ins for inputs that are expected to be small.	2021-08-18 14:23:03 -04:00
Mike Gerwitz	0fc8a1a4df	tamer: Remove default SymbolIndex (et al) index type Oh boy. What a mess of a change. This demonstrates some significant issues we have with Symbol. I had originally modelled the system a bit after Rustc's, but deviated in certain regards: 1. This has a confurable base type to enable better packing without bit twiddling and potentially unsafe tricks I'd rather avoid unless necessary; and 2. The lifetime is not static, and there is no global, singleton interner; and 3. I pass around references to a Symbol rather than passing around an index into an interner. For #3---this is done because there's no singleton interner and therefore resolving a symbol requires a direct reference to an available interner. It also wasn't clear to me (and still isn't, in fact) whether more than one interner may be used for different contexts. But, that doesn't preclude removing lifetimes and just passing around indexes; in fact, I plan to do this in the frontend where the parser and such will have direct interner access and can therefore just look up based on a symbol index. We could reserve references for situations where exposing an interner would be undesirable. Anyway, more to come...	2021-07-29 14:26:40 -04:00
Mike Gerwitz	96ea0302cc	tamer: Cargo.lock: Dependency updates This project has been on pause for over a year.	2021-06-21 12:46:38 -04:00
Joseph Frazer	43d00a8268	[DEV-7504] Add GraphML generation We want to be able to build a representation of the dependency graph so we can easily inspect it. We do not want to make GraphML by default. It is better to use a tool. We use "petgraph-graphml".	2020-05-13 08:04:48 -04:00
Mike Gerwitz	4b643385c8	TAMER: Update Cargo dependencies	2020-04-29 11:33:38 -04:00
Mike Gerwitz	400d5b25a1	ir::asg::Object::Empty: Remove variant This variant is unnecessary, as it was used only by the indexer to represent the absence of a node, for which was can simply use `None` in the containing `Option`. * tamer/Cargo.toml: Add `lazy_static`. * tamer/Cargo.lock: Update. * tamer/src/ir/asg/base.rs (with_capacity): Use `None` in place of `Some(Object::Empty)`. * tamer/src/ir/asg/object.rs: Adjust state machine graphic. (Empty): Remove variant. (Missing): Remove reference to variance. * tamer/src/lib.rs: Import `lazy_static` for test builds. * tamer/obj/xmle/writer/writer.rs (Section::iter): Remove `Object::Empty` from documentation. (test::): Remove references to `Object::Missing`. `lazy_static!` used here. * tamer/obj/xmle/writer/xmle.rs (test::write_section_catch_missing): Replace reference to `Object::Missing`.	2020-03-19 15:42:06 -04:00
Joseph Frazer	e613bd8a8c	[DEV-7081] Add options to tameld We want to add an option to set the output file to the linker so we do not need to redirect output to awk any longer. This also adds integration tests for tameld.	2020-03-06 09:41:55 -05:00
Mike Gerwitz	1f4db84f24	TAMER: Arena-based string interner Contrary to what I said previously, this replaces the previous implementation with an arena-backed internment system. The motivation for this change was investigating how Rustc performed its string interning, and why they chose to associate integer identifiers with symbols. The intent was originally to use Rustc's arena allocator directly, but that create pulled in far too many dependencies and depended on nightly Rust. Bumpalo provides a very similar implementation to Rustc's DroplessArena, so I went with that instead. Rustc also relies on a global, singleton interner. I do not do that here. Instead, the returned Symbol carries a lifetime of the underlying arena, as well as a pointer to the interned string. Now that this is put to rest, it's time to move on.	2020-02-24 14:56:28 -05:00
Mike Gerwitz	176d099fb6	tamer::sym: FNV => Fx Hash For strings of any notable length, Fx Hash outperforms FNV. Rustc also moved to this hash function and noticed performance improvements. Fortunately, as was accounted for in the design, this was a trivial switch. Here are some benchmarks to back up that claim: test hash_set::fnv::with_all_new_1000 ... bench: 133,096 ns/iter (+/- 1,430) test hash_set::fnv::with_all_new_1000_with_capacity ... bench: 82,591 ns/iter (+/- 592) test hash_set::fnv::with_all_new_rc_str_1000_baseline ... bench: 162,073 ns/iter (+/- 1,277) test hash_set::fnv::with_one_new_1000 ... bench: 37,334 ns/iter (+/- 256) test hash_set::fnv::with_one_new_rc_str_1000_baseline ... bench: 18,263 ns/iter (+/- 261) test hash_set::fx::with_all_new_1000 ... bench: 85,217 ns/iter (+/- 1,111) test hash_set::fx::with_all_new_1000_with_capacity ... bench: 59,383 ns/iter (+/- 752) test hash_set::fx::with_all_new_rc_str_1000_baseline ... bench: 98,802 ns/iter (+/- 1,117) test hash_set::fx::with_one_new_1000 ... bench: 42,484 ns/iter (+/- 1,239) test hash_set::fx::with_one_new_rc_str_1000_baseline ... bench: 15,000 ns/iter (+/- 233) test hash_set::with_all_new_1000 ... bench: 137,645 ns/iter (+/- 1,186) test hash_set::with_all_new_rc_str_1000_baseline ... bench: 163,129 ns/iter (+/- 1,725) test hash_set::with_one_new_1000 ... bench: 59,051 ns/iter (+/- 1,202) test hash_set::with_one_new_rc_str_1000_baseline ... bench: 37,986 ns/iter (+/- 771)	2020-02-24 14:56:28 -05:00
Mike Gerwitz	e4e0089815	TAMER: Initial string interning abstraction This is missing two key things that I'll add shortly: a HashMap-based one for use in the ASG for node mapping, and an entry-based system for manipulations. This has been a nice start for exploring various aspects of Rust development, as well as conventions that I'd like to implement. In particular: - Robust documentation intended to guide people through learning the necessary material about the compiler, as well as related work to rationalize design decisions; - Benchmarks; - TDD; - And just getting used to Rust in general. I've beat this one to death, so I'll commit this and make smaller changes going forward to show how easily it can evolve. (This module was originally named `intern` but this commit and those that follow rewrote it to `sym`.)	2020-02-24 14:56:28 -05:00
Mike Gerwitz	8455a38a1d	Graph-based POC This makes use of Petgraph for representing the dependency graph and uses a separate data structure for both string interning and indexing by symbol name.	2019-12-02 10:05:48 -05:00
Mike Gerwitz	d78d81d721	Cargo.toml: Add petgraph This will be used to represent the dependency graph.	2019-12-02 10:00:53 -05:00
Mike Gerwitz	01e3c33b58	tamer/Cargo.toml: Add quick_xml	2019-11-27 09:16:00 -05:00
Mike Gerwitz	fd1a5837ba	TAMER: Initial commit	2019-11-18 14:05:47 -05:00

16 Commits (c57aa7fb537ef42d78f795bfe86905fd156aaa1f)