Commit Graph

384 Commits (2954c591a1349cbca2c5811d15b363caeef03bd6)

Author SHA1 Message Date
Joseph Frazer 06bc89a9ce [DEV-7134] Pass read event errors up the stack 2020-03-06 14:08:55 -05:00
Joseph Frazer 246a40a047 [DEV-7134] Return error for XmloEvent::SymDecl
We want more than warnings when a XmloEvent::SymDecl symbol has an
unknown "kind".
2020-03-06 13:41:32 -05:00
Joseph Frazer 2228a6158a [DEV-7134] Add alias for LoadResult
It looks better and was recommended by Rust's linter.
2020-03-06 12:44:22 -05:00
Joseph Frazer 4810e7a099 [DEV-7134] Remove unwrap so we can bubble up error messages 2020-03-06 12:32:42 -05:00
Joseph Frazer 590245e191 [DEV-7134] Escalate the error from finding the absolute path
We do not want to have a panic here. The error should be displayed
properly.
2020-03-06 12:24:45 -05:00
Mike Gerwitz bfea768f89 Copyright year 2020 update 2020-03-06 11:05:18 -05:00
Joseph Frazer e613bd8a8c [DEV-7081] Add options to tameld
We want to add an option to set the output file to the linker so we do
not need to redirect output to awk any longer.

This also adds integration tests for tameld.
2020-03-06 09:41:55 -05:00
Joseph Frazer 6ac7641087 [DEV-7083] TAMER: xmle writer
This introduces the writer for xmle files.
2020-03-03 11:21:18 -05:00
Mike Gerwitz c2e6efc0b5 TAMER: Additional crate::ld documentation 2020-03-02 15:54:36 -05:00
Mike Gerwitz b89408e5bb TAMER: Extract quick_xml event-related mocks 2020-02-26 10:49:01 -05:00
Mike Gerwitz 19a6d67dc4 TAMER: Separate static xmle section 2020-02-26 10:49:01 -05:00
Mike Gerwitz 7c60b53de8 TAMER: Virtual symbol override 2020-02-26 10:49:01 -05:00
Mike Gerwitz ab3aec980d TAMER: POC: Use FxHash to remove nondeterminism
The default SipHash is a cryptographic hash and causes ordering to change
between runs.
2020-02-26 10:49:00 -05:00
Mike Gerwitz 645908e258 TAMER: xmle output changes to support Summary Page
Co-Authored-By: Joseph Frazer <joseph.frazer@ryansg.com>
2020-02-26 10:49:00 -05:00
Mike Gerwitz 6939753ca0 TAMER: POC: Output xmle
This is a working proof-of-concept that will be finalized in future commits.
2020-02-26 10:49:00 -05:00
Mike Gerwitz 85a4934db5 TAMER: Symbol source data and metadata 2020-02-26 10:49:00 -05:00
Mike Gerwitz bcc2ab1221 TAMER: Initial abstract semantic graph (ASG)
This begins to introduce the ASG, backed by Petgraph.  The API will continue
to evolve, and Petgraph will likely be encapsulated so that our
implementation can vary independently from it (or even remove it in the
future).
2020-02-26 10:48:59 -05:00
Mike Gerwitz 10b9caa7ad TAMER: Fail on empty fragment ids (and fix underlying problem) 2020-02-25 16:46:28 -05:00
Mike Gerwitz a0893da577 TAMER: xmlo: Add Package event 2020-02-25 16:46:27 -05:00
Mike Gerwitz a8726918f7 TAMER: poc: Use xmlo reader
TODO: More information
2020-02-25 16:46:27 -05:00
Mike Gerwitz a929c8cae4 TAMER: xmlo reader
This introduces the reader for xmlo files produced by the XSLT-based
compiler.  It is an initial implementation but is not complete; see future
commits.
2020-02-25 16:46:25 -05:00
Mike Gerwitz 6aae741162 TAMER (sym::Interner::intern_utf8_unchecked): New function
This removes boilerplate for reading xmlo files.  See next commit.
2020-02-25 16:10:55 -05:00
Mike Gerwitz e8cd378d59 TAMER: Display for Symbol
One of the benefits of storing a reference to the interned string on the
symbol itself is that we get to get its underlying value essentially for
free.
2020-02-24 14:56:28 -05:00
Mike Gerwitz ff0c8bb34f Order symtable, sym-dep, fragments
This ordering will simplify streaming processing of xmlo files in
TAMER.  Specifically, we know that symbols will have been declared by the
time dependencies are added to the graph (and so we should only be creating
edges to existing nodes); and we can halt reading as soon as the closing
fragments tag is encountered, avoiding parsing the entirety of these massive
XML files.

On one particularly large program, this cuts time down from ~0.333s to
~0.300 in the POC linker.
2020-02-24 14:56:28 -05:00
Mike Gerwitz 1f4db84f24 TAMER: Arena-based string interner
Contrary to what I said previously, this replaces the previous
implementation with an arena-backed internment system.  The motivation for
this change was investigating how Rustc performed its string interning, and
why they chose to associate integer identifiers with symbols.

The intent was originally to use Rustc's arena allocator directly, but that
create pulled in far too many dependencies and depended on nightly
Rust.  Bumpalo provides a very similar implementation to Rustc's
DroplessArena, so I went with that instead.

Rustc also relies on a global, singleton interner.  I do not do that
here.  Instead, the returned Symbol carries a lifetime of the underlying
arena, as well as a pointer to the interned string.

Now that this is put to rest, it's time to move on.
2020-02-24 14:56:28 -05:00
Mike Gerwitz 176d099fb6 tamer::sym: FNV => Fx Hash
For strings of any notable length, Fx Hash outperforms FNV.  Rustc also
moved to this hash function and noticed performance
improvements.  Fortunately, as was accounted for in the design, this was a
trivial switch.

Here are some benchmarks to back up that claim:

test hash_set::fnv::with_all_new_1000                 ... bench:     133,096 ns/iter (+/- 1,430)
test hash_set::fnv::with_all_new_1000_with_capacity   ... bench:      82,591 ns/iter (+/- 592)
test hash_set::fnv::with_all_new_rc_str_1000_baseline ... bench:     162,073 ns/iter (+/- 1,277)
test hash_set::fnv::with_one_new_1000                 ... bench:      37,334 ns/iter (+/- 256)
test hash_set::fnv::with_one_new_rc_str_1000_baseline ... bench:      18,263 ns/iter (+/- 261)
test hash_set::fx::with_all_new_1000                  ... bench:      85,217 ns/iter (+/- 1,111)
test hash_set::fx::with_all_new_1000_with_capacity    ... bench:      59,383 ns/iter (+/- 752)
test hash_set::fx::with_all_new_rc_str_1000_baseline  ... bench:      98,802 ns/iter (+/- 1,117)
test hash_set::fx::with_one_new_1000                  ... bench:      42,484 ns/iter (+/- 1,239)
test hash_set::fx::with_one_new_rc_str_1000_baseline  ... bench:      15,000 ns/iter (+/- 233)
test hash_set::with_all_new_1000                      ... bench:     137,645 ns/iter (+/- 1,186)
test hash_set::with_all_new_rc_str_1000_baseline      ... bench:     163,129 ns/iter (+/- 1,725)
test hash_set::with_one_new_1000                      ... bench:      59,051 ns/iter (+/- 1,202)
test hash_set::with_one_new_rc_str_1000_baseline      ... bench:      37,986 ns/iter (+/- 771)
2020-02-24 14:56:28 -05:00
Mike Gerwitz 541fbffc2e tameld: Move documentation to tamer::ld 2020-02-24 14:56:28 -05:00
Mike Gerwitz f2b24e6505 HashMapInterner: New interner, docs, and benchmarks
This interner will be suitable for providing an index to look up nodes in
the ASG.
2020-02-24 14:56:28 -05:00
Mike Gerwitz 9a98644213 TAMER: sym::tests: Generate with macro
This will be used for generating the common tests between HashSet and
HashMap implementations.

This is my first macro in Rust.  There does not seem to be a way to
concatenate identifiers (!), so I'm placing them within modules
instead.  That ended up working out just fine, since then I can use a type
to provide the SUT.
2020-02-24 14:56:28 -05:00
Mike Gerwitz e4e0089815 TAMER: Initial string interning abstraction
This is missing two key things that I'll add shortly: a HashMap-based one
for use in the ASG for node mapping, and an entry-based system for
manipulations.

This has been a nice start for exploring various aspects of Rust
development, as well as conventions that I'd like to implement.  In
particular:

  - Robust documentation intended to guide people through learning the
    necessary material about the compiler, as well as related work to
    rationalize design decisions;
  - Benchmarks;
  - TDD;
  - And just getting used to Rust in general.

I've beat this one to death, so I'll commit this and make smaller changes
going forward to show how easily it can evolve.

(This module was originally named `intern` but this commit and those that
follow rewrote it to `sym`.)
2020-02-24 14:56:28 -05:00
Mike Gerwitz 8455a38a1d Graph-based POC
This makes use of Petgraph for representing the dependency graph and uses a
separate data structure for both string interning and indexing by symbol
name.
2019-12-02 10:05:48 -05:00
Mike Gerwitz 8374541965 tamer: Initial baisc POC with no XML output
This is garbage code.  Do not use it.  It is intentionally throwaway.

While I've researched Rust, I haven't actually _used_ it for a project, so
this is a combination of me exploring various ways of accomplishing the
problem and forcing myself to learn certain aspects of the language.

I'll likely be using petgraph, and this also currently lacks symbol
abstractions.  This commit also performs far too much heap allocation
copying strings around.  But it _does_ perform the topological sort.

Since this only stores the symbol name, it lacks enough information about
the symbol to perform a proper linking.
2019-12-02 10:00:53 -05:00
Mike Gerwitz 7412a8934c tameld: Placeholder binary 2019-11-20 10:11:00 -05:00
Mike Gerwitz fd1a5837ba TAMER: Initial commit 2019-11-18 14:05:47 -05:00