Commit Graph

758 Commits (f183600c3ae4620bde0105808971e0030f40685d)

Author SHA1 Message Date
Mike Gerwitz e562d7fcc8 tamer: sym: Begin SymbolIndex base data generalization
This was previously a NonZeroU32, but it was intended to support NonZeroU16
as well for packages, so that we can fit symbols into smaller spaces.  In
particular, the upcoming Span wants to fit within 8 bytes, and so requires a
smaller SymbolIndex type.

I'm unhappy with this current implementation, and so comments are unfinished
and there are a couple ignores for dead code warnings.  I want to flip the
`SupportedSymbolIndex` trait so that users can specify the primitive rather
than the NonZero* type, which is really awkward-looking and verbose,
especially if you have to do `SymbolIndex::<NonZeroU32>::from_int` or
something.  It also prevents (at least in the cases I've observed) Rust from
inferring the proper type for you based on the argument you provide.

So, the goal will be `SymbolIndex::<u32>::from_int(n)`, for example.
2021-07-28 15:21:15 -04:00
Mike Gerwitz ca6ef3ed36 tamer: frontend: Begin basic XML parsing
The first step in the process is to emit the raw XML events that can then be
immediately output again to echo the results into another file.  This will
then allow us to begin parsing the input incrementally, and begin to morph
the output into a real `xmlo` file.
2021-07-27 00:37:13 -04:00
Mike Gerwitz d9dcfe8777 tamer: Introduce tpwrap module to contain quick_xml::Error adapter
This adapter exists to implement PartialEq so that it can be derived on
Error objects.  This is used primarily (well, exclusively atm) for tests.
2021-07-23 23:23:55 -04:00
Mike Gerwitz fb8422d670 tamer: Initial frontend concept
This introduces the beginnings of frontends for TAMER, gated behind a
`wip-features` flag.

This will be introduced in stages:

  1. Replace the existing copy with a parser-based copy (echo back out the
     tokens), when the flag is on.
  2. Begin to parse portions of the source, augmenting the output xmlo (xmli
     at the moment).  The XSLT-based compiler will be modified to skip
     compilation steps as necessary.

As portions of the compilation are implemented in TAMER, they'll be placed
behind their own feature flags and stabalized, which will incrementally
remove the compilation steps from the XSLT-based system.  The result should
be substantial incremental performance improvements.

Short-term, the priorities are for loading identifiers into an IR
are (though the order may change):

  1. Echo
  2. Imports
  3. Extern declarations.
  4. Simple identifiers (e.g. param, const, template, etc).
  5. Classifications.
  6. Documentation expressions.
  7. Calculation expressions.
  8. Template applications.
  9. Template definitions.
  10. Inline templates.

After each of those are done, the resulting xmlo (xmli) will have fully
reconstructed the source document from the IR produced during parsing.
2021-07-23 22:24:08 -04:00
Mike Gerwitz 5aaa1106cb tamer: obj::xmlo::reader::mock: Extract into crate::test::quick_xml
Other mocks exist here, and here it can be re-used for the upcoming XML
frontend.
2021-07-22 15:32:30 -04:00
Mike Gerwitz 2e50af1220 Copyright year update 2021 2021-07-22 15:00:15 -04:00
Mike Gerwitz e5bbd49166 tamer: obj::xmlo::reader: Extract tests separate file
The file's getting a bit large and the tests are rather complex.  Further,
LSP does better on smaller, less complex files.
2021-07-22 14:39:06 -04:00
Mike Gerwitz 1f24cfdf25 Remove :map: sym-dep generation
This was incorrect to begin with---it does not make sense that an input
mapping should depend upon the identifier that it maps to, in the sense that
we make use of these dependencies.  If we add weak symbol references in the
future, then this can be reintroduced.

By removing this, we free tameld from having to perform the check itself.

.rev-xmlo bumped to force rebuilding of object files since the linker now
expects that no such dependencies will exist within them.
2021-07-22 14:27:15 -04:00
Mike Gerwitz 90c6b51fd5 tamer: tameld: Place constants into static section in executable
This is something that changed when the TAMER POC was initially created, as
I was learning Rust.  I don't recall the original reason why this was moved,
but it could have been moved back long ago.

In our systems, constants can hold tables (as matrices) with tens or
hundreds of thousands of rows, and there are a number of them in certain
projects.  As an example, the YAML-based test cases for one of our systems
went from ~2m30s to ~45s after this change was made.  Much of the cost
savings comes from saving GC.
2021-07-21 14:53:15 -04:00
Mike Gerwitz 716556c39f tamer: Rust 1.{42=>48}.0 for stable intra-doc links without nightly 2021-06-21 13:10:00 -04:00
Mike Gerwitz 96ffd5f6e5 [DEV-8000] ir::asg: Error types for unresolved identifiers during sorting
This checks explicitly for unresolved objects while sorting and provides an
explicit error for them.  For example, this will catch externs that have no
concrete resolution.

This previously fell all the way through to the unreachable! block.  The old
POC implementation was catching unresolved objects, albeit with a debug
error.
2020-07-02 01:38:32 -04:00
Mike Gerwitz a2415c8c6f [DEV-8000] ir::asg::base: Replace Symbol::new_dummy
Use symbol_dummy!.
2020-07-01 15:53:56 -04:00
Mike Gerwitz 0d4bbe5e4e [DEV-8000] ir::asg: Introduce SortableAsgError
This will be used for the next commit, but this change has been isolated
both because it distracts from the implementation change in the next commit,
and because it cleans up the code by removing the need for a type parameter
on `AsgError`.

Note that the sort test cases now use `unwrap` instead of having
`{,Sortable}AsgError` support one or the other---this is because that does
not currently happen in practice, and there is not supposed to be a
hierarchy; they are siblings (though perhaps their name may imply otherwise).
2020-07-01 13:42:14 -04:00
Mike Gerwitz f832feb3fa [DEV-8000] ir::asg::base::BaseAsg::check_cycles: Extract into function
The only reason this function was a method of `BaseAsg` was because of
`self.graph`, which is accessible within the scope of this
module.  `check_cycles` is logically associated with `SortableAsg`, and so
should exist alongside it (though it can't exist as an associated function
of that trait).
2020-07-01 11:02:20 -04:00
Joseph Frazer 43d00a8268 [DEV-7504] Add GraphML generation
We want to be able to build a representation of the dependency graph so
we can easily inspect it.

We do not want to make GraphML by default. It is better to use a tool.
We use "petgraph-graphml".
2020-05-13 08:04:48 -04:00
Mike Gerwitz 0127d4b698 TAMER: sym::Interner::index_lookup
This was originally omitted because there wasn't a use case for it.  Now
that we're adding context to errors, however, an owned value is highly
desirable.

This adds almost no measurable overhead to the internment system in
benchmarks (largely within the margin of error).
2020-04-29 11:33:41 -04:00
Mike Gerwitz bcca5f7c49 [DEV-7084] TAMER: AsgBuilder and IR lowering docs 2020-04-28 13:39:55 -04:00
Mike Gerwitz 0f4b2d75f8 [DEV-7084] TAMER: obj::xmlo: Private inner modules 2020-04-28 11:08:05 -04:00
Mike Gerwitz 549e9ca23b [DEV-7084] TAMER: AsgBuilderState:🆕 New constructor 2020-04-28 09:06:25 -04:00
Mike Gerwitz 9893d56775 [DEV-7084] TAMER: Finalize AsgBuilder 2020-04-28 09:06:25 -04:00
Mike Gerwitz 32abc7dce2 [DEV-7084] TAMER: impl PartialEq for XmloError
This cannot be dervied because XmlError does not implement PartialEq,
which is quite the annoyance in tests.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 21a0bdcce1 [DEV-7084] TAMER: AsgBuilderError: Introduce proper error variants
This is a union (sum type) of three other errors types, plus errors specific
to this builder.

This commit does a good job demonstrating the boilerplate, as well as a need
for additional context (in the case of `IdentKindError`), that we'll want to
work on abstracting away.
2020-04-28 09:06:25 -04:00
Mike Gerwitz ef79a763ac [DEV-7084] TAMER: Correct Ix trait bound for AsgError
The `Debug` bound is inconvenient and requires propagation to any types that
use it.  Further, it's really awkward having `Display` depend on `Debug`; if
we want to render a useful display here, we can write one.

To be clear: IndexType implements Debug.

For now, this is pretty-printed by another part of the code, which we don't
want to implement in `Display` because it requires looking things up from
the graph.
2020-04-28 09:06:25 -04:00
Mike Gerwitz cfc13f9016 [DEV-7084] TAMER: ir::asg::IdentKindError: Replace string with enum 2020-04-28 09:06:25 -04:00
Mike Gerwitz 0a9a3214b7 [DEV-7084] TAMER: ir::asg::BaseAsg:🆕 New associated function
Profiling showed that creating an initial capacity of 0 did not have a
notable affect on performance.
2020-04-28 09:06:25 -04:00
Mike Gerwitz ecc2e33ba7 [DEV-7084] TAMER: xmlo::AsgBuilder: Accept XmloResult iterator
This flips the API from using XmloWriter as the context to using Asg and
consuming anything that can produce XmloResults.  This not only makes more
sense, but avoids having to create a trait for XmloReader, and simplifies
the trait bounds we have to concern ourselves with.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 323ea79bf8 [DEV-7084] TAMER: Basic AsgBuilder cleanup
This just tidies things up a little bit before I get into some further
refactoring.  I wrote the original code when I was just learning Rust not
too long ago, so it's interesting to see how my understanding has changed
over that relatively short period of time.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 9220de4769 [DEV-7084] TAMER: Finish encapsulating petgraph
This will allow us to migrate away from Petgraph in the future should we
choose to do so.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 0f423f3b24 [DEV-7084] TAMER: Simplify path canonicalization
This abstracts away the canonicalizer and solves the problem whereby
canonicalization was not being performed prior to recording whether a path
has been visited.  This ensures that multiple relative paths to the same
file will be properly recognized as visited.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 4a7e00c404 [DEV-7084] TAMER: ld::poc: Remove unused fragments arg 2020-04-28 09:06:25 -04:00
Mike Gerwitz c94120335f [DEV-7084] TAMER: ld::poc: Remove unnecessary initial path canonicalization
Less to refactor and test.
2020-04-28 09:06:25 -04:00
Mike Gerwitz da69118592 [DEV-7084] TAMER: AsgBuilderState
This completes the POC extraction for AsgBuilder, but is still POC
code.  The commits that follow will clean it up and provide tests.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 3f46917da9 [DEV-7084] TAMER: AsgBuilder extracted from POC
This extracts the changes nearly verbatim before doing refactoring so that
it's easier to observe what changes have been made.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 7ed0691c45 [DEV-7084] TAMER: fs: impl File for BufReader
This further simplifies the POC linker.
2020-04-28 09:06:25 -04:00
Mike Gerwitz fbfb3c4ba2 [DEV-7084] TAMER: CanonicalFile
This will be entirely replaced in an upcoming commit.  See that for
details.  I don't feel like dealing with the conflicts for rearranging and
squashing these commits.
2020-04-28 09:06:25 -04:00
Mike Gerwitz d97e53a835 [DEV-7084] TAMER: fs: Basic filesystem abstraction
This also includes an implementation to visit paths only once.  Note that it
does not yet canonicalize the path before visiting, so relative paths to the
same file can slip through, and relative paths to _different_ files could be
erroneously considered to have been visited.

This will be fixed in an upcoming commit.
2020-04-28 09:06:19 -04:00
Mike Gerwitz 90ed4e9bd6 [DEV-7084] TAMER: From<B, &I> for XmloReader
This serves as a constructor for the time being, decoupling from POC.  We
may do something better once we have a better idea of how the various
abstractions around this will evolve.
2020-04-20 10:53:51 -04:00
Joseph Frazer 2c587e2d9d [DEV-7147] Add "tamec" executable
Add a stub executable that will eventually become a full-featured TAME
compiler. The first implementation will only copy the source file to an
intermediary file that will be compiled by the XSLT compiler.
2020-04-09 09:46:46 -04:00
Mike Gerwitz 8385b64e1d [DEV-7086] TAMER: Remove WIP linker warning
While it is true that this is still being finalized, the warnings originally
existed because tameld was not feature complete.  It is now.
2020-04-06 10:04:19 -04:00
Mike Gerwitz 68c7636be8 [DEV-7086] TAMER: ir::asg::base::test Add missing set_fragment failure test
Results the last remaining BaseAsg test TODO.
2020-04-06 09:56:13 -04:00
Mike Gerwitz b870480944 [DEV-7086] TAMER: ir::asg::TransitionError::BadFragmentDest tuple=>struct
Consistency.
2020-04-06 09:56:13 -04:00
Mike Gerwitz da5057058d [DEV-7086] TAMER: Disallow IdentObject::resolve redeclarations
Except under well-defined circumstances.
2020-04-06 09:56:12 -04:00
Mike Gerwitz 0868453dab [DEV-7086] Proper handling of identifier overrides
This is an awkward system that I'd like to remove at some point.  It adds
complexity.  For the meantime, overrides have been arbitrarily restricted to
a single override (no override-override).  But it's needed being until we
rework maps and can handle the illusion of overrides using the template
system.
2020-04-06 09:55:54 -04:00
Mike Gerwitz a4657580ca [DEV-7086] TAMER: TransitionError::Incompatible: Remove unused 2020-04-01 15:56:33 -04:00
Mike Gerwitz 0f9acd16cd [DEV-7086] TAMER: BaseAsg::set_fragment: Remove duplicate code
Benchmark performance for this method is still substantially slower.  And
oddly, this nearly doubled the speed of the other two calls (granted, at
that speed, it doesn't matter).
2020-03-31 14:56:34 -04:00
Mike Gerwitz f7ed0dbff3 [DEV-7086] ASG benchmarks 2020-03-31 14:18:26 -04:00
Mike Gerwitz 7c65d729aa TAMER: BaseAsg test: Remove fulfilled stub TODO 2020-03-26 16:16:51 -04:00
Mike Gerwitz 4051debad2 [DEV-7087] TAMER: Add Source to IdentObject::Extern
All of these refactoring commits to arrive at this one final change: the
ability to store the source location for externs so that we can report on
what package is expecting an identifier to be defined.

Phew.  Goodnight.
2020-03-26 09:22:21 -04:00
Mike Gerwitz f44549d730 [DEV-7087] TAMER: Object{State,Data}: API representative of state transitions
The API now enforces beginning at Missing and transitioning through
states.  Methods have been renamed to reflect this.
2020-03-26 09:22:17 -04:00
Mike Gerwitz d3ecd7b228 [DEV-7087] TAMER: BaseAsg: Refactor duplicate declare{,_extern} code 2020-03-26 09:21:50 -04:00
Mike Gerwitz 40eaeb3dc8 [DEV-7087] TAMER: Remote optional Source from ASG and Object
This undoes work I did earlier today...but now we'll be able to support a
Source on an extern.

There is duplicate code between `BaseAsg::declare{,_extern}` that will be
resolved in an upcoming commit.  Upcoming commits will also simplify
terminology and clean up methods on ObjectState.
2020-03-26 09:18:08 -04:00
Mike Gerwitz 7dd8717f2f [DEV-7087] TAMER: Asg: Reintroduce declare_extern
There is some duplication here with `declare` that will be cleared up in a
following commit.  Reintroducing this method is necessary so that Source can
be used to represent the source location of the extern itself; it's
currently None to indicate an extern in `declare`.
2020-03-26 09:15:59 -04:00
Mike Gerwitz 537d9e64af [DEV-7087] TAMER: ObjectState: Introduce extern transition
This is the first step in a more incremental refactoring that previous
commits to undo the optional Source in `ObjectState::ident`.  This provides
an explicit transition to an extern, with the intent of requiring an initial
missing state.  This will simplify logic on the ASG.

Note that the Source provided to this new method is not yet used.  That too
will come in a following commit and will represent the source of the defined
extern rather than the concrete identifier.
2020-03-26 09:14:29 -04:00
Mike Gerwitz d6762ab547 [DEV-7087] TAMER: Type compatability check during extern resolution
This properly verifies extern types, and cleans up Asg's API a little so
that externs aren't handled much differently than other declarations.

With that said, after making src optional, I realized that we will indeed
want source information for externs themselves so we can direct the user to
what package is expecting that symbol (as the old linker does).  So this
approach will not work, and I'll have to undo some of those changes.
2020-03-26 09:14:26 -04:00
Mike Gerwitz 7a972465ea [DEV-7087] TAMER: tameld: Format error output
We will want an option for verbose debug output in the future.
2020-03-26 09:08:13 -04:00
Mike Gerwitz 05d03dc4bb [DEV-7087] Beginning of extern type verification and reporting
This only verifies when externs are defined _before_ they need to be
resolved.  See a future commit for the rest of this.
2020-03-26 09:08:13 -04:00
Mike Gerwitz b35dd4f4dd [DEV-7087] TAMER: AsgError: Wrap TransitionError
See next commit.
2020-03-26 09:08:10 -04:00
Joseph Frazer 6386e096b4 [DEV-7133] Clearly show the cycles in the output 2020-03-26 08:48:43 -04:00
Joseph Frazer 8af93d9339 [DEV-7133] Check for cyclic dependencies
We want the linker to show an error when a cyclic dependency is
encountered.

Co-authored-by: Mike Gerwitz <mike.gerwitz@ryansg.com>
2020-03-26 08:48:43 -04:00
Joseph Frazer 59f194a46a [DEV-7133] Add AsgError::Cycle
We want a special error type when we detect cyclic dependencies.
2020-03-26 08:48:43 -04:00
Mike Gerwitz 7a4f6cf9f2 [DEV-7087] TAMER: symbol_dummy! macro 2020-03-24 14:14:05 -04:00
Mike Gerwitz f969877324 [DEV-7087] TAMER: {=>Ident}Object{,State,Data}
This is essential to clarify what exactly the different object types
represent with the new generic abstractions.  For example, we will have
expressions as an object type.
2020-03-24 09:56:25 -04:00
Mike Gerwitz 5fb68f9b67 TAMER: Make Asg generic over object
There's a lot here to make the object stored on the `Asg` generic.  This
introduces `ObjectState` for state transitions and `ObjectData` for pure
data retrieval.  This will allow not only for mocking, but will be useful to
enforce compile-time restrictions on the type of objects expected by the
linker vs. the compiler (e.g. the linker will not have expressions).

This commit intentionally leaves the corresponding tests in their original
location to prove that the functionality has not changed; they'll be moved
in a future commit.

This also leaves the names as "Object" to reduce the number the cognative
overhead of this commit.  It will be renamed to something like "IdentObject"
in the near future to clarify the intent of the current object type and to
open the way for expressions and a type that marries both of them in the
future.

Once all of this is done, we'll finally be able to make changes to the
compatibility logic in state transitions to implement extern compatibility
checks during resolution.

DEV-7087
2020-03-24 09:56:20 -04:00
Mike Gerwitz f20120787f TAMER: Extract identifier transitions into Object
The next commit will generalize this further.  This moves logic out of
BaseAsg so that we can implement more sophisticated transitions for
compatability checks.

The logic is still tested as part of BaseAsg; the next commit will change
that as it's generalized further.

* tamer/src/ir/asg/base.rs: Extract object transitions.
* tamer/src/ir/asg/graph.rs (AsgError)[IncompatibleIdent]: New variant.
  (From<TransitionError> for AsgError): Basic type translation.
* tamer/src/ir/asg/object.rs (TransitionResult): New type.
  (impl Object): Transition methods.
  (TransitionError): New enum.
2020-03-19 15:42:06 -04:00
Mike Gerwitz 3fe3fc4b84 TAMER: ld/poc: Simplify {get_interner_value=>get_ident} 2020-03-19 15:42:06 -04:00
Mike Gerwitz 400d5b25a1 ir::asg::Object::Empty: Remove variant
This variant is unnecessary, as it was used only by the indexer to represent
the absence of a node, for which was can simply use `None` in the containing
`Option`.

* tamer/Cargo.toml: Add `lazy_static`.
* tamer/Cargo.lock: Update.
* tamer/src/ir/asg/base.rs (with_capacity): Use `None` in place of
    `Some(Object::Empty)`.
* tamer/src/ir/asg/object.rs: Adjust state machine graphic.
  (Empty): Remove variant.
  (Missing): Remove reference to variance.
* tamer/src/lib.rs: Import `lazy_static` for test builds.
* tamer/obj/xmle/writer/writer.rs (Section::iter): Remove `Object::Empty`
    from documentation.
  (test::): Remove references to `Object::Missing`.  `lazy_static!` used
    here.
* tamer/obj/xmle/writer/xmle.rs (test::write_section_catch_missing): Replace
    reference to `Object::Missing`.
2020-03-19 15:42:06 -04:00
Mike Gerwitz 0a135ad707 TAMER: Tidy up graph_sort test
This still isn't comprehensive.  Further, it won't be able to be, because
we'd have to rely on Petgraph implementation details: there are potentially
many acceptable orderings for a given graph.
2020-03-13 11:51:59 -04:00
Joseph Frazer 7e95394076 [DEV-7085] Create `SortableAsg` trait
Create a trait that sorts a graph into `Sections` that can then be used
as an IR. The `BaseAsg` should implement the trait using what was
originally in the POC.
2020-03-13 11:51:59 -04:00
Joseph Frazer bc760387f6 [DEV-7085] Implement `PartialEq` for `Sections`
We want to be able to easily compare `Sections` in tests, so
implementing `PartialEq` (and `Debug`) for both `Sections` and `Section`
is required.
2020-03-13 11:51:59 -04:00
Joseph Frazer 59a0c382af [DEV-7085] Move sections to IR module
We need to use `Sections` in both the writer and the ASG so it needs to
be in a place that makes sense.
2020-03-13 11:51:59 -04:00
Joseph Frazer b5f6a082dd [DEV-7134] Remove unnecessary node replacement
The node was being replaced before we were catching errors properly. Now
that they are propagated, we should not need the replacement.
2020-03-09 11:41:11 -04:00
Joseph Frazer 01e7d3e560 [DEV-7134] Propagate errors from the writer
When an error occurs during the XML writing, they should be shown to the
user.
2020-03-09 08:23:13 -04:00
Joseph Frazer f373a00a80 [DEV-7134] Propagate sorting errors
If a node is found while sorting that is not expected, we should show
the error to the user.
2020-03-09 08:23:13 -04:00
Joseph Frazer 2a5551a04a [DEV-7134] Propagate errors setting fragments
If we cannot set a fragment, we need to display the error to the user.

We are currently ignoring "___head", "___tail", and objects that are
both virtual and overridden. Those will be corrected in with future
changes.
2020-03-09 08:23:13 -04:00
Joseph Frazer 06bc89a9ce [DEV-7134] Pass read event errors up the stack 2020-03-06 14:08:55 -05:00
Joseph Frazer 246a40a047 [DEV-7134] Return error for XmloEvent::SymDecl
We want more than warnings when a XmloEvent::SymDecl symbol has an
unknown "kind".
2020-03-06 13:41:32 -05:00
Joseph Frazer 2228a6158a [DEV-7134] Add alias for LoadResult
It looks better and was recommended by Rust's linter.
2020-03-06 12:44:22 -05:00
Joseph Frazer 4810e7a099 [DEV-7134] Remove unwrap so we can bubble up error messages 2020-03-06 12:32:42 -05:00
Joseph Frazer 590245e191 [DEV-7134] Escalate the error from finding the absolute path
We do not want to have a panic here. The error should be displayed
properly.
2020-03-06 12:24:45 -05:00
Mike Gerwitz bfea768f89 Copyright year 2020 update 2020-03-06 11:05:18 -05:00
Joseph Frazer e613bd8a8c [DEV-7081] Add options to tameld
We want to add an option to set the output file to the linker so we do
not need to redirect output to awk any longer.

This also adds integration tests for tameld.
2020-03-06 09:41:55 -05:00
Joseph Frazer 6ac7641087 [DEV-7083] TAMER: xmle writer
This introduces the writer for xmle files.
2020-03-03 11:21:18 -05:00
Mike Gerwitz c2e6efc0b5 TAMER: Additional crate::ld documentation 2020-03-02 15:54:36 -05:00
Mike Gerwitz b89408e5bb TAMER: Extract quick_xml event-related mocks 2020-02-26 10:49:01 -05:00
Mike Gerwitz 19a6d67dc4 TAMER: Separate static xmle section 2020-02-26 10:49:01 -05:00
Mike Gerwitz 7c60b53de8 TAMER: Virtual symbol override 2020-02-26 10:49:01 -05:00
Mike Gerwitz ab3aec980d TAMER: POC: Use FxHash to remove nondeterminism
The default SipHash is a cryptographic hash and causes ordering to change
between runs.
2020-02-26 10:49:00 -05:00
Mike Gerwitz 645908e258 TAMER: xmle output changes to support Summary Page
Co-Authored-By: Joseph Frazer <joseph.frazer@ryansg.com>
2020-02-26 10:49:00 -05:00
Mike Gerwitz 6939753ca0 TAMER: POC: Output xmle
This is a working proof-of-concept that will be finalized in future commits.
2020-02-26 10:49:00 -05:00
Mike Gerwitz 85a4934db5 TAMER: Symbol source data and metadata 2020-02-26 10:49:00 -05:00
Mike Gerwitz bcc2ab1221 TAMER: Initial abstract semantic graph (ASG)
This begins to introduce the ASG, backed by Petgraph.  The API will continue
to evolve, and Petgraph will likely be encapsulated so that our
implementation can vary independently from it (or even remove it in the
future).
2020-02-26 10:48:59 -05:00
Mike Gerwitz 10b9caa7ad TAMER: Fail on empty fragment ids (and fix underlying problem) 2020-02-25 16:46:28 -05:00
Mike Gerwitz a0893da577 TAMER: xmlo: Add Package event 2020-02-25 16:46:27 -05:00
Mike Gerwitz a8726918f7 TAMER: poc: Use xmlo reader
TODO: More information
2020-02-25 16:46:27 -05:00
Mike Gerwitz a929c8cae4 TAMER: xmlo reader
This introduces the reader for xmlo files produced by the XSLT-based
compiler.  It is an initial implementation but is not complete; see future
commits.
2020-02-25 16:46:25 -05:00
Mike Gerwitz 6aae741162 TAMER (sym::Interner::intern_utf8_unchecked): New function
This removes boilerplate for reading xmlo files.  See next commit.
2020-02-25 16:10:55 -05:00
Mike Gerwitz e8cd378d59 TAMER: Display for Symbol
One of the benefits of storing a reference to the interned string on the
symbol itself is that we get to get its underlying value essentially for
free.
2020-02-24 14:56:28 -05:00
Mike Gerwitz ff0c8bb34f Order symtable, sym-dep, fragments
This ordering will simplify streaming processing of xmlo files in
TAMER.  Specifically, we know that symbols will have been declared by the
time dependencies are added to the graph (and so we should only be creating
edges to existing nodes); and we can halt reading as soon as the closing
fragments tag is encountered, avoiding parsing the entirety of these massive
XML files.

On one particularly large program, this cuts time down from ~0.333s to
~0.300 in the POC linker.
2020-02-24 14:56:28 -05:00
Mike Gerwitz 1f4db84f24 TAMER: Arena-based string interner
Contrary to what I said previously, this replaces the previous
implementation with an arena-backed internment system.  The motivation for
this change was investigating how Rustc performed its string interning, and
why they chose to associate integer identifiers with symbols.

The intent was originally to use Rustc's arena allocator directly, but that
create pulled in far too many dependencies and depended on nightly
Rust.  Bumpalo provides a very similar implementation to Rustc's
DroplessArena, so I went with that instead.

Rustc also relies on a global, singleton interner.  I do not do that
here.  Instead, the returned Symbol carries a lifetime of the underlying
arena, as well as a pointer to the interned string.

Now that this is put to rest, it's time to move on.
2020-02-24 14:56:28 -05:00
Mike Gerwitz 176d099fb6 tamer::sym: FNV => Fx Hash
For strings of any notable length, Fx Hash outperforms FNV.  Rustc also
moved to this hash function and noticed performance
improvements.  Fortunately, as was accounted for in the design, this was a
trivial switch.

Here are some benchmarks to back up that claim:

test hash_set::fnv::with_all_new_1000                 ... bench:     133,096 ns/iter (+/- 1,430)
test hash_set::fnv::with_all_new_1000_with_capacity   ... bench:      82,591 ns/iter (+/- 592)
test hash_set::fnv::with_all_new_rc_str_1000_baseline ... bench:     162,073 ns/iter (+/- 1,277)
test hash_set::fnv::with_one_new_1000                 ... bench:      37,334 ns/iter (+/- 256)
test hash_set::fnv::with_one_new_rc_str_1000_baseline ... bench:      18,263 ns/iter (+/- 261)
test hash_set::fx::with_all_new_1000                  ... bench:      85,217 ns/iter (+/- 1,111)
test hash_set::fx::with_all_new_1000_with_capacity    ... bench:      59,383 ns/iter (+/- 752)
test hash_set::fx::with_all_new_rc_str_1000_baseline  ... bench:      98,802 ns/iter (+/- 1,117)
test hash_set::fx::with_one_new_1000                  ... bench:      42,484 ns/iter (+/- 1,239)
test hash_set::fx::with_one_new_rc_str_1000_baseline  ... bench:      15,000 ns/iter (+/- 233)
test hash_set::with_all_new_1000                      ... bench:     137,645 ns/iter (+/- 1,186)
test hash_set::with_all_new_rc_str_1000_baseline      ... bench:     163,129 ns/iter (+/- 1,725)
test hash_set::with_one_new_1000                      ... bench:      59,051 ns/iter (+/- 1,202)
test hash_set::with_one_new_rc_str_1000_baseline      ... bench:      37,986 ns/iter (+/- 771)
2020-02-24 14:56:28 -05:00
Mike Gerwitz 541fbffc2e tameld: Move documentation to tamer::ld 2020-02-24 14:56:28 -05:00
Mike Gerwitz f2b24e6505 HashMapInterner: New interner, docs, and benchmarks
This interner will be suitable for providing an index to look up nodes in
the ASG.
2020-02-24 14:56:28 -05:00
Mike Gerwitz 9a98644213 TAMER: sym::tests: Generate with macro
This will be used for generating the common tests between HashSet and
HashMap implementations.

This is my first macro in Rust.  There does not seem to be a way to
concatenate identifiers (!), so I'm placing them within modules
instead.  That ended up working out just fine, since then I can use a type
to provide the SUT.
2020-02-24 14:56:28 -05:00
Mike Gerwitz e4e0089815 TAMER: Initial string interning abstraction
This is missing two key things that I'll add shortly: a HashMap-based one
for use in the ASG for node mapping, and an entry-based system for
manipulations.

This has been a nice start for exploring various aspects of Rust
development, as well as conventions that I'd like to implement.  In
particular:

  - Robust documentation intended to guide people through learning the
    necessary material about the compiler, as well as related work to
    rationalize design decisions;
  - Benchmarks;
  - TDD;
  - And just getting used to Rust in general.

I've beat this one to death, so I'll commit this and make smaller changes
going forward to show how easily it can evolve.

(This module was originally named `intern` but this commit and those that
follow rewrote it to `sym`.)
2020-02-24 14:56:28 -05:00
Mike Gerwitz 8455a38a1d Graph-based POC
This makes use of Petgraph for representing the dependency graph and uses a
separate data structure for both string interning and indexing by symbol
name.
2019-12-02 10:05:48 -05:00
Mike Gerwitz 8374541965 tamer: Initial baisc POC with no XML output
This is garbage code.  Do not use it.  It is intentionally throwaway.

While I've researched Rust, I haven't actually _used_ it for a project, so
this is a combination of me exploring various ways of accomplishing the
problem and forcing myself to learn certain aspects of the language.

I'll likely be using petgraph, and this also currently lacks symbol
abstractions.  This commit also performs far too much heap allocation
copying strings around.  But it _does_ perform the topological sort.

Since this only stores the symbol name, it lacks enough information about
the symbol to perform a proper linking.
2019-12-02 10:00:53 -05:00
Mike Gerwitz 7412a8934c tameld: Placeholder binary 2019-11-20 10:11:00 -05:00
Mike Gerwitz fd1a5837ba TAMER: Initial commit 2019-11-18 14:05:47 -05:00