2019-11-27 09:18:17 -05:00
|
|
|
// TAME in Rust (TAMER)
|
|
|
|
//
|
2023-01-17 23:09:25 -05:00
|
|
|
// Copyright (C) 2014-2023 Ryan Specialty, LLC.
|
2020-03-06 11:05:18 -05:00
|
|
|
//
|
|
|
|
// This file is part of TAME.
|
2019-11-27 09:18:17 -05:00
|
|
|
//
|
|
|
|
// This program is free software: you can redistribute it and/or modify
|
|
|
|
// it under the terms of the GNU General Public License as published by
|
|
|
|
// the Free Software Foundation, either version 3 of the License, or
|
|
|
|
// (at your option) any later version.
|
|
|
|
//
|
|
|
|
// This program is distributed in the hope that it will be useful,
|
|
|
|
// but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
// GNU General Public License for more details.
|
|
|
|
//
|
|
|
|
// You should have received a copy of the GNU General Public License
|
|
|
|
// along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
|
2019-12-06 15:03:29 -05:00
|
|
|
//! An incremental rewrite of TAME in Rust.
|
2022-09-19 10:04:40 -04:00
|
|
|
//!
|
|
|
|
//! There are two entry points to this system:
|
|
|
|
//!
|
|
|
|
//! - [`tamec`](../tamec), the TAME compiler; and
|
|
|
|
//! - [`tameld`](../tameld), the TAME linker.
|
2019-12-06 15:03:29 -05:00
|
|
|
|
2021-10-02 00:50:20 -04:00
|
|
|
// Constant functions are still in their infancy as of the time of writing
|
|
|
|
// (October 2021).
|
2022-05-02 11:05:32 -04:00
|
|
|
// These this feature is used by [`sym::prefill::st_as_sym`] to provide
|
2021-10-02 00:50:20 -04:00
|
|
|
// polymorphic symbol types despite Rust's lack of support for constant
|
|
|
|
// trait methods.
|
|
|
|
// See that function for more information.
|
|
|
|
#![feature(const_transmute_copy)]
|
2022-03-17 12:20:20 -04:00
|
|
|
// This is used to unwrap const Option results rather than providing
|
|
|
|
// panicing alternatives.
|
|
|
|
#![feature(const_option)]
|
2021-10-28 14:27:33 -04:00
|
|
|
// Trait aliases are convenient for reducing verbosity in situations where
|
|
|
|
// type aliases cannot be used.
|
|
|
|
// To remove this feature if it is not stabalized,
|
|
|
|
// simply replace each alias reference with its definition,
|
|
|
|
// or possibly write a trait with a `Self` bound.
|
|
|
|
#![feature(trait_alias)]
|
2021-10-28 21:21:30 -04:00
|
|
|
// Can be replaced with `assert!(matches!(...))`,
|
|
|
|
// but at a loss of a better error message.
|
|
|
|
#![feature(assert_matches)]
|
2021-10-29 16:34:05 -04:00
|
|
|
// Simplifies creating `Option` default values.
|
|
|
|
// To remove this feature,
|
|
|
|
// this can be done more verbosely in the usual way,
|
|
|
|
// or we can write our own version.
|
|
|
|
#![feature(option_get_or_insert_default)]
|
2022-03-25 09:56:22 -04:00
|
|
|
// For `Try` and `FromResidual`,
|
|
|
|
// allowing us to write our own `?`-compatible types.
|
|
|
|
#![feature(try_trait_v2)]
|
2022-04-04 21:50:47 -04:00
|
|
|
// Used primarily for convenience,
|
|
|
|
// rather than having to create type constructors as type aliases that are
|
|
|
|
// not associated with a trait.
|
|
|
|
// However,
|
|
|
|
// this also allows for the associated type default to be overridden by
|
|
|
|
// the implementer,
|
|
|
|
// in which case this feature's only substitute is a type parameter.
|
|
|
|
#![feature(associated_type_defaults)]
|
2022-04-14 15:52:08 -04:00
|
|
|
// Convenience features that are easily replaced if not stabalized.
|
|
|
|
#![feature(nonzero_ops)]
|
2022-09-15 11:10:43 -04:00
|
|
|
// Enabled for qualified paths in `matches!`.
|
|
|
|
#![feature(more_qualified_paths)]
|
2023-02-07 14:59:36 -05:00
|
|
|
// Collecting interators into existing objects.
|
|
|
|
// Can be done manually in a more verbose way.
|
|
|
|
#![feature(iter_collect_into)]
|
tamer: src::asg::graph::object::pkg::name: New module
This introduces, but does not yet integrate, `CanonicalName`, which not only
represents canonicalized package names, but handles namespec resolution.
The term "namespec" is motivated by Git's use of *spec (e.g. refspec)
referring to various ways of specifying a particular object. Names look
like paths, and are derived from them, but they _are not paths_. Their
resolution is a purely lexical operation, and they include a number of
restrictions to simplify their clarity and handling. I expect them to
evolve more in the future, and I've had ideas to do so for quite some time.
In particular, resolving packages in this way and then loading the from the
filesystem relative to the project root will ensure that
traversing (conceptually) to a parent directory will not operate
unintuitively with symlinks. The path will always resolve unambigiously.
(With that said, if the symlink is to a shared directory with different
directory structures, that doesn't solve the compilation problem---we'll
have to move object files into a project-specific build directory to handle
that.)
Span Slicing
------------
Okay, it's worth commenting on the horridity of the path name slicing that
goes on here. Care has been taken to ensure that spans will be able to be
properly sliced in all relevant contexts, and there are plenty of words
devoted to that in the documentation committed here.
But there is a more fundamental problem here that I regret not having solved
earlier, because I don't have the time for it right now: while we do have
SPair, it makes no guarantees that the span associated with the corresponding
SymbolId is actually the span that matches the original source lexeme. In
fact, it's often not.
This is a problem when we want to slice up a symbol in an SPair and produce
a sensible span. If it _is_ a source lexeme with its original span, that's
no problem. But if it's _not_, then the two are not in sync, and slicing up
the span won't produce something that actually makes sense to the user. Or,
worse (or maybe it's not worse?), it may cause a panic if the slicing is out
of bounds.
The solution in the future might be to store explicitly the state of an
SPair, or call it Lexeme, or something, so that we know the conditions under
which slicing is safe. If I ever have time for that in this project.
But the result of the lack of a proper abstraction really shows here: this
is some of the most confusing code in TAMER, and it's really not doing
anything all that complicated. It is disproportionately confusing.
DEV-13162
2023-05-04 12:28:08 -04:00
|
|
|
// Concise and descriptive.
|
|
|
|
// Can be done manually in a more verbose way.
|
|
|
|
#![feature(str_split_remainder)]
|
|
|
|
// Concise and descriptive.
|
|
|
|
// Can be done manually in a more verbose way.
|
|
|
|
#![feature(iter_intersperse)]
|
2022-06-10 16:28:15 -04:00
|
|
|
// Used for const params like `&'static str` in `crate::fmt`.
|
|
|
|
// If this is not stabalized,
|
|
|
|
// then we can do without by changing the abstraction;
|
|
|
|
// this is largely experimentation to see if it's useful.
|
|
|
|
#![allow(incomplete_features)]
|
|
|
|
#![feature(adt_const_params)]
|
tamer: f::Functor: New trait
This commit is purposefully coupled with changes that utilize it to
demonstrate that the need for this abstraction has been _derived_, not
forced; TAMER doesn't aim to be functional for the sake of it, since
idiomatic Rust achieves many of its benefits without the formalisms.
But, the formalisms do occasionally help, and this is one such
example. There is other existing code that can be refactored to take
advantage of this style as well.
I do _not_ wish to pull an existing functional dependency into TAMER; I want
to keep these abstractions light, and eliminate them as necessary, as Rust
continues to integrate new features into its core. I also want to be able
to modify the abstractions to suit our particular needs. (This is _not_ a
general recommendation; it's particular to TAMER and to my experience.)
This implementation of `Functor` is one such example. While it is modeled
after Haskell in that it provides `fmap`, the primitive here is instead
`map`, with `fmap` derived from it, since `map` allows for better use of
Rust idioms. Furthermore, it's polymorphic over _trait_ type parameters,
not method, allowing for separate trait impls for different container types,
which can in turn be inferred by Rust and allow for some very concise
mapping; this is particularly important for TAMER because of the disciplined
use of newtypes.
For example, `foo.overwrite(span)` and `foo.overwrite(name)` are both
self-documenting, and better alternatives than, say, `foo.map_span(|_|
span)` and `foo.map_symbol(|_| name)`; the latter are perfectly clear in
what they do, but lack a layer of abstraction, and are verbose. But the
clarity of the _new_ form does rely on either good naming conventions of
arguments, or explicit type annotations using turbofish notation if
necessary.
This will be implemented on core Rust types as appropriate and as
possible. At the time of writing, we do not yet have trait specialization,
and there's too many soundness issues for me to be comfortable enabling it,
so that limits that we can do with something like, say, a generic `Result`,
while also allowing for specialized implementations based on newtypes.
DEV-13160
2023-01-04 12:30:18 -05:00
|
|
|
// Used for traits returning functions,
|
|
|
|
// such as those in `crate::f`.
|
|
|
|
// Our use of this feature is fairly basic;
|
|
|
|
// should it become too complex then we should re-evaluate what we ought
|
|
|
|
// to be doing relative to the status of this feature.
|
|
|
|
#![feature(return_position_impl_trait_in_trait)]
|
2023-03-07 12:41:47 -05:00
|
|
|
// Added for use with `rustfmt::skip`,
|
|
|
|
// so that we can ignore formatting more precisely.
|
|
|
|
#![feature(stmt_expr_attributes)]
|
2023-05-17 10:48:01 -04:00
|
|
|
// Allows using `impl Trait` for associated type bounds instead of having to
|
|
|
|
// extract it into a more verbose `where` clause.
|
|
|
|
// This is not necessary,
|
|
|
|
// and may not even be desirable,
|
|
|
|
// but it's a nice option to have if `impl` would otherwise be used.
|
|
|
|
#![feature(associated_type_bounds)]
|
2021-10-28 14:27:33 -04:00
|
|
|
// We build docs for private items.
|
2021-06-21 13:10:00 -04:00
|
|
|
#![allow(rustdoc::private_intra_doc_links)]
|
2022-08-10 16:33:46 -04:00
|
|
|
// For sym::prefill recursive macro `static_symbols!`.
|
tamer: Introduce NIR (accepting only)
This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.
This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any. It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.
This is the culmination of months of supporting effort. The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens). This is capable of fully parsing our
largest system with >900 packages, as well as `core`.
`tamec`'s lowering is a mess; that'll be cleaned up in future commits. The
same can be said about `tameld`.
NIR's grammar has some initial documentation, but this will improve over
time as well.
The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.
DEV-7145
2022-08-29 15:28:03 -04:00
|
|
|
#![recursion_limit = "512"]
|
tamer: Integrate clippy
This invokes clippy as part of `make check` now, which I had previously
avoided doing (I'll elaborate on that below).
This commit represents the changes needed to resolve all the warnings
presented by clippy. Many changes have been made where I find the lints to
be useful and agreeable, but there are a number of lints, rationalized in
`src/lib.rs`, where I found the lints to be disagreeable. I have provided
rationale, primarily for those wondering why I desire to deviate from the
default lints, though it does feel backward to rationalize why certain lints
ought to be applied (the reverse should be true).
With that said, this did catch some legitimage issues, and it was also
helpful in getting some older code up-to-date with new language additions
that perhaps I used in new code but hadn't gone back and updated old code
for. My goal was to get clippy working without errors so that, in the
future, when others get into TAMER and are still getting used to Rust,
clippy is able to help guide them in the right direction.
One of the reasons I went without clippy for so long (though I admittedly
forgot I wasn't using it for a period of time) was because there were a
number of suggestions that I found disagreeable, and I didn't take the time
to go through them and determine what I wanted to follow. Furthermore, it
was hard to make that judgment when I was new to the language and lacked
the necessary experience to do so.
One thing I would like to comment further on is the use of `format!` with
`expect`, which is also what the diagnostic system convenience methods
do (which clippy does not cover). Because of all the work I've done trying
to understand Rust and looking at disassemblies and seeing what it
optimizes, I falsely assumed that Rust would convert such things into
conditionals in my otherwise-pure code...but apparently that's not the case,
when `format!` is involved.
I noticed that, after making the suggested fix with `get_ident`, Rust
proceeded to then inline it into each call site and then apply further
optimizations. It was also previously invoking the thread lock (for the
interner) unconditionally and invoking the `Display` implementation. That
is not at all what I intended for, despite knowing the eager semantics of
function calls in Rust.
Anyway, possibly more to come on that, I'm just tired of typing and need to
move on. I'll be returning to investigate further diagnostic messages soon.
2023-01-12 10:46:48 -05:00
|
|
|
//
|
|
|
|
// Clippy Lints
|
|
|
|
// ============
|
|
|
|
// This section contains rationale for deviating from standard lints.
|
|
|
|
// This reasoning applies to TAMER and may not be appropriate for other
|
|
|
|
// projects,
|
|
|
|
// or even other teams.
|
|
|
|
//
|
|
|
|
// These are presented in no particular order,
|
|
|
|
// but if you do rearrange them,
|
|
|
|
// be mindful of the comments that may reference preceding lints.
|
|
|
|
//
|
|
|
|
// Choosing not to inline format args sometimes adds to the clarity of the
|
|
|
|
// format string by emphasizing structure more concisely.
|
|
|
|
// Use your judgment.
|
|
|
|
#![allow(clippy::uninlined_format_args)]
|
|
|
|
// The rationale for this lint is that it may catch accidental semicolons,
|
|
|
|
// but the type system is plenty sufficient to catch unit types that were
|
|
|
|
// unintended.
|
|
|
|
#![allow(clippy::unit_arg)]
|
|
|
|
// Use your judgment;
|
|
|
|
// a `match` may be more clear within a given context.
|
|
|
|
// Or may simply be personal preference.
|
|
|
|
#![allow(clippy::single_match)]
|
|
|
|
// Same rationale as the previous,
|
|
|
|
// but additionally this clearly scopes pattern bindings to an inner
|
|
|
|
// block,
|
|
|
|
// which is not the case with a sibling `let` binding.
|
|
|
|
// This pattern was originally taken from `rustc` itself.
|
|
|
|
#![allow(clippy::match_single_binding)]
|
|
|
|
// This lint also seems to apply when dereferencing a double reference,
|
|
|
|
// for which the use of `cloned` would be far more confusing.
|
|
|
|
#![allow(clippy::map_clone)]
|
|
|
|
// Perhaps `is_empty` does not make sense for that particular trait/impl?
|
|
|
|
// We don't need a linter to guide these abstractions;
|
|
|
|
// an `is-empty` method will be added if it is needed and actually
|
|
|
|
// utilized.
|
|
|
|
#![allow(clippy::len_without_is_empty)]
|
|
|
|
// This is another case of a linter trying to guide abstractions.
|
|
|
|
// `Default` will be implemented if it both makes sense and is needed,
|
|
|
|
// not needlessly,
|
|
|
|
// as TAMER is not a library and its uses are statically known.
|
|
|
|
// Furthermore,
|
|
|
|
// `Default` is sometimes explicitly omitted to disallow automatic
|
|
|
|
// construction in various contexts.
|
|
|
|
#![allow(clippy::new_without_default)]
|
|
|
|
// When surrounding code uses `write!`,
|
|
|
|
// switching to `writeln!` for the last line adds an inconsistency that
|
|
|
|
// can make the code less clear,
|
|
|
|
// or possibly even introduce bugs by having the reader miss the change
|
|
|
|
// in pattern.
|
|
|
|
// `writeln!` also gives the impression that it's writing a line,
|
|
|
|
// when in actuality it may simply be appending to a partially-written
|
|
|
|
// line,
|
|
|
|
// making it feel like an inappropriate abstraction.
|
|
|
|
// Choose the abstraction that's most appropriate within a given context.
|
|
|
|
#![allow(clippy::write_with_newline)]
|
|
|
|
// Calling this "obfuscation" is hyperbole.
|
|
|
|
// Furthermore,
|
|
|
|
// `if` statements are expanded by `rustfmt` into something with a
|
|
|
|
// significantly larger footprint than this form,
|
|
|
|
// so this lint does _not_ suggest a suitable replacement.
|
|
|
|
#![allow(clippy::obfuscated_if_else)]
|
2023-02-07 14:59:36 -05:00
|
|
|
// Sometimes being explicit about lifetimes,
|
|
|
|
// even if it's unnecessary,
|
|
|
|
// can help a human to understand what bounds are in play,
|
|
|
|
// which are hidden when they're elided.
|
|
|
|
// Sometimes doing such a thing is a bad idea and introduces complexity.
|
|
|
|
// We need to use our judgment.
|
|
|
|
// Further,
|
|
|
|
// Clippy sometimes recommends eliding named bounds which does not
|
|
|
|
// compile,
|
|
|
|
// but then accepts introducing an anonymous lifetime bound (`'_`),
|
|
|
|
// which can be inscrutable if you are not very familiar with Rust's
|
|
|
|
// borrow checker.
|
|
|
|
#![allow(clippy::needless_lifetimes)]
|
2021-06-21 13:10:00 -04:00
|
|
|
|
2020-01-12 22:59:16 -05:00
|
|
|
pub mod global;
|
2020-03-24 14:14:05 -04:00
|
|
|
|
2021-07-29 14:26:40 -04:00
|
|
|
#[macro_use]
|
|
|
|
extern crate static_assertions;
|
2021-08-20 10:09:55 -04:00
|
|
|
|
2022-06-13 11:17:21 -04:00
|
|
|
#[macro_use]
|
|
|
|
pub mod xir;
|
|
|
|
|
2021-11-04 16:12:15 -04:00
|
|
|
pub mod asg;
|
2021-09-08 16:00:14 -04:00
|
|
|
pub mod convert;
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
pub mod diagnose;
|
tamer: f::Functor: New trait
This commit is purposefully coupled with changes that utilize it to
demonstrate that the need for this abstraction has been _derived_, not
forced; TAMER doesn't aim to be functional for the sake of it, since
idiomatic Rust achieves many of its benefits without the formalisms.
But, the formalisms do occasionally help, and this is one such
example. There is other existing code that can be refactored to take
advantage of this style as well.
I do _not_ wish to pull an existing functional dependency into TAMER; I want
to keep these abstractions light, and eliminate them as necessary, as Rust
continues to integrate new features into its core. I also want to be able
to modify the abstractions to suit our particular needs. (This is _not_ a
general recommendation; it's particular to TAMER and to my experience.)
This implementation of `Functor` is one such example. While it is modeled
after Haskell in that it provides `fmap`, the primitive here is instead
`map`, with `fmap` derived from it, since `map` allows for better use of
Rust idioms. Furthermore, it's polymorphic over _trait_ type parameters,
not method, allowing for separate trait impls for different container types,
which can in turn be inferred by Rust and allow for some very concise
mapping; this is particularly important for TAMER because of the disciplined
use of newtypes.
For example, `foo.overwrite(span)` and `foo.overwrite(name)` are both
self-documenting, and better alternatives than, say, `foo.map_span(|_|
span)` and `foo.map_symbol(|_| name)`; the latter are perfectly clear in
what they do, but lack a layer of abstraction, and are verbose. But the
clarity of the _new_ form does rely on either good naming conventions of
arguments, or explicit type annotations using turbofish notation if
necessary.
This will be implemented on core Rust types as appropriate and as
possible. At the time of writing, we do not yet have trait specialization,
and there's too many soundness issues for me to be comfortable enabling it,
so that limits that we can do with something like, say, a generic `Result`,
while also allowing for specialized implementations based on newtypes.
DEV-13160
2023-01-04 12:30:18 -05:00
|
|
|
pub mod f;
|
2022-06-10 16:28:15 -04:00
|
|
|
pub mod fmt;
|
2020-04-06 16:13:32 -04:00
|
|
|
pub mod fs;
|
2021-10-28 14:27:33 -04:00
|
|
|
pub mod iter;
|
2019-11-27 09:18:17 -05:00
|
|
|
pub mod ld;
|
tamer: Introduce NIR (accepting only)
This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.
This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any. It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.
This is the culmination of months of supporting effort. The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens). This is capable of fully parsing our
largest system with >900 packages, as well as `core`.
`tamec`'s lowering is a mess; that'll be cleaned up in future commits. The
same can be said about `tameld`.
NIR's grammar has some initial documentation, but this will improve over
time as well.
The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.
DEV-7145
2022-08-29 15:28:03 -04:00
|
|
|
pub mod nir;
|
2022-05-19 10:09:49 -04:00
|
|
|
pub mod num;
|
2020-01-09 10:55:55 -05:00
|
|
|
pub mod obj;
|
2022-03-18 16:24:53 -04:00
|
|
|
pub mod parse;
|
2021-08-13 14:59:25 -04:00
|
|
|
pub mod span;
|
tamer: Global interners
This is a major change, and I apologize for it all being in one commit. I
had wanted to break it up, but doing so would have required a significant
amount of temporary work that was not worth doing while I'm the only one
working on this project at the moment.
This accomplishes a number of important things, now that I'm preparing to
write the first compiler frontend for TAMER:
1. `Symbol` has been removed; `SymbolId` is used in its place.
2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer.
3. Using symbols no longer requires dereferencing.
4. **Lifetimes no longer pollute the entire system! (`'i`)**
5. Two global interners are offered to produce `SymbolStr` with `'static`
lifetimes, simplfiying lifetime management and borrowing where strings
are still needed.
6. A nice API is provided for interning and lookups (e.g. "foo".intern())
which makes this look like a core feature of Rust.
Unfortunately, making this change required modifications to...virtually
everything. And that serves to emphasize why this change was needed:
_everything_ used symbols, and so there's no use in not providing globals.
I implemented this in a way that still provides for loose coupling through
Rust's trait system. Indeed, Rustc offers a global interner, and I decided
not to go that route initially because it wasn't clear to me that such a
thing was desirable. It didn't become apparent to me, in fact, until the
recent commit where I introduced `SymbolIndexSize` and saw how many things
had to be touched; the linker evolved so rapidly as I was trying to learn
Rust that I lost track of how bad it got.
Further, this shows how the design of the internment system was a bit
naive---I assumed certain requirements that never panned out. In
particular, everything using symbols stored `&'i Symbol<'i>`---that is, a
reference (usize) to an object containing an index (32-bit) and a string
slice (128-bit). So it was a reference to a pretty large value, which was
allocated in the arena alongside the interned string itself.
But, that was assuming that something would need both the symbol index _and_
a readily available string. That's not the case. In fact, it's pretty
clear that interning happens at the beginning of execution, that `SymbolId`
is all that's needed during processing (unless an error occurs; more on that
below); and it's not until _the very end_ that we need to retrieve interned
strings from the pool to write either to a file or to display to the
user. It was horribly wasteful!
So `SymbolId` solves the lifetime issue in itself for most systems, but it
still requires that an interner be available for anything that needs to
create or resolve symbols, which, as it turns out, is still a lot of
things. Therefore, I decided to implement them as thread-local static
variables, which is very similar to what Rustc does itself (Rustc's are
scoped). TAMER does not use threads, so the resulting `'static` lifetime
should be just fine for now. Eventually I'd like to implement `!Send` and
`!Sync`, though, to prevent references from escaping the thread (as noted in
the patch); I can't do that yet, since the feature has not yet been
stabalized.
In the end, this leaves us with a system that's much easier to use and
maintain; hopefully easier for newcomers to get into without having to deal
with so many complex lifetimes; and a nice API that makes it a pleasure to
work with symbols.
Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I
end up regretting that down the line, but it exists for an important reason:
the `Span` and other structures that'll be introduced need to pack a lot of
data into 64 bits so they can be freely copied around to keep lifetimes
simple without wreaking havoc in other ways, but a 32-bit symbol size needed
by the linker is too large for that. (Actually, the linker doesn't yet need
32 bits for our systems, but it's going to in the somewhat near future
unless we optimize away a bunch of symbols...but I'd really rather not have
the linker hit a limit that requires a lot of code changes to resolve).
Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid
that for now. Most systems can just use on of the `PkgSymbolId` or
`ProgSymbolId` type aliases and not have to worry about it. Systems that
are actually shared between the compiler and the linker do, though, but it's
not like we don't already have a bunch of trait bounds.
Of course, as we implement link-time optimizations (LTO) in the future, it's
possible most things will need the size and I'll grow frustrated with that
and possibly revisit this. We shall see.
Anyway, this was exhausting...and...onward to the first frontend!
2021-08-02 23:54:37 -04:00
|
|
|
pub mod sym;
|
2020-02-13 11:50:18 -05:00
|
|
|
|
|
|
|
#[cfg(test)]
|
|
|
|
pub mod test;
|