tame/tamer/src/lib.rs

203 lines
7.8 KiB
Rust
Raw Normal View History

// TAME in Rust (TAMER)
//
// Copyright (C) 2014-2023 Ryan Specialty, LLC.
2020-03-06 11:05:18 -05:00
//
// This file is part of TAME.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.
//! An incremental rewrite of TAME in Rust.
//!
//! There are two entry points to this system:
//!
//! - [`tamec`](../tamec), the TAME compiler; and
//! - [`tameld`](../tameld), the TAME linker.
// Constant functions are still in their infancy as of the time of writing
// (October 2021).
// These this feature is used by [`sym::prefill::st_as_sym`] to provide
// polymorphic symbol types despite Rust's lack of support for constant
// trait methods.
// See that function for more information.
#![feature(const_transmute_copy)]
// This is used to unwrap const Option results rather than providing
// panicing alternatives.
#![feature(const_option)]
// Trait aliases are convenient for reducing verbosity in situations where
// type aliases cannot be used.
// To remove this feature if it is not stabalized,
// simply replace each alias reference with its definition,
// or possibly write a trait with a `Self` bound.
#![feature(trait_alias)]
// Can be replaced with `assert!(matches!(...))`,
// but at a loss of a better error message.
#![feature(assert_matches)]
// Simplifies creating `Option` default values.
// To remove this feature,
// this can be done more verbosely in the usual way,
// or we can write our own version.
#![feature(option_get_or_insert_default)]
// For `Try` and `FromResidual`,
// allowing us to write our own `?`-compatible types.
#![feature(try_trait_v2)]
// Used primarily for convenience,
// rather than having to create type constructors as type aliases that are
// not associated with a trait.
// However,
// this also allows for the associated type default to be overridden by
// the implementer,
// in which case this feature's only substitute is a type parameter.
#![feature(associated_type_defaults)]
// Convenience features that are easily replaced if not stabalized.
#![feature(nonzero_ops)]
// Enabled for qualified paths in `matches!`.
#![feature(more_qualified_paths)]
// Collecting interators into existing objects.
// Can be done manually in a more verbose way.
#![feature(iter_collect_into)]
tamer: src::asg::graph::object::pkg::name: New module This introduces, but does not yet integrate, `CanonicalName`, which not only represents canonicalized package names, but handles namespec resolution. The term "namespec" is motivated by Git's use of *spec (e.g. refspec) referring to various ways of specifying a particular object. Names look like paths, and are derived from them, but they _are not paths_. Their resolution is a purely lexical operation, and they include a number of restrictions to simplify their clarity and handling. I expect them to evolve more in the future, and I've had ideas to do so for quite some time. In particular, resolving packages in this way and then loading the from the filesystem relative to the project root will ensure that traversing (conceptually) to a parent directory will not operate unintuitively with symlinks. The path will always resolve unambigiously. (With that said, if the symlink is to a shared directory with different directory structures, that doesn't solve the compilation problem---we'll have to move object files into a project-specific build directory to handle that.) Span Slicing ------------ Okay, it's worth commenting on the horridity of the path name slicing that goes on here. Care has been taken to ensure that spans will be able to be properly sliced in all relevant contexts, and there are plenty of words devoted to that in the documentation committed here. But there is a more fundamental problem here that I regret not having solved earlier, because I don't have the time for it right now: while we do have SPair, it makes no guarantees that the span associated with the corresponding SymbolId is actually the span that matches the original source lexeme. In fact, it's often not. This is a problem when we want to slice up a symbol in an SPair and produce a sensible span. If it _is_ a source lexeme with its original span, that's no problem. But if it's _not_, then the two are not in sync, and slicing up the span won't produce something that actually makes sense to the user. Or, worse (or maybe it's not worse?), it may cause a panic if the slicing is out of bounds. The solution in the future might be to store explicitly the state of an SPair, or call it Lexeme, or something, so that we know the conditions under which slicing is safe. If I ever have time for that in this project. But the result of the lack of a proper abstraction really shows here: this is some of the most confusing code in TAMER, and it's really not doing anything all that complicated. It is disproportionately confusing. DEV-13162
2023-05-04 12:28:08 -04:00
// Concise and descriptive.
// Can be done manually in a more verbose way.
#![feature(str_split_remainder)]
// Concise and descriptive.
// Can be done manually in a more verbose way.
#![feature(iter_intersperse)]
// Used for const params like `&'static str` in `crate::fmt`.
// If this is not stabalized,
// then we can do without by changing the abstraction;
// this is largely experimentation to see if it's useful.
#![allow(incomplete_features)]
#![feature(adt_const_params)]
tamer: f::Functor: New trait This commit is purposefully coupled with changes that utilize it to demonstrate that the need for this abstraction has been _derived_, not forced; TAMER doesn't aim to be functional for the sake of it, since idiomatic Rust achieves many of its benefits without the formalisms. But, the formalisms do occasionally help, and this is one such example. There is other existing code that can be refactored to take advantage of this style as well. I do _not_ wish to pull an existing functional dependency into TAMER; I want to keep these abstractions light, and eliminate them as necessary, as Rust continues to integrate new features into its core. I also want to be able to modify the abstractions to suit our particular needs. (This is _not_ a general recommendation; it's particular to TAMER and to my experience.) This implementation of `Functor` is one such example. While it is modeled after Haskell in that it provides `fmap`, the primitive here is instead `map`, with `fmap` derived from it, since `map` allows for better use of Rust idioms. Furthermore, it's polymorphic over _trait_ type parameters, not method, allowing for separate trait impls for different container types, which can in turn be inferred by Rust and allow for some very concise mapping; this is particularly important for TAMER because of the disciplined use of newtypes. For example, `foo.overwrite(span)` and `foo.overwrite(name)` are both self-documenting, and better alternatives than, say, `foo.map_span(|_| span)` and `foo.map_symbol(|_| name)`; the latter are perfectly clear in what they do, but lack a layer of abstraction, and are verbose. But the clarity of the _new_ form does rely on either good naming conventions of arguments, or explicit type annotations using turbofish notation if necessary. This will be implemented on core Rust types as appropriate and as possible. At the time of writing, we do not yet have trait specialization, and there's too many soundness issues for me to be comfortable enabling it, so that limits that we can do with something like, say, a generic `Result`, while also allowing for specialized implementations based on newtypes. DEV-13160
2023-01-04 12:30:18 -05:00
// Used for traits returning functions,
// such as those in `crate::f`.
// Our use of this feature is fairly basic;
// should it become too complex then we should re-evaluate what we ought
// to be doing relative to the status of this feature.
#![feature(return_position_impl_trait_in_trait)]
// Added for use with `rustfmt::skip`,
// so that we can ignore formatting more precisely.
#![feature(stmt_expr_attributes)]
// Allows using `impl Trait` for associated type bounds instead of having to
// extract it into a more verbose `where` clause.
// This is not necessary,
// and may not even be desirable,
// but it's a nice option to have if `impl` would otherwise be used.
#![feature(associated_type_bounds)]
// We build docs for private items.
#![allow(rustdoc::private_intra_doc_links)]
// For sym::prefill recursive macro `static_symbols!`.
#![recursion_limit = "512"]
tamer: Integrate clippy This invokes clippy as part of `make check` now, which I had previously avoided doing (I'll elaborate on that below). This commit represents the changes needed to resolve all the warnings presented by clippy. Many changes have been made where I find the lints to be useful and agreeable, but there are a number of lints, rationalized in `src/lib.rs`, where I found the lints to be disagreeable. I have provided rationale, primarily for those wondering why I desire to deviate from the default lints, though it does feel backward to rationalize why certain lints ought to be applied (the reverse should be true). With that said, this did catch some legitimage issues, and it was also helpful in getting some older code up-to-date with new language additions that perhaps I used in new code but hadn't gone back and updated old code for. My goal was to get clippy working without errors so that, in the future, when others get into TAMER and are still getting used to Rust, clippy is able to help guide them in the right direction. One of the reasons I went without clippy for so long (though I admittedly forgot I wasn't using it for a period of time) was because there were a number of suggestions that I found disagreeable, and I didn't take the time to go through them and determine what I wanted to follow. Furthermore, it was hard to make that judgment when I was new to the language and lacked the necessary experience to do so. One thing I would like to comment further on is the use of `format!` with `expect`, which is also what the diagnostic system convenience methods do (which clippy does not cover). Because of all the work I've done trying to understand Rust and looking at disassemblies and seeing what it optimizes, I falsely assumed that Rust would convert such things into conditionals in my otherwise-pure code...but apparently that's not the case, when `format!` is involved. I noticed that, after making the suggested fix with `get_ident`, Rust proceeded to then inline it into each call site and then apply further optimizations. It was also previously invoking the thread lock (for the interner) unconditionally and invoking the `Display` implementation. That is not at all what I intended for, despite knowing the eager semantics of function calls in Rust. Anyway, possibly more to come on that, I'm just tired of typing and need to move on. I'll be returning to investigate further diagnostic messages soon.
2023-01-12 10:46:48 -05:00
//
// Clippy Lints
// ============
// This section contains rationale for deviating from standard lints.
// This reasoning applies to TAMER and may not be appropriate for other
// projects,
// or even other teams.
//
// These are presented in no particular order,
// but if you do rearrange them,
// be mindful of the comments that may reference preceding lints.
//
// Choosing not to inline format args sometimes adds to the clarity of the
// format string by emphasizing structure more concisely.
// Use your judgment.
#![allow(clippy::uninlined_format_args)]
// The rationale for this lint is that it may catch accidental semicolons,
// but the type system is plenty sufficient to catch unit types that were
// unintended.
#![allow(clippy::unit_arg)]
// Use your judgment;
// a `match` may be more clear within a given context.
// Or may simply be personal preference.
#![allow(clippy::single_match)]
// Same rationale as the previous,
// but additionally this clearly scopes pattern bindings to an inner
// block,
// which is not the case with a sibling `let` binding.
// This pattern was originally taken from `rustc` itself.
#![allow(clippy::match_single_binding)]
// This lint also seems to apply when dereferencing a double reference,
// for which the use of `cloned` would be far more confusing.
#![allow(clippy::map_clone)]
// Perhaps `is_empty` does not make sense for that particular trait/impl?
// We don't need a linter to guide these abstractions;
// an `is-empty` method will be added if it is needed and actually
// utilized.
#![allow(clippy::len_without_is_empty)]
// This is another case of a linter trying to guide abstractions.
// `Default` will be implemented if it both makes sense and is needed,
// not needlessly,
// as TAMER is not a library and its uses are statically known.
// Furthermore,
// `Default` is sometimes explicitly omitted to disallow automatic
// construction in various contexts.
#![allow(clippy::new_without_default)]
// When surrounding code uses `write!`,
// switching to `writeln!` for the last line adds an inconsistency that
// can make the code less clear,
// or possibly even introduce bugs by having the reader miss the change
// in pattern.
// `writeln!` also gives the impression that it's writing a line,
// when in actuality it may simply be appending to a partially-written
// line,
// making it feel like an inappropriate abstraction.
// Choose the abstraction that's most appropriate within a given context.
#![allow(clippy::write_with_newline)]
// Calling this "obfuscation" is hyperbole.
// Furthermore,
// `if` statements are expanded by `rustfmt` into something with a
// significantly larger footprint than this form,
// so this lint does _not_ suggest a suitable replacement.
#![allow(clippy::obfuscated_if_else)]
// Sometimes being explicit about lifetimes,
// even if it's unnecessary,
// can help a human to understand what bounds are in play,
// which are hidden when they're elided.
// Sometimes doing such a thing is a bad idea and introduces complexity.
// We need to use our judgment.
// Further,
// Clippy sometimes recommends eliding named bounds which does not
// compile,
// but then accepts introducing an anonymous lifetime bound (`'_`),
// which can be inscrutable if you are not very familiar with Rust's
// borrow checker.
#![allow(clippy::needless_lifetimes)]
pub mod global;
2020-03-24 14:14:05 -04:00
#[macro_use]
extern crate static_assertions;
tamer: xir::parse: Attribute parser generator This is the first parser generator for the parsing framework. I've been waiting quite a while to do this because I wanted to be sure that I understood how I intended to write the attribute parsers manually. Now that I'm about to start parsing source XML files, it is necessary to have a parser generator. Typically one thinks of a parser generator as a separate program that generates code for some language, but that is not always the case---that represents a lack of expressiveness in the language itself (e.g. C). Here, I simply use Rust's macro system, which should be a concept familiar to someone coming from a language like Lisp. This also resolves where I stand on parser combinators with respect to this abstraction: they both accomplish the exact same thing (composition of smaller parsers), but this abstraction doesn't do so in the typical functional way. But the end result is the same. The parser generated by this abstraction will be optimized an inlined in the same manner as the hand-written parsers. Since they'll be tightly coupled with an element parser (which too will have a parser generator), I expect that most attribute parsers will simply be inlined; they exist as separate parsers conceptually, for the same reason that you'd use parser combinators. It's worth mentioning that this awkward reliance on dead state for a lookahead token to determine when aggregation is complete rubs me the wrong way, but resolving it would involve reintroducing the XIR AttrEnd that I had previously removed. I'll keep fighting with myself on this, but I want to get a bit further before I determine if it's worth the tradeoff of reintroducing (more complex IR but simplified parsing). DEV-7145
2022-06-13 11:17:21 -04:00
#[macro_use]
pub mod xir;
pub mod asg;
pub mod convert;
pub mod diagnose;
tamer: f::Functor: New trait This commit is purposefully coupled with changes that utilize it to demonstrate that the need for this abstraction has been _derived_, not forced; TAMER doesn't aim to be functional for the sake of it, since idiomatic Rust achieves many of its benefits without the formalisms. But, the formalisms do occasionally help, and this is one such example. There is other existing code that can be refactored to take advantage of this style as well. I do _not_ wish to pull an existing functional dependency into TAMER; I want to keep these abstractions light, and eliminate them as necessary, as Rust continues to integrate new features into its core. I also want to be able to modify the abstractions to suit our particular needs. (This is _not_ a general recommendation; it's particular to TAMER and to my experience.) This implementation of `Functor` is one such example. While it is modeled after Haskell in that it provides `fmap`, the primitive here is instead `map`, with `fmap` derived from it, since `map` allows for better use of Rust idioms. Furthermore, it's polymorphic over _trait_ type parameters, not method, allowing for separate trait impls for different container types, which can in turn be inferred by Rust and allow for some very concise mapping; this is particularly important for TAMER because of the disciplined use of newtypes. For example, `foo.overwrite(span)` and `foo.overwrite(name)` are both self-documenting, and better alternatives than, say, `foo.map_span(|_| span)` and `foo.map_symbol(|_| name)`; the latter are perfectly clear in what they do, but lack a layer of abstraction, and are verbose. But the clarity of the _new_ form does rely on either good naming conventions of arguments, or explicit type annotations using turbofish notation if necessary. This will be implemented on core Rust types as appropriate and as possible. At the time of writing, we do not yet have trait specialization, and there's too many soundness issues for me to be comfortable enabling it, so that limits that we can do with something like, say, a generic `Result`, while also allowing for specialized implementations based on newtypes. DEV-13160
2023-01-04 12:30:18 -05:00
pub mod f;
pub mod fmt;
pub mod fs;
pub mod iter;
pub mod ld;
pub mod nir;
pub mod num;
pub mod obj;
pub mod parse;
pub mod pipeline;
pub mod span;
tamer: Global interners This is a major change, and I apologize for it all being in one commit. I had wanted to break it up, but doing so would have required a significant amount of temporary work that was not worth doing while I'm the only one working on this project at the moment. This accomplishes a number of important things, now that I'm preparing to write the first compiler frontend for TAMER: 1. `Symbol` has been removed; `SymbolId` is used in its place. 2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer. 3. Using symbols no longer requires dereferencing. 4. **Lifetimes no longer pollute the entire system! (`'i`)** 5. Two global interners are offered to produce `SymbolStr` with `'static` lifetimes, simplfiying lifetime management and borrowing where strings are still needed. 6. A nice API is provided for interning and lookups (e.g. "foo".intern()) which makes this look like a core feature of Rust. Unfortunately, making this change required modifications to...virtually everything. And that serves to emphasize why this change was needed: _everything_ used symbols, and so there's no use in not providing globals. I implemented this in a way that still provides for loose coupling through Rust's trait system. Indeed, Rustc offers a global interner, and I decided not to go that route initially because it wasn't clear to me that such a thing was desirable. It didn't become apparent to me, in fact, until the recent commit where I introduced `SymbolIndexSize` and saw how many things had to be touched; the linker evolved so rapidly as I was trying to learn Rust that I lost track of how bad it got. Further, this shows how the design of the internment system was a bit naive---I assumed certain requirements that never panned out. In particular, everything using symbols stored `&'i Symbol<'i>`---that is, a reference (usize) to an object containing an index (32-bit) and a string slice (128-bit). So it was a reference to a pretty large value, which was allocated in the arena alongside the interned string itself. But, that was assuming that something would need both the symbol index _and_ a readily available string. That's not the case. In fact, it's pretty clear that interning happens at the beginning of execution, that `SymbolId` is all that's needed during processing (unless an error occurs; more on that below); and it's not until _the very end_ that we need to retrieve interned strings from the pool to write either to a file or to display to the user. It was horribly wasteful! So `SymbolId` solves the lifetime issue in itself for most systems, but it still requires that an interner be available for anything that needs to create or resolve symbols, which, as it turns out, is still a lot of things. Therefore, I decided to implement them as thread-local static variables, which is very similar to what Rustc does itself (Rustc's are scoped). TAMER does not use threads, so the resulting `'static` lifetime should be just fine for now. Eventually I'd like to implement `!Send` and `!Sync`, though, to prevent references from escaping the thread (as noted in the patch); I can't do that yet, since the feature has not yet been stabalized. In the end, this leaves us with a system that's much easier to use and maintain; hopefully easier for newcomers to get into without having to deal with so many complex lifetimes; and a nice API that makes it a pleasure to work with symbols. Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I end up regretting that down the line, but it exists for an important reason: the `Span` and other structures that'll be introduced need to pack a lot of data into 64 bits so they can be freely copied around to keep lifetimes simple without wreaking havoc in other ways, but a 32-bit symbol size needed by the linker is too large for that. (Actually, the linker doesn't yet need 32 bits for our systems, but it's going to in the somewhat near future unless we optimize away a bunch of symbols...but I'd really rather not have the linker hit a limit that requires a lot of code changes to resolve). Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid that for now. Most systems can just use on of the `PkgSymbolId` or `ProgSymbolId` type aliases and not have to worry about it. Systems that are actually shared between the compiler and the linker do, though, but it's not like we don't already have a bunch of trait bounds. Of course, as we implement link-time optimizations (LTO) in the future, it's possible most things will need the size and I'll grow frustrated with that and possibly revisit this. We shall see. Anyway, this was exhausting...and...onward to the first frontend!
2021-08-02 23:54:37 -04:00
pub mod sym;
#[cfg(test)]
pub mod test;