2019-11-27 09:18:17 -05:00
|
|
|
// Proof-of-concept TAME linker
|
|
|
|
//
|
2021-07-22 15:00:15 -04:00
|
|
|
// Copyright (C) 2014-2021 Ryan Specialty Group, LLC.
|
2020-03-06 11:05:18 -05:00
|
|
|
//
|
|
|
|
// This file is part of TAME.
|
2019-11-27 09:18:17 -05:00
|
|
|
//
|
|
|
|
// This program is free software: you can redistribute it and/or modify
|
|
|
|
// it under the terms of the GNU General Public License as published by
|
|
|
|
// the Free Software Foundation, either version 3 of the License, or
|
|
|
|
// (at your option) any later version.
|
|
|
|
//
|
|
|
|
// This program is distributed in the hope that it will be useful,
|
|
|
|
// but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
// GNU General Public License for more details.
|
|
|
|
//
|
|
|
|
// You should have received a copy of the GNU General Public License
|
|
|
|
// along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
|
2021-10-11 23:54:24 -04:00
|
|
|
//! **This contains the remaining portions of the proof-of-concept linker.**
|
|
|
|
//! It is feature-complete and just needs final refactoring.
|
2019-11-27 09:18:17 -05:00
|
|
|
|
2021-10-12 09:42:09 -04:00
|
|
|
use super::xmle::{
|
|
|
|
lower::{sort, SortError},
|
|
|
|
xir::lower_iter,
|
2021-10-14 12:37:32 -04:00
|
|
|
XmleSections,
|
2020-04-06 22:07:39 -04:00
|
|
|
};
|
tamer: xir::escape: Remove XirString in favor of Escaper
This rewrites a good portion of the previous commit.
Rather than explicitly storing whether a given string has been escaped, we
can instead assume that all SymbolIds leaving or entering XIR are unescaped,
because there is no reason for any other part of the system to deal with
such details of XML documents.
Given that, we need only unescape on read and escape on write. This is
customary, so why didn't I do that to begin with?
The previous commit outlines the reason, mainly being an optimization for
the echo writer that is upcoming. However, this solution will end up being
better---it's not implemented yet, but we can have a caching layer, such
that the Escaper records a mapping between escaped and unescaped SymbolIds
to avoid work the next time around. If we share the Escaper between _all_
readers and the writer, the result is that
1. Duplicate strings between source files and object files (many of which
are read by both the linker and compiler) avoid re-unescaping; and
2. Writers can use this cache to avoid re-escaping when we've already seen
the escaped variant of the string during read.
The alternative would be a global cache, like the internment system, but I
did not find that to be appropriate here, since this is far less
fundamental and is much easier to compose.
DEV-11081
2021-11-12 13:59:14 -05:00
|
|
|
use crate::{
|
|
|
|
asg::{Asg, DefaultAsg, IdentObject},
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
diagnose::{AnnotatedSpan, Diagnostic},
|
2021-10-14 12:37:32 -04:00
|
|
|
fs::{
|
|
|
|
Filesystem, FsCanonicalizer, PathFile, VisitOnceFile,
|
|
|
|
VisitOnceFilesystem,
|
|
|
|
},
|
2022-04-11 16:08:50 -04:00
|
|
|
iter::into_iter_while_ok,
|
2021-10-14 12:37:32 -04:00
|
|
|
ld::xmle::Sections,
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
obj::xmlo::{
|
|
|
|
AsgBuilder, AsgBuilderError, AsgBuilderState, XmloError, XmloReader,
|
|
|
|
},
|
|
|
|
parse::ParseError,
|
2022-04-11 16:08:50 -04:00
|
|
|
parse::{ParseState, Parsed},
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
sym::{GlobalSymbolIntern, GlobalSymbolResolve, SymbolId},
|
2022-04-11 16:08:50 -04:00
|
|
|
xir::reader::XmlXirReader,
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
xir::{
|
|
|
|
flat::{self, Object as XirfToken, StateError as XirfError},
|
|
|
|
writer::{Error as XirWriterError, XmlWriter},
|
|
|
|
DefaultEscaper, Error as XirError, Escaper, Token as XirToken,
|
|
|
|
},
|
tamer: xir::escape: Remove XirString in favor of Escaper
This rewrites a good portion of the previous commit.
Rather than explicitly storing whether a given string has been escaped, we
can instead assume that all SymbolIds leaving or entering XIR are unescaped,
because there is no reason for any other part of the system to deal with
such details of XML documents.
Given that, we need only unescape on read and escape on write. This is
customary, so why didn't I do that to begin with?
The previous commit outlines the reason, mainly being an optimization for
the echo writer that is upcoming. However, this solution will end up being
better---it's not implemented yet, but we can have a caching layer, such
that the Escaper records a mapping between escaped and unescaped SymbolIds
to avoid work the next time around. If we share the Escaper between _all_
readers and the writer, the result is that
1. Duplicate strings between source files and object files (many of which
are read by both the linker and compiler) avoid re-unescaping; and
2. Writers can use this cache to avoid re-escaping when we've already seen
the escaped variant of the string during read.
The alternative would be a global cache, like the internment system, but I
did not find that to be appropriate here, since this is far less
fundamental and is much easier to compose.
DEV-11081
2021-11-12 13:59:14 -05:00
|
|
|
};
|
2020-04-07 10:50:57 -04:00
|
|
|
use fxhash::FxBuildHasher;
|
2020-04-30 14:33:10 -04:00
|
|
|
use petgraph_graphml::GraphMl;
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
use std::{
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
error::Error,
|
|
|
|
fmt::{self, Display},
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
fs,
|
|
|
|
io::{self, BufReader, BufWriter, Write},
|
|
|
|
path::{Path, PathBuf},
|
|
|
|
};
|
2019-11-27 09:18:17 -05:00
|
|
|
|
2022-01-14 10:21:49 -05:00
|
|
|
type LinkerAsg = DefaultAsg<IdentObject>;
|
|
|
|
type LinkerAsgBuilderState = AsgBuilderState<FxBuildHasher>;
|
2020-03-06 12:44:22 -05:00
|
|
|
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
pub fn xmle(package_path: &str, output: &str) -> Result<(), TameldError> {
|
2020-04-06 16:13:32 -04:00
|
|
|
let mut fs = VisitOnceFilesystem::new();
|
2020-01-12 22:59:16 -05:00
|
|
|
let mut depgraph = LinkerAsg::with_capacity(65536, 65536);
|
2021-11-12 16:07:57 -05:00
|
|
|
let escaper = DefaultEscaper::default();
|
2019-11-27 09:18:17 -05:00
|
|
|
|
2020-04-07 10:43:54 -04:00
|
|
|
let state = load_xmlo(
|
2020-04-07 10:49:21 -04:00
|
|
|
package_path,
|
2020-04-06 16:13:32 -04:00
|
|
|
&mut fs,
|
2019-12-01 01:17:37 -05:00
|
|
|
&mut depgraph,
|
tamer: xir::escape: Remove XirString in favor of Escaper
This rewrites a good portion of the previous commit.
Rather than explicitly storing whether a given string has been escaped, we
can instead assume that all SymbolIds leaving or entering XIR are unescaped,
because there is no reason for any other part of the system to deal with
such details of XML documents.
Given that, we need only unescape on read and escape on write. This is
customary, so why didn't I do that to begin with?
The previous commit outlines the reason, mainly being an optimization for
the echo writer that is upcoming. However, this solution will end up being
better---it's not implemented yet, but we can have a caching layer, such
that the Escaper records a mapping between escaped and unescaped SymbolIds
to avoid work the next time around. If we share the Escaper between _all_
readers and the writer, the result is that
1. Duplicate strings between source files and object files (many of which
are read by both the linker and compiler) avoid re-unescaping; and
2. Writers can use this cache to avoid re-escaping when we've already seen
the escaped variant of the string during read.
The alternative would be a global cache, like the internment system, but I
did not find that to be appropriate here, since this is far less
fundamental and is much easier to compose.
DEV-11081
2021-11-12 13:59:14 -05:00
|
|
|
&escaper,
|
2020-04-28 02:06:41 -04:00
|
|
|
AsgBuilderState::new(),
|
2020-04-07 10:43:54 -04:00
|
|
|
)?;
|
|
|
|
|
|
|
|
let AsgBuilderState {
|
|
|
|
mut roots,
|
|
|
|
name,
|
|
|
|
relroot,
|
|
|
|
found: _,
|
|
|
|
} = state;
|
2019-12-01 01:17:37 -05:00
|
|
|
|
2020-01-12 22:59:16 -05:00
|
|
|
roots.extend(
|
|
|
|
vec!["___yield", "___worksheet"]
|
|
|
|
.iter()
|
tamer: Global interners
This is a major change, and I apologize for it all being in one commit. I
had wanted to break it up, but doing so would have required a significant
amount of temporary work that was not worth doing while I'm the only one
working on this project at the moment.
This accomplishes a number of important things, now that I'm preparing to
write the first compiler frontend for TAMER:
1. `Symbol` has been removed; `SymbolId` is used in its place.
2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer.
3. Using symbols no longer requires dereferencing.
4. **Lifetimes no longer pollute the entire system! (`'i`)**
5. Two global interners are offered to produce `SymbolStr` with `'static`
lifetimes, simplfiying lifetime management and borrowing where strings
are still needed.
6. A nice API is provided for interning and lookups (e.g. "foo".intern())
which makes this look like a core feature of Rust.
Unfortunately, making this change required modifications to...virtually
everything. And that serves to emphasize why this change was needed:
_everything_ used symbols, and so there's no use in not providing globals.
I implemented this in a way that still provides for loose coupling through
Rust's trait system. Indeed, Rustc offers a global interner, and I decided
not to go that route initially because it wasn't clear to me that such a
thing was desirable. It didn't become apparent to me, in fact, until the
recent commit where I introduced `SymbolIndexSize` and saw how many things
had to be touched; the linker evolved so rapidly as I was trying to learn
Rust that I lost track of how bad it got.
Further, this shows how the design of the internment system was a bit
naive---I assumed certain requirements that never panned out. In
particular, everything using symbols stored `&'i Symbol<'i>`---that is, a
reference (usize) to an object containing an index (32-bit) and a string
slice (128-bit). So it was a reference to a pretty large value, which was
allocated in the arena alongside the interned string itself.
But, that was assuming that something would need both the symbol index _and_
a readily available string. That's not the case. In fact, it's pretty
clear that interning happens at the beginning of execution, that `SymbolId`
is all that's needed during processing (unless an error occurs; more on that
below); and it's not until _the very end_ that we need to retrieve interned
strings from the pool to write either to a file or to display to the
user. It was horribly wasteful!
So `SymbolId` solves the lifetime issue in itself for most systems, but it
still requires that an interner be available for anything that needs to
create or resolve symbols, which, as it turns out, is still a lot of
things. Therefore, I decided to implement them as thread-local static
variables, which is very similar to what Rustc does itself (Rustc's are
scoped). TAMER does not use threads, so the resulting `'static` lifetime
should be just fine for now. Eventually I'd like to implement `!Send` and
`!Sync`, though, to prevent references from escaping the thread (as noted in
the patch); I can't do that yet, since the feature has not yet been
stabalized.
In the end, this leaves us with a system that's much easier to use and
maintain; hopefully easier for newcomers to get into without having to deal
with so many complex lifetimes; and a nice API that makes it a pleasure to
work with symbols.
Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I
end up regretting that down the line, but it exists for an important reason:
the `Span` and other structures that'll be introduced need to pack a lot of
data into 64 bits so they can be freely copied around to keep lifetimes
simple without wreaking havoc in other ways, but a 32-bit symbol size needed
by the linker is too large for that. (Actually, the linker doesn't yet need
32 bits for our systems, but it's going to in the somewhat near future
unless we optimize away a bunch of symbols...but I'd really rather not have
the linker hit a limit that requires a lot of code changes to resolve).
Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid
that for now. Most systems can just use on of the `PkgSymbolId` or
`ProgSymbolId` type aliases and not have to worry about it. Systems that
are actually shared between the compiler and the linker do, though, but it's
not like we don't already have a bunch of trait bounds.
Of course, as we implement link-time optimizations (LTO) in the future, it's
possible most things will need the size and I'll grow frustrated with that
and possibly revisit this. We shall see.
Anyway, this was exhausting...and...onward to the first frontend!
2021-08-02 23:54:37 -04:00
|
|
|
.map(|name| name.intern())
|
2020-01-12 22:59:16 -05:00
|
|
|
.filter_map(|sym| depgraph.lookup(sym)),
|
|
|
|
);
|
|
|
|
|
2021-10-14 12:37:32 -04:00
|
|
|
let sorted = match sort(&depgraph, &roots, Sections::new()) {
|
2020-03-25 10:20:25 -04:00
|
|
|
Ok(sections) => sections,
|
2021-10-12 09:42:09 -04:00
|
|
|
Err(SortError::Cycles(cycles)) => {
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
return Err(TameldError::CycleError(
|
|
|
|
cycles
|
|
|
|
.into_iter()
|
|
|
|
.map(|cycle| {
|
|
|
|
let mut path: Vec<SymbolId> = cycle
|
|
|
|
.into_iter()
|
|
|
|
.map(|obj| depgraph.get(obj).unwrap().name())
|
|
|
|
.collect();
|
|
|
|
|
|
|
|
path.reverse();
|
|
|
|
path.push(path[0].clone());
|
|
|
|
path
|
|
|
|
})
|
|
|
|
.collect(),
|
|
|
|
))
|
2020-03-25 10:20:25 -04:00
|
|
|
}
|
|
|
|
Err(e) => return Err(e.into()),
|
|
|
|
};
|
2019-11-27 09:18:17 -05:00
|
|
|
|
2020-01-14 16:26:36 -05:00
|
|
|
output_xmle(
|
2021-10-14 12:37:32 -04:00
|
|
|
sorted,
|
2020-01-14 16:26:36 -05:00
|
|
|
name.expect("missing root package name"),
|
|
|
|
relroot.expect("missing root package relroot"),
|
2020-03-04 15:31:20 -05:00
|
|
|
output,
|
tamer: xir::escape: Remove XirString in favor of Escaper
This rewrites a good portion of the previous commit.
Rather than explicitly storing whether a given string has been escaped, we
can instead assume that all SymbolIds leaving or entering XIR are unescaped,
because there is no reason for any other part of the system to deal with
such details of XML documents.
Given that, we need only unescape on read and escape on write. This is
customary, so why didn't I do that to begin with?
The previous commit outlines the reason, mainly being an optimization for
the echo writer that is upcoming. However, this solution will end up being
better---it's not implemented yet, but we can have a caching layer, such
that the Escaper records a mapping between escaped and unescaped SymbolIds
to avoid work the next time around. If we share the Escaper between _all_
readers and the writer, the result is that
1. Duplicate strings between source files and object files (many of which
are read by both the linker and compiler) avoid re-unescaping; and
2. Writers can use this cache to avoid re-escaping when we've already seen
the escaped variant of the string during read.
The alternative would be a global cache, like the internment system, but I
did not find that to be appropriate here, since this is far less
fundamental and is much easier to compose.
DEV-11081
2021-11-12 13:59:14 -05:00
|
|
|
&escaper,
|
2020-01-14 16:26:36 -05:00
|
|
|
)?;
|
2019-11-27 09:18:17 -05:00
|
|
|
|
|
|
|
Ok(())
|
|
|
|
}
|
|
|
|
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
pub fn graphml(package_path: &str, output: &str) -> Result<(), TameldError> {
|
2020-04-30 14:33:10 -04:00
|
|
|
let mut fs = VisitOnceFilesystem::new();
|
|
|
|
let mut depgraph = LinkerAsg::with_capacity(65536, 65536);
|
tamer: xir::escape: Remove XirString in favor of Escaper
This rewrites a good portion of the previous commit.
Rather than explicitly storing whether a given string has been escaped, we
can instead assume that all SymbolIds leaving or entering XIR are unescaped,
because there is no reason for any other part of the system to deal with
such details of XML documents.
Given that, we need only unescape on read and escape on write. This is
customary, so why didn't I do that to begin with?
The previous commit outlines the reason, mainly being an optimization for
the echo writer that is upcoming. However, this solution will end up being
better---it's not implemented yet, but we can have a caching layer, such
that the Escaper records a mapping between escaped and unescaped SymbolIds
to avoid work the next time around. If we share the Escaper between _all_
readers and the writer, the result is that
1. Duplicate strings between source files and object files (many of which
are read by both the linker and compiler) avoid re-unescaping; and
2. Writers can use this cache to avoid re-escaping when we've already seen
the escaped variant of the string during read.
The alternative would be a global cache, like the internment system, but I
did not find that to be appropriate here, since this is far less
fundamental and is much easier to compose.
DEV-11081
2021-11-12 13:59:14 -05:00
|
|
|
let escaper = DefaultEscaper::default();
|
2020-04-30 14:33:10 -04:00
|
|
|
|
|
|
|
let _ = load_xmlo(
|
|
|
|
package_path,
|
|
|
|
&mut fs,
|
|
|
|
&mut depgraph,
|
tamer: xir::escape: Remove XirString in favor of Escaper
This rewrites a good portion of the previous commit.
Rather than explicitly storing whether a given string has been escaped, we
can instead assume that all SymbolIds leaving or entering XIR are unescaped,
because there is no reason for any other part of the system to deal with
such details of XML documents.
Given that, we need only unescape on read and escape on write. This is
customary, so why didn't I do that to begin with?
The previous commit outlines the reason, mainly being an optimization for
the echo writer that is upcoming. However, this solution will end up being
better---it's not implemented yet, but we can have a caching layer, such
that the Escaper records a mapping between escaped and unescaped SymbolIds
to avoid work the next time around. If we share the Escaper between _all_
readers and the writer, the result is that
1. Duplicate strings between source files and object files (many of which
are read by both the linker and compiler) avoid re-unescaping; and
2. Writers can use this cache to avoid re-escaping when we've already seen
the escaped variant of the string during read.
The alternative would be a global cache, like the internment system, but I
did not find that to be appropriate here, since this is far less
fundamental and is much easier to compose.
DEV-11081
2021-11-12 13:59:14 -05:00
|
|
|
&escaper,
|
2020-04-30 14:33:10 -04:00
|
|
|
AsgBuilderState::new(),
|
|
|
|
)?;
|
|
|
|
|
|
|
|
// if we move away from petgraph, we will need to abstract this away
|
|
|
|
let g = depgraph.into_inner();
|
|
|
|
let graphml =
|
|
|
|
GraphMl::new(&g)
|
|
|
|
.pretty_print(true)
|
|
|
|
.export_node_weights(Box::new(|node| {
|
|
|
|
let (name, kind, generated) = match node {
|
|
|
|
Some(n) => {
|
|
|
|
let generated = match n.src() {
|
|
|
|
Some(src) => src.generated,
|
|
|
|
None => false,
|
|
|
|
};
|
|
|
|
|
|
|
|
(
|
2021-10-14 14:36:44 -04:00
|
|
|
n.name().lookup_str().into(),
|
2021-09-29 23:18:23 -04:00
|
|
|
n.kind().unwrap().as_sym(),
|
2020-04-30 14:33:10 -04:00
|
|
|
format!("{}", generated),
|
|
|
|
)
|
|
|
|
}
|
|
|
|
None => (
|
|
|
|
String::from("missing"),
|
2021-09-29 23:18:23 -04:00
|
|
|
"missing".into(),
|
2020-04-30 14:33:10 -04:00
|
|
|
format!("{}", false),
|
|
|
|
),
|
|
|
|
};
|
|
|
|
|
|
|
|
vec![
|
|
|
|
("label".into(), name.into()),
|
2021-10-11 12:58:48 -04:00
|
|
|
("kind".into(), kind.lookup_str().into()),
|
2020-04-30 14:33:10 -04:00
|
|
|
("generated".into(), generated.into()),
|
|
|
|
]
|
|
|
|
}));
|
|
|
|
|
|
|
|
fs::write(output, graphml.to_string())?;
|
|
|
|
|
|
|
|
Ok(())
|
|
|
|
}
|
|
|
|
|
tamer: xir::escape: Remove XirString in favor of Escaper
This rewrites a good portion of the previous commit.
Rather than explicitly storing whether a given string has been escaped, we
can instead assume that all SymbolIds leaving or entering XIR are unescaped,
because there is no reason for any other part of the system to deal with
such details of XML documents.
Given that, we need only unescape on read and escape on write. This is
customary, so why didn't I do that to begin with?
The previous commit outlines the reason, mainly being an optimization for
the echo writer that is upcoming. However, this solution will end up being
better---it's not implemented yet, but we can have a caching layer, such
that the Escaper records a mapping between escaped and unescaped SymbolIds
to avoid work the next time around. If we share the Escaper between _all_
readers and the writer, the result is that
1. Duplicate strings between source files and object files (many of which
are read by both the linker and compiler) avoid re-unescaping; and
2. Writers can use this cache to avoid re-escaping when we've already seen
the escaped variant of the string during read.
The alternative would be a global cache, like the internment system, but I
did not find that to be appropriate here, since this is far less
fundamental and is much easier to compose.
DEV-11081
2021-11-12 13:59:14 -05:00
|
|
|
fn load_xmlo<'a, P: AsRef<Path>, S: Escaper>(
|
2020-04-07 11:40:19 -04:00
|
|
|
path_str: P,
|
|
|
|
fs: &mut VisitOnceFilesystem<FsCanonicalizer, FxBuildHasher>,
|
tamer: Global interners
This is a major change, and I apologize for it all being in one commit. I
had wanted to break it up, but doing so would have required a significant
amount of temporary work that was not worth doing while I'm the only one
working on this project at the moment.
This accomplishes a number of important things, now that I'm preparing to
write the first compiler frontend for TAMER:
1. `Symbol` has been removed; `SymbolId` is used in its place.
2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer.
3. Using symbols no longer requires dereferencing.
4. **Lifetimes no longer pollute the entire system! (`'i`)**
5. Two global interners are offered to produce `SymbolStr` with `'static`
lifetimes, simplfiying lifetime management and borrowing where strings
are still needed.
6. A nice API is provided for interning and lookups (e.g. "foo".intern())
which makes this look like a core feature of Rust.
Unfortunately, making this change required modifications to...virtually
everything. And that serves to emphasize why this change was needed:
_everything_ used symbols, and so there's no use in not providing globals.
I implemented this in a way that still provides for loose coupling through
Rust's trait system. Indeed, Rustc offers a global interner, and I decided
not to go that route initially because it wasn't clear to me that such a
thing was desirable. It didn't become apparent to me, in fact, until the
recent commit where I introduced `SymbolIndexSize` and saw how many things
had to be touched; the linker evolved so rapidly as I was trying to learn
Rust that I lost track of how bad it got.
Further, this shows how the design of the internment system was a bit
naive---I assumed certain requirements that never panned out. In
particular, everything using symbols stored `&'i Symbol<'i>`---that is, a
reference (usize) to an object containing an index (32-bit) and a string
slice (128-bit). So it was a reference to a pretty large value, which was
allocated in the arena alongside the interned string itself.
But, that was assuming that something would need both the symbol index _and_
a readily available string. That's not the case. In fact, it's pretty
clear that interning happens at the beginning of execution, that `SymbolId`
is all that's needed during processing (unless an error occurs; more on that
below); and it's not until _the very end_ that we need to retrieve interned
strings from the pool to write either to a file or to display to the
user. It was horribly wasteful!
So `SymbolId` solves the lifetime issue in itself for most systems, but it
still requires that an interner be available for anything that needs to
create or resolve symbols, which, as it turns out, is still a lot of
things. Therefore, I decided to implement them as thread-local static
variables, which is very similar to what Rustc does itself (Rustc's are
scoped). TAMER does not use threads, so the resulting `'static` lifetime
should be just fine for now. Eventually I'd like to implement `!Send` and
`!Sync`, though, to prevent references from escaping the thread (as noted in
the patch); I can't do that yet, since the feature has not yet been
stabalized.
In the end, this leaves us with a system that's much easier to use and
maintain; hopefully easier for newcomers to get into without having to deal
with so many complex lifetimes; and a nice API that makes it a pleasure to
work with symbols.
Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I
end up regretting that down the line, but it exists for an important reason:
the `Span` and other structures that'll be introduced need to pack a lot of
data into 64 bits so they can be freely copied around to keep lifetimes
simple without wreaking havoc in other ways, but a 32-bit symbol size needed
by the linker is too large for that. (Actually, the linker doesn't yet need
32 bits for our systems, but it's going to in the somewhat near future
unless we optimize away a bunch of symbols...but I'd really rather not have
the linker hit a limit that requires a lot of code changes to resolve).
Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid
that for now. Most systems can just use on of the `PkgSymbolId` or
`ProgSymbolId` type aliases and not have to worry about it. Systems that
are actually shared between the compiler and the linker do, though, but it's
not like we don't already have a bunch of trait bounds.
Of course, as we implement link-time optimizations (LTO) in the future, it's
possible most things will need the size and I'll grow frustrated with that
and possibly revisit this. We shall see.
Anyway, this was exhausting...and...onward to the first frontend!
2021-08-02 23:54:37 -04:00
|
|
|
depgraph: &mut LinkerAsg,
|
tamer: xir::escape: Remove XirString in favor of Escaper
This rewrites a good portion of the previous commit.
Rather than explicitly storing whether a given string has been escaped, we
can instead assume that all SymbolIds leaving or entering XIR are unescaped,
because there is no reason for any other part of the system to deal with
such details of XML documents.
Given that, we need only unescape on read and escape on write. This is
customary, so why didn't I do that to begin with?
The previous commit outlines the reason, mainly being an optimization for
the echo writer that is upcoming. However, this solution will end up being
better---it's not implemented yet, but we can have a caching layer, such
that the Escaper records a mapping between escaped and unescaped SymbolIds
to avoid work the next time around. If we share the Escaper between _all_
readers and the writer, the result is that
1. Duplicate strings between source files and object files (many of which
are read by both the linker and compiler) avoid re-unescaping; and
2. Writers can use this cache to avoid re-escaping when we've already seen
the escaped variant of the string during read.
The alternative would be a global cache, like the internment system, but I
did not find that to be appropriate here, since this is far less
fundamental and is much easier to compose.
DEV-11081
2021-11-12 13:59:14 -05:00
|
|
|
escaper: &S,
|
tamer: Global interners
This is a major change, and I apologize for it all being in one commit. I
had wanted to break it up, but doing so would have required a significant
amount of temporary work that was not worth doing while I'm the only one
working on this project at the moment.
This accomplishes a number of important things, now that I'm preparing to
write the first compiler frontend for TAMER:
1. `Symbol` has been removed; `SymbolId` is used in its place.
2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer.
3. Using symbols no longer requires dereferencing.
4. **Lifetimes no longer pollute the entire system! (`'i`)**
5. Two global interners are offered to produce `SymbolStr` with `'static`
lifetimes, simplfiying lifetime management and borrowing where strings
are still needed.
6. A nice API is provided for interning and lookups (e.g. "foo".intern())
which makes this look like a core feature of Rust.
Unfortunately, making this change required modifications to...virtually
everything. And that serves to emphasize why this change was needed:
_everything_ used symbols, and so there's no use in not providing globals.
I implemented this in a way that still provides for loose coupling through
Rust's trait system. Indeed, Rustc offers a global interner, and I decided
not to go that route initially because it wasn't clear to me that such a
thing was desirable. It didn't become apparent to me, in fact, until the
recent commit where I introduced `SymbolIndexSize` and saw how many things
had to be touched; the linker evolved so rapidly as I was trying to learn
Rust that I lost track of how bad it got.
Further, this shows how the design of the internment system was a bit
naive---I assumed certain requirements that never panned out. In
particular, everything using symbols stored `&'i Symbol<'i>`---that is, a
reference (usize) to an object containing an index (32-bit) and a string
slice (128-bit). So it was a reference to a pretty large value, which was
allocated in the arena alongside the interned string itself.
But, that was assuming that something would need both the symbol index _and_
a readily available string. That's not the case. In fact, it's pretty
clear that interning happens at the beginning of execution, that `SymbolId`
is all that's needed during processing (unless an error occurs; more on that
below); and it's not until _the very end_ that we need to retrieve interned
strings from the pool to write either to a file or to display to the
user. It was horribly wasteful!
So `SymbolId` solves the lifetime issue in itself for most systems, but it
still requires that an interner be available for anything that needs to
create or resolve symbols, which, as it turns out, is still a lot of
things. Therefore, I decided to implement them as thread-local static
variables, which is very similar to what Rustc does itself (Rustc's are
scoped). TAMER does not use threads, so the resulting `'static` lifetime
should be just fine for now. Eventually I'd like to implement `!Send` and
`!Sync`, though, to prevent references from escaping the thread (as noted in
the patch); I can't do that yet, since the feature has not yet been
stabalized.
In the end, this leaves us with a system that's much easier to use and
maintain; hopefully easier for newcomers to get into without having to deal
with so many complex lifetimes; and a nice API that makes it a pleasure to
work with symbols.
Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I
end up regretting that down the line, but it exists for an important reason:
the `Span` and other structures that'll be introduced need to pack a lot of
data into 64 bits so they can be freely copied around to keep lifetimes
simple without wreaking havoc in other ways, but a 32-bit symbol size needed
by the linker is too large for that. (Actually, the linker doesn't yet need
32 bits for our systems, but it's going to in the somewhat near future
unless we optimize away a bunch of symbols...but I'd really rather not have
the linker hit a limit that requires a lot of code changes to resolve).
Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid
that for now. Most systems can just use on of the `PkgSymbolId` or
`ProgSymbolId` type aliases and not have to worry about it. Systems that
are actually shared between the compiler and the linker do, though, but it's
not like we don't already have a bunch of trait bounds.
Of course, as we implement link-time optimizations (LTO) in the future, it's
possible most things will need the size and I'll grow frustrated with that
and possibly revisit this. We shall see.
Anyway, this was exhausting...and...onward to the first frontend!
2021-08-02 23:54:37 -04:00
|
|
|
state: LinkerAsgBuilderState,
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
) -> Result<LinkerAsgBuilderState, TameldError> {
|
2022-04-08 16:16:23 -04:00
|
|
|
let PathFile(path, file, ctx): PathFile<BufReader<fs::File>> =
|
|
|
|
match fs.open(path_str)? {
|
|
|
|
VisitOnceFile::FirstVisit(file) => file,
|
|
|
|
VisitOnceFile::Visited => return Ok(state),
|
|
|
|
};
|
2020-04-06 22:07:39 -04:00
|
|
|
|
2022-04-11 16:08:50 -04:00
|
|
|
// TODO: This entire block is a WIP and will be incrementally
|
|
|
|
// abstracted away.
|
|
|
|
let mut state =
|
|
|
|
into_iter_while_ok(XmlXirReader::new(file, escaper, ctx), |toks| {
|
|
|
|
flat::State::<64>::parse(toks).lower_while_ok::<XmloReader, _>(
|
|
|
|
|xirf| {
|
|
|
|
into_iter_while_ok(xirf, |xmlo_out| {
|
|
|
|
// TODO: Transitionary---we do not want to filter.
|
|
|
|
depgraph.import_xmlo(
|
|
|
|
xmlo_out.filter_map(|parsed| match parsed {
|
|
|
|
Parsed::Incomplete => None,
|
|
|
|
Parsed::Object(obj) => Some(Ok(obj)),
|
|
|
|
}),
|
|
|
|
state,
|
|
|
|
)
|
|
|
|
})
|
2022-04-08 16:16:23 -04:00
|
|
|
},
|
2022-04-11 16:08:50 -04:00
|
|
|
)
|
|
|
|
})????;
|
2020-01-12 22:59:16 -05:00
|
|
|
|
2020-04-06 22:07:39 -04:00
|
|
|
let mut dir: PathBuf = path.clone();
|
2019-11-27 09:18:17 -05:00
|
|
|
dir.pop();
|
|
|
|
|
2020-04-07 10:43:54 -04:00
|
|
|
let found = state.found.take().unwrap_or_default();
|
|
|
|
|
2019-11-27 09:18:17 -05:00
|
|
|
for relpath in found.iter() {
|
|
|
|
let mut path_buf = dir.clone();
|
tamer: Global interners
This is a major change, and I apologize for it all being in one commit. I
had wanted to break it up, but doing so would have required a significant
amount of temporary work that was not worth doing while I'm the only one
working on this project at the moment.
This accomplishes a number of important things, now that I'm preparing to
write the first compiler frontend for TAMER:
1. `Symbol` has been removed; `SymbolId` is used in its place.
2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer.
3. Using symbols no longer requires dereferencing.
4. **Lifetimes no longer pollute the entire system! (`'i`)**
5. Two global interners are offered to produce `SymbolStr` with `'static`
lifetimes, simplfiying lifetime management and borrowing where strings
are still needed.
6. A nice API is provided for interning and lookups (e.g. "foo".intern())
which makes this look like a core feature of Rust.
Unfortunately, making this change required modifications to...virtually
everything. And that serves to emphasize why this change was needed:
_everything_ used symbols, and so there's no use in not providing globals.
I implemented this in a way that still provides for loose coupling through
Rust's trait system. Indeed, Rustc offers a global interner, and I decided
not to go that route initially because it wasn't clear to me that such a
thing was desirable. It didn't become apparent to me, in fact, until the
recent commit where I introduced `SymbolIndexSize` and saw how many things
had to be touched; the linker evolved so rapidly as I was trying to learn
Rust that I lost track of how bad it got.
Further, this shows how the design of the internment system was a bit
naive---I assumed certain requirements that never panned out. In
particular, everything using symbols stored `&'i Symbol<'i>`---that is, a
reference (usize) to an object containing an index (32-bit) and a string
slice (128-bit). So it was a reference to a pretty large value, which was
allocated in the arena alongside the interned string itself.
But, that was assuming that something would need both the symbol index _and_
a readily available string. That's not the case. In fact, it's pretty
clear that interning happens at the beginning of execution, that `SymbolId`
is all that's needed during processing (unless an error occurs; more on that
below); and it's not until _the very end_ that we need to retrieve interned
strings from the pool to write either to a file or to display to the
user. It was horribly wasteful!
So `SymbolId` solves the lifetime issue in itself for most systems, but it
still requires that an interner be available for anything that needs to
create or resolve symbols, which, as it turns out, is still a lot of
things. Therefore, I decided to implement them as thread-local static
variables, which is very similar to what Rustc does itself (Rustc's are
scoped). TAMER does not use threads, so the resulting `'static` lifetime
should be just fine for now. Eventually I'd like to implement `!Send` and
`!Sync`, though, to prevent references from escaping the thread (as noted in
the patch); I can't do that yet, since the feature has not yet been
stabalized.
In the end, this leaves us with a system that's much easier to use and
maintain; hopefully easier for newcomers to get into without having to deal
with so many complex lifetimes; and a nice API that makes it a pleasure to
work with symbols.
Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I
end up regretting that down the line, but it exists for an important reason:
the `Span` and other structures that'll be introduced need to pack a lot of
data into 64 bits so they can be freely copied around to keep lifetimes
simple without wreaking havoc in other ways, but a 32-bit symbol size needed
by the linker is too large for that. (Actually, the linker doesn't yet need
32 bits for our systems, but it's going to in the somewhat near future
unless we optimize away a bunch of symbols...but I'd really rather not have
the linker hit a limit that requires a lot of code changes to resolve).
Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid
that for now. Most systems can just use on of the `PkgSymbolId` or
`ProgSymbolId` type aliases and not have to worry about it. Systems that
are actually shared between the compiler and the linker do, though, but it's
not like we don't already have a bunch of trait bounds.
Of course, as we implement link-time optimizations (LTO) in the future, it's
possible most things will need the size and I'll grow frustrated with that
and possibly revisit this. We shall see.
Anyway, this was exhausting...and...onward to the first frontend!
2021-08-02 23:54:37 -04:00
|
|
|
let str: &str = &relpath.lookup_str();
|
|
|
|
path_buf.push(str);
|
2019-11-27 09:18:17 -05:00
|
|
|
path_buf.set_extension("xmlo");
|
|
|
|
|
tamer: xir::escape: Remove XirString in favor of Escaper
This rewrites a good portion of the previous commit.
Rather than explicitly storing whether a given string has been escaped, we
can instead assume that all SymbolIds leaving or entering XIR are unescaped,
because there is no reason for any other part of the system to deal with
such details of XML documents.
Given that, we need only unescape on read and escape on write. This is
customary, so why didn't I do that to begin with?
The previous commit outlines the reason, mainly being an optimization for
the echo writer that is upcoming. However, this solution will end up being
better---it's not implemented yet, but we can have a caching layer, such
that the Escaper records a mapping between escaped and unescaped SymbolIds
to avoid work the next time around. If we share the Escaper between _all_
readers and the writer, the result is that
1. Duplicate strings between source files and object files (many of which
are read by both the linker and compiler) avoid re-unescaping; and
2. Writers can use this cache to avoid re-escaping when we've already seen
the escaped variant of the string during read.
The alternative would be a global cache, like the internment system, but I
did not find that to be appropriate here, since this is far less
fundamental and is much easier to compose.
DEV-11081
2021-11-12 13:59:14 -05:00
|
|
|
state = load_xmlo(path_buf, fs, depgraph, escaper, state)?;
|
2019-11-27 09:18:17 -05:00
|
|
|
}
|
|
|
|
|
2020-04-07 10:43:54 -04:00
|
|
|
Ok(state)
|
2019-11-27 09:18:17 -05:00
|
|
|
}
|
|
|
|
|
tamer: xir::escape: Remove XirString in favor of Escaper
This rewrites a good portion of the previous commit.
Rather than explicitly storing whether a given string has been escaped, we
can instead assume that all SymbolIds leaving or entering XIR are unescaped,
because there is no reason for any other part of the system to deal with
such details of XML documents.
Given that, we need only unescape on read and escape on write. This is
customary, so why didn't I do that to begin with?
The previous commit outlines the reason, mainly being an optimization for
the echo writer that is upcoming. However, this solution will end up being
better---it's not implemented yet, but we can have a caching layer, such
that the Escaper records a mapping between escaped and unescaped SymbolIds
to avoid work the next time around. If we share the Escaper between _all_
readers and the writer, the result is that
1. Duplicate strings between source files and object files (many of which
are read by both the linker and compiler) avoid re-unescaping; and
2. Writers can use this cache to avoid re-escaping when we've already seen
the escaped variant of the string during read.
The alternative would be a global cache, like the internment system, but I
did not find that to be appropriate here, since this is far less
fundamental and is much easier to compose.
DEV-11081
2021-11-12 13:59:14 -05:00
|
|
|
fn output_xmle<'a, X: XmleSections<'a>, S: Escaper>(
|
|
|
|
sorted: X,
|
2021-09-23 14:52:53 -04:00
|
|
|
name: SymbolId,
|
2021-10-11 16:00:19 -04:00
|
|
|
relroot: SymbolId,
|
2020-03-04 15:31:20 -05:00
|
|
|
output: &str,
|
tamer: xir::escape: Remove XirString in favor of Escaper
This rewrites a good portion of the previous commit.
Rather than explicitly storing whether a given string has been escaped, we
can instead assume that all SymbolIds leaving or entering XIR are unescaped,
because there is no reason for any other part of the system to deal with
such details of XML documents.
Given that, we need only unescape on read and escape on write. This is
customary, so why didn't I do that to begin with?
The previous commit outlines the reason, mainly being an optimization for
the echo writer that is upcoming. However, this solution will end up being
better---it's not implemented yet, but we can have a caching layer, such
that the Escaper records a mapping between escaped and unescaped SymbolIds
to avoid work the next time around. If we share the Escaper between _all_
readers and the writer, the result is that
1. Duplicate strings between source files and object files (many of which
are read by both the linker and compiler) avoid re-unescaping; and
2. Writers can use this cache to avoid re-escaping when we've already seen
the escaped variant of the string during read.
The alternative would be a global cache, like the internment system, but I
did not find that to be appropriate here, since this is far less
fundamental and is much easier to compose.
DEV-11081
2021-11-12 13:59:14 -05:00
|
|
|
escaper: &S,
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
) -> Result<(), TameldError> {
|
2020-03-04 15:31:20 -05:00
|
|
|
let file = fs::File::create(output)?;
|
2021-10-08 21:57:51 -04:00
|
|
|
let mut buf = BufWriter::new(file);
|
2021-09-28 14:52:31 -04:00
|
|
|
|
tamer: xir::escape: Remove XirString in favor of Escaper
This rewrites a good portion of the previous commit.
Rather than explicitly storing whether a given string has been escaped, we
can instead assume that all SymbolIds leaving or entering XIR are unescaped,
because there is no reason for any other part of the system to deal with
such details of XML documents.
Given that, we need only unescape on read and escape on write. This is
customary, so why didn't I do that to begin with?
The previous commit outlines the reason, mainly being an optimization for
the echo writer that is upcoming. However, this solution will end up being
better---it's not implemented yet, but we can have a caching layer, such
that the Escaper records a mapping between escaped and unescaped SymbolIds
to avoid work the next time around. If we share the Escaper between _all_
readers and the writer, the result is that
1. Duplicate strings between source files and object files (many of which
are read by both the linker and compiler) avoid re-unescaping; and
2. Writers can use this cache to avoid re-escaping when we've already seen
the escaped variant of the string during read.
The alternative would be a global cache, like the internment system, but I
did not find that to be appropriate here, since this is far less
fundamental and is much easier to compose.
DEV-11081
2021-11-12 13:59:14 -05:00
|
|
|
lower_iter(sorted, name, relroot).write(
|
|
|
|
&mut buf,
|
|
|
|
Default::default(),
|
|
|
|
escaper,
|
|
|
|
)?;
|
2021-10-08 21:57:51 -04:00
|
|
|
buf.flush()?;
|
2020-01-13 16:41:06 -05:00
|
|
|
|
|
|
|
Ok(())
|
2019-11-27 09:18:17 -05:00
|
|
|
}
|
|
|
|
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
// TODO: This, like everything else here, needs a home.
|
|
|
|
// TODO: Better encapsulation for `*ParseError` types.
|
|
|
|
/// Linker (`tameld`) error.
|
|
|
|
///
|
|
|
|
/// This represents the aggregation of all possible errors that can occur
|
|
|
|
/// during link-time.
|
|
|
|
/// This cannot include panics,
|
|
|
|
/// but efforts have been made to reduce panics to situations that
|
|
|
|
/// represent the equivalent of assertions.
|
|
|
|
#[derive(Debug)]
|
|
|
|
pub enum TameldError {
|
|
|
|
Io(io::Error),
|
|
|
|
SortError(SortError),
|
|
|
|
XirError(XirError),
|
|
|
|
XirfParseError(ParseError<XirToken, XirfError>),
|
|
|
|
XmloParseError(ParseError<XirfToken, XmloError>),
|
|
|
|
AsgBuilderError(AsgBuilderError),
|
|
|
|
XirWriterError(XirWriterError),
|
|
|
|
CycleError(Vec<Vec<SymbolId>>),
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
Fmt(fmt::Error),
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
impl From<io::Error> for TameldError {
|
|
|
|
fn from(e: io::Error) -> Self {
|
|
|
|
Self::Io(e)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl From<SortError> for TameldError {
|
|
|
|
fn from(e: SortError) -> Self {
|
|
|
|
Self::SortError(e)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl From<XirError> for TameldError {
|
|
|
|
fn from(e: XirError) -> Self {
|
|
|
|
Self::XirError(e)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl From<ParseError<XirfToken, XmloError>> for TameldError {
|
|
|
|
fn from(e: ParseError<XirfToken, XmloError>) -> Self {
|
|
|
|
Self::XmloParseError(e)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl From<ParseError<XirToken, XirfError>> for TameldError {
|
|
|
|
fn from(e: ParseError<XirToken, XirfError>) -> Self {
|
|
|
|
Self::XirfParseError(e)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl From<AsgBuilderError> for TameldError {
|
|
|
|
fn from(e: AsgBuilderError) -> Self {
|
|
|
|
Self::AsgBuilderError(e)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl From<XirWriterError> for TameldError {
|
|
|
|
fn from(e: XirWriterError) -> Self {
|
|
|
|
Self::XirWriterError(e)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
impl From<fmt::Error> for TameldError {
|
|
|
|
fn from(e: fmt::Error) -> Self {
|
|
|
|
Self::Fmt(e)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
impl Display for TameldError {
|
|
|
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
|
|
|
match self {
|
|
|
|
Self::Io(e) => Display::fmt(e, f),
|
|
|
|
Self::SortError(e) => Display::fmt(e, f),
|
|
|
|
Self::XirError(e) => Display::fmt(e, f),
|
|
|
|
Self::XirfParseError(e) => Display::fmt(e, f),
|
|
|
|
Self::XmloParseError(e) => Display::fmt(e, f),
|
|
|
|
Self::AsgBuilderError(e) => Display::fmt(e, f),
|
|
|
|
Self::XirWriterError(e) => Display::fmt(e, f),
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
Self::CycleError(cycles) => {
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
for cycle in cycles {
|
|
|
|
writeln!(
|
|
|
|
f,
|
|
|
|
"cycle: {}",
|
|
|
|
cycle
|
|
|
|
.iter()
|
|
|
|
.map(|sym| sym.lookup_str())
|
|
|
|
.collect::<Vec<&str>>()
|
|
|
|
.join(" -> ")
|
|
|
|
)?;
|
|
|
|
}
|
|
|
|
|
|
|
|
Ok(())
|
|
|
|
}
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
Self::Fmt(e) => Display::fmt(e, f),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl Error for TameldError {
|
|
|
|
fn source(&self) -> Option<&(dyn Error + 'static)> {
|
|
|
|
match self {
|
|
|
|
Self::Io(e) => Some(e),
|
|
|
|
Self::SortError(e) => Some(e),
|
|
|
|
Self::XirError(e) => Some(e),
|
|
|
|
Self::XirfParseError(e) => Some(e),
|
|
|
|
Self::XmloParseError(e) => Some(e),
|
|
|
|
Self::AsgBuilderError(e) => Some(e),
|
|
|
|
Self::XirWriterError(e) => Some(e),
|
|
|
|
Self::CycleError(..) => None,
|
|
|
|
Self::Fmt(e) => Some(e),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl Diagnostic for TameldError {
|
|
|
|
fn describe(&self) -> Vec<AnnotatedSpan> {
|
|
|
|
match self {
|
|
|
|
Self::XirError(e) => e.describe(),
|
|
|
|
Self::XirfParseError(e) => e.describe(),
|
|
|
|
Self::XmloParseError(e) => e.describe(),
|
|
|
|
|
|
|
|
// TODO (will fall back to rendering just the error `Display`)
|
|
|
|
_ => vec![],
|
tamer: tameld (TameldError): Error sum type
This aggregates all non-panic errors that can occur during link time, making
`Box<dyn Error>` unnecessary. I've been wanting to do this for a long time,
so it's nice seeing this come together. This is a powerful tool, in that we
know, at compile time, all errors that can occur, and properly report on
them and compose them. This method of error composition ensures that all
errors have a chance to be handled within their context, though it'll take
time to do so in a decent way.
This just maintains compatibility with the dynamic dispatch that was
previous occurring. This work is being done to introduce the initial
diagnostic system, which was really difficult/confusing to do without proper
errors types at the top level, considering the toplevel is responsible for
triggering the diagnostic reporting.
The cycle error is in particular going to be interesting once the system is
in place, especially once it provides spans in the future, since it will
guide the user through the code to understand how the cycle formed.
More to come.
DEV-10935
2022-04-11 15:15:04 -04:00
|
|
|
}
|
|
|
|
}
|
2019-11-27 09:18:17 -05:00
|
|
|
}
|