tamer: Global interners

This is a major change, and I apologize for it all being in one commit.  I
had wanted to break it up, but doing so would have required a significant
amount of temporary work that was not worth doing while I'm the only one
working on this project at the moment.

This accomplishes a number of important things, now that I'm preparing to
write the first compiler frontend for TAMER:

  1. `Symbol` has been removed; `SymbolId` is used in its place.
  2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer.
  3. Using symbols no longer requires dereferencing.
  4. **Lifetimes no longer pollute the entire system! (`'i`)**
  5. Two global interners are offered to produce `SymbolStr` with `'static`
     lifetimes, simplfiying lifetime management and borrowing where strings
     are still needed.
  6. A nice API is provided for interning and lookups (e.g. "foo".intern())
     which makes this look like a core feature of Rust.

Unfortunately, making this change required modifications to...virtually
everything.  And that serves to emphasize why this change was needed:
_everything_ used symbols, and so there's no use in not providing globals.

I implemented this in a way that still provides for loose coupling through
Rust's trait system.  Indeed, Rustc offers a global interner, and I decided
not to go that route initially because it wasn't clear to me that such a
thing was desirable.  It didn't become apparent to me, in fact, until the
recent commit where I introduced `SymbolIndexSize` and saw how many things
had to be touched; the linker evolved so rapidly as I was trying to learn
Rust that I lost track of how bad it got.

Further, this shows how the design of the internment system was a bit
naive---I assumed certain requirements that never panned out.  In
particular, everything using symbols stored `&'i Symbol<'i>`---that is, a
reference (usize) to an object containing an index (32-bit) and a string
slice (128-bit).  So it was a reference to a pretty large value, which was
allocated in the arena alongside the interned string itself.

But, that was assuming that something would need both the symbol index _and_
a readily available string.  That's not the case.  In fact, it's pretty
clear that interning happens at the beginning of execution, that `SymbolId`
is all that's needed during processing (unless an error occurs; more on that
below); and it's not until _the very end_ that we need to retrieve interned
strings from the pool to write either to a file or to display to the
user.  It was horribly wasteful!

So `SymbolId` solves the lifetime issue in itself for most systems, but it
still requires that an interner be available for anything that needs to
create or resolve symbols, which, as it turns out, is still a lot of
things.  Therefore, I decided to implement them as thread-local static
variables, which is very similar to what Rustc does itself (Rustc's are
scoped).  TAMER does not use threads, so the resulting `'static` lifetime
should be just fine for now.  Eventually I'd like to implement `!Send` and
`!Sync`, though, to prevent references from escaping the thread (as noted in
the patch); I can't do that yet, since the feature has not yet been
stabalized.

In the end, this leaves us with a system that's much easier to use and
maintain; hopefully easier for newcomers to get into without having to deal
with so many complex lifetimes; and a nice API that makes it a pleasure to
work with symbols.

Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I
end up regretting that down the line, but it exists for an important reason:
the `Span` and other structures that'll be introduced need to pack a lot of
data into 64 bits so they can be freely copied around to keep lifetimes
simple without wreaking havoc in other ways, but a 32-bit symbol size needed
by the linker is too large for that.  (Actually, the linker doesn't yet need
32 bits for our systems, but it's going to in the somewhat near future
unless we optimize away a bunch of symbols...but I'd really rather not have
the linker hit a limit that requires a lot of code changes to resolve).

Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid
that for now.  Most systems can just use on of the `PkgSymbolId` or
`ProgSymbolId` type aliases and not have to worry about it.  Systems that
are actually shared between the compiler and the linker do, though, but it's
not like we don't already have a bunch of trait bounds.

Of course, as we implement link-time optimizations (LTO) in the future, it's
possible most things will need the size and I'll grow frustrated with that
and possibly revisit this.  We shall see.

Anyway, this was exhausting...and...onward to the first frontend!
main
Mike Gerwitz 2021-08-02 23:54:37 -04:00
parent 71011f5724
commit 9deb393bfd
20 changed files with 1407 additions and 1383 deletions

View File

@ -37,38 +37,28 @@ mod base {
use tamer::ir::asg::{
Asg, DataType, DefaultAsg, IdentKind, IdentObject, SortableAsg, Source,
};
use tamer::sym::{DefaultInterner, Interner, Symbol, SymbolIndexSize};
use tamer::sym::{GlobalSymbolIntern, SymbolId, SymbolIndexSize};
type Sut<'i> = DefaultAsg<
'i,
IdentObject<'i, global::PkgSymSize>,
global::PkgIdentSize,
>;
type SutProg<'i> = DefaultAsg<
'i,
IdentObject<'i, global::ProgSymSize>,
global::ProgIdentSize,
>;
type Sut =
DefaultAsg<IdentObject<global::PkgSymSize>, global::PkgIdentSize>;
type SutProg<'i> =
DefaultAsg<IdentObject<global::ProgSymSize>, global::ProgIdentSize>;
fn interned_n<'i, Ix: SymbolIndexSize>(
interner: &'i DefaultInterner<'i, Ix>,
n: u16,
) -> Vec<&'i Symbol<'i, Ix>>
fn interned_n<Ix: SymbolIndexSize>(n: u16) -> Vec<SymbolId<Ix>>
where
<Ix as TryFrom<usize>>::Error: Debug,
{
(0..n).map(|i| interner.intern(&i.to_string())).collect()
(0..n).map(|i| i.to_string().intern()).collect()
}
#[bench]
fn declare_1_000(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
bench.iter(|| {
xs.iter()
.map(|i| sut.declare(i, IdentKind::Meta, Source::default()))
.map(|i| sut.declare(*i, IdentKind::Meta, Source::default()))
.for_each(drop);
});
}
@ -76,12 +66,11 @@ mod base {
#[bench]
fn declare_1_000_full_inital_capacity(bench: &mut Bencher) {
let mut sut = Sut::with_capacity(1024, 1024);
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
bench.iter(|| {
xs.iter()
.map(|i| sut.declare(i, IdentKind::Meta, Source::default()))
.map(|i| sut.declare(*i, IdentKind::Meta, Source::default()))
.for_each(drop);
});
}
@ -90,12 +79,11 @@ mod base {
#[bench]
fn declare_1_000_prog_ident_size(bench: &mut Bencher) {
let mut sut = SutProg::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
bench.iter(|| {
xs.iter()
.map(|i| sut.declare(i, IdentKind::Meta, Source::default()))
.map(|i| sut.declare(*i, IdentKind::Meta, Source::default()))
.for_each(drop);
});
}
@ -103,13 +91,12 @@ mod base {
#[bench]
fn declare_extern_1_000(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
bench.iter(|| {
xs.iter()
.map(|i| {
sut.declare_extern(i, IdentKind::Meta, Source::default())
sut.declare_extern(*i, IdentKind::Meta, Source::default())
})
.for_each(drop);
});
@ -118,17 +105,19 @@ mod base {
#[bench]
fn resolve_extern_1_000(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
xs.iter().for_each(|sym| {
let _ = sut.declare_extern(sym, IdentKind::Meta, Source::default());
let _ =
sut.declare_extern(*sym, IdentKind::Meta, Source::default());
});
// Bench only the resolution, not initial declare.
bench.iter(|| {
xs.iter()
.map(|sym| sut.declare(sym, IdentKind::Meta, Source::default()))
.map(|sym| {
sut.declare(*sym, IdentKind::Meta, Source::default())
})
.for_each(drop);
});
}
@ -139,13 +128,12 @@ mod base {
#[bench]
fn set_fragment_1_000_with_new_str(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
let orefs = xs
.iter()
.map(|sym| {
sut.declare(sym, IdentKind::Meta, Source::default())
sut.declare(*sym, IdentKind::Meta, Source::default())
.unwrap()
})
.collect::<Vec<_>>();
@ -162,28 +150,28 @@ mod base {
#[bench]
fn lookup_1_000(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
xs.iter().for_each(|sym| {
let _ = sut.declare(&sym, IdentKind::Meta, Source::default());
let _ = sut.declare(*sym, IdentKind::Meta, Source::default());
});
bench.iter(|| {
xs.iter().map(|sym| sut.lookup(sym).unwrap()).for_each(drop);
xs.iter()
.map(|sym| sut.lookup(*sym).unwrap())
.for_each(drop);
});
}
#[bench]
fn get_1_000(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
let orefs = xs
.iter()
.map(|sym| {
sut.declare(sym, IdentKind::Meta, Source::default())
sut.declare(*sym, IdentKind::Meta, Source::default())
.unwrap()
})
.collect::<Vec<_>>();
@ -201,13 +189,12 @@ mod base {
#[bench]
fn add_dep_1_000_to_single_node(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
let orefs = xs
.iter()
.map(|sym| {
sut.declare(sym, IdentKind::Meta, Source::default())
sut.declare(*sym, IdentKind::Meta, Source::default())
.unwrap()
})
.collect::<Vec<_>>();
@ -227,13 +214,12 @@ mod base {
#[bench]
fn add_dep_1_000_one_edge_per_node(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
let orefs = xs
.iter()
.map(|sym| {
sut.declare(sym, IdentKind::Meta, Source::default())
sut.declare(*sym, IdentKind::Meta, Source::default())
.unwrap()
})
.collect::<Vec<_>>();
@ -250,13 +236,12 @@ mod base {
#[bench]
fn has_dep_1_000_single_node(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
let orefs = xs
.iter()
.map(|sym| {
sut.declare(sym, IdentKind::Meta, Source::default())
sut.declare(*sym, IdentKind::Meta, Source::default())
.unwrap()
})
.collect::<Vec<_>>();
@ -279,13 +264,12 @@ mod base {
#[bench]
fn has_dep_1_000_one_edge_per_node(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
let orefs = xs
.iter()
.map(|sym| {
sut.declare(sym, IdentKind::Meta, Source::default())
sut.declare(*sym, IdentKind::Meta, Source::default())
.unwrap()
})
.collect::<Vec<_>>();
@ -308,13 +292,12 @@ mod base {
#[bench]
fn add_dep_lookup_1_000_missing_one_edge_per_node(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
bench.iter(|| {
xs.iter()
.zip(xs.iter().cycle().skip(1))
.map(|(from, to)| sut.add_dep_lookup(from, to))
.map(|(from, to)| sut.add_dep_lookup(*from, *to))
.for_each(drop);
});
}
@ -322,17 +305,16 @@ mod base {
#[bench]
fn add_dep_lookup_1_000_existing_one_edge_per_node(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
xs.iter().for_each(|sym| {
let _ = sut.declare(sym, IdentKind::Meta, Source::default());
let _ = sut.declare(*sym, IdentKind::Meta, Source::default());
});
bench.iter(|| {
xs.iter()
.zip(xs.iter().cycle().skip(1))
.map(|(from, to)| sut.add_dep_lookup(from, to))
.map(|(from, to)| sut.add_dep_lookup(*from, *to))
.for_each(drop);
});
}
@ -340,14 +322,13 @@ mod base {
#[bench]
fn sort_1_with_1_000_existing_supernode(bench: &mut Bencher) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
let orefs = xs
.iter()
.map(|sym| {
sut.declare(
sym,
*sym,
IdentKind::Rate(DataType::Integer),
Source::default(),
)
@ -372,14 +353,13 @@ mod base {
bench: &mut Bencher,
) {
let mut sut = Sut::new();
let interner = DefaultInterner::new();
let xs = interned_n(&interner, 1_000);
let xs = interned_n(1_000);
let orefs = xs
.iter()
.map(|sym| {
sut.declare(
sym,
*sym,
IdentKind::Rate(DataType::Integer),
Source::default(),
)
@ -414,29 +394,27 @@ mod object {
use tamer::ir::asg::{
IdentKind, IdentObject, IdentObjectData, IdentObjectState, Source,
};
use tamer::sym::{DefaultInterner, Interner};
use tamer::sym::GlobalSymbolIntern;
type Sut<'i> = IdentObject<'i, global::ProgSymSize>;
type Sut = IdentObject<global::ProgSymSize>;
#[bench]
fn declare_1_000(bench: &mut Bencher) {
let interner = DefaultInterner::new();
let sym = interner.intern("sym");
let sym = "sym".intern();
bench.iter(|| {
(0..1000).map(|_| Sut::declare(&sym)).for_each(drop);
(0..1000).map(|_| Sut::declare(sym)).for_each(drop);
});
}
#[bench]
fn resolve_1_000_missing(bench: &mut Bencher) {
let interner = DefaultInterner::new();
let sym = interner.intern("sym");
let sym = "sym".intern();
bench.iter(|| {
(0..1000)
.map(|_| {
Sut::declare(&sym)
Sut::declare(sym)
.resolve(IdentKind::Meta, Source::default())
})
.for_each(drop);
@ -445,13 +423,12 @@ mod object {
#[bench]
fn extern_1_000_missing(bench: &mut Bencher) {
let interner = DefaultInterner::new();
let sym = interner.intern("sym");
let sym = "sym".intern();
bench.iter(|| {
(0..1000)
.map(|_| {
Sut::declare(&sym)
Sut::declare(sym)
.extern_(IdentKind::Meta, Source::default())
})
.for_each(drop);
@ -460,13 +437,12 @@ mod object {
#[bench]
fn resolve_1_000_extern(bench: &mut Bencher) {
let interner = DefaultInterner::new();
let sym = interner.intern("sym");
let sym = "sym".intern();
bench.iter(|| {
(0..1000)
.map(|_| {
Sut::declare(&sym)
Sut::declare(sym)
.extern_(IdentKind::Meta, Source::default())
.unwrap()
.resolve(IdentKind::Meta, Source::default())
@ -477,13 +453,12 @@ mod object {
#[bench]
fn resolve_1_000_override(bench: &mut Bencher) {
let interner = DefaultInterner::new();
let sym = interner.intern("sym");
let sym = "sym".intern();
bench.iter(|| {
(0..1000)
.map(|_| {
Sut::declare(&sym)
Sut::declare(sym)
.resolve(
IdentKind::Meta,
Source {
@ -507,13 +482,12 @@ mod object {
// Override encountered before virtual
#[bench]
fn resolve_1_000_override_virt_after_override(bench: &mut Bencher) {
let interner = DefaultInterner::new();
let sym = interner.intern("sym");
let sym = "sym".intern();
bench.iter(|| {
(0..1000)
.map(|_| {
Sut::declare(&sym)
Sut::declare(sym)
.resolve(
IdentKind::Meta,
Source {
@ -536,13 +510,12 @@ mod object {
#[bench]
fn set_fragment_1_000_resolved_with_new_str(bench: &mut Bencher) {
let interner = DefaultInterner::new();
let sym = interner.intern("sym");
let sym = "sym".intern();
bench.iter(|| {
(0..1000)
.map(|_| {
Sut::declare(&sym)
Sut::declare(sym)
.resolve(IdentKind::Meta, Source::default())
.unwrap()
.set_fragment("".into())
@ -554,11 +527,10 @@ mod object {
// No need to do all of the others, since they're all the same thing.
#[bench]
fn declared_name_1_000(bench: &mut Bencher) {
let interner = DefaultInterner::new();
let sym = interner.intern("sym");
let sym = "sym".intern();
bench.iter(|| {
(0..1000).map(|_| Sut::declare(&sym).name()).for_each(drop);
(0..1000).map(|_| Sut::declare(sym).name()).for_each(drop);
});
}
}

View File

@ -112,10 +112,7 @@ mod interner {
let sut = ArenaInterner::<RandomState, u32>::new();
let strs = gen_strs(1000);
let syms = strs
.iter()
.map(|s| sut.intern(s).index())
.collect::<Vec<_>>();
let syms = strs.iter().map(|s| sut.intern(s)).collect::<Vec<_>>();
bench.iter(|| {
syms.iter().map(|si| sut.index_lookup(*si)).for_each(drop);
@ -188,4 +185,41 @@ mod interner {
});
}
}
// Note that these tests don't drop the global interner in-between.
mod global {
use super::*;
use tamer::sym::GlobalSymbolIntern;
#[bench]
fn with_all_new_1000(bench: &mut Bencher) {
let strs = gen_strs(1000);
bench.iter(|| {
strs.iter()
.map::<ProgSymbolId, _>(|s| s.intern())
.for_each(drop);
});
}
#[bench]
fn with_one_new_1000(bench: &mut Bencher) {
bench.iter(|| {
(0..1000)
.map::<ProgSymbolId, _>(|_| "onenew".intern())
.for_each(drop);
});
}
#[bench]
fn with_one_new_1000_utf8_unchecked(bench: &mut Bencher) {
bench.iter(|| {
(0..1000)
.map::<ProgSymbolId, _>(|_| unsafe {
(b"onenewu8").intern_utf8_unchecked()
})
.for_each(drop);
});
}
}
}

File diff suppressed because it is too large Load Diff

View File

@ -25,7 +25,7 @@ use super::object::{
UnresolvedError,
};
use super::Sections;
use crate::sym::{Symbol, SymbolIndexSize};
use crate::sym::{SymbolId, SymbolIndexSize};
use petgraph::graph::NodeIndex;
use std::fmt::Debug;
use std::result::Result;
@ -46,10 +46,10 @@ impl<T: petgraph::graph::IndexType> IndexType for T {}
///
/// For more information,
/// see the [module-level documentation][self].
pub trait Asg<'i, O, Ix>
pub trait Asg<O, Ix>
where
Ix: IndexType + SymbolIndexSize,
O: IdentObjectState<'i, Ix, O>,
O: IdentObjectState<Ix, O>,
{
/// Declare a concrete identifier.
///
@ -59,7 +59,7 @@ where
/// Once declared,
/// this information cannot be changed.
///
/// Identifiers are uniquely identified by a [`Symbol`] `name`.
/// Identifiers are uniquely identified by a [`SymbolId`] `name`.
/// If an identifier of the same `name` already exists,
/// then the provided declaration is compared against the existing
/// declaration---should
@ -84,9 +84,9 @@ where
/// and return an [`ObjectRef`] reference.
fn declare(
&mut self,
name: &'i Symbol<'i, Ix>,
name: SymbolId<Ix>,
kind: IdentKind,
src: Source<'i, Ix>,
src: Source<Ix>,
) -> AsgResult<ObjectRef<Ix>>;
/// Declare an abstract identifier.
@ -111,9 +111,9 @@ where
/// compatibility related to extern resolution.
fn declare_extern(
&mut self,
name: &'i Symbol<'i, Ix>,
name: SymbolId<Ix>,
kind: IdentKind,
src: Source<'i, Ix>,
src: Source<Ix>,
) -> AsgResult<ObjectRef<Ix>>;
/// Set the fragment associated with a concrete identifier.
@ -142,7 +142,7 @@ where
/// this method cannot be used to retrieve all possible objects on the
/// graph---for
/// that, see [`Asg::get`].
fn lookup(&self, name: &'i Symbol<'i, Ix>) -> Option<ObjectRef<Ix>>;
fn lookup(&self, name: SymbolId<Ix>) -> Option<ObjectRef<Ix>>;
/// Declare that `dep` is a dependency of `ident`.
///
@ -151,7 +151,7 @@ where
/// The [linker][crate::ld] will ensure this ordering.
///
/// See [`add_dep_lookup`][Asg::add_dep_lookup] if identifiers have to
/// be looked up by [`Symbol`] or if they may not yet have been
/// be looked up by [`SymbolId`] or if they may not yet have been
/// declared.
fn add_dep(&mut self, ident: ObjectRef<Ix>, dep: ObjectRef<Ix>);
@ -173,8 +173,8 @@ where
/// References to both identifiers are returned in argument order.
fn add_dep_lookup(
&mut self,
ident: &'i Symbol<'i, Ix>,
dep: &'i Symbol<'i, Ix>,
ident: SymbolId<Ix>,
dep: SymbolId<Ix>,
) -> (ObjectRef<Ix>, ObjectRef<Ix>);
}
@ -182,9 +182,9 @@ where
///
/// Allow a graph to be partitioned into different [`Sections`] that can be
/// used as an `Intermediate Representation`.
pub trait SortableAsg<'i, O, Ix>
pub trait SortableAsg<O, Ix>
where
O: IdentObjectData<'i, Ix>,
O: IdentObjectData<Ix>,
Ix: IndexType + SymbolIndexSize,
{
/// Sort graph into [`Sections`].
@ -192,7 +192,7 @@ where
/// Sorting will fail if the graph contains unresolved objects,
/// or identifiers whose kind cannot be determined
/// (see [`UnresolvedError`]).
fn sort(
fn sort<'i>(
&'i self,
roots: &[ObjectRef<Ix>],
) -> SortableAsgResult<Sections<'i, O>, Ix>;
@ -246,11 +246,6 @@ pub type AsgEdge = ();
pub type Node<O> = Option<O>;
/// An error from an ASG operation.
///
/// Storing [`Symbol`] would require that this have a lifetime,
/// which is very inconvenient when chaining [`Result`],
/// so this stores only owned values.
/// The caller will know the problem values.
#[derive(Debug, PartialEq)]
pub enum AsgError {
/// An object could not change state in the manner requested.

View File

@ -209,7 +209,7 @@ impl std::fmt::Display for IdentKind {
}
}
impl<'i, Ix> TryFrom<SymAttrs<'i, Ix>> for IdentKind
impl<Ix> TryFrom<SymAttrs<Ix>> for IdentKind
where
Ix: SymbolIndexSize,
{
@ -219,12 +219,12 @@ where
///
/// Certain [`IdentKind`] require that certain attributes be present,
/// otherwise the conversion will fail.
fn try_from(attrs: SymAttrs<'i, Ix>) -> Result<Self, Self::Error> {
fn try_from(attrs: SymAttrs<Ix>) -> Result<Self, Self::Error> {
Self::try_from(&attrs)
}
}
impl<'i, Ix> TryFrom<&SymAttrs<'i, Ix>> for IdentKind
impl<Ix> TryFrom<&SymAttrs<Ix>> for IdentKind
where
Ix: SymbolIndexSize,
{
@ -234,7 +234,7 @@ where
///
/// Certain [`IdentKind`] require that certain attributes be present,
/// otherwise the conversion will fail.
fn try_from(attrs: &SymAttrs<'i, Ix>) -> Result<Self, Self::Error> {
fn try_from(attrs: &SymAttrs<Ix>) -> Result<Self, Self::Error> {
let ty = attrs.ty.as_ref().ok_or(Self::Error::MissingType)?;
macro_rules! ident {
@ -360,7 +360,7 @@ mod test {
use super::*;
use std::convert::TryInto;
type Ix = u8;
type Ix = u16;
#[test]
fn dim_from_u8() {

View File

@ -208,4 +208,4 @@ pub use object::{
pub use section::{Section, SectionIterator, Sections};
/// Default concrete ASG implementation.
pub type DefaultAsg<'i, O, Ix> = base::BaseAsg<O, Ix>;
pub type DefaultAsg<O, Ix> = base::BaseAsg<O, Ix>;

File diff suppressed because it is too large Load Diff

View File

@ -154,13 +154,9 @@ impl<'a, T> Sections<'a, T> {
mod test {
use super::*;
use crate::ir::asg::IdentObject;
use crate::sym::{Symbol, SymbolId};
use crate::sym::GlobalSymbolIntern;
lazy_static! {
static ref SYM: Symbol<'static, u16> = symbol_dummy!(1, "sym");
}
type Sut<'a, 'i> = Section<'a, IdentObject<'i, u16>>;
type Sut<'a, 'i> = Section<'a, IdentObject<u16>>;
#[test]
fn section_empty() {
@ -174,7 +170,7 @@ mod test {
#[test]
fn section_head() {
let mut section = Sut::new();
let obj = IdentObject::Missing(&SYM);
let obj = IdentObject::Missing("sym".intern());
assert!(section.head.is_empty());
@ -186,7 +182,7 @@ mod test {
#[test]
fn section_body() {
let mut section = Sut::new();
let obj = IdentObject::Missing(&SYM);
let obj = IdentObject::Missing("sym".intern());
assert!(section.body.is_empty());
@ -199,7 +195,7 @@ mod test {
#[test]
fn section_tail() {
let mut section = Sut::new();
let obj = IdentObject::Missing(&SYM);
let obj = IdentObject::Missing("sym".intern());
assert!(section.tail.is_empty());
@ -211,7 +207,7 @@ mod test {
#[test]
fn section_len() {
let mut section = Sut::new();
let obj = IdentObject::Missing(&SYM);
let obj = IdentObject::Missing("sym".intern());
assert_eq!(0, section.len());
section.push_head(&obj);
@ -225,7 +221,7 @@ mod test {
#[test]
fn section_is_empty_head() {
let mut section = Sut::new();
let obj = IdentObject::Missing(&SYM);
let obj = IdentObject::Missing("sym".intern());
assert!(section.is_empty());
section.push_head(&obj);
@ -235,7 +231,7 @@ mod test {
#[test]
fn section_is_empty_body() {
let mut section = Sut::new();
let obj = IdentObject::Missing(&SYM);
let obj = IdentObject::Missing("sym".intern());
assert!(section.is_empty());
section.push_body(&obj);
@ -245,7 +241,7 @@ mod test {
#[test]
fn section_is_empty_tail() {
let mut section = Sut::new();
let obj = IdentObject::Missing(&SYM);
let obj = IdentObject::Missing("sym".intern());
assert!(section.is_empty());
section.push_tail(&obj);
@ -255,7 +251,7 @@ mod test {
#[test]
fn section_iterator() {
let mut section = Sut::new();
let obj = IdentObject::Missing(&SYM);
let obj = IdentObject::Missing("sym".intern());
let expect = vec![&obj, &obj, &obj];
section.push_head(&obj);

View File

@ -27,18 +27,18 @@
//! This IR should be converted into a higher-level IR quickly,
//! especially considering that it will be going away in the future.
use crate::sym::{Symbol, SymbolIndexSize};
use crate::sym::{SymbolId, SymbolIndexSize};
use std::convert::TryFrom;
use std::result::Result;
/// Toplevel package attributes.
#[derive(Debug, PartialEq, Eq)]
pub struct PackageAttrs<'i, Ix: SymbolIndexSize> {
pub struct PackageAttrs<Ix: SymbolIndexSize> {
/// Unique package identifier.
///
/// The package name is derived from the filename relative to the
/// project root during compilation (see `relroot`).
pub name: Option<&'i Symbol<'i, Ix>>,
pub name: Option<SymbolId<Ix>>,
/// Relative path from package to project root.
pub relroot: Option<String>,
@ -57,12 +57,12 @@ pub struct PackageAttrs<'i, Ix: SymbolIndexSize> {
/// met.
/// This symbol is responsible for including each of those invariants as
/// dependencies so that they are included at link-time.
pub elig: Option<&'i Symbol<'i, Ix>>,
pub elig: Option<SymbolId<Ix>>,
}
// The derive macro seems to add an `Ix: Default` bound,
// so we'll implement it manually to avoid that.
impl<'i, Ix: SymbolIndexSize> Default for PackageAttrs<'i, Ix> {
impl<Ix: SymbolIndexSize> Default for PackageAttrs<Ix> {
fn default() -> Self {
Self {
name: Default::default(),
@ -87,13 +87,13 @@ impl<'i, Ix: SymbolIndexSize> Default for PackageAttrs<'i, Ix> {
/// Consequently,
/// valid values should be enforced by the Rust's type system.
#[derive(Debug, PartialEq, Eq)]
pub struct SymAttrs<'i, Ix: SymbolIndexSize> {
pub struct SymAttrs<Ix: SymbolIndexSize> {
/// Relative path to the package that defined this symbol.
///
/// Object files store relative paths so that they are somewhat
/// portable—the
/// entire project root should be able to be relocated.
pub src: Option<&'i Symbol<'i, Ix>>,
pub src: Option<SymbolId<Ix>>,
/// Symbol type.
///
@ -135,14 +135,14 @@ pub struct SymAttrs<'i, Ix: SymbolIndexSize> {
/// relative to the project root.
/// _Note that this is problematic if one wants to compile the equivalent
/// of shared libraries._
pub pkg_name: Option<&'i Symbol<'i, Ix>>,
pub pkg_name: Option<SymbolId<Ix>>,
/// The identifier from which this one is derived.
///
/// For example,
/// [`SymType::Cgen`] has a parent [`SymType::Class`] and
/// [`SymType::Gen`] has a parent [`SymType::Rate`].
pub parent: Option<&'i Symbol<'i, Ix>>,
pub parent: Option<SymbolId<Ix>>,
/// Whether this identifier was generated by the compiler.
///
@ -157,7 +157,7 @@ pub struct SymAttrs<'i, Ix: SymbolIndexSize> {
///
/// For [`SymType::Class`],
/// this represents an associated [`SymType::Cgen`].
pub yields: Option<&'i Symbol<'i, Ix>>,
pub yields: Option<SymbolId<Ix>>,
/// User-friendly identifier description.
///
@ -172,7 +172,7 @@ pub struct SymAttrs<'i, Ix: SymbolIndexSize> {
/// - [`SymType::Map`] includes the name of the source field; and
/// - [`SymType::Func`] lists params in order (so that the compiler
/// knows application order).
pub from: Option<Vec<&'i Symbol<'i, Ix>>>,
pub from: Option<Vec<SymbolId<Ix>>>,
/// Whether symbol can be overridden.
///
@ -187,7 +187,7 @@ pub struct SymAttrs<'i, Ix: SymbolIndexSize> {
// The derive macro seems to add an `Ix: Default` bound,
// so we'll implement it manually to avoid that.
impl<'i, Ix: SymbolIndexSize> Default for SymAttrs<'i, Ix> {
impl<Ix: SymbolIndexSize> Default for SymAttrs<Ix> {
fn default() -> Self {
Self {
src: Default::default(),

View File

@ -30,7 +30,8 @@ use crate::ir::asg::{
};
use crate::obj::xmle::writer::XmleWriter;
use crate::obj::xmlo::{AsgBuilder, AsgBuilderState, XmloReader};
use crate::sym::{DefaultInterner, DefaultProgInterner, Interner, Symbol};
use crate::sym::SymbolId;
use crate::sym::{GlobalSymbolIntern, GlobalSymbolResolve};
use fxhash::FxBuildHasher;
use petgraph_graphml::GraphMl;
use std::error::Error;
@ -38,22 +39,20 @@ use std::fs;
use std::io::BufReader;
use std::path::{Path, PathBuf};
type LinkerAsg<'i> =
DefaultAsg<'i, IdentObject<'i, global::ProgSymSize>, global::ProgIdentSize>;
type LinkerAsg =
DefaultAsg<IdentObject<global::ProgSymSize>, global::ProgIdentSize>;
type LinkerAsgBuilderState<'i> =
AsgBuilderState<'i, FxBuildHasher, global::ProgIdentSize>;
type LinkerAsgBuilderState =
AsgBuilderState<FxBuildHasher, global::ProgIdentSize>;
pub fn xmle(package_path: &str, output: &str) -> Result<(), Box<dyn Error>> {
let mut fs = VisitOnceFilesystem::new();
let mut depgraph = LinkerAsg::with_capacity(65536, 65536);
let interner = DefaultInterner::new();
let state = load_xmlo(
package_path,
&mut fs,
&mut depgraph,
&interner,
AsgBuilderState::new(),
)?;
@ -67,7 +66,7 @@ pub fn xmle(package_path: &str, output: &str) -> Result<(), Box<dyn Error>> {
roots.extend(
vec!["___yield", "___worksheet"]
.iter()
.map(|name| interner.intern(name))
.map(|name| name.intern())
.filter_map(|sym| depgraph.lookup(sym)),
);
@ -82,7 +81,12 @@ pub fn xmle(package_path: &str, output: &str) -> Result<(), Box<dyn Error>> {
.map(|obj| {
format!(
"{}",
depgraph.get(obj).unwrap().name().unwrap()
depgraph
.get(obj)
.unwrap()
.name()
.unwrap()
.lookup_str(),
)
})
.collect();
@ -100,7 +104,6 @@ pub fn xmle(package_path: &str, output: &str) -> Result<(), Box<dyn Error>> {
output_xmle(
&depgraph,
&interner,
&mut sorted,
name.expect("missing root package name"),
relroot.expect("missing root package relroot"),
@ -113,13 +116,11 @@ pub fn xmle(package_path: &str, output: &str) -> Result<(), Box<dyn Error>> {
pub fn graphml(package_path: &str, output: &str) -> Result<(), Box<dyn Error>> {
let mut fs = VisitOnceFilesystem::new();
let mut depgraph = LinkerAsg::with_capacity(65536, 65536);
let interner = DefaultProgInterner::new();
let _ = load_xmlo(
package_path,
&mut fs,
&mut depgraph,
&interner,
AsgBuilderState::new(),
)?;
@ -139,7 +140,7 @@ pub fn graphml(package_path: &str, output: &str) -> Result<(), Box<dyn Error>> {
};
(
format!("{}", n),
format!("{}", n.name().unwrap().lookup_str()),
n.kind().unwrap().as_ref(),
format!("{}", generated),
)
@ -163,13 +164,12 @@ pub fn graphml(package_path: &str, output: &str) -> Result<(), Box<dyn Error>> {
Ok(())
}
fn load_xmlo<'a, 'i, I: Interner<'i, global::ProgSymSize>, P: AsRef<Path>>(
fn load_xmlo<'a, P: AsRef<Path>>(
path_str: P,
fs: &mut VisitOnceFilesystem<FsCanonicalizer, FxBuildHasher>,
depgraph: &mut LinkerAsg<'i>,
interner: &'i I,
state: LinkerAsgBuilderState<'i>,
) -> Result<LinkerAsgBuilderState<'i>, Box<dyn Error>> {
depgraph: &mut LinkerAsg,
state: LinkerAsgBuilderState,
) -> Result<LinkerAsgBuilderState, Box<dyn Error>> {
let cfile: PathFile<BufReader<fs::File>> = match fs.open(path_str)? {
VisitOnceFile::FirstVisit(file) => file,
VisitOnceFile::Visited => return Ok(state),
@ -177,7 +177,7 @@ fn load_xmlo<'a, 'i, I: Interner<'i, global::ProgSymSize>, P: AsRef<Path>>(
let (path, file) = cfile.into();
let xmlo: XmloReader<'_, _, _, _> = (file, interner).into();
let xmlo: XmloReader<_, _> = file.into();
let mut state = depgraph.import_xmlo(xmlo, state)?;
@ -188,51 +188,49 @@ fn load_xmlo<'a, 'i, I: Interner<'i, global::ProgSymSize>, P: AsRef<Path>>(
for relpath in found.iter() {
let mut path_buf = dir.clone();
path_buf.push(relpath);
let str: &str = &relpath.lookup_str();
path_buf.push(str);
path_buf.set_extension("xmlo");
state = load_xmlo(path_buf, fs, depgraph, interner, state)?;
state = load_xmlo(path_buf, fs, depgraph, state)?;
}
Ok(state)
}
fn get_ident<'a, 'i>(
depgraph: &'a LinkerAsg<'i>,
name: &'i Symbol<'i, global::ProgSymSize>,
) -> Result<&'a IdentObject<'i, global::ProgSymSize>, String> {
fn get_ident<'a>(
depgraph: &'a LinkerAsg,
name: SymbolId<global::ProgSymSize>,
) -> Result<&'a IdentObject<global::ProgSymSize>, String> {
depgraph
.lookup(name)
.and_then(|id| depgraph.get(id))
.ok_or(format!("missing identifier: {}", name))
.ok_or(format!("missing identifier: {}", name.lookup_str()))
}
fn output_xmle<'a, 'i, I: Interner<'i, global::ProgSymSize>>(
depgraph: &'a LinkerAsg<'i>,
interner: &'i I,
sorted: &mut Sections<'a, IdentObject<'i, global::ProgSymSize>>,
name: &'i Symbol<'i, global::ProgSymSize>,
fn output_xmle<'a>(
depgraph: &'a LinkerAsg,
sorted: &mut Sections<'a, IdentObject<global::ProgSymSize>>,
name: SymbolId<global::ProgSymSize>,
relroot: String,
output: &str,
) -> Result<(), Box<dyn Error>> {
if !sorted.map.is_empty() {
sorted
.map
.push_head(get_ident(depgraph, interner.intern(":map:___head"))?);
.push_head(get_ident(depgraph, ":map:___head".intern())?);
sorted
.map
.push_tail(get_ident(depgraph, interner.intern(":map:___tail"))?);
.push_tail(get_ident(depgraph, ":map:___tail".intern())?);
}
if !sorted.retmap.is_empty() {
sorted.retmap.push_head(get_ident(
depgraph,
interner.intern(":retmap:___head"),
)?);
sorted.retmap.push_tail(get_ident(
depgraph,
interner.intern(":retmap:___tail"),
)?);
sorted
.retmap
.push_head(get_ident(depgraph, ":retmap:___head".intern())?);
sorted
.retmap
.push_tail(get_ident(depgraph, ":retmap:___tail".intern())?);
}
let file = fs::File::create(output)?;

View File

@ -24,14 +24,9 @@
pub mod global;
#[macro_use]
extern crate lazy_static;
#[macro_use]
extern crate static_assertions;
#[macro_use]
pub mod sym;
#[cfg(feature = "wip-frontends")]
pub mod frontend;
@ -39,6 +34,7 @@ pub mod fs;
pub mod ir;
pub mod ld;
pub mod obj;
pub mod sym;
pub mod tpwrap;
#[cfg(test)]

View File

@ -28,18 +28,15 @@
//! The example below is incomplete, but shows the general usage.
//!
//! ```
//! use tamer::obj::xmle::writer::XmleWriter;
//! use tamer::ir::asg::{IdentObject, Sections};
//! use tamer::sym::{DefaultProgInterner, Interner, Symbol};
//! use tamer::obj::xmle::writer::XmleWriter;
//! use tamer::sym::GlobalSymbolIntern;
//! use std::io::Cursor;
//!
//! let interner = DefaultProgInterner::new();
//! let name = interner.intern(&String::from("foo"));
//!
//! let sections = Sections::<IdentObject<_>>::new();
//! let writer = Cursor::new(Vec::new());
//! let mut xmle_writer = XmleWriter::new(writer);
//! xmle_writer.write(&sections, name, &String::from(""));
//! xmle_writer.write(&sections, "foo".intern(), &String::from(""));
//! ```
mod writer;

View File

@ -18,7 +18,7 @@
// along with this program. If not, see <http://www.gnu.org/licenses/>.
use crate::ir::asg::Sections;
use crate::sym::ProgSymbol as Symbol;
use crate::sym::ProgSymbolId;
use quick_xml::Error as XmlError;
use std::io::{Error as IoError, Write};
use std::result;
@ -33,7 +33,7 @@ pub trait Writer<W: Write> {
fn write<T>(
&mut self,
sections: &Sections<T>,
name: Symbol,
name: ProgSymbolId,
relroot: &str,
) -> Result<()>
where

View File

@ -18,10 +18,11 @@
// along with this program. If not, see <http://www.gnu.org/licenses/>.
use super::writer::{Result, WriterError};
use crate::global;
use crate::ir::asg::{
IdentKind, IdentObject, IdentObjectData, SectionIterator, Sections,
};
use crate::sym::{Symbol, SymbolIndexSize};
use crate::sym::{GlobalSymbolResolve, ProgSymbolId, SymbolId};
use fxhash::FxHashSet;
#[cfg(test)]
use mock::MockXmlWriter as XmlWriter;
@ -35,6 +36,8 @@ pub struct XmleWriter<W: Write> {
writer: XmlWriter<W>,
}
type Ix = global::ProgSymSize;
impl<W: Write> XmleWriter<W> {
/// Create a new instance of `XmleWriter`
/// ```
@ -71,28 +74,26 @@ impl<W: Write> XmleWriter<W> {
///
/// ```
/// use std::io::Cursor;
/// use tamer::obj::xmle::writer::XmleWriter;
/// use tamer::ir::asg::{Sections, IdentObject};
/// use tamer::sym::{Symbol, SymbolId, DefaultProgInterner, Interner};
/// use tamer::obj::xmle::writer::XmleWriter;
/// use tamer::sym::GlobalSymbolIntern;
///
/// let writer = Cursor::new(Vec::new());
/// let mut xmle_writer = XmleWriter::new(writer);
/// let sections = Sections::<IdentObject<_>>::new();
/// let a = "foo";
/// let interner = DefaultProgInterner::new();
/// let name = interner.intern(&a);
/// let name = "foo".intern();
/// xmle_writer.write(
/// &sections,
/// &name,
/// name,
/// &String::from(""),
/// );
/// let buf = xmle_writer.into_inner().into_inner();
/// assert!(!buf.is_empty(), "something was written to the buffer");
/// ```
pub fn write<'i, Ix: SymbolIndexSize, T: IdentObjectData<'i, Ix>>(
pub fn write<T: IdentObjectData<Ix>>(
&mut self,
sections: &Sections<T>,
name: &Symbol<'i, Ix>,
name: SymbolId<Ix>,
relroot: &str,
) -> Result {
self.write_start_package(name, &relroot)?
@ -152,19 +153,21 @@ impl<W: Write> XmleWriter<W> {
///
/// The `package` element's opening tag needs attributes, so it cannot use
/// `write_start_tag` directly.
fn write_start_package<Ix: SymbolIndexSize>(
fn write_start_package(
&mut self,
name: &Symbol<Ix>,
name: SymbolId<Ix>,
relroot: &str,
) -> Result<&mut XmleWriter<W>> {
let name_str = name.lookup_str();
let root =
BytesStart::owned_name(b"package".to_vec()).with_attributes(vec![
("xmlns", "http://www.lovullo.com/rater"),
("xmlns:preproc", "http://www.lovullo.com/rater/preproc"),
("xmlns:l", "http://www.lovullo.com/rater/linker"),
("title", &name), // TODO
("title", &name_str), // TODO
("program", "true"),
("name", &name),
("name", &name_str),
("__rootpath", &relroot),
]);
@ -193,7 +196,7 @@ impl<W: Write> XmleWriter<W> {
///
/// All the [`Sections`] found need to be written out using the `writer`
/// object.
fn write_sections<'i, Ix: SymbolIndexSize, T: IdentObjectData<'i, Ix>>(
fn write_sections<T: IdentObjectData<Ix>>(
&mut self,
sections: &Sections<T>,
relroot: &str,
@ -217,8 +220,6 @@ impl<W: Write> XmleWriter<W> {
match ident {
IdentObject::Ident(sym, kind, src)
| IdentObject::IdentFragment(sym, kind, src, _) => {
let name: &str = sym;
// this'll be formalized more sanely
let mut attrs = match kind {
IdentKind::Cgen(dim) => {
@ -269,6 +270,7 @@ impl<W: Write> XmleWriter<W> {
IdentKind::Worksheet => vec![("type", "worksheet")],
};
let name = &sym.lookup_str();
attrs.push(("name", name));
if src.generated {
@ -277,13 +279,17 @@ impl<W: Write> XmleWriter<W> {
let srcpath: String;
if let Some(pkg_name) = src.pkg_name {
srcpath = format!("{}{}", relroot, pkg_name);
srcpath = format!("{}{}", relroot, pkg_name.lookup_str());
attrs.push(("src", &srcpath));
}
if let Some(parent) = src.parent {
let parent_str = src.parent.map(|sym| sym.lookup_str());
if let Some(ref parent) = parent_str {
attrs.push(("parent", parent));
}
if let Some(yields) = src.yields {
let yields_str = src.yields.map(|sym| sym.lookup_str());
if let Some(ref yields) = yields_str {
attrs.push(("yields", yields));
}
if let Some(desc) = &src.desc {
@ -309,11 +315,11 @@ impl<W: Write> XmleWriter<W> {
///
/// If a `map` object has a `from` attribute in its source, we need to
/// write them using the `writer`'s `write_event`.
fn write_froms<'i, Ix: SymbolIndexSize, T: IdentObjectData<'i, Ix>>(
fn write_froms<T: IdentObjectData<Ix>>(
&mut self,
sections: &Sections<T>,
) -> Result<&mut XmleWriter<W>> {
let mut map_froms: FxHashSet<&str> = Default::default();
let mut map_froms: FxHashSet<ProgSymbolId> = Default::default();
let map_iter = sections.map.iter();
@ -322,13 +328,13 @@ impl<W: Write> XmleWriter<W> {
if let Some(froms) = &src.from {
for from in froms {
map_froms.insert(from);
map_froms.insert(*from);
}
}
}
for from in map_froms {
let name: &str = from;
let name: &str = &from.lookup_str();
self.writer.write_event(Event::Empty(
BytesStart::borrowed_name(b"l:from")
@ -343,7 +349,7 @@ impl<W: Write> XmleWriter<W> {
///
/// Iterates through the parts of a `Section` and writes them using the
/// `writer`'s 'write_event`.
fn write_section<'i, Ix: SymbolIndexSize, T: IdentObjectData<'i, Ix>>(
fn write_section<T: IdentObjectData<Ix>>(
&mut self,
idents: SectionIterator<T>,
) -> Result<&mut XmleWriter<W>> {
@ -422,7 +428,7 @@ mod test {
use super::*;
use crate::ir::asg::{Dim, Section, Source};
use crate::ir::legacyir::SymAttrs;
use crate::sym::{Symbol, SymbolId};
use crate::sym::GlobalSymbolIntern;
use std::str;
type Sut<W> = XmleWriter<W>;
@ -457,9 +463,7 @@ mod test {
_ => panic!("did not match expected event"),
}));
let sym = symbol_dummy!(1u8, "sym");
sut.write_start_package(&sym, &String::from(""))?;
sut.write_start_package("".intern(), &String::from(""))?;
Ok(())
}
@ -512,9 +516,8 @@ mod test {
_ => panic!("did not trigger event"),
}));
let sym = symbol_dummy!(1u8, "sym");
let obj = IdentObject::IdentFragment(
&sym,
"sym".intern(),
IdentKind::Meta,
Source::default(),
String::from(""),
@ -534,9 +537,8 @@ mod test {
panic!("callback should not have been called");
}));
let sym = symbol_dummy!(1u8, "sym");
let obj = IdentObject::Ident(
&sym,
"sym".intern(),
IdentKind::Cgen(Dim::default()),
Source::default(),
);
@ -555,8 +557,7 @@ mod test {
panic!("callback should not have been called");
}));
let sym = symbol_dummy!(1u8, "sym");
let obj = IdentObject::Missing(&sym);
let obj = IdentObject::Missing("missing".intern());
let mut section = Section::new();
section.push_body(&obj);
@ -594,9 +595,11 @@ mod test {
_ => panic!("unexpected event"),
}));
let sym = symbol_dummy!(1u8, "random_symbol");
let object =
IdentObject::Ident(&sym, IdentKind::Worksheet, Source::default());
let object = IdentObject::Ident(
"random_symbol".intern(),
IdentKind::Worksheet,
Source::default(),
);
let mut sections = Sections::new();
sections.map.push_body(&object);
sut.write_sections(&sections, &String::from(""))?;
@ -649,25 +652,25 @@ mod test {
_ => panic!("unexpected event"),
}));
let nsym = symbol_dummy!(1u8, "name");
let ssym = symbol_dummy!(2u8, "src");
let psym = symbol_dummy!(3u8, "parent");
let ysym = symbol_dummy!(4u8, "yields");
let fsym = symbol_dummy!(5u8, "from");
let nsym = "name".intern();
let ssym = "src".intern();
let psym = "parent".intern();
let ysym = "yields".intern();
let fsym = "from".intern();
let attrs = SymAttrs {
pkg_name: Some(&nsym),
src: Some(&ssym),
pkg_name: Some(nsym),
src: Some(ssym),
generated: true,
parent: Some(&psym),
yields: Some(&ysym),
parent: Some(psym),
yields: Some(ysym),
desc: Some("sym desc".to_string()),
from: Some(vec![&fsym]),
from: Some(vec![fsym]),
virtual_: true,
..Default::default()
};
let object =
IdentObject::Ident(&nsym, IdentKind::Worksheet, attrs.into());
IdentObject::Ident(nsym, IdentKind::Worksheet, attrs.into());
let mut sections = Sections::new();
sections.map.push_body(&object);
sut.write_sections(&sections, &String::from("root"))?;
@ -696,12 +699,12 @@ mod test {
_ => panic!("unexpected event"),
}));
let sym = symbol_dummy!(1u8, "source symbol");
let symb = symbol_dummy!(2u8, "dest symbol");
let sym = "source symbol".intern();
let symb = "dest symbol".intern();
let mut src = Source::default();
src.from = Some(vec![&symb]);
let object = IdentObject::Ident(&sym, IdentKind::Worksheet, src);
src.from = Some(vec![symb]);
let object = IdentObject::Ident(sym, IdentKind::Worksheet, src);
let mut sections = Sections::new();
sections.map.push_body(&object);
sut.write_froms(&sections)?;
@ -717,10 +720,10 @@ mod test {
_ => panic!("unexpected write"),
}));
let sym = symbol_dummy!(1u8, "random_symbol");
let sym = "random_symbol".intern();
let object =
IdentObject::Ident(&sym, IdentKind::Worksheet, Source::default());
IdentObject::Ident(sym, IdentKind::Worksheet, Source::default());
let mut sections = Sections::new();
sections.map.push_body(&object);
sut.write_froms(&sections)?;

View File

@ -42,7 +42,7 @@
//! use tamer::global;
//! use tamer::ir::asg::{DefaultAsg, IdentObject};
//! use tamer::obj::xmlo::{AsgBuilder, AsgBuilderState, XmloReader};
//! use tamer::sym::{DefaultInterner, Interner};
//! use tamer::sym::GlobalSymbolIntern;
//! use fxhash::FxBuildHasher;
//! use std::io::BufReader;
//!
@ -54,16 +54,15 @@
//! </preproc:fragments>
//! </package>"#;
//!
//! let interner = DefaultInterner::new();
//! let xmlo = XmloReader::new(src_xmlo, &interner);
//! let mut asg = DefaultAsg::<'_, IdentObject<_>, global::ProgIdentSize>::new();
//! let xmlo = XmloReader::new(src_xmlo);
//! let mut asg = DefaultAsg::<IdentObject<_>, global::ProgIdentSize>::new();
//!
//! let state = asg.import_xmlo(xmlo, AsgBuilderState::<'_, FxBuildHasher, _>::new());
//! let state = asg.import_xmlo(xmlo, AsgBuilderState::<FxBuildHasher, _>::new());
//!
//! // Use `state.found` to recursively load dependencies.
//! let AsgBuilderState { found, .. } = state.expect("unexpected failure");
//! assert_eq!(
//! vec![&"dep/package"],
//! vec![&"dep/package".intern()],
//! found.unwrap().iter().collect::<Vec<_>>(),
//! );
//! ```
@ -73,15 +72,15 @@ use crate::ir::asg::{
Asg, AsgError, IdentKind, IdentKindError, IdentObjectState, IndexType,
ObjectRef, Source,
};
use crate::sym::{Symbol, SymbolIndexSize};
use crate::sym::{GlobalSymbolResolve, SymbolId, SymbolIndexSize, SymbolStr};
use std::collections::HashSet;
use std::convert::TryInto;
use std::error::Error;
use std::fmt::Display;
use std::hash::BuildHasher;
pub type Result<'i, S, Ix> =
std::result::Result<AsgBuilderState<'i, S, Ix>, AsgBuilderError>;
pub type Result<S, Ix> =
std::result::Result<AsgBuilderState<S, Ix>, AsgBuilderError>;
/// Builder state between imports.
///
@ -103,7 +102,7 @@ pub type Result<'i, S, Ix> =
/// This is used by the linker to only include dependencies that are
/// actually used by a particular program.
#[derive(Debug, Default)]
pub struct AsgBuilderState<'i, S, Ix>
pub struct AsgBuilderState<S, Ix>
where
S: BuildHasher,
Ix: IndexType + SymbolIndexSize,
@ -123,12 +122,12 @@ where
///
/// See [`AsgBuilder::import_xmlo`] for behavior when this value is
/// [`None`].
pub found: Option<HashSet<&'i str, S>>,
pub found: Option<HashSet<SymbolId<Ix>, S>>,
/// Program name once discovered.
///
/// This will be set by the first package encountered.
pub name: Option<&'i Symbol<'i, Ix>>,
pub name: Option<SymbolId<Ix>>,
/// Relative path to project root once discovered.
///
@ -136,7 +135,7 @@ where
pub relroot: Option<String>,
}
impl<'i, S, Ix> AsgBuilderState<'i, S, Ix>
impl<S, Ix> AsgBuilderState<S, Ix>
where
S: BuildHasher + Default,
Ix: IndexType + SymbolIndexSize,
@ -161,9 +160,9 @@ where
/// For more information on what data are processed,
/// see [`AsgBuilderState`].
/// See the [module-level documentation](self) for example usage.
pub trait AsgBuilder<'i, O, S, Ix>
pub trait AsgBuilder<O, S, Ix>
where
O: IdentObjectState<'i, Ix, O>,
O: IdentObjectState<Ix, O>,
S: BuildHasher,
Ix: IndexType + SymbolIndexSize,
{
@ -179,23 +178,23 @@ where
/// Its initial value can be provided as [`Default::default`].
fn import_xmlo(
&mut self,
xmlo: impl Iterator<Item = XmloResult<XmloEvent<'i, Ix>>>,
state: AsgBuilderState<'i, S, Ix>,
) -> Result<'i, S, Ix>;
xmlo: impl Iterator<Item = XmloResult<XmloEvent<Ix>>>,
state: AsgBuilderState<S, Ix>,
) -> Result<S, Ix>;
}
impl<'i, O, S, Ix, G> AsgBuilder<'i, O, S, Ix> for G
impl<O, S, Ix, G> AsgBuilder<O, S, Ix> for G
where
O: IdentObjectState<'i, Ix, O>,
O: IdentObjectState<Ix, O>,
S: BuildHasher + Default,
Ix: IndexType + SymbolIndexSize,
G: Asg<'i, O, Ix>,
G: Asg<O, Ix>,
{
fn import_xmlo(
&mut self,
mut xmlo: impl Iterator<Item = XmloResult<XmloEvent<'i, Ix>>>,
mut state: AsgBuilderState<'i, S, Ix>,
) -> Result<'i, S, Ix> {
mut xmlo: impl Iterator<Item = XmloResult<XmloEvent<Ix>>>,
mut state: AsgBuilderState<S, Ix>,
) -> Result<S, Ix> {
let mut elig = None;
let first = state.is_first();
let found = state.found.get_or_insert(Default::default());
@ -224,7 +223,7 @@ where
let extern_ = attrs.extern_;
let kindval = (&attrs).try_into()?;
let mut src: Source<'i, Ix> = attrs.into();
let mut src: Source<Ix> = attrs.into();
// Existing convention is to omit @src of local package
// (in this case, the program being linked)
@ -253,7 +252,7 @@ where
XmloEvent::Fragment(sym, text) => {
let frag = self.lookup(sym).ok_or(
AsgBuilderError::MissingFragmentIdent(sym.to_string()),
AsgBuilderError::MissingFragmentIdent(sym.lookup_str()),
)?;
self.set_fragment(frag, text)?;
@ -269,11 +268,11 @@ where
}
if let Some(elig_sym) = elig {
state
.roots
.push(self.lookup(elig_sym).ok_or(
AsgBuilderError::BadEligRef(elig_sym.to_string()),
)?);
state.roots.push(
self.lookup(elig_sym).ok_or(AsgBuilderError::BadEligRef(
elig_sym.lookup_str(),
))?,
);
}
Ok(state)
@ -293,13 +292,13 @@ pub enum AsgBuilderError {
AsgError(AsgError),
/// Fragment encountered for an unknown identifier.
MissingFragmentIdent(String),
MissingFragmentIdent(SymbolStr<'static>),
/// Eligibility classification references unknown identifier.
///
/// This is generated by the compiler and so should never happen.
/// (That's not to say that it won't, but it shouldn't.)
BadEligRef(String),
BadEligRef(SymbolStr<'static>),
}
impl Display for AsgBuilderError {
@ -359,22 +358,22 @@ mod test {
use super::*;
use crate::ir::asg::{DefaultAsg, FragmentText, IdentObject};
use crate::ir::legacyir::{PackageAttrs, SymAttrs, SymType};
use crate::sym::SymbolId;
use crate::sym::GlobalSymbolIntern;
use std::collections::hash_map::RandomState;
type SutIx = u8;
type Sut<'i> = DefaultAsg<'i, IdentObject<'i, SutIx>, SutIx>;
type SutState<'i> = AsgBuilderState<'i, RandomState, SutIx>;
type SutIx = u16;
type Sut<'i> = DefaultAsg<IdentObject<SutIx>, SutIx>;
type SutState<'i> = AsgBuilderState<RandomState, SutIx>;
#[test]
fn gets_data_from_package_event() {
let mut sut = Sut::new();
let name = symbol_dummy!(1, "name");
let name = "name".intern();
let relroot = "some/path".to_string();
let evs = vec![Ok(XmloEvent::Package(PackageAttrs {
name: Some(&name),
name: Some(name),
relroot: Some(relroot.clone()),
..Default::default()
}))];
@ -383,7 +382,7 @@ mod test {
.import_xmlo(evs.into_iter(), SutState::new())
.expect("parsing of proper PackageAttrs must succeed");
assert_eq!(Some(&name), state.name);
assert_eq!(Some(name), state.name);
assert_eq!(Some(relroot), state.relroot);
}
@ -403,15 +402,15 @@ mod test {
#[test]
fn adds_elig_as_root() {
let mut sut = Sut::new();
let elig_sym = symbol_dummy!(1, "elig");
let elig_sym = "elig".intern();
// The symbol must be on the graph, or it'll fail.
let elig_node = sut
.declare(&elig_sym, IdentKind::Meta, Default::default())
.declare(elig_sym, IdentKind::Meta, Default::default())
.unwrap();
let evs = vec![Ok(XmloEvent::Package(PackageAttrs {
elig: Some(&elig_sym),
elig: Some(elig_sym),
..Default::default()
}))];
@ -424,20 +423,20 @@ mod test {
fn adds_sym_deps() {
let mut sut = Sut::new();
let sym_from = symbol_dummy!(1, "from");
let sym_to1 = symbol_dummy!(2, "to1");
let sym_to2 = symbol_dummy!(3, "to2");
let sym_from = "from".intern();
let sym_to1 = "to1".intern();
let sym_to2 = "to2".intern();
let evs =
vec![Ok(XmloEvent::SymDeps(&sym_from, vec![&sym_to1, &sym_to2]))];
vec![Ok(XmloEvent::SymDeps(sym_from, vec![sym_to1, sym_to2]))];
let _ = sut
.import_xmlo(evs.into_iter(), SutState::new())
.expect("unexpected failure");
let node_from = sut.lookup(&sym_from).expect("from node not added");
let node_to1 = sut.lookup(&sym_to1).expect("to1 node not added");
let node_to2 = sut.lookup(&sym_to2).expect("to2 node not added");
let node_from = sut.lookup(sym_from).expect("from node not added");
let node_to1 = sut.lookup(sym_to1).expect("to1 node not added");
let node_to2 = sut.lookup(sym_to2).expect("to2 node not added");
assert!(sut.has_dep(node_from, node_to1));
assert!(sut.has_dep(node_from, node_to2));
@ -447,22 +446,22 @@ mod test {
fn sym_decl_with_src_not_added_and_populates_found() {
let mut sut = Sut::new();
let sym = symbol_dummy!(1, "sym");
let src_a = symbol_dummy!(2, "src_a");
let src_b = symbol_dummy!(3, "src_b");
let sym = "sym".intern();
let src_a = "src_a".intern();
let src_b = "src_b".intern();
let evs = vec![
Ok(XmloEvent::SymDecl(
&sym,
sym,
SymAttrs {
src: Some(&src_a),
src: Some(src_a),
..Default::default()
},
)),
Ok(XmloEvent::SymDecl(
&sym,
sym,
SymAttrs {
src: Some(&src_b),
src: Some(src_b),
..Default::default()
},
)),
@ -481,30 +480,30 @@ mod test {
// to change (we're using RandomState).
founds.sort();
assert_eq!(vec![&src_a as &str, &src_b as &str], founds);
assert_eq!(vec![src_a, src_b], founds);
// Symbols with `src` set are external and should not be added to
// the graph.
assert!(sut.lookup(&sym).is_none());
assert!(sut.lookup(sym).is_none());
}
#[test]
fn sym_decl_added_to_graph() {
let mut sut = Sut::new();
let sym_extern = symbol_dummy!(1, "sym_extern");
let sym_non_extern = symbol_dummy!(2, "sym_non_extern");
let sym_map = symbol_dummy!(3, "sym_map");
let sym_retmap = symbol_dummy!(4, "sym_retmap");
let pkg_name = symbol_dummy!(5, "pkg name");
let sym_extern = "sym_extern".intern();
let sym_non_extern = "sym_non_extern".intern();
let sym_map = "sym_map".intern();
let sym_retmap = "sym_retmap".intern();
let pkg_name = "pkg name".intern();
let evs = vec![
// Note that externs should not be recognized as roots even if
// their type would be.
Ok(XmloEvent::SymDecl(
&sym_extern,
sym_extern,
SymAttrs {
pkg_name: Some(&pkg_name),
pkg_name: Some(pkg_name),
extern_: true,
ty: Some(SymType::Meta),
..Default::default()
@ -512,25 +511,25 @@ mod test {
)),
// These three will be roots
Ok(XmloEvent::SymDecl(
&sym_non_extern,
sym_non_extern,
SymAttrs {
pkg_name: Some(&pkg_name),
pkg_name: Some(pkg_name),
ty: Some(SymType::Meta),
..Default::default()
},
)),
Ok(XmloEvent::SymDecl(
&sym_map,
sym_map,
SymAttrs {
pkg_name: Some(&pkg_name),
pkg_name: Some(pkg_name),
ty: Some(SymType::Map),
..Default::default()
},
)),
Ok(XmloEvent::SymDecl(
&sym_retmap,
sym_retmap,
SymAttrs {
pkg_name: Some(&pkg_name),
pkg_name: Some(pkg_name),
ty: Some(SymType::RetMap),
..Default::default()
},
@ -544,9 +543,9 @@ mod test {
assert_eq!(
vec![
sut.lookup(&sym_non_extern).unwrap(),
sut.lookup(&sym_map).unwrap(),
sut.lookup(&sym_retmap).unwrap(),
sut.lookup(sym_non_extern).unwrap(),
sut.lookup(sym_map).unwrap(),
sut.lookup(sym_retmap).unwrap(),
],
state.roots
);
@ -556,50 +555,50 @@ mod test {
assert_eq!(
&IdentObject::Extern(
&sym_extern,
sym_extern,
IdentKind::Meta,
Source {
pkg_name: None,
..Default::default()
},
),
sut.get(sut.lookup(&sym_extern).unwrap()).unwrap(),
sut.get(sut.lookup(sym_extern).unwrap()).unwrap(),
);
assert_eq!(
&IdentObject::Ident(
&sym_non_extern,
sym_non_extern,
IdentKind::Meta,
Source {
pkg_name: None,
..Default::default()
},
),
sut.get(sut.lookup(&sym_non_extern).unwrap()).unwrap(),
sut.get(sut.lookup(sym_non_extern).unwrap()).unwrap(),
);
assert_eq!(
&IdentObject::Ident(
&sym_map,
sym_map,
IdentKind::Map,
Source {
pkg_name: None,
..Default::default()
},
),
sut.get(sut.lookup(&sym_map).unwrap()).unwrap(),
sut.get(sut.lookup(sym_map).unwrap()).unwrap(),
);
assert_eq!(
&IdentObject::Ident(
&sym_retmap,
sym_retmap,
IdentKind::RetMap,
Source {
pkg_name: None,
..Default::default()
},
),
sut.get(sut.lookup(&sym_retmap).unwrap()).unwrap(),
sut.get(sut.lookup(sym_retmap).unwrap()).unwrap(),
);
}
@ -608,20 +607,20 @@ mod test {
fn sym_decl_pkg_name_retained_if_not_first() {
let mut sut = Sut::new();
let sym = symbol_dummy!(1, "sym");
let pkg_name = symbol_dummy!(2, "pkg name");
let sym = "sym".intern();
let pkg_name = "pkg name".intern();
// This is all that's needed to not consider this to be the first
// package, so that pkg_name is retained below.
let state = AsgBuilderState::<'_, RandomState, SutIx> {
name: Some(&pkg_name),
let state = AsgBuilderState::<RandomState, SutIx> {
name: Some(pkg_name),
..Default::default()
};
let evs = vec![Ok(XmloEvent::SymDecl(
&sym,
sym,
SymAttrs {
pkg_name: Some(&pkg_name),
pkg_name: Some(pkg_name),
ty: Some(SymType::Meta),
..Default::default()
},
@ -632,14 +631,14 @@ mod test {
assert_eq!(
// `pkg_name` retained
&IdentObject::Ident(
&sym,
sym,
IdentKind::Meta,
Source {
pkg_name: Some(&pkg_name),
pkg_name: Some(pkg_name),
..Default::default()
},
),
sut.get(sut.lookup(&sym).unwrap()).unwrap(),
sut.get(sut.lookup(sym).unwrap()).unwrap(),
);
}
@ -647,10 +646,10 @@ mod test {
fn ident_kind_conversion_error_propagates() {
let mut sut = Sut::new();
let sym = symbol_dummy!(1, "sym");
let sym = "sym".intern();
let bad_attrs = SymAttrs::default();
let evs = vec![Ok(XmloEvent::SymDecl(&sym, bad_attrs))];
let evs = vec![Ok(XmloEvent::SymDecl(sym, bad_attrs))];
let result = sut
.import_xmlo(evs.into_iter(), SutState::new())
@ -663,11 +662,11 @@ mod test {
fn declare_extern_error_propagates() {
let mut sut = Sut::new();
let sym = symbol_dummy!(1, "sym");
let sym = "sym".intern();
let evs = vec![
Ok(XmloEvent::SymDecl(
&sym,
sym,
SymAttrs {
extern_: true,
ty: Some(SymType::Meta),
@ -676,7 +675,7 @@ mod test {
)),
// Incompatible
Ok(XmloEvent::SymDecl(
&sym,
sym,
SymAttrs {
extern_: true,
ty: Some(SymType::Map),
@ -696,11 +695,11 @@ mod test {
fn declare_error_propagates() {
let mut sut = Sut::new();
let sym = symbol_dummy!(1, "sym");
let sym = "sym".intern();
let evs = vec![
Ok(XmloEvent::SymDecl(
&sym,
sym,
SymAttrs {
ty: Some(SymType::Meta),
..Default::default()
@ -708,7 +707,7 @@ mod test {
)),
// Redeclare
Ok(XmloEvent::SymDecl(
&sym,
sym,
SymAttrs {
ty: Some(SymType::Meta),
..Default::default()
@ -727,29 +726,29 @@ mod test {
fn sets_fragment() {
let mut sut = Sut::new();
let sym = symbol_dummy!(1, "sym");
let sym = "sym".intern();
let frag = FragmentText::from("foo");
let evs = vec![
Ok(XmloEvent::SymDecl(
&sym,
sym,
SymAttrs {
ty: Some(SymType::Meta),
..Default::default()
},
)),
Ok(XmloEvent::Fragment(&sym, frag.clone())),
Ok(XmloEvent::Fragment(sym, frag.clone())),
];
let _ = sut.import_xmlo(evs.into_iter(), SutState::new()).unwrap();
let node = sut
.lookup(&sym)
.lookup(sym)
.expect("ident/fragment was not added to graph");
assert_eq!(
Some(&IdentObject::IdentFragment(
&sym,
sym,
IdentKind::Meta,
Default::default(),
frag
@ -762,17 +761,17 @@ mod test {
fn error_missing_ident_for_fragment() {
let mut sut = Sut::new();
let sym = symbol_dummy!(1, "sym");
let sym = "sym".intern();
// Note: missing `SymDecl`.
let evs = vec![Ok(XmloEvent::Fragment(&sym, "foo".into()))];
let evs = vec![Ok(XmloEvent::Fragment(sym, "foo".into()))];
let result = sut
.import_xmlo(evs.into_iter(), SutState::new())
.expect_err("expected error for fragment without ident");
assert_eq!(
AsgBuilderError::MissingFragmentIdent(sym.to_string()),
AsgBuilderError::MissingFragmentIdent(sym.lookup_str()),
result,
);
}
@ -781,12 +780,12 @@ mod test {
fn fragment_error_propagates() {
let mut sut = Sut::new();
let sym = symbol_dummy!(1, "sym");
let sym = "sym".intern();
let frag = FragmentText::from("foo");
let evs = vec![
Ok(XmloEvent::SymDecl(
&sym,
sym,
SymAttrs {
// Invalid fragment destination
extern_: true,
@ -794,7 +793,7 @@ mod test {
..Default::default()
},
)),
Ok(XmloEvent::Fragment(&sym, frag.clone())),
Ok(XmloEvent::Fragment(sym, frag.clone())),
];
let result = sut
@ -807,7 +806,7 @@ mod test {
));
let node = sut
.lookup(&sym)
.lookup(sym)
.expect("ident/fragment was not added to graph");
// The identifier should not have been modified on failure.
@ -821,14 +820,14 @@ mod test {
fn stops_at_eoh() {
let mut sut = Sut::new();
let pkg_name = symbol_dummy!(1, "pkg name");
let pkg_name = "pkg name".intern();
let evs = vec![
// Stop here.
Ok(XmloEvent::Eoh),
// Shouldn't make it to this one.
Ok(XmloEvent::Package(PackageAttrs {
name: Some(&pkg_name),
name: Some(pkg_name),
..Default::default()
})),
];

View File

@ -25,14 +25,14 @@
//! types of nodes present in the file.
//!
//! _Note that a "symbol" in the `xmlo` sense differs slightly from
//! [`Symbol`];_
//! [`SymbolId`];_
//! the former is more akin to an identifier.
//!
//! For more information on `xmlo` files,
//! see the [parent crate][super].A
//!
//! This reader will be used by both the compiler and linker,
//! and so its [`Symbol`] type is generalized.
//! and so its [`SymbolId`] type is generalized.
//!
//!
//! How To Use
@ -44,8 +44,6 @@
//! There is minor overhead incurred from parsing if the emitted events are
//! not used,
//! but it is quite minimal.
//! The only lifetime one has to worry about is the lifetime of the
//! [`Interner`] used to produce symbols.
//!
//! The next [`XmloEvent`] is retrieved using [`XmloReader::read_event`].
//! _You should stop reading at [`XmloEvent::Eoh`];_
@ -53,9 +51,10 @@
//!
//! ```
//! # fn main() -> Result<(), Box<dyn std::error::Error>> {
//! use tamer::obj::xmlo::{XmloEvent, XmloReader};
//! use tamer::global;
//! use tamer::ir::legacyir::SymType;
//! use tamer::sym::{DefaultPkgInterner, Interner};
//! use tamer::obj::xmlo::{XmloEvent, XmloReader};
//! use tamer::sym::GlobalSymbolIntern;
//!
//! let xmlo = br#"<package name="foo">
//! <preproc:symtable>
@ -79,8 +78,7 @@
//! </preproc:fragments>
//! </package>"#;
//!
//! let interner = DefaultPkgInterner::new();
//! let mut reader = XmloReader::new(xmlo as &[u8], &interner);
//! let mut reader = XmloReader::<_, global::PkgSymSize>::new(xmlo as &[u8]);
//!
//! let mut pkgname = None;
//! let mut syms = Vec::new();
@ -99,24 +97,24 @@
//! }
//! }
//!
//! assert_eq!(Some(interner.intern("foo")), pkgname);
//! assert_eq!(Some("foo".intern()), pkgname);
//!
//! assert_eq!(
//! vec![
//! (interner.intern("syma"), Some(SymType::Class)),
//! (interner.intern("symb"), Some(SymType::Cgen)),
//! ("syma".intern(), Some(SymType::Class)),
//! ("symb".intern(), Some(SymType::Cgen)),
//! ],
//! syms
//! );
//!
//! assert_eq!(
//! vec![
//! (interner.intern("syma"), vec![
//! interner.intern("depa-1"),
//! interner.intern("depa-2"),
//! ("syma".intern(), vec![
//! "depa-1".intern(),
//! "depa-2".intern(),
//! ]),
//! (interner.intern("symb"), vec![
//! interner.intern("depb-1"),
//! ("symb".intern(), vec![
//! "depb-1".intern(),
//! ]),
//! ],
//! deps
@ -124,8 +122,8 @@
//!
//! assert_eq!(
//! vec![
//! (interner.intern("syma"), "syma text".into()),
//! (interner.intern("symb"), "symb text".into()),
//! ("syma".intern(), "syma text".into()),
//! ("symb".intern(), "symb text".into()),
//! ],
//! fragments
//! );
@ -135,7 +133,10 @@
//! ```
use crate::ir::legacyir::{PackageAttrs, SymAttrs, SymType};
use crate::sym::{Interner, Symbol, SymbolIndexSize};
use crate::sym::{
GlobalSymbolInternUnchecked, GlobalSymbolResolve, SymbolId,
SymbolIndexSize, SymbolStr,
};
#[cfg(test)]
use crate::test::quick_xml::MockBytesStart as BytesStart;
#[cfg(test)]
@ -164,7 +165,7 @@ pub type XmloResult<T> = Result<T, XmloError>;
/// Wrapper around [`quick_xml::Reader`] for reading and parsing `xmlo`
/// object files.
///
/// This reader performs interning (see [`Interner`]) for data that is
/// This reader performs interning (see [crate::sym]) for data that is
/// expected to be duplicated or compared.
/// Other data are converted into more concise representations where
/// possible,
@ -175,10 +176,9 @@ pub type XmloResult<T> = Result<T, XmloError>;
///
/// See [module-level documentation](self) for more information and
/// examples.
pub struct XmloReader<'i, B, I, Ix>
pub struct XmloReader<B, Ix>
where
B: BufRead,
I: Interner<'i, Ix>,
Ix: SymbolIndexSize,
{
/// Source `xmlo` reader.
@ -193,9 +193,6 @@ where
/// TODO: It this worth removing? If not, remove this TODO.
sub_buffer: Vec<u8>,
/// String internment system.
interner: &'i I,
/// Whether the root has been validated.
///
/// This is used to ensure that we provide an error early on if we try
@ -206,17 +203,16 @@ where
///
/// This is known after processing the root `package` element,
/// provided that it's a proper root node.
pkg_name: Option<&'i Symbol<'i, Ix>>,
pkg_name: Option<SymbolId<Ix>>,
}
impl<'i, B, I, Ix> XmloReader<'i, B, I, Ix>
impl<B, Ix> XmloReader<B, Ix>
where
B: BufRead,
I: Interner<'i, Ix>,
Ix: SymbolIndexSize,
{
/// Construct a new reader.
pub fn new(reader: B, interner: &'i I) -> Self {
pub fn new(reader: B) -> Self {
let mut reader = XmlReader::from_reader(reader);
// xmlo files are compiler output and should be trusted
@ -227,7 +223,6 @@ where
// TODO: option to accept buffer
buffer: Vec::new(),
sub_buffer: Vec::new(),
interner,
seen_root: false,
pkg_name: None,
}
@ -254,7 +249,7 @@ where
/// See private methods for more information.
///
/// TODO: Augment failures with context
pub fn read_event<'a>(&mut self) -> XmloResult<XmloEvent<'i, Ix>> {
pub fn read_event<'a>(&mut self) -> XmloResult<XmloEvent<Ix>> {
let event = self.reader.read_event(&mut self.buffer)?;
// Ensure that the first encountered node is something we expect
@ -277,12 +272,12 @@ where
match event {
XmlEvent::Empty(ele) if ele.name() == b"preproc:sym" => {
Self::process_sym(&self.pkg_name, &ele, self.interner)
Self::process_sym(&self.pkg_name, &ele)
}
XmlEvent::Start(ele) => match ele.name() {
b"package" | b"lv:package" => {
let attrs = Self::process_package(&ele, self.interner)?;
let attrs = Self::process_package(&ele)?;
self.pkg_name = attrs.name;
@ -291,14 +286,12 @@ where
b"preproc:sym-dep" => Self::process_dep(
&ele,
self.interner,
&mut self.reader,
&mut self.sub_buffer,
),
b"preproc:fragment" => Self::process_fragment(
&ele,
self.interner,
&mut self.reader,
&mut self.sub_buffer,
),
@ -308,15 +301,13 @@ where
// source field information which we want to keep. (We
// don't care about `retmap` for our purposes.)
b"preproc:sym" => {
let mut event =
Self::process_sym(&self.pkg_name, &ele, self.interner)?;
let mut event = Self::process_sym(&self.pkg_name, &ele)?;
match &mut event {
XmloEvent::SymDecl(_, attrs)
if attrs.ty == Some(SymType::Map) =>
{
attrs.from = Some(Self::process_map_from(
self.interner,
&mut self.reader,
&mut self.sub_buffer,
)?);
@ -354,19 +345,17 @@ where
/// parsed.
fn process_package<'a>(
ele: &'a BytesStart<'a>,
interner: &'i I,
) -> XmloResult<PackageAttrs<'i, Ix>> {
) -> XmloResult<PackageAttrs<Ix>> {
let mut program = false;
let mut elig: Option<&'i Symbol<'i, Ix>> = None;
let mut name: Option<&'i Symbol<'i, Ix>> = None;
let mut elig: Option<SymbolId<Ix>> = None;
let mut name: Option<SymbolId<Ix>> = None;
let mut relroot: Option<String> = None;
for attr in ele.attributes().with_checks(false).filter_map(Result::ok) {
match attr.key {
b"name" => {
name = Some(unsafe {
interner.intern_utf8_unchecked(&attr.value)
});
name =
Some(unsafe { (&attr.value).intern_utf8_unchecked() });
}
b"__rootpath" => {
@ -380,9 +369,8 @@ where
}
b"preproc:elig-class-yields" => {
elig = Some(unsafe {
interner.intern_utf8_unchecked(&attr.value)
});
elig =
Some(unsafe { (&attr.value).intern_utf8_unchecked() });
}
_ => (),
@ -412,25 +400,21 @@ where
/// ======
/// - [`XmloError::UnassociatedSym`] if missing `preproc:sym/@name`.
fn process_sym<'a>(
pkg_name: &Option<&'i Symbol<'i, Ix>>,
pkg_name: &Option<SymbolId<Ix>>,
ele: &'a BytesStart<'a>,
interner: &'i I,
) -> XmloResult<XmloEvent<'i, Ix>> {
let mut name: Option<&'i Symbol<'i, Ix>> = None;
) -> XmloResult<XmloEvent<Ix>> {
let mut name: Option<SymbolId<Ix>> = None;
let mut sym_attrs = SymAttrs::default();
for attr in ele.attributes().with_checks(false).filter_map(Result::ok) {
match attr.key {
b"name" => {
name = Some(unsafe {
interner.intern_utf8_unchecked(&attr.value)
});
name = Some(unsafe { attr.value.intern_utf8_unchecked() });
}
b"src" => {
sym_attrs.src = Some(unsafe {
interner.intern_utf8_unchecked(&attr.value)
});
sym_attrs.src =
Some(unsafe { attr.value.intern_utf8_unchecked() });
}
b"type" => {
@ -464,15 +448,13 @@ where
}
b"parent" => {
sym_attrs.parent = Some(unsafe {
interner.intern_utf8_unchecked(&attr.value)
});
sym_attrs.parent =
Some(unsafe { attr.value.intern_utf8_unchecked() });
}
b"yields" => {
sym_attrs.yields = Some(unsafe {
interner.intern_utf8_unchecked(&attr.value)
});
sym_attrs.yields =
Some(unsafe { attr.value.intern_utf8_unchecked() });
}
b"desc" => {
@ -513,10 +495,9 @@ where
/// data (e.g. elements) are encountered.
/// - [`XmloError::XmlError`] on XML parsing failure.
fn process_map_from<'a>(
interner: &'i I,
reader: &mut XmlReader<B>,
buffer: &mut Vec<u8>,
) -> XmloResult<Vec<&'i Symbol<'i, Ix>>> {
) -> XmloResult<Vec<SymbolId<Ix>>> {
let mut froms = Vec::new();
loop {
@ -533,8 +514,7 @@ where
)),
|attr| {
Ok(unsafe {
interner
.intern_utf8_unchecked(&attr.value)
attr.value.intern_utf8_unchecked()
})
},
)?,
@ -566,7 +546,7 @@ where
/// ```
///
/// This function will read any number of `preproc:sym-ref` nodes and
/// produce a single [`XmloEvent::SymDeps`] containing a [`Symbol`]
/// produce a single [`XmloEvent::SymDeps`] containing a [`SymbolId`]
/// for `preproc:sym-dep/@name` and for each `preproc:sym-ref/@name`.
///
/// Errors
@ -577,17 +557,16 @@ where
/// - [`XmloError::XmlError`] on XML parsing failure.
fn process_dep<'a>(
ele: &'a BytesStart<'a>,
interner: &'i I,
reader: &mut XmlReader<B>,
buffer: &mut Vec<u8>,
) -> XmloResult<XmloEvent<'i, Ix>> {
) -> XmloResult<XmloEvent<Ix>> {
let name = ele
.attributes()
.with_checks(false)
.filter_map(Result::ok)
.find(|attr| attr.key == b"name")
.map_or(Err(XmloError::UnassociatedSymDep), |attr| {
Ok(unsafe { interner.intern_utf8_unchecked(&attr.value) })
Ok(unsafe { attr.value.intern_utf8_unchecked() })
})?;
let mut deps = Vec::new();
@ -609,7 +588,7 @@ where
)),
|attr| {
Ok(unsafe {
interner.intern_utf8_unchecked(&attr.value)
attr.value.intern_utf8_unchecked()
})
},
)?,
@ -625,7 +604,7 @@ where
_ => return Err(XmloError::MalformedSymRef(format!(
"preproc:sym-dep must contain only preproc:sym-ref children for `{}`",
name,
name.lookup_str(),
)))
}
}
@ -647,10 +626,9 @@ where
/// - [`XmloError::XmlError`] for XML parsing errors.
fn process_fragment<'a>(
ele: &'a BytesStart<'a>,
interner: &'i I,
reader: &mut XmlReader<B>,
buffer: &mut Vec<u8>,
) -> XmloResult<XmloEvent<'i, Ix>> {
) -> XmloResult<XmloEvent<Ix>> {
let mut src_attrs = ele.attributes();
let mut filtered = src_attrs.with_checks(false).filter_map(Result::ok);
@ -658,7 +636,7 @@ where
.find(|attr| attr.key == b"id")
.filter(|attr| &*attr.value != b"")
.map_or(Err(XmloError::UnassociatedFragment), |attr| {
Ok(unsafe { interner.intern_utf8_unchecked(&attr.value) })
Ok(unsafe { attr.value.intern_utf8_unchecked() })
})?;
let text =
@ -666,7 +644,7 @@ where
.read_text(ele.name(), buffer)
.map_err(|err| match err {
InnerXmlError::TextNotFound => {
XmloError::MissingFragmentText(id.to_string())
XmloError::MissingFragmentText(id.lookup_str())
}
_ => err.into(),
})?;
@ -694,13 +672,12 @@ where
}
}
impl<'i, B, I, Ix> Iterator for XmloReader<'i, B, I, Ix>
impl<B, Ix> Iterator for XmloReader<B, Ix>
where
B: BufRead,
I: Interner<'i, Ix>,
Ix: SymbolIndexSize,
{
type Item = XmloResult<XmloEvent<'i, Ix>>;
type Item = XmloResult<XmloEvent<Ix>>;
/// Invoke [`XmloReader::read_event`] and yield the result via an
/// [`Iterator`] API.
@ -716,14 +693,13 @@ where
}
}
impl<'i, B, I, Ix> From<(B, &'i I)> for XmloReader<'i, B, I, Ix>
impl<B, Ix> From<B> for XmloReader<B, Ix>
where
B: BufRead,
I: Interner<'i, Ix>,
Ix: SymbolIndexSize,
{
fn from(args: (B, &'i I)) -> Self {
Self::new(args.0, args.1)
fn from(buf: B) -> Self {
Self::new(buf)
}
}
@ -737,24 +713,24 @@ where
/// we should instead prefer not to put data into object files that won't
/// be useful and can't be easily skipped without parsing.
#[derive(Debug, PartialEq, Eq)]
pub enum XmloEvent<'i, Ix: SymbolIndexSize> {
pub enum XmloEvent<Ix: SymbolIndexSize> {
/// Package declaration.
///
/// This contains data gathered from the root `lv:package` node.
Package(PackageAttrs<'i, Ix>),
Package(PackageAttrs<Ix>),
/// Symbol declaration.
///
/// This represents an entry in the symbol table,
/// which includes a symbol along with its variable metadata as
/// [`SymAttrs`].
SymDecl(&'i Symbol<'i, Ix>, SymAttrs<'i, Ix>),
SymDecl(SymbolId<Ix>, SymAttrs<Ix>),
/// Dependencies of a given symbol.
///
/// Note that, for simplicity, an owned vector is returned rather than a
/// slice into an internal buffer.
SymDeps(&'i Symbol<'i, Ix>, Vec<&'i Symbol<'i, Ix>>),
SymDeps(SymbolId<Ix>, Vec<SymbolId<Ix>>),
/// Text (compiled code) fragment for a given symbol.
///
@ -763,7 +739,7 @@ pub enum XmloEvent<'i, Ix: SymbolIndexSize> {
/// Given that fragments can be quite large,
/// a caller not interested in these data should choose to skip
/// fragments entirely rather than simply ignoring fragment events.
Fragment(&'i Symbol<'i, Ix>, String),
Fragment(SymbolId<Ix>, String),
/// End-of-header.
///
@ -806,7 +782,7 @@ pub enum XmloError {
/// A `preproc:fragment` element was found, but is missing `@id`.
UnassociatedFragment,
/// A `preproc:fragment` element was found, but is missing `text()`.
MissingFragmentText(String),
MissingFragmentText(SymbolStr<'static>),
}
impl From<InnerXmlError> for XmloError {

View File

@ -18,27 +18,27 @@
// along with this program. If not, see <http://www.gnu.org/licenses/>.
use super::*;
use crate::global;
use crate::ir::legacyir::{SymDtype, SymType};
use crate::sym::DefaultInterner;
use crate::sym::GlobalSymbolIntern;
use crate::test::quick_xml::*;
type Sut<'i, B, I> = XmloReader<'i, B, I, u16>;
type Sut<B> = XmloReader<B, global::PkgIdentExprSize>;
macro_rules! xmlo_tests {
($(fn $fn:ident($sut:ident, $interner:ident) $body:block)*) => {
($(fn $fn:ident($sut:ident) $body:block)*) => {
$(
#[test]
fn $fn() -> XmloResult<()> {
let stub_data: &[u8] = &[];
let $interner = DefaultInterner::new();
#[allow(unused_mut)]
let mut $sut = Sut::new(stub_data, &$interner);
let mut $sut = Sut::new(stub_data);
// We don't want to have to output a proper root node
// for every one of our tests.
$sut.seen_root = true;
$sut.pkg_name = Some($interner.intern("pkg/name"));
$sut.pkg_name = Some("pkg/name".intern());
$body;
@ -49,11 +49,11 @@ macro_rules! xmlo_tests {
}
xmlo_tests! {
fn sets_parsing_options(sut, interner) {
fn sets_parsing_options(sut) {
assert_eq!(Some(false), sut.reader.check_end);
}
fn proxies_xml_failures(sut, interner) {
fn proxies_xml_failures(sut) {
sut.reader.next_event =
Some(Box::new(|_, _| Err(InnerXmlError::UnexpectedEof("test".into()))));
@ -63,7 +63,7 @@ xmlo_tests! {
}
}
fn sym_fails_without_name(sut, interner) {
fn sym_fails_without_name(sut) {
sut.reader.next_event = Some(Box::new(|_, _| {
Ok(XmlEvent::Start(MockBytesStart::new(
b"preproc:sym",
@ -77,7 +77,7 @@ xmlo_tests! {
}
}
fn fails_on_invalid_root(sut, interner) {
fn fails_on_invalid_root(sut) {
// xmlo_tests macro sets this for us, so we need to clear it to
// be able to perform the check
sut.seen_root = false;
@ -95,7 +95,7 @@ xmlo_tests! {
}
}
fn recognizes_valid_roots(sut, interner) {
fn recognizes_valid_roots(sut) {
// xmlo_tests macro sets this for us, so we need to clear it to
// be able to perform the check
sut.seen_root = false;
@ -125,7 +125,7 @@ xmlo_tests! {
sut.read_event()?;
}
fn package_event_program(sut, interner) {
fn package_event_program(sut) {
sut.reader.next_event = Some(Box::new(|_, _| {
Ok(XmlEvent::Start(MockBytesStart::new(
b"package",
@ -143,14 +143,14 @@ xmlo_tests! {
assert_eq!(
XmloEvent::Package(PackageAttrs {
program: true,
elig: Some(interner.intern("eligClassYields")),
elig: Some("eligClassYields".intern()),
..Default::default()
}),
result
);
}
fn package_event_nonprogram(sut, interner) {
fn package_event_nonprogram(sut) {
sut.reader.next_event = Some(Box::new(|_, _| {
Ok(XmlEvent::Start(MockBytesStart::new(
b"package",
@ -169,7 +169,7 @@ xmlo_tests! {
);
}
fn package_event_name(sut, interner) {
fn package_event_name(sut) {
sut.reader.next_event = Some(Box::new(|_, _| {
Ok(XmlEvent::Start(MockBytesStart::new(
b"package",
@ -184,7 +184,7 @@ xmlo_tests! {
assert_eq!(
XmloEvent::Package(PackageAttrs {
name: Some(interner.intern("pkg/name")),
name: Some("pkg/name".intern()),
relroot: Some("../../".into()),
program: false,
..Default::default()
@ -193,7 +193,7 @@ xmlo_tests! {
);
}
fn sym_dep_event(sut, interner) {
fn sym_dep_event(sut) {
sut.reader.next_event = Some(Box::new(|_, event_i| match event_i {
0 => Ok(XmlEvent::Start(MockBytesStart::new(
b"preproc:sym-dep",
@ -223,14 +223,14 @@ xmlo_tests! {
assert_eq!(
XmloEvent::SymDeps(
interner.intern("depsym"),
vec![interner.intern("dep1"), interner.intern("dep2")]
"depsym".intern(),
vec!["dep1".intern(), "dep2".intern()]
),
result
);
}
fn sym_dep_fails_with_missing_name(sut, interner) {
fn sym_dep_fails_with_missing_name(sut) {
sut.reader.next_event = Some(Box::new(|_, _| {
Ok(XmlEvent::Start(MockBytesStart::new(
b"preproc:sym-dep",
@ -244,7 +244,7 @@ xmlo_tests! {
}
}
fn sym_dep_malformed_ref_missing_name(sut, interner) {
fn sym_dep_malformed_ref_missing_name(sut) {
sut.reader.next_event = Some(Box::new(|_, event_i| match event_i {
0 => Ok(XmlEvent::Start(MockBytesStart::new(
b"preproc:sym-dep",
@ -270,7 +270,7 @@ xmlo_tests! {
}
}
fn sym_dep_malformed_ref_unexpected_element(sut, interner) {
fn sym_dep_malformed_ref_unexpected_element(sut) {
sut.reader.next_event = Some(Box::new(|_, event_i| match event_i {
0 => Ok(XmlEvent::Start(MockBytesStart::new(
b"preproc:sym-dep",
@ -303,7 +303,7 @@ xmlo_tests! {
assert_eq!(3, sut.reader.event_i, "Did not ignore Text");
}
fn eoh_after_fragments(sut, interner) {
fn eoh_after_fragments(sut) {
sut.reader.next_event = Some(Box::new(|_, _| {
Ok(XmlEvent::End(MockBytesEnd::new(b"preproc:fragments")))
}));
@ -313,7 +313,7 @@ xmlo_tests! {
assert_eq!(XmloEvent::Eoh, result);
}
fn fragment_event(sut, interner) {
fn fragment_event(sut) {
let expected = "fragment text".to_string();
sut.reader.next_text = Some(Ok(expected.clone()));
@ -329,7 +329,7 @@ xmlo_tests! {
let result = sut.read_event()?;
assert_eq!(
XmloEvent::Fragment(interner.intern("fragsym"), expected),
XmloEvent::Fragment("fragsym".intern(), expected),
result
);
@ -340,7 +340,7 @@ xmlo_tests! {
);
}
fn fragment_fails_with_missing_id(sut, interner) {
fn fragment_fails_with_missing_id(sut) {
sut.reader.next_event = Some(Box::new(|_, _| {
Ok(XmlEvent::Start(MockBytesStart::new(
b"preproc:fragment",
@ -355,7 +355,7 @@ xmlo_tests! {
}
// Yes, this happened.
fn fragment_fails_with_empty_id(sut, interner) {
fn fragment_fails_with_empty_id(sut) {
sut.reader.next_event = Some(Box::new(|_, _| {
Ok(XmlEvent::Start(MockBytesStart::new(
b"preproc:fragment",
@ -371,7 +371,7 @@ xmlo_tests! {
}
}
fn fragment_fails_with_missing_text(sut, interner) {
fn fragment_fails_with_missing_text(sut) {
sut.reader.next_text = Some(Err(InnerXmlError::TextNotFound));
sut.reader.next_event = Some(Box::new(|_, _| {
@ -385,13 +385,13 @@ xmlo_tests! {
match sut.read_event() {
Err(XmloError::MissingFragmentText(symname)) => {
assert_eq!("fragsym".to_string(), symname)
assert_eq!("fragsym", symname)
}
bad => panic!("expected XmloError: {:?}", bad),
}
}
fn skips_unneeded_nodes(sut, interner) {
fn skips_unneeded_nodes(sut) {
sut.reader.next_event = Some(Box::new(|_, event_i| match event_i {
// Skip over this
0 => Ok(XmlEvent::End(MockBytesEnd::new(
@ -421,9 +421,9 @@ xmlo_tests! {
assert_eq!(
XmloEvent::SymDecl(
interner.intern("sym-expected"),
"sym-expected".intern(),
SymAttrs {
pkg_name: Some(interner.intern("pkg/name")),
pkg_name: Some("pkg/name".intern()),
..Default::default()
},
),
@ -434,7 +434,7 @@ xmlo_tests! {
// Some preproc:sym nodes have children (`func` symbols,
// specifically) that we choose to ignore. See next test for
// data we do care about.
fn sym_nonempty_element(sut, interner) {
fn sym_nonempty_element(sut) {
sut.reader.next_event = Some(Box::new(|_, _| {
// Notice Start, not Empty
Ok(XmlEvent::Start(MockBytesStart::new(
@ -455,10 +455,10 @@ xmlo_tests! {
assert_eq!(
XmloEvent::SymDecl(
interner.intern("sym-nonempty"),
"sym-nonempty".intern(),
SymAttrs {
dim: Some(2),
pkg_name: Some(interner.intern("pkg/name")),
pkg_name: Some("pkg/name".intern()),
..Default::default()
},
),
@ -473,7 +473,7 @@ xmlo_tests! {
// `map` symbols include information about their source
// fields.
fn sym_map_from(sut, interner) {
fn sym_map_from(sut) {
sut.reader.next_event = Some(Box::new(|_, event_i| match event_i {
// Notice Start, not Empty
0 => Ok(XmlEvent::Start(MockBytesStart::new(
@ -524,14 +524,14 @@ xmlo_tests! {
assert_eq!(
XmloEvent::SymDecl(
interner.intern("sym-map-from"),
"sym-map-from".intern(),
SymAttrs {
ty: Some(SymType::Map),
from: Some(vec![
interner.intern("from-a"),
interner.intern("from-b"),
"from-a".intern(),
"from-b".intern(),
]),
pkg_name: Some(interner.intern("pkg/name")),
pkg_name: Some("pkg/name".intern()),
..Default::default()
},
),
@ -542,7 +542,7 @@ xmlo_tests! {
assert_eq!(None, sut.reader.read_to_end_name);
}
fn sym_map_from_missing_name(sut, interner) {
fn sym_map_from_missing_name(sut) {
sut.reader.next_event = Some(Box::new(|_, event_i| match event_i {
// Notice Start, not Empty
0 => Ok(XmlEvent::Start(MockBytesStart::new(
@ -580,7 +580,7 @@ xmlo_tests! {
}
}
fn sym_map_from_unexpected_data(sut, interner) {
fn sym_map_from_unexpected_data(sut) {
sut.reader.next_event = Some(Box::new(|_, event_i| match event_i {
// Notice Start, not Empty
0 => Ok(XmlEvent::Start(MockBytesStart::new(
@ -612,7 +612,7 @@ xmlo_tests! {
}
}
fn read_events_via_iterator(sut, interner) {
fn read_events_via_iterator(sut) {
sut.reader.next_event = Some(Box::new(|_, _| {
Ok(XmlEvent::Start(MockBytesStart::new(
b"package",
@ -633,7 +633,7 @@ xmlo_tests! {
}
macro_rules! sym_test_reader_event {
($sut:ident, $interner:ident, $name:ident, $($key:ident=$val:literal),*) => {
($sut:ident, $name:ident, $($key:ident=$val:literal),*) => {
// See xmlo_tests macro for explanation
$sut.seen_root = true;
@ -676,24 +676,23 @@ macro_rules! sym_test_reader_event {
}
macro_rules! sym_tests {
(($interner:ident) $($name:ident: [$($key:ident=$val:literal),*] => $expect:expr)*) => {
($($name:ident: [$($key:ident=$val:literal),*] => $expect:expr)*) => {
$(
#[test]
fn $name() -> XmloResult<()> {
let stub_data: &[u8] = &[];
let $interner = DefaultInterner::new();
let mut sut = Sut::new(stub_data, &$interner);
let mut sut = Sut::new(stub_data);
sym_test_reader_event!(sut, $interner, $name, $( $key=$val ),*);
sym_test_reader_event!(sut, $name, $( $key=$val ),*);
let result = sut.read_event()?;
let mut expected_attrs = $expect;
expected_attrs.pkg_name = Some($interner.intern("pkg/name"));
expected_attrs.pkg_name = Some("pkg/name".intern());
assert_eq!(
XmloEvent::SymDecl(
$interner.intern(stringify!($name)),
stringify!($name).intern(),
expected_attrs
),
result
@ -705,13 +704,11 @@ macro_rules! sym_tests {
}
sym_tests! {
(interner)
src: [src="foo/bar/baz"] => SymAttrs {
// see macro for src relpath
src: Some(interner.intern("foo/bar/baz")),
..Default::default()
}
src: [src="foo/bar/baz"] => SymAttrs {
// see macro for src relpath
src: Some("foo/bar/baz".intern()),
..Default::default()
}
// note that this doesn't test every type; we're not going to
// duplicate the mapping for all of them here
@ -764,12 +761,12 @@ sym_tests! {
}
parent: [parent="foo"] => SymAttrs {
parent: Some(interner.intern("foo")),
parent: Some("foo".intern()),
..Default::default()
}
yields: [yields="yield"] => SymAttrs {
yields: Some(interner.intern("yield")),
yields: Some("yield".intern()),
..Default::default()
}
@ -792,7 +789,7 @@ sym_tests! {
multi: [src="foo", type="class", dim="1", dtype="float", extern="true"]
=> SymAttrs {
// see macro for src relpath
src: Some(interner.intern("foo")),
src: Some("foo".intern()),
ty: Some(SymType::Class),
dim: Some(1),
dtype: Some(SymDtype::Float),
@ -805,12 +802,11 @@ sym_tests! {
#[test]
fn generated_true() -> XmloResult<()> {
let stub_data: &[u8] = &[];
let interner = DefaultInterner::new();
let mut sut = Sut::new(stub_data, &interner);
let mut sut = Sut::new(stub_data);
// See xmlo_tests macro for explanation
sut.seen_root = true;
sut.pkg_name = Some(interner.intern("pkg/name"));
sut.pkg_name = Some("pkg/name".intern());
sut.reader.next_event = Some(Box::new(|_, _| {
Ok(XmlEvent::Empty(MockBytesStart::new(
@ -826,12 +822,12 @@ fn generated_true() -> XmloResult<()> {
let expected_attrs = SymAttrs {
generated: true,
pkg_name: Some(interner.intern("pkg/name")),
pkg_name: Some("pkg/name".intern()),
..Default::default()
};
assert_eq!(
XmloEvent::SymDecl(interner.intern("generated_true"), expected_attrs),
XmloEvent::SymDecl("generated_true".intern(), expected_attrs),
result
);
@ -841,10 +837,9 @@ fn generated_true() -> XmloResult<()> {
#[test]
fn fails_on_non_ascii_dim() {
let stub_data: &[u8] = &[];
let interner = DefaultInterner::new();
let mut sut = Sut::new(stub_data, &interner);
let mut sut = Sut::new(stub_data);
sym_test_reader_event!(sut, interner, fail_sym, dim = "X1");
sym_test_reader_event!(sut, fail_sym, dim = "X1");
match sut.read_event() {
Err(XmloError::InvalidDim(msg)) => assert!(msg.contains("X1")),
@ -855,10 +850,9 @@ fn fails_on_non_ascii_dim() {
#[test]
fn fails_on_multi_char_dim() {
let stub_data: &[u8] = &[];
let interner = DefaultInterner::new();
let mut sut = Sut::new(stub_data, &interner);
let mut sut = Sut::new(stub_data);
sym_test_reader_event!(sut, interner, fail_sym, dim = "11");
sym_test_reader_event!(sut, fail_sym, dim = "11");
match sut.read_event() {
Err(XmloError::InvalidDim(msg)) => assert!(msg.contains("11")),
@ -869,10 +863,9 @@ fn fails_on_multi_char_dim() {
#[test]
fn fails_on_invalid_type() {
let stub_data: &[u8] = &[];
let interner = DefaultInterner::new();
let mut sut = Sut::new(stub_data, &interner);
let mut sut = Sut::new(stub_data);
sym_test_reader_event!(sut, interner, fail_sym, type = "foo");
sym_test_reader_event!(sut, fail_sym, type = "foo");
match sut.read_event() {
Err(XmloError::InvalidType(msg)) => assert!(msg.contains("foo")),
@ -883,10 +876,9 @@ fn fails_on_invalid_type() {
#[test]
fn fails_on_invalid_dtype() {
let stub_data: &[u8] = &[];
let interner = DefaultInterner::new();
let mut sut = Sut::new(stub_data, &interner);
let mut sut = Sut::new(stub_data);
sym_test_reader_event!(sut, interner, fail_sym, dtype = "foo");
sym_test_reader_event!(sut, fail_sym, dtype = "foo");
match sut.read_event() {
Err(XmloError::InvalidDtype(msg)) => assert!(msg.contains("foo")),
@ -897,10 +889,9 @@ fn fails_on_invalid_dtype() {
#[test]
fn fails_when_missing_sym_name() {
let stub_data: &[u8] = &[];
let interner = DefaultInterner::new();
let mut sut = Sut::new(stub_data, &interner);
let mut sut = Sut::new(stub_data);
sym_test_reader_event!(sut, interner, fail_sym, dtype = "foo");
sym_test_reader_event!(sut, fail_sym, dtype = "foo");
match sut.read_event() {
Err(XmloError::InvalidDtype(msg)) => assert!(msg.contains("foo")),

View File

@ -20,8 +20,55 @@
//! Interners used to intern values as symbols.
//!
//! See the [parent module](super) for more information.
//!
//!
//! Using Interners Directly (Without Global State)
//! ===============================================
//! Please do not do this unless you have a compelling use case and know
//! what you are doing,
//! including understanding how to mitigate mixing of [`SymbolId`]s,
//! such as with newtypes or encapsulation.
//! Otherwise,
//! use the global interners instead,
//! as documented in the [parent module](super).
//!
//! ```
//! use tamer::sym::{Interner, DefaultPkgInterner, SymbolId};
//!
//! // Inputs to be interned
//! let a = "foo";
//! let b = &"foo".to_string();
//! let c = "foobar";
//! let d = &c[0..3];
//!
//! // Interners employ interior mutability and so do not need to be
//! // declared `mut`
//! let interner = DefaultPkgInterner::new();
//!
//! let (ia, ib, ic, id) = (
//! interner.intern(a),
//! interner.intern(b),
//! interner.intern(c),
//! interner.intern(d),
//! );
//!
//! assert_eq!(ia, ib);
//! assert_eq!(ia, id);
//! assert_eq!(ib, id);
//! assert_ne!(ia, ic);
//!
//! // Only "foo" and "foobar" are interned
//! assert_eq!(2, interner.len());
//! assert!(interner.contains("foo"));
//! assert!(interner.contains("foobar"));
//! assert!(!interner.contains("something else"));
//!
//! // Symbols can also be looked up by index.
//! assert_eq!("foo", interner.index_lookup(ia).unwrap());
//! ```
use super::{Symbol, SymbolId, SymbolIndexSize};
use super::symbol::SymbolStr;
use super::{SymbolId, SymbolIndexSize};
use crate::global;
use bumpalo::Bump;
use fxhash::FxBuildHasher;
@ -31,71 +78,64 @@ use std::convert::{TryFrom, TryInto};
use std::fmt::Debug;
use std::hash::BuildHasher;
/// Create, store, compare, and retrieve [`Symbol`] values.
/// Create, store, compare, and retrieve interned values.
///
/// Interners accept string slices and produce values of type [`Symbol`].
/// A reference to the same [`Symbol`] will always be returned for a given
/// string,
/// allowing symbols to be compared for equality cheaply by comparing
/// pointers.
/// Symbol locations in memory are fixed for the lifetime of the interner.
/// Interners accept string slices and produce values of type [`SymbolId`].
/// The same [`SymbolId`] will always be returned for a given string,
/// allowing symbols to be compared for equality cheaply by comparing
/// integers.
/// Symbol locations in memory are fixed for the lifetime of the interner,
/// and can be retrieved as [`SymbolStr`] using
/// [`index_lookup`](Interner::index_lookup).
///
/// If you care whether a value has been interned yet or not,
/// see [`intern_soft`][Interner::intern_soft`] and
/// [`contains`](Interner::contains).
///
/// See the [module-level documentation](self) for an example.
/// For interfaces to the global interners that indirectly use these
/// methods,
/// see the [parent module](super).
pub trait Interner<'i, Ix: SymbolIndexSize> {
/// Intern a string slice or return an existing [`Symbol`].
/// Intern a string slice or return an existing [`SymbolId`].
///
/// If the provided string has already been interned,
/// then a reference to the existing [`Symbol`] will be returned.
/// then an existing [`SymbolId`] will be returned.
/// Otherwise,
/// the string will be interned and a new [`Symbol`] created.
///
/// The lifetime of the returned symbol is bound to the lifetime of the
/// underlying intern pool.
/// the string will be interned and a new [`SymbolId`] allocated.
///
/// To retrieve an existing symbol _without_ interning,
/// see [`intern_soft`](Interner::intern_soft).
fn intern(&'i self, value: &str) -> &'i Symbol<'i, Ix>;
fn intern(&self, value: &str) -> SymbolId<Ix>;
/// Retrieve an existing intern for the string slice `s`.
/// Retrieve an existing intern for the provided string slice.
///
/// Unlike [`intern`](Interner::intern),
/// this will _not_ intern the string if it has not already been
/// interned.
fn intern_soft(&'i self, value: &str) -> Option<&'i Symbol<'i, Ix>>;
fn intern_soft(&self, value: &str) -> Option<SymbolId<Ix>>;
/// Determine whether the given value has already been interned.
///
/// This is equivalent to `intern_soft(value).is_some()`.
fn contains(&self, value: &str) -> bool;
/// Number of interned strings.
/// Number of interned strings in this interner's pool.
///
/// This count will increase each time a unique string is interned.
/// It does not increase when a string is already interned.
fn len(&self) -> usize;
/// Look up a previously interned [`Symbol`] by its [`SymbolId`].
/// Look up a symbol's string value by its [`SymbolId`].
///
/// This will always return a [`Symbol`] as long as the provided `index`
/// represents a symbol interned with this interner.
/// This will always return a [`SymbolStr`] as long as the provided
/// `index` represents a symbol interned with this interner.
/// If the index is not found,
/// the result is [`None`].
///
/// This method is most useful when storing [`Symbol`] is not possible
/// or desirable.
/// For example,
/// borrowed [`Symbol`] references require lifetimes,
/// whereas [`SymbolId`] is both owned _and_ [`Copy`].
/// [`SymbolId`] is also much smaller than [`Symbol`].
fn index_lookup(
&'i self,
index: SymbolId<Ix>,
) -> Option<&'i Symbol<'i, Ix>>;
fn index_lookup(&'i self, index: SymbolId<Ix>) -> Option<SymbolStr<'i>>;
/// Intern an assumed-UTF8 slice of bytes or return an existing
/// [`Symbol`].
/// [`SymbolId`].
///
/// Safety
/// ======
@ -106,20 +146,17 @@ pub trait Interner<'i, Ix: SymbolIndexSize> {
/// (such as [object files][]).
///
/// [object files]: crate::obj
unsafe fn intern_utf8_unchecked(
&'i self,
value: &[u8],
) -> &'i Symbol<'i, Ix> {
unsafe fn intern_utf8_unchecked(&self, value: &[u8]) -> SymbolId<Ix> {
self.intern(std::str::from_utf8_unchecked(value))
}
}
/// An interner backed by an [arena](bumpalo).
///
/// Since interns exist until the interner itself is freed,
/// Since all symbols exist until the interner itself is freed,
/// an arena is a much more efficient and appropriate memory allocation
/// strategy.
/// This further provides a stable location in memory for symbol data.
/// This also provides a stable location in memory for symbol data.
///
/// For the recommended configuration,
/// see [`DefaultInterner`].
@ -131,22 +168,22 @@ where
S: BuildHasher + Default,
Ix: SymbolIndexSize,
{
/// String and [`Symbol`] storage.
/// Storage for interned strings.
arena: Bump,
/// Symbol references by index.
///
/// This vector enables looking up a [`Symbol`] using its
/// [`SymbolId`].
/// Interned strings by [`SymbolId`].
///
/// The first index must always be populated during initialization to
/// ensure that [`SymbolId`] will never be `0`.
indexes: RefCell<Vec<&'i Symbol<'i, Ix>>>,
/// Map of interned strings to their respective [`Symbol`].
///
/// Both strings and symbols are allocated within `arena`.
map: RefCell<HashMap<&'i str, &'i Symbol<'i, Ix>, S>>,
/// These string slices are stored in `arena`.
strings: RefCell<Vec<&'i str>>,
/// Map of interned strings to their respective [`SymbolId`].
///
/// This allows us to determine whether a string has already been
/// interned and, if so, to return its corresponding symbol.
map: RefCell<HashMap<&'i str, SymbolId<Ix>, S>>,
}
impl<'i, S, Ix> ArenaInterner<'i, S, Ix>
@ -179,14 +216,14 @@ where
/// [consistent]: https://en.wikipedia.org/wiki/Consistent_hashing
#[inline]
pub fn with_capacity(capacity: usize) -> Self {
let mut indexes = Vec::<&'i Symbol<'i, Ix>>::with_capacity(capacity);
let mut strings = Vec::<_>::with_capacity(capacity);
// The first index is not used since SymbolId cannot be 0.
indexes.push(Ix::dummy_sym());
strings.push("");
Self {
arena: Bump::new(),
indexes: RefCell::new(indexes),
strings: RefCell::new(strings),
map: RefCell::new(HashMap::with_capacity_and_hasher(
capacity,
Default::default(),
@ -201,14 +238,14 @@ where
Ix: SymbolIndexSize,
<Ix as TryFrom<usize>>::Error: Debug,
{
fn intern(&'i self, value: &str) -> &'i Symbol<'i, Ix> {
fn intern(&self, value: &str) -> SymbolId<Ix> {
let mut map = self.map.borrow_mut();
if let Some(sym) = map.get(value) {
return sym;
return *sym;
}
let mut syms = self.indexes.borrow_mut();
let mut syms = self.strings.borrow_mut();
let next_index: Ix = syms
.len()
@ -227,18 +264,14 @@ where
) as *const str)
};
// Symbols are also stored within the arena, adjacent to the
// string. This ensures that both have stable locations in memory.
let sym: &'i Symbol<'i, Ix> = self.arena.alloc(Symbol::new(id, clone));
map.insert(clone, id);
syms.push(clone);
map.insert(clone, sym);
syms.push(sym);
sym
id
}
#[inline]
fn intern_soft(&'i self, value: &str) -> Option<&'i Symbol<'i, Ix>> {
fn intern_soft(&self, value: &str) -> Option<SymbolId<Ix>> {
self.map.borrow().get(value).map(|sym| *sym)
}
@ -252,11 +285,11 @@ where
self.map.borrow().len()
}
fn index_lookup(
&'i self,
index: SymbolId<Ix>,
) -> Option<&'i Symbol<'i, Ix>> {
self.indexes.borrow().get(index.as_usize()).map(|sym| *sym)
fn index_lookup(&'i self, index: SymbolId<Ix>) -> Option<SymbolStr<'i>> {
self.strings
.borrow()
.get(index.as_usize())
.map(|str| SymbolStr::from_interned_slice(*str))
}
}
@ -297,6 +330,8 @@ pub type DefaultPkgInterner<'i> = DefaultInterner<'i, global::PkgSymSize>;
/// a large number of packages in a program simultaneously.
pub type DefaultProgInterner<'i> = DefaultInterner<'i, global::ProgSymSize>;
// Note that these tests assert on standalone interners, not on the globals;
// see the `global` sibling package for those tests.
#[cfg(test)]
mod test {
use super::*;
@ -316,16 +351,8 @@ mod test {
(sut.intern(a), sut.intern(&b), sut.intern(c), sut.intern(&d));
assert_eq!(ia, ib);
assert_eq!(&ia, &ib);
assert_eq!(*ia, *ib);
assert_eq!(ic, id);
assert_eq!(&ic, &id);
assert_eq!(*ic, *id);
assert_ne!(ia, ic);
assert_ne!(&ia, &ic);
assert_ne!(*ia, *ic);
}
#[test]
@ -335,19 +362,19 @@ mod test {
// Remember that identifiers begin at 1
assert_eq!(
SymbolId::from_int(1),
sut.intern("foo").index(),
sut.intern("foo"),
"First index should be 1"
);
assert_eq!(
SymbolId::from_int(1),
sut.intern("foo").index(),
sut.intern("foo"),
"Index should not increment for already-interned symbols"
);
assert_eq!(
SymbolId::from_int(2),
sut.intern("bar").index(),
sut.intern("bar"),
"Index should increment for new symbols"
);
}
@ -415,7 +442,6 @@ mod test {
assert!(sut.index_lookup(SymbolId::from_int(1)).is_none());
let sym = sut.intern("foo");
assert_eq!(Some(sym), sut.index_lookup(sym.index()));
assert_eq!(Some(sym), sut.index_lookup(sym.into()));
assert_eq!("foo", sut.index_lookup(sym).unwrap());
}
}

View File

@ -19,79 +19,88 @@
//! String internment system.
//!
//! Interned strings are represented by [`Symbol`],
//! created by an [`Interner`]:
//~
//! Interned strings are represented by an integer [`SymbolId`],
//! created by an [`Interner`].
//!
//! - [`ArenaInterner`] - Intern pool backed by an [arena][] for fast
//! and stable allocation.
//! - [`DefaultInterner`] - The currently recommended intern pool
//! configuration for symbol interning.
//! - [`FxArenaInterner`] - Intern pool backed by an [arena][] using the
//! [Fx Hash][fxhash] hashing algorithm.
//! - [`DefaultInterner`] - The currently recommended intern pool
//! configuration for symbol interning (size-agnostic).
//! - [`DefaultPkgInterner`] - The currently recommended intern pool
//! configuration for individual packages and their imports.
//! - [`DefaultProgInterner`] - The currently recommended intern pool
//! configuration for all packages within a program.
//!
//! Interners return symbols by reference which allows for `O(1)` comparison
//! by pointer.
//! Interners represent symbols as integer values which allows for `O(1)`
//! comparison of any arbitrary interned value,
//! regardless of length.
//!
//! The most common way to intern strings is using the global static
//! interners,
//! which offer several conveniences that are discussed below.
//! However,
//! interners may also be used standalone without requiring global state.
//!
//! [arena]: bumpalo
//!
//! ```
//! use tamer::sym::{Interner, DefaultInterner, Symbol, SymbolId};
//! use tamer::sym::{GlobalSymbolIntern, GlobalSymbolResolve, PkgSymbolId};
//!
//! // Inputs to be interned
//! let a = "foo";
//! let b = &"foo".to_string();
//! let c = "foobar";
//! let d = &c[0..3];
//! // Interns are represented by `SymbolId`. You should choose one of
//! // `ProgSymbolId` or `PkgSymbolId`, unless both must be supported.
//! let foo: PkgSymbolId = "foo".intern();
//! assert_eq!(foo, foo);
//!
//! // Interners employ interior mutability and so do not need to be
//! // declared `mut`
//! let interner = DefaultInterner::new();
//! // Interning the same string twice returns the same intern
//! assert_eq!(foo, "foo".intern());
//!
//! let (ia, ib, ic, id) = (
//! interner.intern(a),
//! interner.intern(b),
//! interner.intern(c),
//! interner.intern(d),
//! );
//! // All interns can be freely copied.
//! let foo2 = foo;
//! assert_eq!(foo, foo2);
//!
//! assert_eq!(ia, ib);
//! assert_eq!(ia, id);
//! assert_eq!(ib, id);
//! assert_ne!(ia, ic);
//! // Different strings intern to different values
//! assert_ne!(foo, "bar".intern());
//!
//! // All interns can be cloned and clones are eq
//! assert_eq!(*ia, ia.clone());
//!
//! // Only "foo" and "foobar" are interned
//! assert_eq!(2, interner.len());
//! assert!(interner.contains("foo"));
//! assert!(interner.contains("foobar"));
//! assert!(!interner.contains("something else"));
//!
//! // Each symbol has an associated, densely-packed integer value
//! // that can be used for indexing
//! assert_eq!(SymbolId::from_int(1u16), ia.index());
//! assert_eq!(SymbolId::from_int(1u16), ib.index());
//! assert_eq!(SymbolId::from_int(2u16), ic.index());
//! assert_eq!(SymbolId::from_int(1u16), id.index());
//!
//! // Symbols can also be looked up by index.
//! assert_eq!(Some(ia), interner.index_lookup(ia.index()));
//! // Interned slices can be looked up by their symbol id.
//! assert_eq!(&"foo", &foo.lookup_str());
//! ```
//!
//! What Is String Interning?
//! =========================
//! _[String interning][]_ is a process by which a single copy of a string
//! is stored immutably in memory as part of a _pool_.
//! When the same string is later encountered,
//! a reference to the string in the pool is used rather than allocating a
//! new string.
//! Interned strings are typically referred to as "symbols" or "atoms".
//! Once a string has been interned,
//! attempting to intern it again will always return the same [`SymbolId`].
//! Interned strings are typically referred to as "symbols" or "atoms".
//!
//! String comparison then amounts to comparing pointers (`O(1)`)
//! String comparison then amounts to comparing integer values (`O(1)`)
//! rather than having to scan the string (`O(n)`).
//! There is, however, a hashing cost of interning strings,
//! as well as looking up strings in the intern pool.
//! as well as looking up strings in the intern pool (both `O(1)`).
//!
//! It is expected that strings are interned as soon as they are encountered,
//! which is likely to be from source inputs or previously compiled object
//! files.
//! Processing stages will then hold the interned [`SymbolId`] and use those
//! for any needed comparsions,
//! without any need to look up the string from the pool.
//! Strings should only be looked up
//! (using [`GlobalSymbolResolve::lookup_str`] or
//! [`Interner::index_lookup`]) when they need to be written
//! (e.g. into a target or displayed to the user).
//!
//! [`SymbolId`] is monotonically increasing from 1,
//! making it a useful densely-packed index as an alternative [`HashMap`]
//! when most of the symbols will be represented as part of the map.
//! This also means that strings can be interned in bulk and have a
//! predictable relationship to one-another---for
//! example,
//! if strings are interned in lexographic order,
//! their [`SymbolId`]s will reflect that same ordering,
//! so long as those strings have not previously been interned.
//! Bulk insertion should therefore be done before processing user input.
//!
//! [string interning]: https://en.wikipedia.org/wiki/String_interning
//!
@ -105,78 +114,121 @@
//! 1. Strings are compared against the existing intern pool using a
//! [`HashMap`].
//! 2. If a string has not yet been interned:
//! - The string is copied into the arena-backed pool;
//! - A new [`Symbol`] is allocated adjacent to it in the arena holding
//! a string slice referencing the arena-allocated string; and
//! - The symbol is stored as the value in the [`HashMap`] for that key.
//! 3. Otherwise, a reference to the existing [`Symbol`] is returned.
//! - A new integer [`SymbolId`] index is allocated;
//! - The string is copied into the arena-backed pool at that new index;
//! and
//! - The string is hashed and will resolve to the new [`SymbolId`] for
//! future lookups and internment attempts.
//! 3. Otherwise, the existing [`SymbolId`] associated with the provided
//! string is returned.
//!
//! Since the arena provides a stable location in memory,
//! and all symbols are immutable,
//! [`ArenaInterner`] is able to safely return any number of references to
//! a single [`Symbol`],
//! bound to the lifetime of the arena itself.
//! Since the [`Symbol`] contains the string slice,
//! it also acts as a [smart pointer] for the interned string itself,
//! allowing [`Symbol`] to be used in any context where `&str` is
//! expected.
//! Dropping a [`Symbol`] does _not_ affect the underlying arena-allocated
//! data.
//! The string associated with a [`SymbolId`] can be looked up from the pool
//! using [`GlobalSymbolResolve::lookup_str`] for global interners,
//! or [`Interner::index_lookup`] otherwise.
//! Interned strings are represented by [`SymbolStr`],
//! which can be dereferenced into [`&str`].
//! Symbols allocated using a global interner will have a `'static`
//! lifetime.
//!
//! [smart pointer]: https://doc.rust-lang.org/book/ch15-00-smart-pointers.html
//!
//! Each symbol also has an associated integer index value
//! (see [`Symbol::index`]),
//! which provides a dense range of values suitable for use in vectors
//! as an alternative to [`HashMap`] for mapping symbol data.
//! A [`SymbolId`] can be mapped back into its associated [`Symbol`]
//! using [`Interner::index_lookup`].
//!
//! Since a reference to the same [`Symbol`] is returned for each
//! [`Interner::intern`] and [`Interner::intern_soft`] call,
//! symbols can be compared by pointer in `O(1)` time.
//! Symbols also implement [`Copy`],
//! and will still compare equal to other symbols referencing the same
//! interned value by comparing the underlying string slice pointers.
//! Since [`SymbolId`] is an integer value,
//! it implements [`Copy`] and will still compare equal to other symbols
//! referencing the same interned value.
//!
//! This implementation was heavily motivated by [Rustc's own internment
//! system][rustc-intern],
//! but differs in significant ways:
//!
//! - This implementation stores string references in [`Symbol`] rather
//! than relying on a global singleton [`Interner`];
//! - Consequently, associates the lifetime of interned strings with that
//! of the underlying arena rather than casting to `&'static`;
//! - Retrieves symbol values by pointer reference without requiring use
//! of [`Interner`] or a locking mechanism; and
//! - Stores [`Symbol`] objects in the arena rather than within a vector
//! indexed by [`SymbolId`].
//! system][rustc-intern].
//!
//! [`HashMap`]: std::collections::HashMap
//! [`NonZeroU32`]: std::num::NonZeroU32
//!
//!
//! Name Mangling
//! =============
//! Interners do not perform [name mangling][].
//! For future consideration,
//! see [RFC 2603][rfc-2603] and the [Itanium ABI][itanium-abi].
//! Symbol Index Sizes
//! ------------------
//! [`SymbolId`] is generic over [`SymbolIndexSize`],
//! which is implemented for
//! [`global::PkgSymSize`](crate::global::PkgSymSize) and
//! [`global::ProgSymSize`](crate::global::ProgSymSize).
//! This allows the compiler---which processes far less data than the
//! linker---to use a smaller index size.
//! This is desirable for certain core data structures,
//! like spans,
//! which try to pack a lot of information into 64-bit structures.
//!
//! [name mangling]: https://en.wikipedia.org/wiki/Name_mangling
//! [rfc-2603]: https://rust-lang.github.io/rfcs/2603-symbol-name-mangling-v2.html
//! [itanium-abi]: http://refspecs.linuxbase.org/cxxabi-1.86.html#mangling
//! But the cost is that of another trait bound on any systems that must
//! accommodate any [`SymbolIndexSize`]
//! Systems should therefore favor one of these two types if they are not
//! shared between e.g. compilers and linkers:
//!
//! - [`PkgSymbolId`] for individual packages and their imports; and
//! - [`ProgSymbolId`] for all packages in a program.
//!
//! Note that _it is not permissable to cast between different index sizes_!
//! Even though a [`PkgSymbolId`] could fit within the index size of a
//! [`ProgSymbolId`],
//! for example,
//! they use _different_ interners with their own distinct index
//! sets.
//! A system should avoid using multiple interners at the same time,
//! and trait bounds will make such a mistake painfully obvious.
//!
//! Global Interners
//! ----------------
//! TAMER offers two thread-local global interners that intern strings with
//! a `'static` lifetime,
//! simplifying the handling of lifetimes;
//! they produce symbols of type [`PkgSymbolId`] and [`ProgSymbolId`]
//! and are intended for packages and entire programs respectively.
//! These interners are lazily initialized on first use.
//! Symbols from the two interners cannot be mixed;
//! you must use the largest [`SymbolIndexSize`] needed.
//!
//! Global interners were introduced because symbols are used by virtually
//! every part of the system,
//! which polluted everything with interner lifetimes.
//! This suggested that the interner should be treated instead as if it were
//! a part of Rust itself,
//! and treated no differently than other core memory allocation.
//!
//! All [`SymbolStr`] objects returned from global interners hold a
//! `'static` lifetime to simplify lifetime management and borrowing.
//! However,
//! these should not be used in place of [`SymbolId`] if the string value
//! is not actually needed.
//!
//! Global interners are exposed via friendly APIs using two traits:
//!
//! - [`GlobalSymbolIntern`] provides an `intern` method that can be used
//! on any [`&str`] (e.g. `"foo".intern()`); and
//! - [`GlobalSymbolResolve`] provides a `lookup_str` method on
//! [`SymbolId`] which resolves the symbol using the appropriate
//! global interner,
//! producing a [`SymbolStr`] holding a reference to the `'static`
//! string slice within the pool.
//!
//! These traits are intentionally separate so that it is clear how a
//! particular package or object makes use of symbols.
//! If this distinction proves too cumbersome,
//! then they may be combined in the future.
//!
//! TAMER does not currently utilize threads,
//! and global interners are never dropped,
//! and so [`SymbolStr`] will always refer to a valid string.
//!
//! There is no mechanism preventing [`SymbolId`] from one interner from
//! being used with another beyond [`SymbolIndexSize`] bounds;
//! if you utilize interners for any other purpose,
//! it is advised that you create newtypes for their [`SymbolId`]s.
//!
//!
//! Related Work and Further Reading
//! ================================
//! String interning is often tightly coupled with symbols (in the generic
//! sense),
//! sometimes called atoms.
//! Symbols can often be either interned,
//! String interning is used in a variety of systems and languages.
//! Symbols can typically be either interned,
//! and therefore compared for equivalency,
//! or _uninterned_,
//! which makes them unique even to symbols of the same name.
//! Interning may also be done automatically by a language for performance.
//! Interning may also be done automatically by a language as a performance
//! optimization,
//! or by a compiler for storage in an object file such as ELF.
//! Languages listed below that allow for explicit interning may also
//! perform automatic interning as well
//! (for example, `'symbol` in Lisp and `lowercase_vars` as atoms in
@ -231,12 +283,9 @@
//! for Rust developed by Mozilla for Servo.
//! - [`string-interner`][rust-string-interner] is another string
//! interning library for Rust.
//! - [Rustc interns strings as `Symbol`s][rustc-intern] using an
//! [arena allocator][rustc-arena] and avoids `Rc` by representing
//! symbols as integer values and converting them to strings using a
//! global pool and unsafe rust to cast to a `static` slice.
//! - Rustc identifies symbols by integer value encapsulated within a
//! `Symbol`.
//! - [Rustc interns strings as `Symbol`s][rustc-intern] using a
//! global [arena allocator][rustc-arena] and unsafe rust to cast to
//! a `static` slice.
//! - Rustc's [`newtype_index!` macro][rustc-nt] uses
//! [`NonZeroU32`] so that [`Option`] uses no
//! additional space (see [pull request `53315`][rustc-nt-pr]).
@ -247,7 +296,7 @@
//! [rust-string-cache]: https://github.com/servo/string-cache
//! [rust-string-interner]: https://github.com/robbepop/string-interner
//! [rfc-1845]: https://rust-lang.github.io/rfcs/1845-shared-from-slice.html
//! [rustc-intern]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/ast/struct.Name.html
//! [rustc-intern]: https://doc.rust-lang.org/nightly/nightly-rustc/src/rustc_span/symbol.rs.html
//! [rustc-arena]: https://doc.rust-lang.org/nightly/nightly-rustc/arena/index.html
//! [rustc-nt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_index/macro.newtype_index.html
//! [rustc-nt-pr]: https://github.com/rust-lang/rust/pull/53315
@ -276,12 +325,7 @@ pub use interner::{
ArenaInterner, DefaultInterner, DefaultPkgInterner, DefaultProgInterner,
FxArenaInterner, Interner,
};
pub use symbol::{PkgSymbol, ProgSymbol, Symbol, SymbolId, SymbolIndexSize};
/// Concisely define dummy symbols for testing.
#[cfg(test)]
macro_rules! symbol_dummy {
($id:expr, $name:expr) => {
Symbol::new_dummy(SymbolId::from_int($id), $name);
};
}
pub use symbol::{
GlobalSymbolIntern, GlobalSymbolInternUnchecked, GlobalSymbolResolve,
PkgSymbolId, ProgSymbolId, SymbolId, SymbolIndexSize, SymbolStr,
};

View File

@ -17,34 +17,59 @@
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.
//! Symbol objects for string internment system.
//! Symbol objects representing interned strings.
//!
//! See the [parent module](super) for more information.
use super::{DefaultPkgInterner, DefaultProgInterner, Interner};
use crate::global;
use std::convert::{TryFrom, TryInto};
use std::fmt::{self, Debug};
use std::num::{NonZeroU16, NonZeroU32, NonZeroU8};
use std::fmt::{self, Debug, Display};
use std::hash::Hash;
use std::num::{NonZeroU16, NonZeroU32};
use std::ops::Deref;
use std::thread::LocalKey;
/// Unique symbol identifier.
/// Unique symbol identifier produced by an [`Interner`].
///
/// _Do not construct this value yourself;_
/// use an [`Interner`].
/// Use one of [`PkgSymbolId`] or [`ProgSymbolId`] unless a generic size is
/// actually needed
/// (e.g. implementations shared between a compiler and linker).
///
/// This newtype helps to prevent other indexes from being used where a
/// symbol index is expected.
/// Note, however, that it provides no defense against mixing symbol indexes
/// between multiple [`Interner`]s.
/// between multiple [`Interner`]s;
/// you should create your own newtypes to resolve that concern.
///
/// The index `0` is never valid because of
/// [`SymbolIndexSize::NonZero`],
/// which allows us to have `Option<SymbolId>` at no space cost.
///
/// [`Interner`]: super::Interner
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
/// Symbol Strings
/// ==============
/// [`SymbolId`] intentionally omits the [`Display`] trait to ensure that
/// compile-time errors occur when symbols are used in contexts where
/// strings are expected.
/// To resolve a [`SymbolId`] into the string that it represents,
/// see either [`GlobalSymbolResolve::lookup_str`] or
/// [`Interner::index_lookup`].
#[derive(Copy, Clone, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct SymbolId<Ix: SymbolIndexSize>(Ix::NonZero);
assert_eq_size!(Option<Symbol<u16>>, Symbol<u16>);
assert_eq_size!(Option<SymbolId<u16>>, SymbolId<u16>);
/// Identifier of a symbol within a single package.
///
/// This type should be preferred to [`ProgSymbolId`] when only a single
/// package's symbols are being processed.
pub type PkgSymbolId = SymbolId<global::PkgSymSize>;
/// Identifier of a symbol within an entire program.
///
/// This symbol type is preconfigured to accommodate a larger number of
/// symbols than [`PkgSymbolId`] and is suitable for use in a linker.
/// Use this type only when necessary.
pub type ProgSymbolId = SymbolId<global::ProgSymSize>;
impl<Ix: SymbolIndexSize> SymbolId<Ix> {
/// Construct index from a non-zero `u16` value.
@ -66,24 +91,6 @@ impl<Ix: SymbolIndexSize> SymbolId<Ix> {
SymbolId(Ix::new_unchecked(n))
}
/// Construct index from a non-zero `u16` value.
///
/// Panics
/// ------
/// Will panic if `n == 0`.
pub fn from_u16(n: u16) -> SymbolId<u16> {
SymbolId::from_int(n)
}
/// Construct index from a non-zero `u32` value.
///
/// Panics
/// ------
/// Will panic if `n == 0`.
pub fn from_u32(n: u32) -> SymbolId<u32> {
SymbolId::from_int(n)
}
pub fn as_usize(self) -> usize {
self.0.into().as_usize()
}
@ -98,12 +105,6 @@ where
}
}
impl<'i, Ix: SymbolIndexSize> From<&Symbol<'i, Ix>> for SymbolId<Ix> {
fn from(sym: &Symbol<'i, Ix>) -> Self {
sym.index()
}
}
/// An integer type paired with its respective `NonZero` type that may be
/// used to index symbols.
///
@ -119,15 +120,14 @@ pub trait SymbolIndexSize:
+ Eq
+ TryFrom<usize>
+ TryInto<usize>
+ Hash
+ 'static
{
/// The associated `NonZero*` type (e.g. [`NonZeroU16`]).
type NonZero: Copy + Into<Self> + Debug;
type NonZero: Copy + Into<Self> + Debug + PartialEq + Eq + Hash;
/// A symbol with a static lifetime suitable for placement at index 0 in
/// the string interment table,
/// which is not a valid [`SymbolId`] value.
fn dummy_sym() -> &'static Symbol<'static, Self>;
/// Global interner for this index type.
type Interner: Interner<'static, Self>;
/// Construct a new non-zero value from the provided primitive value.
///
@ -140,16 +140,26 @@ pub trait SymbolIndexSize:
/// Convert primitive value into a [`usize`].
fn as_usize(self) -> usize;
/// Perform an operation using the global interner for this index type.
///
/// This solves the problem of determining which global interner must be
/// used for a given [`SymbolIndexSize`] without having to resort to
/// dynamic dispatch.
fn with_static_interner<F, R>(f: F) -> R
where
F: FnOnce(&'static Self::Interner) -> R;
}
macro_rules! supported_symbol_index {
($prim:ty, $nonzero:ty, $dummy:ident) => {
($prim:ty, $nonzero:ty, $interner:ty, $global:ident) => {
thread_local! {
pub(super) static $global: $interner = <$interner>::new();
}
impl SymbolIndexSize for $prim {
type NonZero = $nonzero;
fn dummy_sym() -> &'static Symbol<'static, Self> {
&$dummy
}
type Interner = $interner;
fn new(n: Self) -> Option<Self::NonZero> {
Self::NonZero::new(n)
@ -162,159 +172,206 @@ macro_rules! supported_symbol_index {
fn as_usize(self) -> usize {
self as usize
}
fn with_static_interner<F, R>(f: F) -> R
where
F: FnOnce(&'static Self::Interner) -> R,
{
with_static_interner(&$global, f)
}
}
};
}
supported_symbol_index!(u8, NonZeroU8, DUMMY_SYM_8);
supported_symbol_index!(u16, NonZeroU16, DUMMY_SYM_16);
supported_symbol_index!(u32, NonZeroU32, DUMMY_SYM_32);
type StaticPkgInterner = DefaultPkgInterner<'static>;
type StaticProgInterner = DefaultProgInterner<'static>;
/// Interned string.
///
/// A reference to this symbol is returned each time the same string is
/// interned with the same [`Interner`];
/// as such,
/// symbols can be compared for equality by pointer;
/// the underlying symbol id need not be used.
///
/// Each symbol is identified by a unique integer
/// (see [`index`](Symbol::index)).
/// The use of integers creates a more dense range of values than pointers,
/// which allows callers to use a plain [`Vec`] as a map instead of
/// something far more expensive like
/// [`HashSet`](std::collections::HashSet);
/// this is especially beneficial for portions of the system that make
/// use of nearly all interned symbols,
/// like the ASG.
/// A [`SymbolId`] can be mapped back into its [`Symbol`] by calling
/// [`Interner::index_lookup`] on the same interner that produced it.
///
/// The symbol also stores a string slice referencing the interned string
/// itself,
/// whose lifetime is that of the [`Interner`]'s underlying data store.
/// Dereferencing the symbol will expose the underlying slice.
///
/// [`Interner`]: super::Interner
/// [`Interner::index_lookup`]: super::Interner::index_lookup
#[derive(Copy, Clone, Debug)]
pub struct Symbol<'i, Ix: SymbolIndexSize> {
index: SymbolId<Ix>,
str: &'i str,
}
supported_symbol_index!(u16, NonZeroU16, StaticPkgInterner, INTERNER_PKG);
supported_symbol_index!(u32, NonZeroU32, StaticProgInterner, INTERNER_PROG);
/// Interned string within a single package.
/// A string retrieved from the intern pool using a [`SymbolId`].
///
/// This type should be preferred to [`ProgSymbol`] when only a single
/// package's symbols are being processed.
pub type PkgSymbol<'i> = Symbol<'i, global::PkgSymSize>;
/// Interned string within an entire program.
/// The lifetime of the inner string is constrained to the lifetime of the
/// interner itself.
/// For global interners,
/// this means that the string slice has a `'static` lifetime.
///
/// This symbol type is preconfigured to accommodate a larger number of
/// symbols than [`PkgSymbol`] and is situable for use in a linker.
/// Use this type only when necessary.
pub type ProgSymbol<'i> = Symbol<'i, global::ProgSymSize>;
/// [`SymbolStr`] requires significantly more storage than an appropriate
/// [`SymbolId`] and should only be used when a string value must be
/// written (e.g. to a file or displayed to the user).
///
/// This value is intended to be short-lived.
#[derive(Debug, Default, Clone)]
pub struct SymbolStr<'i>(&'i str);
impl<'i, Ix: SymbolIndexSize> Symbol<'i, Ix> {
/// Construct a new interned value.
///
/// _This must only be done by an [`Interner`]._
/// As such,
/// this function is not public.
///
/// For test builds (when `cfg(test)`),
/// `new_dummy` is available to create symbols for tests.
///
/// [`Interner`]: super::Interner
#[inline]
pub(super) fn new(index: SymbolId<Ix>, str: &'i str) -> Symbol<'i, Ix> {
Self { index, str }
impl<'i> SymbolStr<'i> {
pub fn as_str(&self) -> &'i str {
self.0
}
/// Retrieve unique symbol index.
/// Create a [`SymbolStr`] from a string for testing.
///
/// This is a densely-packed identifier that can be used as an index for
/// mapping.
/// See [`SymbolId`] for more information.
#[inline]
pub fn index(&self) -> SymbolId<Ix> {
self.index
}
/// Construct a new interned value _for testing_.
///
/// This is a public version of [`Symbol::new`] available for test
/// builds.
/// This separate name is meant to strongly imply that you should not be
/// doing this otherwise.
///
/// See also `dummy_symbol!`.
/// _This function is only available for tests for convenience!_
/// `SymbolStr` must always represent a real, interned string in
/// non-test code.
#[cfg(test)]
#[inline(always)]
pub fn new_dummy(index: SymbolId<Ix>, str: &'i str) -> Symbol<'i, Ix> {
Self::new(index, str)
pub fn test_from_str(s: &'i str) -> Self {
SymbolStr(s)
}
}
impl<'i, Ix: SymbolIndexSize> PartialEq for Symbol<'i, Ix> {
fn eq(&self, other: &Self) -> bool {
std::ptr::eq(self as *const _, other as *const _)
|| std::ptr::eq(self.str.as_ptr(), other.str.as_ptr())
impl<'i> SymbolStr<'i> {
pub(super) fn from_interned_slice(slice: &'i str) -> SymbolStr<'i> {
SymbolStr(slice)
}
}
impl<'i, Ix: SymbolIndexSize> Eq for Symbol<'i, Ix> {}
impl<'i, T: Deref<Target = str>> PartialEq<T> for SymbolStr<'i> {
fn eq(&self, other: &T) -> bool {
self.0 == other.deref()
}
}
impl<'i, Ix: SymbolIndexSize> Deref for Symbol<'i, Ix> {
impl PartialEq<SymbolStr<'_>> for &str {
fn eq(&self, other: &SymbolStr<'_>) -> bool {
*self == other.0
}
}
impl<'i> Display for SymbolStr<'i> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{}", self.0)
}
}
// Once we have unsafe_impls stabalized,
// we should prevent `SymbolStr` from crossing threads.
// TAMER does not use threads at the time of writing,
// so this isn't a practical concern.
// If we _do_ want to pass between threads,
// we need to ensure the thread holding the interner lives longer than all
// other threads.
//impl<'i> !Send for SymbolStr<'i> {}
//impl<'i> !Sync for SymbolStr<'i> {}
impl<'i> Deref for SymbolStr<'i> {
type Target = str;
/// Dereference to interned string slice.
///
/// This allows for symbols to be used where strings are expected.
#[inline]
fn deref(&self) -> &str {
self.str
fn deref(&self) -> &'i str {
self.as_str()
}
}
impl<'i, Ix: SymbolIndexSize> fmt::Display for Symbol<'i, Ix> {
/// Display name of underlying string.
/// Acquire a static reference to a global interner.
///
/// Global interners are static and thread-local.
/// They are created using the [`thread_local!`] macro,
/// which produces a [`LocalKey`] that provides access with a lifetime
/// that cannot exceed that of the closure.
/// This is a problem,
/// because we must return a value from the interner's storage.
///
/// This function transmutes the lifetime of [`LocalKey`] back to
/// `'static`.
/// This has the benefit of requiring no further casting of the [`Interner`],
/// since the lifetime of its storage is already `'static`,
/// and so the retrieved interner can be used to return a static string
/// slice without any further unsafe code.
///
/// This lifetime transmutation is expected to be safe,
/// because the thread-local storage is never deallocated,
/// and the storage is only accessible to one thread.
fn with_static_interner<F, R, I, Ix>(key: &'static LocalKey<I>, f: F) -> R
where
Ix: SymbolIndexSize,
I: Interner<'static, Ix> + 'static,
F: FnOnce(&'static I) -> R,
{
key.with(|interner| {
f(unsafe {
// These type annotations are inferred, but please leave
// them here; transmute is especially dangerous, and we want
// to be sure reality always matches our expectations.
std::mem::transmute::<&I, &'static I>(interner)
})
})
}
/// Resolve a [`SymbolId`] to the string value it represents using a global
/// interner.
///
/// This exists as its own trait
/// (rather than simply adding to [`SymbolId`])
/// to make it easy to see what systems rely on global state.
pub trait GlobalSymbolResolve {
/// Resolve a [`SymbolId`] allocated using a global interner.
///
/// Since symbols contain pointers to their interned slices,
/// we effectively get this for free.
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{}", self.str)
/// This name is intended to convey that this operation has a cost---a
/// lookup is performed on the global interner pool,
/// which requires locking and so comes at a (small) cost.
/// This shouldn't be done more than is necessary.
fn lookup_str(&self) -> SymbolStr<'static>;
}
impl<Ix: SymbolIndexSize> GlobalSymbolResolve for SymbolId<Ix> {
fn lookup_str(&self) -> SymbolStr<'static> {
Ix::with_static_interner(|interner| {
interner.index_lookup(*self).unwrap()
})
}
}
lazy_static! {
/// Dummy 8-bit [`Symbol`] for use at index `0`.
///
/// A symbol must never have an index of `0`,
/// so this can be used as a placeholder.
/// The chosen [`SymbolId`] here does not matter since this will
/// never be referenced.
static ref DUMMY_SYM_8: Symbol<'static, u8> =
Symbol::new(SymbolId::from_int(1), "!BADSYMREF!");
/// Intern a string using a global interner.
///
/// This provides a convenient API that creates the appearance that string
/// interning is a core Rust language feature
/// (e.g. `"foo".intern()`).
/// This speaks to the rationale of introducing global interners to begin
/// with---mainly
/// that symbols are so pervasive that they may as well be a language
/// feature so that they are more natural to work with.
///
/// This will automatically intern using the proper global interner based on
/// the resolved [`SymbolIndexSize`].
/// In most situations within real (non-test) code,
/// Rust is able to infer this itself and so it looks quite natural.
pub trait GlobalSymbolIntern<Ix: SymbolIndexSize> {
/// Intern a string using a global interner.
fn intern(self) -> SymbolId<Ix>;
}
/// Dummy 16-bit [`Symbol`] for use at index `0`.
/// Intern a byte slice using a global interner.
///
/// See also [`GlobalSymbolIntern`].
/// This uses [`Interner::intern_utf8_unchecked`].
pub trait GlobalSymbolInternUnchecked<Ix: SymbolIndexSize> {
/// Intern a bye slice using a global interner.
///
/// A symbol must never have an index of `0`,
/// so this can be used as a placeholder.
/// The chosen [`SymbolId`] here does not matter since this will
/// never be referenced.
static ref DUMMY_SYM_16: Symbol<'static, u16> =
Symbol::new(SymbolId::from_int(1), "!BADSYMREF!");
/// Safety
/// ======
/// This function is unsafe because it uses
/// [`Interner::intern_utf8_unchecked`].
/// It is provided for convenience when interning from trusted binary
/// data
/// (such as [object files][]).
///
/// [object files]: crate::obj
unsafe fn intern_utf8_unchecked(self) -> SymbolId<Ix>;
}
/// Dummy 32-bit [`Symbol`] for use at index `0`.
///
/// A symbol must never have an index of `0`,
/// so this can be used as a placeholder.
/// The chosen [`SymbolId`] here does not matter since this will
/// never be referenced.
static ref DUMMY_SYM_32: Symbol<'static, u32> =
Symbol::new(SymbolId::from_int(1), "!BADSYMREF!");
impl<Ix: SymbolIndexSize> GlobalSymbolIntern<Ix> for &str {
fn intern(self) -> SymbolId<Ix> {
Ix::with_static_interner(|interner| interner.intern(self))
}
}
impl<Ix: SymbolIndexSize> GlobalSymbolInternUnchecked<Ix> for &[u8] {
unsafe fn intern_utf8_unchecked(self) -> SymbolId<Ix> {
Ix::with_static_interner(|interner| {
interner.intern_utf8_unchecked(self)
})
}
}
#[cfg(test)]
@ -323,63 +380,19 @@ mod test {
#[test]
fn self_compares_eq() {
let sym = Symbol::new(SymbolId::from_int(1u16), "str");
let sym = SymbolId::from_int(1u16);
assert_eq!(&sym, &sym);
}
#[test]
fn copy_compares_equal() {
let sym = Symbol::new(SymbolId::from_int(1u16), "str");
let sym = SymbolId::from_int(1u16);
let cpy = sym;
assert_eq!(sym, cpy);
}
// Integer values are for convenience, not identity. They cannot be
// used as a unique identifier across different interners.
#[test]
fn same_index_different_slices_compare_unequal() {
let a = Symbol::new(SymbolId::from_int(1u16), "a");
let b = Symbol::new(SymbolId::from_int(1u16), "b");
assert_ne!(a, b);
}
// As mentioned above, ids are _not_ the identity of the symbol. If
// two values point to the same location in memory, they are assumed
// to have come from the same interner, and should therefore have
// the same index this should never happen unless symbols are
// being created without the use of interners, which is unsupported.
//
// This test is a cautionary tale.
#[test]
fn different_index_same_slices_compare_equal() {
let slice = "str";
let a = Symbol::new(SymbolId::from_int(1u16), slice);
let b = Symbol::new(SymbolId::from_int(2u16), slice);
assert_eq!(a, b);
}
#[test]
fn cloned_symbols_compare_equal() {
let sym = Symbol::new(SymbolId::from_int(1u16), "foo");
assert_eq!(sym, sym.clone());
}
// &Symbol can be used where string slices are expected (this won't
// compile otherwise).
#[test]
fn ref_can_be_used_as_string_slice() {
let slice = "str";
let sym_slice: &str = &Symbol::new(SymbolId::from_int(1u16), slice);
assert_eq!(slice, sym_slice);
}
// For use when we can guarantee proper ids.
#[test]
fn can_create_index_unchecked() {
@ -388,17 +401,33 @@ mod test {
});
}
#[test]
fn can_retrieve_symbol_index() {
let index = SymbolId::from_int(1u16);
mod global {
use super::*;
assert_eq!(index, Symbol::new(index, "").index());
}
#[test]
fn str_lookup_using_global_interner() {
INTERNER_PKG.with(|interner| {
let given = "test global intern";
let sym = interner.intern(given);
#[test]
fn displays_as_interned_value() {
let sym = Symbol::new(SymbolId::from_int(1u16), "foo");
assert_eq!(given, sym.lookup_str());
});
}
assert_eq!(format!("{}", sym), sym.str);
#[test]
fn str_intern_uses_global_interner() {
// This creates the illusion of a core Rust language feature
let sym = "foo".intern();
assert_eq!("foo", sym.lookup_str());
INTERNER_PKG.with(|interner| {
assert_eq!(
sym,
interner.intern("foo"),
"GlobalSymbolIntern<&str>::intern must use the global interner"
);
});
}
}
}