tame/tamer/src/nir.rs

// IR that is "near" the source code.
//
//  Copyright (C) 2014-2022 Ryan Specialty Group, LLC.
//
//  This file is part of TAME.
//
//  This program is free software: you can redistribute it and/or modify
//  it under the terms of the GNU General Public License as published by
//  the Free Software Foundation, either version 3 of the License, or
//  (at your option) any later version.
//
//  This program is distributed in the hope that it will be useful,
//  but WITHOUT ANY WARRANTY; without even the implied warranty of
//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
//  GNU General Public License for more details.
//
//  You should have received a copy of the GNU General Public License
//  along with this program.  If not, see <http://www.gnu.org/licenses/>.

//! An IR that is "near" the source code.
//!
//! This IR is "near" the source code written by the user,
//!   performing only basic normalization tasks like desugaring.
//! It takes a verbose input language and translates it into a much more
//!   concise internal representation.
//! The hope is that most desugaring will be done by templates in the future.
//!
//! NIR cannot completely normalize the source input because it does not
//!   have enough information to do so---the
//!     template system requires a compile-time interpreter that is beyond
//!     the capabilities of NIR,
//!       and so a final normalization pass must be done later on in the
//!       lowering pipeline.
//!
//! This is a streaming IR,
//!   meaning that the equivalent AST is not explicitly represented as a
//!   tree structure in memory.
//!
//! NIR is lossy and does not retain enough information for code
//!   formatting---that
//!     type of operation will require a mapping between
//!     XIRF and NIR,
//!       where the latter is used to gather enough context for formatting
//!       and the former is used as a concrete representation of what the user
//!       actually typed.
//!
//! For more information on the parser,
//!   see [`parse`].
//! The entry point for NIR in the lowering pipeline is exported as
//!   [`XirfToNir`].

mod desugar;
mod parse;

use crate::{
    diagnose::{Annotate, Diagnostic},
    fmt::{DisplayWrapper, TtQuote},
    parse::{Object, Token},
    span::{Span, UNKNOWN_SPAN},
    sym::{st::quick_contains_byte, GlobalSymbolResolve, SymbolId},
    xir::{
        attr::{Attr, AttrSpan},
        fmt::TtXmlAttr,
        QName,
    },
};
use memchr::memchr;
use std::{
    convert::Infallible,
    error::Error,
    fmt::{Debug, Display},
};

pub use desugar::{DesugarNir, DesugarNirError};
pub use parse::{
    NirParseState as XirfToNir, NirParseStateError_ as XirfToNirError,
};

/// IR that is "near" the source code,
///   without its syntactic sugar.
///
/// This form contains only primitives that cannot be reasonably represented
///   by other primitives.
/// This is somewhat arbitrary and may change over time,
///   but represents a balance between the level of abstraction of the IR
///   and performance of lowering operations.
///
/// See [`SugaredNir`] for more information about the sugared form.
#[derive(Debug, PartialEq, Eq)]
pub enum PlainNir {
    Todo,
}

impl Token for PlainNir {
    fn ir_name() -> &'static str {
        "Plain NIR"
    }

    fn span(&self) -> Span {
        use PlainNir::*;

        match self {
            Todo => UNKNOWN_SPAN,
        }
    }
}

impl Object for PlainNir {}

impl Display for PlainNir {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        use PlainNir::*;

        match self {
            Todo => write!(f, "TODO"),
        }
    }
}

/// Syntactic sugar atop of [`PlainNir`].
///
/// NIR contains various syntax features that serve as mere quality-of-life
///   conveniences for users
///     ("sugar" to sweeten the experience).
/// These features do not add an expressiveness to the language,
///   and are able to be lowered into other primitives without changing
///   its meaning.
///
/// The process of lowering syntactic sugar into primitives is called
///   "desugaring" and is carried out by the [`DesugarNir`] lowering
///     operation,
///       producing [`PlainNir`].
#[derive(Debug, PartialEq, Eq)]
pub enum SugaredNir {
    /// A primitive token that may have sugared values.
    Todo,
}

impl Token for SugaredNir {
    fn ir_name() -> &'static str {
        "Sugared NIR"
    }

    fn span(&self) -> Span {
        use SugaredNir::*;

        match self {
            Todo => UNKNOWN_SPAN,
        }
    }
}

impl Object for SugaredNir {}

impl Display for SugaredNir {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        use SugaredNir::*;

        match self {
            Todo => write!(f, "TODO"),
        }
    }
}

/// Tag representing the type of a NIR value.
///
/// NIR values originate from attributes,
///   which are refined into types as enough information becomes available.
/// Value parsing must be deferred if a value requires desugaring or
///   metavalue expansion.
#[derive(Debug, PartialEq, Eq)]
#[repr(u8)]
pub enum NirSymbolTy {
    AnyIdent,
    BooleanLiteral,
    ClassIdent,
    ClassIdentList,
    ConstIdent,
    DescLiteral,
    Dim,
    DynNodeLiteral,
    FuncIdent,
    IdentDtype,
    IdentType,
    MapTransformLiteral,
    NumLiteral,
    ParamDefault,
    ParamIdent,
    ParamName,
    ParamType,
    PkgPath,
    ShortDimNumLiteral,
    StringLiteral,
    SymbolTableKey,
    TexMathLiteral,
    Title,
    TplMetaIdent,
    TplIdent,
    TplParamIdent,
    TypeIdent,
    ValueIdent,
}

impl Display for NirSymbolTy {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        use NirSymbolTy::*;

        match self {
            AnyIdent => write!(f, "any identifier"),
            BooleanLiteral => write!(
                f,
                "boolean literal {fmt_true} or {fmt_false}",
                fmt_true = TtQuote::wrap("true"),
                fmt_false = TtQuote::wrap("false"),
            ),
            ClassIdent => write!(f, "classification identifier"),
            ClassIdentList => {
                write!(f, "space-delimited list of classification identifiers")
            }
            ConstIdent => write!(f, "constant identifier"),
            DescLiteral => write!(f, "description literal"),
            Dim => write!(f, "dimension declaration"),
            DynNodeLiteral => write!(f, "dynamic node literal"),
            FuncIdent => write!(f, "function identifier"),
            IdentDtype => write!(f, "identifier primitive datatype"),
            IdentType => write!(f, "identifier type"),
            MapTransformLiteral => write!(f, "map transformation literal"),
            NumLiteral => write!(f, "numeric literal"),
            ParamDefault => write!(f, "param default"),
            ParamIdent => write!(f, "param identifier"),
            ParamName => write!(f, "param name"),
            ParamType => write!(f, "param type"),
            PkgPath => write!(f, "package path"),
            ShortDimNumLiteral => {
                write!(f, "short-hand dimensionalized numeric literal")
            }
            StringLiteral => write!(f, "string literal"),
            SymbolTableKey => write!(f, "symbol table key name"),
            TexMathLiteral => write!(f, "TeX math literal"),
            Title => write!(f, "title"),
            TplMetaIdent => write!(f, "template metadata identifier"),
            TplIdent => write!(f, "template name"),
            TplParamIdent => write!(f, "template param identifier"),
            TypeIdent => write!(f, "type identifier"),
            ValueIdent => write!(f, "value identifier"),
        }
    }
}

/// A ([`SymbolId`], [`Span`]) pair in an attribute value context that may
///   require desugaring and interpretation within the context of a template
///   application.
///
/// Interpolated values require desugaring;
///   see [`DesugarNir`] for more information.
///
/// _This object must be kept small_,
///   since it is used in objects that aggregate portions of the token
///   stream,
///     which must persist in memory for a short period of time,
///     and therefore cannot be optimized away as other portions of the IR.
/// As such,
///   this does not nest enums.
#[derive(Debug, PartialEq, Eq)]
pub enum SugaredNirSymbol<const TY: NirSymbolTy> {
    /// The symbol contains an expression representing the concatenation of
    ///   any number of literals and metavariables
    ///     (referred to as "string interpolation" in many languages).
    Interpolate(SymbolId, Span),

    /// It's not ripe yet.
    ///
    /// No parsing has been performed.
    Todo(SymbolId, Span),
}

// Force developer to be conscious of any changes in size;
//   see `SugaredNirSymbol` docs for more information.
assert_eq_size!(SugaredNirSymbol<{ NirSymbolTy::AnyIdent }>, u128);

/// Character whose presence in a string indicates that interpolation
///   parsing must occur.
pub const INTERPOLATE_CHAR: u8 = b'{';

#[derive(Debug, PartialEq, Eq)]
pub enum PkgType {
    /// Package is intended to produce an executable program.
    ///
    /// This is specified by the `rater` root node.
    Prog,
    /// Package is intended to be imported as a component of a larger
    ///   program.
    Mod,
}

/// Whether a value represented by the provided [`SymbolId`] requires
///   interpolation.
///
/// _NB: This dereferences the provided [`SymbolId`] if it is dynamically
///   allocated._
///
/// The provided value requires interpolation if it contains,
///   anywhere in the string,
///   the character [`INTERPOLATE_CHAR`].
/// This does not know if the string will parse correctly;
///   that job is left for desugaring,
///     and so this will flag syntactically invalid interpolated strings
///       (which is expected).
#[inline]
fn needs_interpolation(val: SymbolId) -> bool {
    // We can skip pre-interned symbols that we know cannot include the
    //   interpolation character.
    // TODO: Abstract into `sym::symbol` module.
    let ch = INTERPOLATE_CHAR;
    quick_contains_byte(val, ch)
        .or_else(|| memchr(ch, val.lookup_str().as_bytes()).map(|_| true))
        .unwrap_or(false)
}

impl<const TY: NirSymbolTy> TryFrom<(SymbolId, Span)> for SugaredNirSymbol<TY> {
    type Error = NirAttrParseError;

    fn try_from((val, span): (SymbolId, Span)) -> Result<Self, Self::Error> {
        match needs_interpolation(val) {
            true => Ok(SugaredNirSymbol::Interpolate(val, span)),
            false => Ok(SugaredNirSymbol::Todo(val, span)),
        }
    }
}

impl<const TY: NirSymbolTy> TryFrom<Attr> for SugaredNirSymbol<TY> {
    type Error = NirAttrParseError;

    fn try_from(attr: Attr) -> Result<Self, Self::Error> {
        match attr {
            Attr(_, val, AttrSpan(_, vspan)) => (val, vspan).try_into(),
        }
    }
}

#[derive(Debug, PartialEq, Eq)]
pub struct Literal<const S: SymbolId>;

impl<const S: SymbolId> TryFrom<Attr> for Literal<S> {
    type Error = NirAttrParseError;

    fn try_from(attr: Attr) -> Result<Self, Self::Error> {
        match attr {
            Attr(_, val, _) if val == S => Ok(Literal),
            Attr(name, _, aspan) => Err(NirAttrParseError::LiteralMismatch(
                name,
                aspan.value_span(),
                S,
            )),
        }
    }
}

impl From<Infallible> for NirAttrParseError {
    fn from(x: Infallible) -> Self {
        match x {}
    }
}

type ExpectedSymbolId = SymbolId;

#[derive(Debug, PartialEq, Eq)]
pub enum NirAttrParseError {
    LiteralMismatch(QName, Span, ExpectedSymbolId),
}

impl Error for NirAttrParseError {
    fn source(&self) -> Option<&(dyn Error + 'static)> {
        None
    }
}

impl Display for NirAttrParseError {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        match self {
            Self::LiteralMismatch(name, _, _) => {
                write!(f, "unexpected value for {}", TtXmlAttr::wrap(name),)
            }
        }
    }
}

impl Diagnostic for NirAttrParseError {
    fn describe(&self) -> Vec<crate::diagnose::AnnotatedSpan> {
        match self {
            Self::LiteralMismatch(_, span, expected) => span
                .error(format!("expecting {}", TtQuote::wrap(expected)))
                .into(),
        }
    }
}

#[cfg(test)]
mod test;
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								// IR that is "near" the source code.
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								//
 								//  Copyright (C) 2014-2022 Ryan Specialty Group, LLC.
 								//
 								//  This file is part of TAME.
 								//
 								//  This program is free software: you can redistribute it and/or modify
 								//  it under the terms of the GNU General Public License as published by
 								//  the Free Software Foundation, either version 3 of the License, or
 								//  (at your option) any later version.
 								//
 								//  This program is distributed in the hope that it will be useful,
 								//  but WITHOUT ANY WARRANTY; without even the implied warranty of
 								//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 								//  GNU General Public License for more details.
 								//
 								//  You should have received a copy of the GNU General Public License
 								//  along with this program.  If not, see <http://www.gnu.org/licenses/>.
-												tamer: nir: Re-define "NIR"

This was originally the "noramlized" IR, but that's not possible to do
without template expansion, which is going to happen at a later point.  So,
this is just "NIR", pronounced "near", which is an IR that is "near" to the
source code.  You can define it was "Near IR" if you want, but it's just a
homonym with a not-quite-defined acronym to me.

DEV-7145

											
										
										
											2022-09-16 09:59:38 -04:00
+								//! An IR that is "near" the source code.
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								//!
 								//! This IR is "near" the source code written by the user,
-												tamer: nir: Re-define "NIR"

This was originally the "noramlized" IR, but that's not possible to do
without template expansion, which is going to happen at a later point.  So,
this is just "NIR", pronounced "near", which is an IR that is "near" to the
source code.  You can define it was "Near IR" if you want, but it's just a
homonym with a not-quite-defined acronym to me.

DEV-7145

											
										
										
											2022-09-16 09:59:38 -04:00
+								//!   performing only basic normalization tasks like desugaring.
 								//! It takes a verbose input language and translates it into a much more
 								//!   concise internal representation.
 								//! The hope is that most desugaring will be done by templates in the future.
 								//!
 								//! NIR cannot completely normalize the source input because it does not
 								//!   have enough information to do so---the
 								//!     template system requires a compile-time interpreter that is beyond
 								//!     the capabilities of NIR,
 								//!       and so a final normalization pass must be done later on in the
 								//!       lowering pipeline.
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								//!
 								//! This is a streaming IR,
 								//!   meaning that the equivalent AST is not explicitly represented as a
 								//!   tree structure in memory.
 								//!
-												tamer: nir: Re-define "NIR"

This was originally the "noramlized" IR, but that's not possible to do
without template expansion, which is going to happen at a later point.  So,
this is just "NIR", pronounced "near", which is an IR that is "near" to the
source code.  You can define it was "Near IR" if you want, but it's just a
homonym with a not-quite-defined acronym to me.

DEV-7145

											
										
										
											2022-09-16 09:59:38 -04:00
+								//! NIR is lossy and does not retain enough information for code
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								//!   formatting---that
 								//!     type of operation will require a mapping between
 								//!     XIRF and NIR,
 								//!       where the latter is used to gather enough context for formatting
 								//!       and the former is used as a concrete representation of what the user
 								//!       actually typed.
-												tamer: nir::parse: Grammar summary docs

This is intended to provide just enough information to help elucidate how
the system works and why.

DEV-7145

											
										
										
											2022-09-19 09:22:07 -04:00
+								//!
 								//! For more information on the parser,
 								//!   see [`parse`].
 								//! The entry point for NIR in the lowering pipeline is exported as
 								//!   [`XirfToNir`].
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								mod desugar;
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								mod parse;
 								use crate::{
 								    diagnose::{Annotate, Diagnostic},
 								    fmt::{DisplayWrapper, TtQuote},
 								    parse::{Object, Token},
-												tamer: nir: Remove token `todo!`s

Just preparing to actually define NIR itself.  The _grammar_ has been
represented (derived from our internal systems, using them as a test case),
but the IR itself has not yet received a definition.

DEV-7145

											
										
										
											2022-09-19 16:21:41 -04:00
+								    span::{Span, UNKNOWN_SPAN},
-												tamer: nir: Detect interpolated values

This simply detects whether a value will need to be further parsed for
interpolation; it does not yet perform the parsing itself, which will happen
during desugaring.

This introduces a performance regression, for an interesting reason.  I
found that introducing a single new variant to `SugaredNir` (with a
`(SymbolId, Span)` pair), was causing the width of the `NirParseState` type
to increase just enough to cause Rust to be unable to optimize away a
significant number of memcpys related to `Parser` moves, and consequently
reducing performance by nearly 50% for `tamec`.  Yikes.

I suspected this would be a problem, and indeed have tried in all other
cases to avoid aggregation until the ASG---the problem is that I had wanted
to aggregate attributes for NIR so that the IR could actually make some
progress toward simplifying the stream (and therefore working with the
data), and be able to validate against a grammar defined in a single
place.  The problem is that the `NirParseState` type contains a sum type for
every attribute parser, and is therefore as wide as the largest one.  That
is what Rust is having trouble optimizing memcpy away for.

Indeed, reducing the number of attributes improves the situation
drastically.  However, it doesn't make it go away entirely.

If you look at a callgrind profile for `tameld` (or a dissassembly), you'll
notice that I put quite a bit of effort into ensuring that the hot code path
for the lowering pipeline contains _no_ memcpys for the parsers.  But that
is not the case with `tamec`---I had to move on.  But I do still have the
same escape hatch that I introduced for `tameld`, which is the mutable
`Context`.

It seems that may be the solution there too, but I want to get a bit further
along first to see how these data end up propagating before I go through
that somewhat significant effort.

DEV-13156

											
										
										
											2022-11-01 14:30:34 -04:00
+								    sym::{st::quick_contains_byte, GlobalSymbolResolve, SymbolId},
 								    xir::{
 								        attr::{Attr, AttrSpan},
 								        fmt::TtXmlAttr,
 								        QName,
 								    },
 								};
 								use memchr::memchr;
 								use std::{
 								    convert::Infallible,
 								    error::Error,
 								    fmt::{Debug, Display},
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								};
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								pub use desugar::{DesugarNir, DesugarNirError};
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								pub use parse::{
 								    NirParseState as XirfToNir, NirParseStateError_ as XirfToNirError,
 								};
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								/// IR that is "near" the source code,
 								///   without its syntactic sugar.
 								///
 								/// This form contains only primitives that cannot be reasonably represented
 								///   by other primitives.
 								/// This is somewhat arbitrary and may change over time,
 								///   but represents a balance between the level of abstraction of the IR
 								///   and performance of lowering operations.
 								///
 								/// See [`SugaredNir`] for more information about the sugared form.
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								#[derive(Debug, PartialEq, Eq)]
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								pub enum PlainNir {
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								    Todo,
 								}
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								impl Token for PlainNir {
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								    fn ir_name() -> &'static str {
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								        "Plain NIR"
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								    }
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								    fn span(&self) -> Span {
 								        use PlainNir::*;
-												tamer: nir: Remove token `todo!`s

Just preparing to actually define NIR itself.  The _grammar_ has been
represented (derived from our internal systems, using them as a test case),
but the IR itself has not yet received a definition.

DEV-7145

											
										
										
											2022-09-19 16:21:41 -04:00
 								        match self {
 								            Todo => UNKNOWN_SPAN,
 								        }
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								    }
 								}
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								impl Object for PlainNir {}
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								impl Display for PlainNir {
-												tamer: nir: Remove token `todo!`s

Just preparing to actually define NIR itself.  The _grammar_ has been
represented (derived from our internal systems, using them as a test case),
but the IR itself has not yet received a definition.

DEV-7145

											
										
										
											2022-09-19 16:21:41 -04:00
+								    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								        use PlainNir::*;
-												tamer: nir: Remove token `todo!`s

Just preparing to actually define NIR itself.  The _grammar_ has been
represented (derived from our internal systems, using them as a test case),
but the IR itself has not yet received a definition.

DEV-7145

											
										
										
											2022-09-19 16:21:41 -04:00
 								        match self {
 								            Todo => write!(f, "TODO"),
 								        }
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								    }
 								}
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								/// Syntactic sugar atop of [`PlainNir`].
 								///
 								/// NIR contains various syntax features that serve as mere quality-of-life
 								///   conveniences for users
 								///     ("sugar" to sweeten the experience).
 								/// These features do not add an expressiveness to the language,
 								///   and are able to be lowered into other primitives without changing
 								///   its meaning.
 								///
 								/// The process of lowering syntactic sugar into primitives is called
 								///   "desugaring" and is carried out by the [`DesugarNir`] lowering
 								///     operation,
 								///       producing [`PlainNir`].
 								#[derive(Debug, PartialEq, Eq)]
 								pub enum SugaredNir {
-												tamer: nir: Detect interpolated values

This simply detects whether a value will need to be further parsed for
interpolation; it does not yet perform the parsing itself, which will happen
during desugaring.

This introduces a performance regression, for an interesting reason.  I
found that introducing a single new variant to `SugaredNir` (with a
`(SymbolId, Span)` pair), was causing the width of the `NirParseState` type
to increase just enough to cause Rust to be unable to optimize away a
significant number of memcpys related to `Parser` moves, and consequently
reducing performance by nearly 50% for `tamec`.  Yikes.

I suspected this would be a problem, and indeed have tried in all other
cases to avoid aggregation until the ASG---the problem is that I had wanted
to aggregate attributes for NIR so that the IR could actually make some
progress toward simplifying the stream (and therefore working with the
data), and be able to validate against a grammar defined in a single
place.  The problem is that the `NirParseState` type contains a sum type for
every attribute parser, and is therefore as wide as the largest one.  That
is what Rust is having trouble optimizing memcpy away for.

Indeed, reducing the number of attributes improves the situation
drastically.  However, it doesn't make it go away entirely.

If you look at a callgrind profile for `tameld` (or a dissassembly), you'll
notice that I put quite a bit of effort into ensuring that the hot code path
for the lowering pipeline contains _no_ memcpys for the parsers.  But that
is not the case with `tamec`---I had to move on.  But I do still have the
same escape hatch that I introduced for `tameld`, which is the mutable
`Context`.

It seems that may be the solution there too, but I want to get a bit further
along first to see how these data end up propagating before I go through
that somewhat significant effort.

DEV-13156

											
										
										
											2022-11-01 14:30:34 -04:00
+								    /// A primitive token that may have sugared values.
-												tamer: nir (SugaredNir): Mirror PlainNir

This mirror is only a `Todo` variant at the moment, but my hope had been to
try to creatively nest or use generics to simplify the conversaion between
the two flavors without a lot of boilerplate.  But it doesn't seem like I'm
going to be successful, and may have to resort to macros to remove
boilerplate.

But I need to stop fighting with myself and move on.  Though I would still
like to keep the types purely compile-time via const generics if possible,
since they're not needed in memory (or disk) until we get to templates;
they're otherwise static relative to a NIR token variant.

DEV-13209

											
										
										
											2022-11-01 15:13:29 -04:00
+								    Todo,
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								}
 								impl Token for SugaredNir {
 								    fn ir_name() -> &'static str {
 								        "Sugared NIR"
 								    }
 								    fn span(&self) -> Span {
 								        use SugaredNir::*;
 								        match self {
-												tamer: nir (SugaredNir): Mirror PlainNir

This mirror is only a `Todo` variant at the moment, but my hope had been to
try to creatively nest or use generics to simplify the conversaion between
the two flavors without a lot of boilerplate.  But it doesn't seem like I'm
going to be successful, and may have to resort to macros to remove
boilerplate.

But I need to stop fighting with myself and move on.  Though I would still
like to keep the types purely compile-time via const generics if possible,
since they're not needed in memory (or disk) until we get to templates;
they're otherwise static relative to a NIR token variant.

DEV-13209

											
										
										
											2022-11-01 15:13:29 -04:00
+								            Todo => UNKNOWN_SPAN,
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								        }
 								    }
 								}
 								impl Object for SugaredNir {}
 								impl Display for SugaredNir {
 								    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
 								        use SugaredNir::*;
 								        match self {
-												tamer: nir (SugaredNir): Mirror PlainNir

This mirror is only a `Todo` variant at the moment, but my hope had been to
try to creatively nest or use generics to simplify the conversaion between
the two flavors without a lot of boilerplate.  But it doesn't seem like I'm
going to be successful, and may have to resort to macros to remove
boilerplate.

But I need to stop fighting with myself and move on.  Though I would still
like to keep the types purely compile-time via const generics if possible,
since they're not needed in memory (or disk) until we get to templates;
they're otherwise static relative to a NIR token variant.

DEV-13209

											
										
										
											2022-11-01 15:13:29 -04:00
+								            Todo => write!(f, "TODO"),
-												tamer: nir: Sugared and plain flavors

This introduces the concept of sugared NIR and provides the boilerplate for
a desugaring pass.  The earlier commits dealing with cleaning up the
lowering pipeline were to support this work, in particular to ensure that
reporting and recovery properly applied to this lowering operation without
adding a ton more boilerplate.

DEV-13158

											
										
										
											2022-10-19 10:00:08 -04:00
+								        }
 								    }
 								}
-												tamer: nir: Detect interpolated values

This simply detects whether a value will need to be further parsed for
interpolation; it does not yet perform the parsing itself, which will happen
during desugaring.

This introduces a performance regression, for an interesting reason.  I
found that introducing a single new variant to `SugaredNir` (with a
`(SymbolId, Span)` pair), was causing the width of the `NirParseState` type
to increase just enough to cause Rust to be unable to optimize away a
significant number of memcpys related to `Parser` moves, and consequently
reducing performance by nearly 50% for `tamec`.  Yikes.

I suspected this would be a problem, and indeed have tried in all other
cases to avoid aggregation until the ASG---the problem is that I had wanted
to aggregate attributes for NIR so that the IR could actually make some
progress toward simplifying the stream (and therefore working with the
data), and be able to validate against a grammar defined in a single
place.  The problem is that the `NirParseState` type contains a sum type for
every attribute parser, and is therefore as wide as the largest one.  That
is what Rust is having trouble optimizing memcpy away for.

Indeed, reducing the number of attributes improves the situation
drastically.  However, it doesn't make it go away entirely.

If you look at a callgrind profile for `tameld` (or a dissassembly), you'll
notice that I put quite a bit of effort into ensuring that the hot code path
for the lowering pipeline contains _no_ memcpys for the parsers.  But that
is not the case with `tamec`---I had to move on.  But I do still have the
same escape hatch that I introduced for `tameld`, which is the mutable
`Context`.

It seems that may be the solution there too, but I want to get a bit further
along first to see how these data end up propagating before I go through
that somewhat significant effort.

DEV-13156

											
										
										
											2022-11-01 14:30:34 -04:00
+								/// Tag representing the type of a NIR value.
 								///
 								/// NIR values originate from attributes,
 								///   which are refined into types as enough information becomes available.
 								/// Value parsing must be deferred if a value requires desugaring or
 								///   metavalue expansion.
 								#[derive(Debug, PartialEq, Eq)]
 								#[repr(u8)]
 								pub enum NirSymbolTy {
 								    AnyIdent,
 								    BooleanLiteral,
 								    ClassIdent,
 								    ClassIdentList,
 								    ConstIdent,
 								    DescLiteral,
 								    Dim,
 								    DynNodeLiteral,
 								    FuncIdent,
 								    IdentDtype,
 								    IdentType,
 								    MapTransformLiteral,
 								    NumLiteral,
 								    ParamDefault,
 								    ParamIdent,
 								    ParamName,
 								    ParamType,
 								    PkgPath,
 								    ShortDimNumLiteral,
 								    StringLiteral,
 								    SymbolTableKey,
 								    TexMathLiteral,
 								    Title,
 								    TplMetaIdent,
-												tamer: nir::NirSymbolTy (Display): Add impl

Add initial descriptions and consolodate some of the types.  There'll be
more to come; this is just to get `Display` derives working for types
that'll be using it.  I'd like to see where this description manifests
itself before I decide how user-friendly I'd like it to be.

DEV-13156

											
										
										
											2022-11-01 16:23:51 -04:00
+								    TplIdent,
-												tamer: nir: Detect interpolated values

This simply detects whether a value will need to be further parsed for
interpolation; it does not yet perform the parsing itself, which will happen
during desugaring.

This introduces a performance regression, for an interesting reason.  I
found that introducing a single new variant to `SugaredNir` (with a
`(SymbolId, Span)` pair), was causing the width of the `NirParseState` type
to increase just enough to cause Rust to be unable to optimize away a
significant number of memcpys related to `Parser` moves, and consequently
reducing performance by nearly 50% for `tamec`.  Yikes.

I suspected this would be a problem, and indeed have tried in all other
cases to avoid aggregation until the ASG---the problem is that I had wanted
to aggregate attributes for NIR so that the IR could actually make some
progress toward simplifying the stream (and therefore working with the
data), and be able to validate against a grammar defined in a single
place.  The problem is that the `NirParseState` type contains a sum type for
every attribute parser, and is therefore as wide as the largest one.  That
is what Rust is having trouble optimizing memcpy away for.

Indeed, reducing the number of attributes improves the situation
drastically.  However, it doesn't make it go away entirely.

If you look at a callgrind profile for `tameld` (or a dissassembly), you'll
notice that I put quite a bit of effort into ensuring that the hot code path
for the lowering pipeline contains _no_ memcpys for the parsers.  But that
is not the case with `tamec`---I had to move on.  But I do still have the
same escape hatch that I introduced for `tameld`, which is the mutable
`Context`.

It seems that may be the solution there too, but I want to get a bit further
along first to see how these data end up propagating before I go through
that somewhat significant effort.

DEV-13156

											
										
										
											2022-11-01 14:30:34 -04:00
+								    TplParamIdent,
 								    TypeIdent,
 								    ValueIdent,
 								}
-												tamer: nir::NirSymbolTy (Display): Add impl

Add initial descriptions and consolodate some of the types.  There'll be
more to come; this is just to get `Display` derives working for types
that'll be using it.  I'd like to see where this description manifests
itself before I decide how user-friendly I'd like it to be.

DEV-13156

											
										
										
											2022-11-01 16:23:51 -04:00
+								impl Display for NirSymbolTy {
 								    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
 								        use NirSymbolTy::*;
 								        match self {
 								            AnyIdent => write!(f, "any identifier"),
 								            BooleanLiteral => write!(
 								                f,
 								                "boolean literal {fmt_true} or {fmt_false}",
 								                fmt_true = TtQuote::wrap("true"),
 								                fmt_false = TtQuote::wrap("false"),
 								            ),
 								            ClassIdent => write!(f, "classification identifier"),
 								            ClassIdentList => {
 								                write!(f, "space-delimited list of classification identifiers")
 								            }
 								            ConstIdent => write!(f, "constant identifier"),
 								            DescLiteral => write!(f, "description literal"),
 								            Dim => write!(f, "dimension declaration"),
 								            DynNodeLiteral => write!(f, "dynamic node literal"),
 								            FuncIdent => write!(f, "function identifier"),
 								            IdentDtype => write!(f, "identifier primitive datatype"),
 								            IdentType => write!(f, "identifier type"),
 								            MapTransformLiteral => write!(f, "map transformation literal"),
 								            NumLiteral => write!(f, "numeric literal"),
 								            ParamDefault => write!(f, "param default"),
 								            ParamIdent => write!(f, "param identifier"),
 								            ParamName => write!(f, "param name"),
 								            ParamType => write!(f, "param type"),
 								            PkgPath => write!(f, "package path"),
 								            ShortDimNumLiteral => {
 								                write!(f, "short-hand dimensionalized numeric literal")
 								            }
 								            StringLiteral => write!(f, "string literal"),
 								            SymbolTableKey => write!(f, "symbol table key name"),
 								            TexMathLiteral => write!(f, "TeX math literal"),
 								            Title => write!(f, "title"),
 								            TplMetaIdent => write!(f, "template metadata identifier"),
 								            TplIdent => write!(f, "template name"),
 								            TplParamIdent => write!(f, "template param identifier"),
 								            TypeIdent => write!(f, "type identifier"),
 								            ValueIdent => write!(f, "value identifier"),
 								        }
 								    }
 								}
-												tamer: nir: Detect interpolated values

This simply detects whether a value will need to be further parsed for
interpolation; it does not yet perform the parsing itself, which will happen
during desugaring.

This introduces a performance regression, for an interesting reason.  I
found that introducing a single new variant to `SugaredNir` (with a
`(SymbolId, Span)` pair), was causing the width of the `NirParseState` type
to increase just enough to cause Rust to be unable to optimize away a
significant number of memcpys related to `Parser` moves, and consequently
reducing performance by nearly 50% for `tamec`.  Yikes.

I suspected this would be a problem, and indeed have tried in all other
cases to avoid aggregation until the ASG---the problem is that I had wanted
to aggregate attributes for NIR so that the IR could actually make some
progress toward simplifying the stream (and therefore working with the
data), and be able to validate against a grammar defined in a single
place.  The problem is that the `NirParseState` type contains a sum type for
every attribute parser, and is therefore as wide as the largest one.  That
is what Rust is having trouble optimizing memcpy away for.

Indeed, reducing the number of attributes improves the situation
drastically.  However, it doesn't make it go away entirely.

If you look at a callgrind profile for `tameld` (or a dissassembly), you'll
notice that I put quite a bit of effort into ensuring that the hot code path
for the lowering pipeline contains _no_ memcpys for the parsers.  But that
is not the case with `tamec`---I had to move on.  But I do still have the
same escape hatch that I introduced for `tameld`, which is the mutable
`Context`.

It seems that may be the solution there too, but I want to get a bit further
along first to see how these data end up propagating before I go through
that somewhat significant effort.

DEV-13156

											
										
										
											2022-11-01 14:30:34 -04:00
+								/// A ([`SymbolId`], [`Span`]) pair in an attribute value context that may
 								///   require desugaring and interpretation within the context of a template
 								///   application.
 								///
 								/// Interpolated values require desugaring;
 								///   see [`DesugarNir`] for more information.
 								///
 								/// _This object must be kept small_,
 								///   since it is used in objects that aggregate portions of the token
 								///   stream,
 								///     which must persist in memory for a short period of time,
 								///     and therefore cannot be optimized away as other portions of the IR.
 								/// As such,
 								///   this does not nest enums.
 								#[derive(Debug, PartialEq, Eq)]
 								pub enum SugaredNirSymbol<const TY: NirSymbolTy> {
 								    /// The symbol contains an expression representing the concatenation of
 								    ///   any number of literals and metavariables
 								    ///     (referred to as "string interpolation" in many languages).
 								    Interpolate(SymbolId, Span),
 								    /// It's not ripe yet.
 								    ///
 								    /// No parsing has been performed.
 								    Todo(SymbolId, Span),
 								}
 								// Force developer to be conscious of any changes in size;
 								//   see `SugaredNirSymbol` docs for more information.
 								assert_eq_size!(SugaredNirSymbol<{ NirSymbolTy::AnyIdent }>, u128);
 								/// Character whose presence in a string indicates that interpolation
 								///   parsing must occur.
 								pub const INTERPOLATE_CHAR: u8 = b'{';
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
 								#[derive(Debug, PartialEq, Eq)]
 								pub enum PkgType {
 								    /// Package is intended to produce an executable program.
 								    ///
 								    /// This is specified by the `rater` root node.
 								    Prog,
 								    /// Package is intended to be imported as a component of a larger
 								    ///   program.
 								    Mod,
 								}
-												tamer: nir: Detect interpolated values

This simply detects whether a value will need to be further parsed for
interpolation; it does not yet perform the parsing itself, which will happen
during desugaring.

This introduces a performance regression, for an interesting reason.  I
found that introducing a single new variant to `SugaredNir` (with a
`(SymbolId, Span)` pair), was causing the width of the `NirParseState` type
to increase just enough to cause Rust to be unable to optimize away a
significant number of memcpys related to `Parser` moves, and consequently
reducing performance by nearly 50% for `tamec`.  Yikes.

I suspected this would be a problem, and indeed have tried in all other
cases to avoid aggregation until the ASG---the problem is that I had wanted
to aggregate attributes for NIR so that the IR could actually make some
progress toward simplifying the stream (and therefore working with the
data), and be able to validate against a grammar defined in a single
place.  The problem is that the `NirParseState` type contains a sum type for
every attribute parser, and is therefore as wide as the largest one.  That
is what Rust is having trouble optimizing memcpy away for.

Indeed, reducing the number of attributes improves the situation
drastically.  However, it doesn't make it go away entirely.

If you look at a callgrind profile for `tameld` (or a dissassembly), you'll
notice that I put quite a bit of effort into ensuring that the hot code path
for the lowering pipeline contains _no_ memcpys for the parsers.  But that
is not the case with `tamec`---I had to move on.  But I do still have the
same escape hatch that I introduced for `tameld`, which is the mutable
`Context`.

It seems that may be the solution there too, but I want to get a bit further
along first to see how these data end up propagating before I go through
that somewhat significant effort.

DEV-13156

											
										
										
											2022-11-01 14:30:34 -04:00
+								/// Whether a value represented by the provided [`SymbolId`] requires
 								///   interpolation.
 								///
 								/// _NB: This dereferences the provided [`SymbolId`] if it is dynamically
 								///   allocated._
 								///
 								/// The provided value requires interpolation if it contains,
 								///   anywhere in the string,
 								///   the character [`INTERPOLATE_CHAR`].
 								/// This does not know if the string will parse correctly;
 								///   that job is left for desugaring,
 								///     and so this will flag syntactically invalid interpolated strings
 								///       (which is expected).
 								#[inline]
 								fn needs_interpolation(val: SymbolId) -> bool {
 								    // We can skip pre-interned symbols that we know cannot include the
 								    //   interpolation character.
 								    // TODO: Abstract into `sym::symbol` module.
 								    let ch = INTERPOLATE_CHAR;
 								    quick_contains_byte(val, ch)
 								        .or_else(|| memchr(ch, val.lookup_str().as_bytes()).map(|_| true))
 								        .unwrap_or(false)
 								}
 								impl<const TY: NirSymbolTy> TryFrom<(SymbolId, Span)> for SugaredNirSymbol<TY> {
 								    type Error = NirAttrParseError;
 								    fn try_from((val, span): (SymbolId, Span)) -> Result<Self, Self::Error> {
 								        match needs_interpolation(val) {
 								            true => Ok(SugaredNirSymbol::Interpolate(val, span)),
 								            false => Ok(SugaredNirSymbol::Todo(val, span)),
 								        }
 								    }
 								}
 								impl<const TY: NirSymbolTy> TryFrom<Attr> for SugaredNirSymbol<TY> {
 								    type Error = NirAttrParseError;
 								    fn try_from(attr: Attr) -> Result<Self, Self::Error> {
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								        match attr {
-												tamer: nir: Detect interpolated values

This simply detects whether a value will need to be further parsed for
interpolation; it does not yet perform the parsing itself, which will happen
during desugaring.

This introduces a performance regression, for an interesting reason.  I
found that introducing a single new variant to `SugaredNir` (with a
`(SymbolId, Span)` pair), was causing the width of the `NirParseState` type
to increase just enough to cause Rust to be unable to optimize away a
significant number of memcpys related to `Parser` moves, and consequently
reducing performance by nearly 50% for `tamec`.  Yikes.

I suspected this would be a problem, and indeed have tried in all other
cases to avoid aggregation until the ASG---the problem is that I had wanted
to aggregate attributes for NIR so that the IR could actually make some
progress toward simplifying the stream (and therefore working with the
data), and be able to validate against a grammar defined in a single
place.  The problem is that the `NirParseState` type contains a sum type for
every attribute parser, and is therefore as wide as the largest one.  That
is what Rust is having trouble optimizing memcpy away for.

Indeed, reducing the number of attributes improves the situation
drastically.  However, it doesn't make it go away entirely.

If you look at a callgrind profile for `tameld` (or a dissassembly), you'll
notice that I put quite a bit of effort into ensuring that the hot code path
for the lowering pipeline contains _no_ memcpys for the parsers.  But that
is not the case with `tamec`---I had to move on.  But I do still have the
same escape hatch that I introduced for `tameld`, which is the mutable
`Context`.

It seems that may be the solution there too, but I want to get a bit further
along first to see how these data end up propagating before I go through
that somewhat significant effort.

DEV-13156

											
										
										
											2022-11-01 14:30:34 -04:00
+								            Attr(_, val, AttrSpan(_, vspan)) => (val, vspan).try_into(),
-												tamer: Introduce NIR (accepting only)

This introduces NIR, but only as an accepting grammar; it doesn't yet emit
the NIR IR, beyond TODOs.

This modifies `tamec` to, while copying XIR, also attempt to lower NIR to
produce parser errors, if any.  It does not yet fail compilation, as I just
want to be cautious and observe that everything's working properly for a
little while as people use it, before I potentially break builds.

This is the culmination of months of supporting effort.  The NIR grammar is
derived from our existing TAME sources internally, which I use for now as a
test case until I introduce test cases directly into TAMER later on (I'd do
it now, if I hadn't spent so much time on this; I'll start introducing tests
as I begin emitting NIR tokens).  This is capable of fully parsing our
largest system with >900 packages, as well as `core`.

`tamec`'s lowering is a mess; that'll be cleaned up in future commits.  The
same can be said about `tameld`.

NIR's grammar has some initial documentation, but this will improve over
time as well.

The generated docs still need some improvement, too, especially with
generated identifiers; I just want to get this out here for testing.

DEV-7145

											
										
										
											2022-08-29 15:28:03 -04:00
+								        }
 								    }
 								}
 								#[derive(Debug, PartialEq, Eq)]
 								pub struct Literal<const S: SymbolId>;
 								impl<const S: SymbolId> TryFrom<Attr> for Literal<S> {
 								    type Error = NirAttrParseError;
 								    fn try_from(attr: Attr) -> Result<Self, Self::Error> {
 								        match attr {
 								            Attr(_, val, _) if val == S => Ok(Literal),
 								            Attr(name, _, aspan) => Err(NirAttrParseError::LiteralMismatch(
 								                name,
 								                aspan.value_span(),
 								                S,
 								            )),
 								        }
 								    }
 								}
 								impl From<Infallible> for NirAttrParseError {
 								    fn from(x: Infallible) -> Self {
 								        match x {}
 								    }
 								}
 								type ExpectedSymbolId = SymbolId;
 								#[derive(Debug, PartialEq, Eq)]
 								pub enum NirAttrParseError {
 								    LiteralMismatch(QName, Span, ExpectedSymbolId),
 								}
 								impl Error for NirAttrParseError {
 								    fn source(&self) -> Option<&(dyn Error + 'static)> {
 								        None
 								    }
 								}
 								impl Display for NirAttrParseError {
 								    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
 								        match self {
 								            Self::LiteralMismatch(name, _, _) => {
 								                write!(f, "unexpected value for {}", TtXmlAttr::wrap(name),)
 								            }
 								        }
 								    }
 								}
 								impl Diagnostic for NirAttrParseError {
 								    fn describe(&self) -> Vec<crate::diagnose::AnnotatedSpan> {
 								        match self {
 								            Self::LiteralMismatch(_, span, expected) => span
 								                .error(format!("expecting {}", TtQuote::wrap(expected)))
 								                .into(),
 								        }
 								    }
 								}
-												tamer: nir: Detect interpolated values

This simply detects whether a value will need to be further parsed for
interpolation; it does not yet perform the parsing itself, which will happen
during desugaring.

This introduces a performance regression, for an interesting reason.  I
found that introducing a single new variant to `SugaredNir` (with a
`(SymbolId, Span)` pair), was causing the width of the `NirParseState` type
to increase just enough to cause Rust to be unable to optimize away a
significant number of memcpys related to `Parser` moves, and consequently
reducing performance by nearly 50% for `tamec`.  Yikes.

I suspected this would be a problem, and indeed have tried in all other
cases to avoid aggregation until the ASG---the problem is that I had wanted
to aggregate attributes for NIR so that the IR could actually make some
progress toward simplifying the stream (and therefore working with the
data), and be able to validate against a grammar defined in a single
place.  The problem is that the `NirParseState` type contains a sum type for
every attribute parser, and is therefore as wide as the largest one.  That
is what Rust is having trouble optimizing memcpy away for.

Indeed, reducing the number of attributes improves the situation
drastically.  However, it doesn't make it go away entirely.

If you look at a callgrind profile for `tameld` (or a dissassembly), you'll
notice that I put quite a bit of effort into ensuring that the hot code path
for the lowering pipeline contains _no_ memcpys for the parsers.  But that
is not the case with `tamec`---I had to move on.  But I do still have the
same escape hatch that I introduced for `tameld`, which is the mutable
`Context`.

It seems that may be the solution there too, but I want to get a bit further
along first to see how these data end up propagating before I go through
that somewhat significant effort.

DEV-13156

											
										
										
											2022-11-01 14:30:34 -04:00
 								#[cfg(test)]
 								mod test;