tame/tamer/src/parse.rs

// Basic streaming parsing framework
//
//  Copyright (C) 2014-2022 Ryan Specialty Group, LLC.
//
//  This file is part of TAME.
//
//  This program is free software: you can redistribute it and/or modify
//  it under the terms of the GNU General Public License as published by
//  the Free Software Foundation, either version 3 of the License, or
//  (at your option) any later version.
//
//  This program is distributed in the hope that it will be useful,
//  but WITHOUT ANY WARRANTY; without even the implied warranty of
//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
//  GNU General Public License for more details.
//
//  You should have received a copy of the GNU General Public License
//  along with this program.  If not, see <http://www.gnu.org/licenses/>.

//! Basic streaming parser framework for lowering operations.
//!
//! _TODO: Some proper docs and examples!_

use crate::diagnose::{Annotate, AnnotatedSpan, Diagnostic};
use crate::iter::{TripIter, TrippableIterator};
use crate::span::{Span, UNKNOWN_SPAN};
use std::fmt::Debug;
use std::hint::unreachable_unchecked;
use std::iter::{self, Empty};
use std::mem::take;
use std::ops::{ControlFlow, Deref, DerefMut, FromResidual, Try};
use std::{convert::Infallible, error::Error, fmt::Display};

/// Result of applying a [`Token`] to a [`ParseState`],
///   with any error having been wrapped in a [`ParseError`].
pub type ParsedResult<S> = ParseResult<S, Parsed<<S as ParseState>::Object>>;

/// Result of some non-parsing operation on a [`Parser`],
///   with any error having been wrapped in a [`ParseError`].
pub type ParseResult<S, T> =
    Result<T, ParseError<<S as ParseState>::Token, <S as ParseState>::Error>>;

/// A single datum from a streaming IR with an associated [`Span`].
///
/// A token may be a lexeme with associated data,
///   or a more structured object having been lowered from other IRs.
pub trait Token: Display + Debug + PartialEq {
    /// Retrieve the [`Span`] representing the source location of the token.
    fn span(&self) -> Span;
}

impl<T: Token> From<T> for Span {
    fn from(tok: T) -> Self {
        tok.span()
    }
}

/// An IR object produced by a lowering operation on one or more [`Token`]s.
///
/// Note that an [`Object`] may also be a [`Token`] if it will be in turn
///   fed to another [`Parser`] for lowering.
///
/// This trait exists to disambiguate an otherwise unbounded type for
///   [`From`] conversions,
///     used in the [`Transition`] API to provide greater flexibility.
pub trait Object: Debug + PartialEq {}

/// An infallible [`Token`] stream.
///
/// If the token stream originates from an operation that could potentially
///   fail and ought to be propagated,
///     use [`TokenResultStream`].
///
/// The name "stream" in place of "iterator" is intended to convey that this
///   type is expected to be processed in real-time as a stream,
///     not read into memory.
pub trait TokenStream<T: Token> = Iterator<Item = T>;

/// A [`Token`] stream that may encounter errors during parsing.
///
/// If the stream cannot fail,
///   consider using [`TokenStream`].
pub trait TokenResultStream<T: Token, E: Error> = Iterator<Item = Result<T, E>>;

/// A [`ParseState`] capable of being automatically stitched together with
///   a parent [`ParseState`] `SP` to create a composite parser.
///
/// Conceptually,
///   this can be visualized as combining the state machines of multiple
///   parsers into one larger state machine.
///
/// The term _state stitching_ refers to a particular pattern able to be
///   performed automatically by this parsing framework;
///     it is not necessary for parser composition,
///       provided that you perform the necessary wiring yourself in absence
///       of state stitching.
pub trait StitchableParseState<SP: ParseState> = ParseState
where
    SP: ParseState<Token = <Self as ParseState>::Token>,
    <Self as ParseState>::Object: Into<<SP as ParseState>::Object>,
    <Self as ParseState>::Error: Into<<SP as ParseState>::Error>;

/// A parsing automaton.
///
/// These states are utilized by a [`Parser`].
///
/// A [`ParseState`] is also responsible for storing data about the
///   accepted input,
///     and handling appropriate type conversions into the final type.
/// That is---an
///   automaton may store metadata that is subsequently emitted once an
///   accepting state has been reached.
/// Whatever the underlying automaton,
///   a `(state, token, context)` triple must uniquely determine the next
///   parser action.
pub trait ParseState: Default + PartialEq + Eq + Display + Debug {
    /// Input tokens to the parser.
    type Token: Token;

    /// Objects produced by a parser utilizing these states.
    type Object: Object;

    /// Errors specific to this set of states.
    type Error: Debug + Diagnostic + PartialEq;

    /// Object provided to parser alongside each token.
    ///
    /// This may be used in situations where Rust/LLVM are unable to
    ///   optimize away moves of interior data associated with the
    ///   otherwise-immutable [`ParseState`].
    type Context: Debug = EmptyContext;

    /// Construct a parser.
    ///
    /// Whether this method is helpful or provides any clarity depends on
    ///   the context and the types that are able to be inferred.
    fn parse<I: TokenStream<Self::Token>>(toks: I) -> Parser<Self, I>
    where
        Self::Context: Default,
    {
        Parser::from(toks)
    }

    /// Construct a parser with a non-default [`ParseState::Context`].
    ///
    /// This is useful in two ways:
    ///
    ///   1. To allow for parsing using a context that does not implement
    ///        [`Default`],
    ///          or whose default is not sufficient; and
    ///   2. To re-use a context from a previous [`Parser`].
    ///
    /// If neither of these apply to your situation,
    ///   consider [`ParseState::parse`] instead.
    ///
    /// To retrieve a context from a parser for re-use,
    ///   see [`Parser::finalize`].
    fn parse_with_context<I: TokenStream<Self::Token>>(
        toks: I,
        ctx: Self::Context,
    ) -> Parser<Self, I> {
        Parser::from((toks, ctx))
    }

    /// Parse a single [`Token`] and optionally perform a state transition.
    ///
    /// The current state is represented by `self`.
    /// The result of a parsing operation is a state transition with
    ///   associated [`ParseStatus`] data.
    ///
    /// Note that `self` is owned,
    ///   for a couple primary reasons:
    ///
    ///   1. This forces the parser to explicitly consider and document all
    ///        state transitions,
    ///          rather than potentially missing unintended behavior through
    ///          implicit behavior; and
    ///   2. It allows for more natural functional composition of state,
    ///        which in turn makes it easier to compose parsers
    ///          (which conceptually involves stitching together state
    ///            machines).
    ///
    /// Since a [`ParseState`] produces a new version of itself with each
    ///   invocation,
    ///     it is functionally pure.
    /// Generally,
    ///   Rust/LLVM are able to optimize moves into direct assignments.
    /// However,
    ///   there are circumstances where this is _not_ the case,
    ///   in which case [`Context`] can be used to provide a mutable context
    ///     owned by the caller (e.g. [`Parser`]) to store additional
    ///     information that is not subject to Rust's move semantics.
    /// If this is not necessary,
    ///   see [`NoContext`].
    fn parse_token(
        self,
        tok: Self::Token,
        ctx: &mut Self::Context,
    ) -> TransitionResult<Self>;

    /// Whether the current state represents an accepting state.
    ///
    /// An accepting state represents a valid state to stop parsing.
    /// If parsing stops at a state that is _not_ accepting,
    ///   then the [`TokenStream`] has ended unexpectedly and should produce
    ///   a [`ParseError::UnexpectedEof`].
    ///
    /// It makes sense for there to be exist multiple accepting states for a
    ///   parser.
    /// For example:
    ///   A parser that parses a list of attributes may be used to parse one
    ///   or more attributes,
    ///     or the entire list of attributes.
    ///   It is acceptable to attempt to parse just one of those attributes,
    ///     or it is acceptable to parse all the way until the end.
    fn is_accepting(&self) -> bool;

    /// Delegate parsing from a compatible, stitched [`ParseState`]~`SP`.
    ///
    /// This helps to combine two state machines that speak the same input
    ///   language
    ///   (share the same [`Self::Token`]),
    ///     handling the boilerplate of delegating [`Self::Token`] from a
    ///     parent state~`SP` to `Self`.
    ///
    /// Token delegation happens after [`Self`] has been entered from a
    ///   parent [`ParseState`] context~`SP`,
    ///     so stitching the start and accepting states must happen elsewhere
    ///     (for now).
    ///
    /// This assumes that no lookahead token from [`ParseStatus::Dead`] will
    ///   need to be handled by the parent state~`SP`.
    /// To handle a token of lookahead,
    ///   use [`Self::delegate_lookahead`] instead.
    ///
    /// _TODO: More documentation once this is finalized._
    fn delegate<SP, C>(
        self,
        mut context: C,
        tok: <Self as ParseState>::Token,
        into: impl FnOnce(Self) -> SP,
    ) -> TransitionResult<SP>
    where
        Self: StitchableParseState<SP>,
        C: AsMut<<Self as ParseState>::Context>,
    {
        use ParseStatus::{Dead, Incomplete, Object as Obj};

        let (Transition(newst), result) =
            self.parse_token(tok, context.as_mut()).into();

        // This does not use `delegate_lookahead` so that we can have
        //   `into: impl FnOnce` instead of `Fn`.
        Transition(into(newst)).result(match result {
            Ok(Incomplete) => Ok(Incomplete),
            Ok(Obj(obj)) => Ok(Obj(obj.into())),
            Ok(Dead(tok)) => Ok(Dead(tok)),
            Err(e) => Err(e.into()),
        })
    }

    /// Delegate parsing from a compatible, stitched [`ParseState`]~`SP` with
    ///   support for a lookahead token.
    ///
    /// This does the same thing as [`Self::delegate`],
    ///   but allows for the handling of a lookahead token from [`Self`]
    ///   rather than simply proxying [`ParseStatus::Dead`].
    ///
    /// _TODO: More documentation once this is finalized._
    fn delegate_lookahead<SP, C>(
        self,
        mut context: C,
        tok: <Self as ParseState>::Token,
        into: impl FnOnce(Self) -> SP,
    ) -> ControlFlow<TransitionResult<SP>, (Self, <Self as ParseState>::Token, C)>
    where
        Self: StitchableParseState<SP>,
        C: AsMut<<Self as ParseState>::Context>,
    {
        use ControlFlow::*;
        use ParseStatus::{Dead, Incomplete, Object as Obj};

        // NB: Rust/LLVM are generally able to elide these moves into direct
        //   assignments,
        //     but sometimes this does not work
        //       (e.g. XIRF's use of `ArrayVec`).
        // If your [`ParseState`] has a lot of `memcpy`s or other
        //   performance issues,
        //     move heavy objects into `context`.
        let (Transition(newst), result) =
            self.parse_token(tok, context.as_mut()).into();

        match result {
            Ok(Incomplete) => Break(Transition(into(newst)).incomplete()),
            Ok(Obj(obj)) => Break(Transition(into(newst)).ok(obj.into())),
            Ok(Dead(tok)) => Continue((newst, tok, context)),
            Err(e) => Break(Transition(into(newst)).err(e)),
        }
    }
}

/// Empty [`Context`] for [`ParseState`]s with pure functional
///   implementations with no mutable state.
///
/// Using this value means that a [`ParseState`] does not require a
///   context.
/// All [`Context`]s implement [`AsMut<EmptyContext>`](AsMut),
///   and so all pure [`ParseState`]s have contexts compatible with every
///   other parser for composition
///     (provided that the other invariants in [`StitchableParseState`] are
///       met).
///
/// This can be clearly represented in function signatures using
///   [`EmptyContext`].
#[derive(Debug, PartialEq, Eq, Default)]
pub struct EmptyContext;

impl AsMut<EmptyContext> for EmptyContext {
    fn as_mut(&mut self) -> &mut EmptyContext {
        self
    }
}

/// A [`ParseState`] does not require any mutable [`Context`].
///
/// A [`ParseState`] using this context is pure
///   (has no mutable state),
///     returning a new version of itself on each state change.
///
/// This type is intended to be self-documenting:
///   `_: EmptyContext` is nicer to readers than `_: &mut EmptyContext`.
///
/// See [`EmptyContext`] for more information.
pub type NoContext<'a> = &'a mut EmptyContext;

/// Mutable context for [`ParseState`].
///
/// [`ParseState`]s are immutable and pure---they
///   are invoked via [`ParseState::parse_token`] and return a new version
///   of themselves representing their new state.
/// Rust/LLVM are generally able to elide intermediate values and moves,
///   optimizing these parsers away into assignments.
///
/// However,
///   there are circumstances where moves may not be elided and may retain
///   their `memcpy` equivalents.
/// To work around this,
///   [`ParseState::parse_token`] accepts a mutable [`Context`] reference
///   which is held by the parent [`Parser`],
///     which can be mutated in-place without worrying about Rust's move
///     semantics.
///
/// Plainly: you should only use this if you have to.
/// This was added because certain parsers may be invoked millions of times
///   for each individual token in systems with many source packages,
///     which may otherwise result in millions of `memcpy`s.
///
/// When composing two [`ParseState`]s `A<B, C>`,
///   a [`Context<B, C>`](Context) must be contravariant over `B` and~`C`.
/// Concretely,
///   this means that [`AsMut<B::Context>`](AsMut) and
///   [`AsMut<C::Context>`](AsMut) must be implemented for `A::Context`.
/// This almost certainly means that `A::Context` is a product type.
/// Consequently,
///   a single [`Parser`] is able to hold a composite [`Context`] in a
///   single memory location.
///
/// [`Context<T>`](Context) implements [`Deref<T>`](Deref) for convenience.
///
/// If your [`ParseState`] does not require a mutable [`Context`],
///   see [`NoContext`].
#[derive(Debug, Default)]
pub struct Context<T: Debug + Default>(T, EmptyContext);

impl<T: Debug + Default> AsMut<EmptyContext> for Context<T> {
    fn as_mut(&mut self) -> &mut EmptyContext {
        &mut self.1
    }
}

impl<T: Debug + Default> Deref for Context<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

impl<T: Debug + Default> DerefMut for Context<T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.0
    }
}

impl<T: Debug + Default> From<T> for Context<T> {
    fn from(x: T) -> Self {
        Context(x, EmptyContext)
    }
}

/// Result of applying a [`Token`] to a [`ParseState`].
///
/// This is used by [`ParseState::parse_token`];
///   see that function for rationale.
pub type ParseStateResult<S> = Result<ParseStatus<S>, <S as ParseState>::Error>;

/// A state transition with associated data.
///
/// Conceptually,
///   imagine the act of a state transition producing data.
/// See [`Transition`] for convenience methods for producing this tuple.
#[derive(Debug, PartialEq)]
pub struct TransitionResult<S: ParseState>(
    pub Transition<S>,
    pub ParseStateResult<S>,
);

/// Denotes a state transition.
///
/// This newtype was created to produce clear, self-documenting code;
///   parsers can get confusing to read with all of the types involved,
///     so this provides a mental synchronization point.
///
/// This also provides some convenience methods to help remote boilerplate
///   and further improve code clarity.
#[derive(Debug, PartialEq, Eq)]
pub struct Transition<S: ParseState>(pub S);

impl<S: ParseState> Transition<S> {
    /// A state transition with corresponding data.
    ///
    /// This allows [`ParseState::parse_token`] to emit a parsed object and
    ///   corresponds to [`ParseStatus::Object`].
    pub fn ok<T>(self, obj: T) -> TransitionResult<S>
    where
        T: Into<ParseStatus<S>>,
    {
        TransitionResult(self, Ok(obj.into()))
    }

    /// A transition with corresponding error.
    ///
    /// This indicates a parsing failure.
    /// The state ought to be suitable for error recovery.
    pub fn err<E: Into<S::Error>>(self, err: E) -> TransitionResult<S> {
        TransitionResult(self, Err(err.into()))
    }

    /// A state transition with corresponding [`Result`].
    ///
    /// This translates the provided [`Result`] in a manner equivalent to
    ///   [`Transition::ok`] and [`Transition::err`].
    pub fn result<T, E>(self, result: Result<T, E>) -> TransitionResult<S>
    where
        T: Into<ParseStatus<S>>,
        E: Into<S::Error>,
    {
        TransitionResult(self, result.map(Into::into).map_err(Into::into))
    }

    /// A state transition indicating that more data is needed before an
    ///   object can be emitted.
    ///
    /// This corresponds to [`ParseStatus::Incomplete`].
    pub fn incomplete(self) -> TransitionResult<S> {
        TransitionResult(self, Ok(ParseStatus::Incomplete))
    }

    /// A dead state transition.
    ///
    /// This corresponds to [`ParseStatus::Dead`],
    ///   and a calling parser should use the provided [`Token`] as
    ///   lookahead.
    pub fn dead(self, tok: S::Token) -> TransitionResult<S> {
        TransitionResult(self, Ok(ParseStatus::Dead(tok)))
    }
}

impl<S: ParseState> Into<(Transition<S>, ParseStateResult<S>)>
    for TransitionResult<S>
{
    fn into(self) -> (Transition<S>, ParseStateResult<S>) {
        (self.0, self.1)
    }
}

impl<S: ParseState> Try for TransitionResult<S> {
    type Output = (Transition<S>, ParseStateResult<S>);
    type Residual = (Transition<S>, ParseStateResult<S>);

    fn from_output(output: Self::Output) -> Self {
        match output {
            (st, result) => Self(st, result),
        }
    }

    fn branch(self) -> ControlFlow<Self::Residual, Self::Output> {
        match self.into() {
            (st, Ok(x)) => ControlFlow::Continue((st, Ok(x))),
            (st, Err(e)) => ControlFlow::Break((st, Err(e))),
        }
    }
}

impl<S: ParseState> FromResidual<(Transition<S>, ParseStateResult<S>)>
    for TransitionResult<S>
{
    fn from_residual(residual: (Transition<S>, ParseStateResult<S>)) -> Self {
        match residual {
            (st, result) => Self(st, result),
        }
    }
}

impl<S: ParseState> FromResidual<Result<Infallible, TransitionResult<S>>>
    for TransitionResult<S>
{
    fn from_residual(
        residual: Result<Infallible, TransitionResult<S>>,
    ) -> Self {
        match residual {
            Err(e) => e,
            // SAFETY: This match arm doesn't seem to be required in
            //   core::result::Result's FromResidual implementation,
            //     but as of 1.61 nightly it is here.
            // Since this is Infallable,
            //   it cannot occur.
            Ok(_) => unsafe { unreachable_unchecked() },
        }
    }
}

impl<S: ParseState> FromResidual<ControlFlow<TransitionResult<S>, Infallible>>
    for TransitionResult<S>
{
    fn from_residual(
        residual: ControlFlow<TransitionResult<S>, Infallible>,
    ) -> Self {
        match residual {
            ControlFlow::Break(result) => result,
            // SAFETY: Infallible, so cannot hit.
            ControlFlow::Continue(_) => unsafe { unreachable_unchecked() },
        }
    }
}

/// An object able to be used as data for a state [`Transition`].
///
/// This flips the usual order of things:
///   rather than using a method of [`Transition`] to provide data,
///     this starts with the data and produces a transition from it.
/// This is sometimes necessary to satisfy ownership/borrowing rules.
///
/// This trait simply removes boilerplate associated with storing
///   intermediate values and translating into the resulting type.
pub trait Transitionable<S: ParseState> {
    /// Perform a state transition to `S` using [`Self`] as the associated
    ///   data.
    ///
    /// This may be necessary to satisfy ownership/borrowing rules when
    ///   state data from `S` is used to compute [`Self`].
    fn transition(self, to: S) -> TransitionResult<S>;
}

impl<S, E> Transitionable<S> for Result<ParseStatus<S>, E>
where
    S: ParseState,
    <S as ParseState>::Error: From<E>,
{
    fn transition(self, to: S) -> TransitionResult<S> {
        Transition(to).result(self)
    }
}

impl<S, E> Transitionable<S> for Result<(), E>
where
    S: ParseState,
    <S as ParseState>::Error: From<E>,
{
    fn transition(self, to: S) -> TransitionResult<S> {
        Transition(to).result(self.map(|_| ParseStatus::Incomplete))
    }
}

/// A streaming parser defined by a [`ParseState`] with exclusive
///   mutable access to an underlying [`TokenStream`].
///
/// This parser handles operations that are common among all types of
///   parsers,
///     such that specialized parsers need only implement logic that is
///     unique to their operation.
/// This also simplifies combinators,
///   since there is more uniformity among distinct parser types.
///
/// After you have finished with a parser,
///   if you have not consumed the entire iterator,
///   call [`finalize`](Parser::finalize) to ensure that parsing has
///     completed in an accepting state.
#[derive(Debug, PartialEq, Eq)]
pub struct Parser<S: ParseState, I: TokenStream<S::Token>> {
    toks: I,
    state: S,
    last_span: Span,
    ctx: S::Context,
}

impl<S: ParseState, I: TokenStream<S::Token>> Parser<S, I> {
    /// Indicate that no further parsing will take place using this parser,
    ///   retrieve any final aggregate state (the context),
    ///   and [`drop`] it.
    ///
    /// Invoking the method is equivalent to stating that the stream has
    ///   ended,
    ///     since the parser will have no later opportunity to continue
    ///     parsing.
    /// Consequently,
    ///   the caller should expect [`ParseError::UnexpectedEof`] if the
    ///   parser is not in an accepting state.
    ///
    /// To re-use the context returned by this method,
    ///   see [`ParseState::parse_with_context`].
    /// Note that whether the context is permitted to be reused,
    ///   or is useful independently to the caller,
    ///   is a decision made by the [`ParseState`].
    pub fn finalize(
        self,
    ) -> Result<S::Context, (Self, ParseError<S::Token, S::Error>)> {
        match self.assert_accepting() {
            Ok(()) => Ok(self.ctx),
            Err(err) => Err((self, err)),
        }
    }

    /// Return [`Ok`] if the parser is in an accepting state,
    ///   otherwise [`Err`] with [`ParseError::UnexpectedEof`].
    ///
    /// See [`finalize`](Self::finalize) for the public-facing method.
    fn assert_accepting(&self) -> Result<(), ParseError<S::Token, S::Error>> {
        if self.state.is_accepting() {
            Ok(())
        } else {
            let endpoints = self.last_span.endpoints();
            Err(ParseError::UnexpectedEof(
                endpoints.1.unwrap_or(endpoints.0),
                self.state.to_string(),
            ))
        }
    }

    /// Feed an input token to the parser.
    ///
    /// This _pushes_ data into the parser,
    ///   rather than the typical pull system used by [`Parser`]'s
    ///   [`Iterator`] implementation.
    /// The pull system also uses this method to provided data to the
    ///   parser.
    ///
    /// This method is intentionally private,
    ///   since push parsers are currently supported only internally.
    /// The only thing preventing this being public is formalization and a
    ///   commitment to maintain it.
    fn feed_tok(&mut self, tok: S::Token) -> ParsedResult<S> {
        // Store the most recently encountered Span for error
        //   reporting in case we encounter an EOF.
        self.last_span = tok.span();

        let result;
        TransitionResult(Transition(self.state), result) =
            take(&mut self.state).parse_token(tok, &mut self.ctx);

        use ParseStatus::*;
        match result {
            // Nothing handled this dead state,
            //   and we cannot discard a lookahead token,
            //   so we have no choice but to produce an error.
            Ok(Dead(invalid)) => Err(ParseError::UnexpectedToken(
                invalid,
                self.state.to_string(),
            )),

            Ok(parsed @ (Incomplete | Object(..))) => Ok(parsed.into()),
            Err(e) => Err(e.into()),
        }
    }

    /// Lower the IR produced by this [`Parser`] into another IR by piping
    ///   the output to a new parser defined by the [`ParseState`] `LS`.
    ///
    /// This parser consumes tokens `S::Token` and produces the IR
    ///   `S::Output`.
    /// If there is some other [`ParseState`] `LS` such that
    ///   `LS::Token == S::Output`
    ///     (that is—the output of this parser is the input to another),
    ///     then this method will wire the two together into a new iterator
    ///       that produces `LS::Output`.
    ///
    /// Visually, we have,
    ///   within the provided closure `f`,
    ///   a [`LowerIter`] that acts as this pipeline:
    ///
    /// ```text
    /// (S::Token) -> (S::Output == LS::Token) -> (LS::Output)
    /// ```
    ///
    /// The new iterator is a [`LowerIter`],
    ///   and scoped to the provided closure `f`.
    /// The outer [`Result`] of `Self`'s [`ParsedResult`] is stripped by
    ///   a [`TripIter`] before being provided as input to a new push
    ///   [`Parser`] utilizing `LS`.
    /// A push parser,
    ///   rather than pulling tokens from a [`TokenStream`],
    ///   has tokens pushed into it;
    ///     this parser is created automatically for you.
    ///
    /// _TODO_: There's no way to access the inner parser for error recovery
    ///   after tripping the [`TripIter`].
    /// Consequently,
    ///   this API (likely the return type) will change.
    #[inline]
    pub fn lower_while_ok<LS, U, E>(
        &mut self,
        f: impl FnOnce(&mut LowerIter<S, Parser<S, I>, LS>) -> Result<U, E>,
    ) -> Result<U, E>
    where
        LS: ParseState<Token = S::Object>,
        <S as ParseState>::Object: Token,
        <LS as ParseState>::Context: Default,
        ParseError<S::Token, S::Error>: Into<E>,
    {
        self.while_ok(|toks| {
            // TODO: This parser is not accessible after error recovery!
            let lower = LS::parse(iter::empty());
            f(&mut LowerIter { lower, toks })
        })
        .map_err(Into::into)
    }
}

/// An IR lowering operation that pipes the output of one [`Parser`] to the
///   input of another.
///
/// This is produced by [`Parser::lower_while_ok`].
pub struct LowerIter<'a, 'b, S, I, LS>
where
    S: ParseState,
    I: Iterator<Item = ParsedResult<S>>,
    LS: ParseState<Token = S::Object>,
    <S as ParseState>::Object: Token,
{
    /// A push [`Parser`].
    lower: Parser<LS, Empty<LS::Token>>,

    /// Source tokens from higher-level [`Parser`],
    ///   with the outer [`Result`] having been stripped by a [`TripIter`].
    toks: &'a mut TripIter<
        'b,
        I,
        Parsed<S::Object>,
        ParseError<S::Token, S::Error>,
    >,
}

impl<'a, 'b, S, I, LS> Iterator for LowerIter<'a, 'b, S, I, LS>
where
    S: ParseState,
    I: Iterator<Item = ParsedResult<S>>,
    LS: ParseState<Token = S::Object>,
    <S as ParseState>::Object: Token,
{
    type Item = ParsedResult<LS>;

    /// Pull a token through the higher-level [`Parser`],
    ///   push it to the lowering parser,
    ///   and yield the resulting [`ParseResult`].
    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        match self.toks.next() {
            None => None,
            Some(Parsed::Incomplete) => Some(Ok(Parsed::Incomplete)),
            Some(Parsed::Object(obj)) => Some(self.lower.feed_tok(obj)),
        }
    }
}

impl<S: ParseState, I: TokenStream<S::Token>> Iterator for Parser<S, I> {
    type Item = ParsedResult<S>;

    /// Parse a single [`Token`] according to the current
    ///   [`ParseState`],
    ///     if available.
    ///
    /// If the underlying [`TokenStream`] yields [`None`],
    ///   then the [`ParseState`] must be in an accepting state;
    ///     otherwise, [`ParseError::UnexpectedEof`] will occur.
    ///
    /// This is intended to be invoked by [`Iterator::next`].
    /// Accepting a token rather than the [`TokenStream`] allows the caller
    ///   to inspect the token first
    ///     (e.g. to store a copy of the [`Span`][crate::span::Span]).
    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        let otok = self.toks.next();

        match otok {
            None => match self.assert_accepting() {
                Ok(()) => None,
                Err(e) => Some(Err(e)),
            },

            Some(tok) => Some(self.feed_tok(tok)),
        }
    }
}

/// Common parsing errors produced by [`Parser`].
///
/// These errors are common enough that they are handled in a common way,
///   such that individual parsers needn't check for these situations
///   themselves.
///
/// Having a common type also allows combinators to handle error types in a
///   consistent way when composing parsers.
///
/// Parsers may return their own unique errors via the
///   [`StateError`][ParseError::StateError] variant.
#[derive(Debug, PartialEq)]
pub enum ParseError<T: Token, E: Diagnostic + PartialEq> {
    /// Token stream ended unexpectedly.
    ///
    /// This error means that the parser was expecting more input before
    ///   reaching an accepting state.
    /// This could represent a truncated file,
    ///   a malformed stream,
    ///   or maybe just a user that's not done typing yet
    ///     (e.g. in the case of an LSP implementation).
    ///
    /// If no span is available,
    ///   then parsing has not even had the chance to begin.
    /// If this parser follows another,
    ///   then the combinator ought to substitute a missing span with
    ///   whatever span preceded this invocation.
    ///
    /// The string is intended to describe what was expected to have been
    ///   available based on the current [`ParseState`].
    /// It is a heap-allocated string so that a copy of [`ParseState`]
    ///   needn't be stored.
    UnexpectedEof(Span, String),

    /// The parser reached an unhandled dead state.
    ///
    /// Once a parser returns [`ParseStatus::Dead`],
    ///   a parent context must use that provided token as a lookahead.
    /// If that does not occur,
    ///   [`Parser`] produces this error.
    ///
    /// The string is intended to describe what was expected to have been
    ///   available based on the current [`ParseState`].
    /// It is a heap-allocated string so that a copy of [`ParseState`]
    ///   needn't be stored.
    UnexpectedToken(T, String),

    /// A parser-specific error associated with an inner
    ///   [`ParseState`].
    StateError(E),
}

impl<T: Token, EA: Diagnostic + PartialEq> ParseError<T, EA> {
    pub fn inner_into<EB: Diagnostic + PartialEq + Eq>(
        self,
    ) -> ParseError<T, EB>
    where
        EA: Into<EB>,
    {
        use ParseError::*;
        match self {
            UnexpectedEof(span, desc) => UnexpectedEof(span, desc),
            UnexpectedToken(x, desc) => UnexpectedToken(x, desc),
            StateError(e) => StateError(e.into()),
        }
    }
}

impl<T: Token, E: Diagnostic + PartialEq> From<E> for ParseError<T, E> {
    fn from(e: E) -> Self {
        Self::StateError(e)
    }
}

impl<T: Token, E: Diagnostic + PartialEq> Display for ParseError<T, E> {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Self::UnexpectedEof(_, desc) => {
                write!(f, "unexpected end of input while {desc}")
            }
            Self::UnexpectedToken(_, desc) => {
                write!(f, "unexpected input while {desc}")
            }
            Self::StateError(e) => Display::fmt(e, f),
        }
    }
}

impl<T: Token, E: Diagnostic + PartialEq + 'static> Error for ParseError<T, E> {
    fn source(&self) -> Option<&(dyn Error + 'static)> {
        match self {
            Self::StateError(e) => Some(e),
            _ => None,
        }
    }
}

impl<T: Token, E: Diagnostic + PartialEq + 'static> Diagnostic
    for ParseError<T, E>
{
    fn describe(&self) -> Vec<AnnotatedSpan> {
        use ParseError::*;

        match self {
            UnexpectedEof(span, desc) => span.error(desc).into(),

            UnexpectedToken(tok, desc) => tok.span().error(desc).into(),

            // TODO: Is there any additional useful context we can augment
            //   this with?
            StateError(e) => e.describe(),
        }
    }
}

impl<S, I> From<I> for Parser<S, I>
where
    S: ParseState,
    I: TokenStream<S::Token>,
    <S as ParseState>::Context: Default,
{
    /// Create a new parser with a default context.
    ///
    /// This can only be used if the associated [`ParseState::Context`] does
    ///   not implement [`Default`];
    ///     otherwise,
    ///       consider instantiating from a `(TokenStream, Context)` pair.
    /// See also [`ParseState::parse`] and
    ///   [`ParseState::parse_with_context`].
    fn from(toks: I) -> Self {
        Self {
            toks,
            state: Default::default(),
            last_span: UNKNOWN_SPAN,
            ctx: Default::default(),
        }
    }
}

impl<S, I, C> From<(I, C)> for Parser<S, I>
where
    S: ParseState<Context = C>,
    I: TokenStream<S::Token>,
{
    /// Create a new parser with a provided context.
    ///
    /// For more information,
    ///   see [`ParseState::parse_with_context`].
    ///
    /// See also [`ParseState::parse`].
    fn from((toks, ctx): (I, C)) -> Self {
        Self {
            toks,
            state: Default::default(),
            last_span: UNKNOWN_SPAN,
            ctx,
        }
    }
}

/// Result of a parsing operation.
#[derive(Debug, PartialEq, Eq)]
pub enum ParseStatus<S: ParseState> {
    /// Additional tokens are needed to complete parsing of the next object.
    Incomplete,

    /// Parsing of an object is complete.
    ///
    /// This does not indicate that the parser is complete,
    ///   as more objects may be able to be emitted.
    Object(S::Object),

    /// Parser encountered a dead state relative to the given token.
    ///
    /// A dead state is an empty accepting state that has no state
    ///   transition for the given token.
    /// A state is empty if a [`ParseStatus::Object`] will not be lost if
    ///   parsing ends at this point
    ///     (that is---there is no partially-built object).
    /// This could simply mean that the parser has completed its job and
    ///   that control must be returned to a parent context.
    ///
    /// If a parser is _not_ in an accepting state,
    ///   then an error ought to occur rather than a dead state;
    ///     the difference between the two is that the token associated with
    ///       a dead state can be used as a lookahead token in order to
    ///       produce a state transition at a higher level,
    ///     whereas an error indicates that parsing has failed.
    /// Intuitively,
    ///   this means that a [`ParseStatus::Object`] had just been emitted
    ///   and that the token following it isn't something that can be
    ///   parsed.
    ///
    /// If there is no parent context to handle the token,
    ///   [`Parser`] must yield an error.
    Dead(S::Token),
}

impl<S: ParseState<Object = T>, T: Object> From<T> for ParseStatus<S> {
    fn from(obj: T) -> Self {
        Self::Object(obj)
    }
}

/// Result of a parsing operation.
///
/// Whereas [`ParseStatus`] is used by [`ParseState`] to influence parser
///   operation,
///     this type is public-facing and used by [`Parser`].
#[derive(Debug, PartialEq, Eq)]
pub enum Parsed<O> {
    /// Additional tokens are needed to complete parsing of the next object.
    Incomplete,

    /// Parsing of an object is complete.
    ///
    /// This does not indicate that the parser is complete,
    ///   as more objects may be able to be emitted.
    Object(O),
}

impl<S: ParseState> From<ParseStatus<S>> for Parsed<S::Object> {
    fn from(status: ParseStatus<S>) -> Self {
        match status {
            ParseStatus::Incomplete => Parsed::Incomplete,
            ParseStatus::Object(x) => Parsed::Object(x),
            ParseStatus::Dead(_) => {
                unreachable!("Dead status must be filtered by Parser")
            }
        }
    }
}

#[cfg(test)]
pub mod test {
    use std::{assert_matches::assert_matches, iter::once};

    use super::*;
    use crate::{span::DUMMY_SPAN as DS, sym::GlobalSymbolIntern};

    #[derive(Debug, PartialEq, Eq, Clone)]
    enum TestToken {
        Close(Span),
        MarkDone(Span),
        Text(Span),
        SetCtxVal(u8),
    }

    impl Display for TestToken {
        fn fmt(&self, _f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
            unimplemented!("fmt::Display")
        }
    }

    impl Token for TestToken {
        fn span(&self) -> Span {
            use TestToken::*;
            match self {
                Close(span) | MarkDone(span) | Text(span) => *span,
                _ => UNKNOWN_SPAN,
            }
        }
    }

    impl Object for TestToken {}

    #[derive(Debug, PartialEq, Eq)]
    enum EchoState {
        Empty,
        Done,
    }

    impl Default for EchoState {
        fn default() -> Self {
            Self::Empty
        }
    }

    #[derive(Debug, PartialEq, Default)]
    struct StubContext {
        val: u8,
    }

    impl ParseState for EchoState {
        type Token = TestToken;
        type Object = TestToken;
        type Error = EchoStateError;

        type Context = StubContext;

        fn parse_token(
            self,
            tok: TestToken,
            ctx: &mut StubContext,
        ) -> TransitionResult<Self> {
            match tok {
                TestToken::MarkDone(..) => Transition(Self::Done).ok(tok),
                TestToken::Close(..) => {
                    Transition(self).err(EchoStateError::InnerError(tok))
                }
                TestToken::Text(..) => Transition(self).dead(tok),
                TestToken::SetCtxVal(val) => {
                    ctx.val = val;
                    Transition(Self::Done).incomplete()
                }
            }
        }

        fn is_accepting(&self) -> bool {
            *self == Self::Done
        }
    }

    impl Display for EchoState {
        fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
            write!(f, "<EchoState as Display>::fmt")
        }
    }

    #[derive(Debug, PartialEq, Eq)]
    enum EchoStateError {
        InnerError(TestToken),
    }

    impl Display for EchoStateError {
        fn fmt(&self, _: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
            unimplemented!()
        }
    }

    impl Error for EchoStateError {
        fn source(&self) -> Option<&(dyn Error + 'static)> {
            None
        }
    }

    impl Diagnostic for EchoStateError {
        fn describe(&self) -> Vec<AnnotatedSpan> {
            unimplemented!()
        }
    }

    type Sut<I> = Parser<EchoState, I>;

    #[test]
    fn successful_parse_in_accepting_state_with_spans() {
        // EchoState is placed into a Done state given Comment.
        let tok = TestToken::MarkDone(DS);
        let mut toks = once(tok.clone());

        let mut sut = Sut::from(&mut toks);

        // The first token should be processed normally.
        // EchoState proxies the token back.
        assert_eq!(Some(Ok(Parsed::Object(tok))), sut.next());

        // This is now the end of the token stream,
        //   which should be okay provided that the first token put us into
        //   a proper accepting state.
        assert_eq!(None, sut.next());

        // Further, finalizing should work in this state.
        assert!(sut.finalize().is_ok());
    }

    #[test]
    fn fails_on_end_of_stream_when_not_in_accepting_state() {
        let span = Span::new(10, 20, "ctx".intern());
        let mut toks = [TestToken::Close(span)].into_iter();

        let mut sut = Sut::from(&mut toks);

        // The first token is fine,
        //   and allows us to acquire our most recent span.
        sut.next();

        // Given that we have no tokens,
        //   and that EchoState::default does not start in an accepting
        //     state,
        //   we must fail when we encounter the end of the stream.
        assert_eq!(
            Some(Err(ParseError::UnexpectedEof(
                span.endpoints().1.unwrap(),
                // All the states have the same string
                //   (at time of writing).
                EchoState::default().to_string(),
            ))),
            sut.next()
        );
    }

    #[test]
    fn returns_state_specific_error() {
        // TestToken::Close causes EchoState to produce an error.
        let errtok = TestToken::Close(DS);
        let mut toks = [errtok.clone()].into_iter();

        let mut sut = Sut::from(&mut toks);

        assert_eq!(
            Some(Err(ParseError::StateError(EchoStateError::InnerError(
                errtok
            )))),
            sut.next()
        );

        // The token must have been consumed.
        // It is up to a recovery process to either bail out or provide
        //   recovery tokens;
        //     continuing without recovery is unlikely to make sense.
        assert_eq!(0, toks.len());
    }

    #[test]
    fn fails_when_parser_is_finalized_in_non_accepting_state() {
        let span = Span::new(10, 10, "ctx".intern());

        // Set up so that we have a single token that we can use for
        //   recovery as part of the same iterator.
        let recovery = TestToken::MarkDone(DS);
        let mut toks = [
            // Used purely to populate a Span.
            TestToken::Close(span),
            // Recovery token here:
            recovery.clone(),
        ]
        .into_iter();

        let mut sut = Sut::from(&mut toks);

        // Populate our most recently seen token's span.
        sut.next();

        // Attempting to finalize now in a non-accepting state should fail
        //   in the same way that encountering an end-of-stream does,
        //     since we're effectively saying "we're done with the stream"
        //     and the parser will have no further opportunity to reach an
        //     accepting state.
        let result = sut.finalize();
        assert_matches!(
            result,
            Err((_, ParseError::UnexpectedEof(s, _))) if s == span.endpoints().1.unwrap()
        );

        // The sut should have been re-returned,
        //   allowing for attempted error recovery if the caller can manage
        //   to produce a sequence of tokens that will be considered valid.
        // `toks` above is set up already for this,
        //   which allows us to assert that we received back the same `sut`.
        let mut sut = result.unwrap_err().0;
        assert_eq!(Some(Ok(Parsed::Object(recovery))), sut.next());

        // And so we should now be in an accepting state,
        //   able to finalize.
        assert!(sut.finalize().is_ok());
    }

    #[test]
    fn unhandled_dead_state_results_in_error() {
        // A Text will cause our parser to return Dead.
        let tok = TestToken::Text(DS);
        let mut toks = once(tok.clone());

        let mut sut = Sut::from(&mut toks);

        // Our parser returns a Dead status,
        //   which is unhandled by any parent context
        //     (since we're not composing parsers),
        //     which causes an error due to an unhandled Dead state.
        assert_eq!(
            sut.next(),
            Some(Err(ParseError::UnexpectedToken(
                tok,
                EchoState::default().to_string()
            ))),
        );
    }

    // A context can be both retrieved from a finished parser and provided
    //   to a new one.
    #[test]
    fn provide_and_retrieve_context() {
        // First, verify that it's initialized to a default context.
        let mut toks = vec![TestToken::MarkDone(DS)].into_iter();
        let mut sut = Sut::from(&mut toks);
        sut.next().unwrap().unwrap();
        let ctx = sut.finalize().unwrap();
        assert_eq!(ctx, Default::default());

        // Next, verify that the context that is manipulated is the context
        //   that is returned to us.
        let val = 5;
        let mut toks = vec![TestToken::SetCtxVal(5)].into_iter();
        let mut sut = Sut::from(&mut toks);
        sut.next().unwrap().unwrap();
        let ctx = sut.finalize().unwrap();
        assert_eq!(ctx, StubContext { val });

        // Finally, verify that the context provided is the context that is
        //   used.
        let val = 10;
        let given_ctx = StubContext { val };
        let mut toks = vec![TestToken::MarkDone(DS)].into_iter();
        let mut sut = EchoState::parse_with_context(&mut toks, given_ctx);
        sut.next().unwrap().unwrap();
        let ctx = sut.finalize().unwrap();
        assert_eq!(ctx, StubContext { val });
    }
}
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								// Basic streaming parsing framework
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								//
-												Copyright year update 2022

RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no
"Group"), but I'm not sure if the legal name has been changed yet or not, so
I'll wait on that.

											
										
										
											2022-05-03 14:14:29 -04:00
+								//  Copyright (C) 2014-2022 Ryan Specialty Group, LLC.
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								//
 								//  This file is part of TAME.
 								//
 								//  This program is free software: you can redistribute it and/or modify
 								//  it under the terms of the GNU General Public License as published by
 								//  the Free Software Foundation, either version 3 of the License, or
 								//  (at your option) any later version.
 								//
 								//  This program is distributed in the hope that it will be useful,
 								//  but WITHOUT ANY WARRANTY; without even the implied warranty of
 								//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 								//  GNU General Public License for more details.
 								//
 								//  You should have received a copy of the GNU General Public License
 								//  along with this program.  If not, see <http://www.gnu.org/licenses/>.
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								//! Basic streaming parser framework for lowering operations.
 								//!
 								//! _TODO: Some proper docs and examples!_
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: diagnose: Introduction of diagnostic system

This is a working concept that will continue to evolve.  I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.

This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does.  The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them.  I need to
balance this work with everything else I have going on.

This is a large commit, but it converts the existing Error Display impls
into Diagnostic.  This separation is a bit verbose, so I'll see how this
ends up evolving.

Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.

Output is integrated into tameld only in this commit; I'll add tamec
next.  Examples of what this outputs are available in the test cases in this
commit.

DEV-10935

											
										
										
											2022-04-13 14:41:54 -04:00
+								use crate::diagnose::{Annotate, AnnotatedSpan, Diagnostic};
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								use crate::iter::{TripIter, TrippableIterator};
-												tamer: parse::Parser (last_span): Replace Option with UNKNOWN_SPAN

There's no use in complicating the error handling here when we'd just
default to `UNKNOWN_SPAN` anyway when trying to render it.  `UNKNOWN_SPAN`
didn't exist at the time of writing.

DEV-10935

											
										
										
											2022-04-12 09:59:00 -04:00
+								use crate::span::{Span, UNKNOWN_SPAN};
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								use std::fmt::Debug;
-												tamer: parse::TransitionResult: Alias=>newtype

This converts the tuple type alias into a newtype, so that we may provide
our own implementations.

This differs from a previous approach that I took, which involved making
this type `Result<(S, T), (S, E)>` so that the return values composed well
with other functions.  But the reality is that this is used only by other
`ParseState`s and `Parser`, so it's unnecessary.

However, this is also an attempt to utilize the new Try and FromResidual
traits; note how the Try associated types match precisely what I was trying
to do before, though they're used as intermediate types.  I'll see how this
evolves.

DEV-10863

											
										
										
											2022-03-25 09:56:22 -04:00
+								use std::hint::unreachable_unchecked;
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								use std::iter::{self, Empty};
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								use std::mem::take;
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								use std::ops::{ControlFlow, Deref, DerefMut, FromResidual, Try};
-												tamer: parse::TransitionResult: Alias=>newtype

This converts the tuple type alias into a newtype, so that we may provide
our own implementations.

This differs from a previous approach that I took, which involved making
this type `Result<(S, T), (S, E)>` so that the return values composed well
with other functions.  But the reality is that this is used only by other
`ParseState`s and `Parser`, so it's unnecessary.

However, this is also an attempt to utilize the new Try and FromResidual
traits; note how the Try associated types match precisely what I was trying
to do before, though they're used as intermediate types.  I'll see how this
evolves.

DEV-10863

											
										
										
											2022-03-25 09:56:22 -04:00
+								use std::{convert::Infallible, error::Error, fmt::Display};
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								/// Result of applying a [`Token`] to a [`ParseState`],
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///   with any error having been wrapped in a [`ParseError`].
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								pub type ParsedResult<S> = ParseResult<S, Parsed<<S as ParseState>::Object>>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								/// Result of some non-parsing operation on a [`Parser`],
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///   with any error having been wrapped in a [`ParseError`].
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								pub type ParseResult<S, T> =
 								    Result<T, ParseError<<S as ParseState>::Token, <S as ParseState>::Error>>;
 								/// A single datum from a streaming IR with an associated [`Span`].
 								///
 								/// A token may be a lexeme with associated data,
 								///   or a more structured object having been lowered from other IRs.
-												tamer: parse::Token: Remove Eq trait bound

`PartialEq` remains, and is all that is needed.  See previous commit
regarding the removal of this same bound from `Context`.

This can be re-added if it ends up actually being necessary.  But Tokens are
ephemeral and used only in lowering pipelines, using pattern matching.

DEV-11864

											
										
										
											2022-05-16 10:05:14 -04:00
+								pub trait Token: Display + Debug + PartialEq {
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    /// Retrieve the [`Span`] representing the source location of the token.
 								    fn span(&self) -> Span;
 								}
 								impl<T: Token> From<T> for Span {
 								    fn from(tok: T) -> Self {
 								        tok.span()
 								    }
 								}
-												tamer: parse: More flexible Transition API

This does some cleanup and adds `parse::Object` for use in disambiguating
`From` for `ParseStatus`, allowing the `Transition` API to be much more
flexible in the data it accepts and automatically converts.  This allows us
to concisely provide raw output data to be wrapped, or provide `ParseStatus`
directly when more convenient.

There aren't yet examples in the docs; I'll do so once I make sure this API
is actually utilized as intended.

DEV-10863

											
										
										
											2022-03-25 16:45:32 -04:00
+								/// An IR object produced by a lowering operation on one or more [`Token`]s.
 								///
 								/// Note that an [`Object`] may also be a [`Token`] if it will be in turn
 								///   fed to another [`Parser`] for lowering.
 								///
 								/// This trait exists to disambiguate an otherwise unbounded type for
 								///   [`From`] conversions,
 								///     used in the [`Transition`] API to provide greater flexibility.
-												tamer: parse::Token: Remove Eq trait bound

`PartialEq` remains, and is all that is needed.  See previous commit
regarding the removal of this same bound from `Context`.

This can be re-added if it ends up actually being necessary.  But Tokens are
ephemeral and used only in lowering pipelines, using pattern matching.

DEV-11864

											
										
										
											2022-05-16 10:05:14 -04:00
+								pub trait Object: Debug + PartialEq {}
-												tamer: parse: More flexible Transition API

This does some cleanup and adds `parse::Object` for use in disambiguating
`From` for `ParseStatus`, allowing the `Transition` API to be much more
flexible in the data it accepts and automatically converts.  This allows us
to concisely provide raw output data to be wrapped, or provide `ParseStatus`
directly when more convenient.

There aren't yet examples in the docs; I'll do so once I make sure this API
is actually utilized as intended.

DEV-10863

											
										
										
											2022-03-25 16:45:32 -04:00
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								/// An infallible [`Token`] stream.
 								///
 								/// If the token stream originates from an operation that could potentially
 								///   fail and ought to be propagated,
 								///     use [`TokenResultStream`].
 								///
 								/// The name "stream" in place of "iterator" is intended to convey that this
 								///   type is expected to be processed in real-time as a stream,
 								///     not read into memory.
 								pub trait TokenStream<T: Token> = Iterator<Item = T>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								/// A [`Token`] stream that may encounter errors during parsing.
 								///
 								/// If the stream cannot fail,
 								///   consider using [`TokenStream`].
 								pub trait TokenResultStream<T: Token, E: Error> = Iterator<Item = Result<T, E>>;
-												tamer: obj::xmlo::reader: Working xmlo reader

This makes the necessary tweaks to have the entire linker work end-to-end
and produce a compatible xmle file (that is, identical except for
nondeterministic topological ordering).  That's good, and finally that can
get off of my plate.

What's disappointing, and what I'll have more information on in future
commits, is how slow it is.

The linking of our largest package goes from ~1s -> ~15s with this
change.  The reason is because of tens of millions of `memcpy` calls.  Why?

The ParseState abstraction is pure and passes an owned `self` around, and
Parser replaces its own reference using this:

        let result;
        TransitionResult(Transition(self.state), result) =
            take(&mut self.state).parse_token(tok);

Naively, this would store a copy of the old state in `result`, allocate a
new ParseState for `self.state`, pass the original or a copy to
`parse_token`, and then overwrite `self.state` with the new ParseState that
is returned once it is all over.

Of course, that'd be devastating.  What we want to happen is for Rust to
realize that it can just pass a reference to `self.state` and perform no
copying at all.

For certain parsers, this is exactly what happens.  Great!

But for XIRF, it we have this:

  /// Stack of element [`QName`] and [`Span`] pairs,
  ///   representing the current level of nesting.
  ///
  /// This storage is statically allocated,
  ///   allowing XIRF's parser to avoid memory allocation entirely.
  type ElementStack<const MAX_DEPTH: usize> = ArrayVec<(QName, Span), MAX_DEPTH>;

  /// XIRF document parser state.
  ///
  /// This parser is a pushdown automaton that parses a single XML document.
  #[derive(Debug, Default, PartialEq, Eq)]
  pub enum State<const MAX_DEPTH: usize, SA = AttrParseState>
  where
      SA: FlatAttrParseState,
  {
      /// Document parsing has not yet begun.
      #[default]
      PreRoot,

      /// Parsing nodes.
      NodeExpected(ElementStack<MAX_DEPTH>),

      /// Delegating to attribute parser.
      AttrExpected(ElementStack<MAX_DEPTH>, SA),

      /// End of document has been reached.
      Done,
  }

ParseState contains an ArrayVec, and its implementation details are causes
LLVM _not_ to elide the `memcpy`.  And there's a lot of them.

Considering that ParseState is supposed to use only statically allocated
memory and be zero-copy, this is rather ironic.

Now, this _could_ be potentially fixed by not using ArrayVec; removing
it (and the corresponding checks for balanced tags) gets us down to
2s (which still needs improvement), but we can't have a core abstraction in
our system resting on a house of cards.  What if the optimization changes
between releases and suddenly linking / building becomes shit slow?  That's
too much of a risk.

Further, having to limit what abstractions we use just to appease the
compiler to optimize away moves is very restrictive.

The better option seems like to go back to what I used to do: pass around
`&mut self`.  I had moved to an owned `self` to force consideration of _all_
state transitions, but I can try to do the same thing in a different type of
way using mutable references, and then we avoid this problem.  The
abstraction isn't pure (in the functional sense) anymore, but it's safe and
isn't relying on delicate inlining and optimizer implementation details to
have a performant system.

More information to come.

DEV-10863

											
										
										
											2022-04-01 16:14:35 -04:00
+								/// A [`ParseState`] capable of being automatically stitched together with
 								///   a parent [`ParseState`] `SP` to create a composite parser.
-												tamer: obj::xmlo::reader: preproc:sym-deps processing

This parses the symbol dependency list (adjacency list).

I'm noticing some glaring issues in error handling, particularly that the
token being parsed while an error occurs is not returned and so recovery is
impossible.  I'll have to address that later on, after I get this parser
completed.

Another previous question that I had a hard time answering in prior months
was how I was going to compose boilerplate parsers, e.g. handling the
parsing of single-attribute elements and such.  A pattern is clearly taking
shape, and with the composition of parsers more formalized, that'll be able
to be abstracted away.  But again, that's going to wait until after this
parser is actually functioning.  Too many delays so far.

DEV-10863

											
										
										
											2022-03-30 15:03:50 -04:00
+								///
 								/// Conceptually,
 								///   this can be visualized as combining the state machines of multiple
 								///   parsers into one larger state machine.
 								///
 								/// The term _state stitching_ refers to a particular pattern able to be
 								///   performed automatically by this parsing framework;
 								///     it is not necessary for parser composition,
 								///       provided that you perform the necessary wiring yourself in absence
 								///       of state stitching.
 								pub trait StitchableParseState<SP: ParseState> = ParseState
 								where
 								    SP: ParseState<Token = <Self as ParseState>::Token>,
 								    <Self as ParseState>::Object: Into<<SP as ParseState>::Object>,
 								    <Self as ParseState>::Error: Into<<SP as ParseState>::Error>;
-												tamer: parse (ParseState): Doc correction regarding determinism

The pair is now a triple and parsers are often NFAs.

											
										
										
											2022-04-05 15:55:58 -04:00
+								/// A parsing automaton.
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								/// These states are utilized by a [`Parser`].
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								/// A [`ParseState`] is also responsible for storing data about the
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///   accepted input,
 								///     and handling appropriate type conversions into the final type.
 								/// That is---an
 								///   automaton may store metadata that is subsequently emitted once an
 								///   accepting state has been reached.
 								/// Whatever the underlying automaton,
-												tamer: parse (ParseState): Doc correction regarding determinism

The pair is now a triple and parsers are often NFAs.

											
										
										
											2022-04-05 15:55:58 -04:00
+								///   a `(state, token, context)` triple must uniquely determine the next
 								///   parser action.
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								pub trait ParseState: Default + PartialEq + Eq + Display + Debug {
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    /// Input tokens to the parser.
 								    type Token: Token;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    /// Objects produced by a parser utilizing these states.
-												tamer: parse: More flexible Transition API

This does some cleanup and adds `parse::Object` for use in disambiguating
`From` for `ParseStatus`, allowing the `Transition` API to be much more
flexible in the data it accepts and automatically converts.  This allows us
to concisely provide raw output data to be wrapped, or provide `ParseStatus`
directly when more convenient.

There aren't yet examples in the docs; I'll do so once I make sure this API
is actually utilized as intended.

DEV-10863

											
										
										
											2022-03-25 16:45:32 -04:00
+								    type Object: Object;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								    /// Errors specific to this set of states.
-												tamer: parse::ParseState:Error: Relax Eq trait bound

This is unnecessarily restrictive, since we do not require anything further
than `PartialEq` for the situations where we care about equality (tests).

DEV-11864

											
										
										
											2022-05-06 15:28:47 -04:00
+								    type Error: Debug + Diagnostic + PartialEq;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: parse::ParseState::Context: Add missing comment

DEV-11864

											
										
										
											2022-05-10 11:06:22 -04:00
+								    /// Object provided to parser alongside each token.
 								    ///
 								    /// This may be used in situations where Rust/LLVM are unable to
 								    ///   optimize away moves of interior data associated with the
 								    ///   otherwise-immutable [`ParseState`].
-												tamer: parse::ParseState::Context: Remove Default trait bound

This is too restrictive, especially for parsers that fold into something,
like the ASG, which may exist prior to invoking the parser.

This moves the trait bound to the functions that actually need it.  Those
obviously cannot be used if the Context does not implement `Default`, but
I'll provide alternative conveniences.

DEV-11864

											
										
										
											2022-05-05 15:55:04 -04:00
+								    type Context: Debug = EmptyContext;
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    /// Construct a parser.
 								    ///
 								    /// Whether this method is helpful or provides any clarity depends on
 								    ///   the context and the types that are able to be inferred.
-												tamer: parse::ParseState::Context: Remove Default trait bound

This is too restrictive, especially for parsers that fold into something,
like the ASG, which may exist prior to invoking the parser.

This moves the trait bound to the functions that actually need it.  Those
obviously cannot be used if the Context does not implement `Default`, but
I'll provide alternative conveniences.

DEV-11864

											
										
										
											2022-05-05 15:55:04 -04:00
+								    fn parse<I: TokenStream<Self::Token>>(toks: I) -> Parser<Self, I>
 								    where
 								        Self::Context: Default,
 								    {
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								        Parser::from(toks)
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    }
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								    /// Construct a parser with a non-default [`ParseState::Context`].
 								    ///
 								    /// This is useful in two ways:
 								    ///
 								    ///   1. To allow for parsing using a context that does not implement
 								    ///        [`Default`],
 								    ///          or whose default is not sufficient; and
 								    ///   2. To re-use a context from a previous [`Parser`].
 								    ///
 								    /// If neither of these apply to your situation,
 								    ///   consider [`ParseState::parse`] instead.
 								    ///
 								    /// To retrieve a context from a parser for re-use,
 								    ///   see [`Parser::finalize`].
 								    fn parse_with_context<I: TokenStream<Self::Token>>(
 								        toks: I,
 								        ctx: Self::Context,
 								    ) -> Parser<Self, I> {
 								        Parser::from((toks, ctx))
 								    }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    /// Parse a single [`Token`] and optionally perform a state transition.
 								    ///
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								    /// The current state is represented by `self`.
 								    /// The result of a parsing operation is a state transition with
 								    ///   associated [`ParseStatus`] data.
 								    ///
 								    /// Note that `self` is owned,
 								    ///   for a couple primary reasons:
 								    ///
 								    ///   1. This forces the parser to explicitly consider and document all
 								    ///        state transitions,
 								    ///          rather than potentially missing unintended behavior through
 								    ///          implicit behavior; and
 								    ///   2. It allows for more natural functional composition of state,
 								    ///        which in turn makes it easier to compose parsers
 								    ///          (which conceptually involves stitching together state
 								    ///            machines).
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								    ///
 								    /// Since a [`ParseState`] produces a new version of itself with each
 								    ///   invocation,
 								    ///     it is functionally pure.
 								    /// Generally,
 								    ///   Rust/LLVM are able to optimize moves into direct assignments.
 								    /// However,
 								    ///   there are circumstances where this is _not_ the case,
 								    ///   in which case [`Context`] can be used to provide a mutable context
 								    ///     owned by the caller (e.g. [`Parser`]) to store additional
 								    ///     information that is not subject to Rust's move semantics.
 								    /// If this is not necessary,
 								    ///   see [`NoContext`].
 								    fn parse_token(
 								        self,
 								        tok: Self::Token,
 								        ctx: &mut Self::Context,
 								    ) -> TransitionResult<Self>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								    /// Whether the current state represents an accepting state.
 								    ///
 								    /// An accepting state represents a valid state to stop parsing.
 								    /// If parsing stops at a state that is _not_ accepting,
 								    ///   then the [`TokenStream`] has ended unexpectedly and should produce
 								    ///   a [`ParseError::UnexpectedEof`].
 								    ///
 								    /// It makes sense for there to be exist multiple accepting states for a
 								    ///   parser.
 								    /// For example:
 								    ///   A parser that parses a list of attributes may be used to parse one
 								    ///   or more attributes,
 								    ///     or the entire list of attributes.
 								    ///   It is acceptable to attempt to parse just one of those attributes,
 								    ///     or it is acceptable to parse all the way until the end.
 								    fn is_accepting(&self) -> bool;
-												tamer: parse::ParseState::delegate: Initial state stitching concept

This is the delegation portion of what I've come to call "state
stitching"---wiring together two state machines that recognize the same
input tokens.

This handles the delegation of tokens once the parser has been entered, but
does not yet handle the actual stitching part of it: wiring the start and
accepting states of the child parser to the parent.

This is indirectly tested by the XmloReader, but it will receive its own
tests once I further finalize this concept.  I'm playing around with some
ideas.  With that said, a quick visual inspection together with the
guarantees provided by the type system should convince any familiar reader
of its correctness.

DEV-10863

											
										
										
											2022-03-29 12:46:16 -04:00
-												tamer: parser::ParseState::delegate_lookahead: New concept

This introduces a new method similar to the previous `delegate`, but with
another closure that allows for handling lookahead tokens from the child
parser.

Admittedly, this isn't exactly what I was going for---a list of arguments
isn't exactly self-documenting, especially with the brevity when the
arguments line up---but this was easy to do and so I'll run with this for
now.

This also modified `delegate` to accept a context, even though it wasn't
necessary, both for consistency with its lookup counterpart and for brevity
with the `into` argument (allowing, in our case, to just pass the name of
the variant, rather than a closure).

I'm not going to handle the actual starting and accepting state stitching
abstraction for now; I'd like to observe future boilerplate more before I
consider the best way to handle it, though I do have some ideas.

DEV-10863

											
										
										
											2022-03-29 14:18:08 -04:00
+								    /// Delegate parsing from a compatible, stitched [`ParseState`]~`SP`.
-												tamer: parse::ParseState::delegate: Initial state stitching concept

This is the delegation portion of what I've come to call "state
stitching"---wiring together two state machines that recognize the same
input tokens.

This handles the delegation of tokens once the parser has been entered, but
does not yet handle the actual stitching part of it: wiring the start and
accepting states of the child parser to the parent.

This is indirectly tested by the XmloReader, but it will receive its own
tests once I further finalize this concept.  I'm playing around with some
ideas.  With that said, a quick visual inspection together with the
guarantees provided by the type system should convince any familiar reader
of its correctness.

DEV-10863

											
										
										
											2022-03-29 12:46:16 -04:00
+								    ///
 								    /// This helps to combine two state machines that speak the same input
 								    ///   language
 								    ///   (share the same [`Self::Token`]),
 								    ///     handling the boilerplate of delegating [`Self::Token`] from a
 								    ///     parent state~`SP` to `Self`.
 								    ///
 								    /// Token delegation happens after [`Self`] has been entered from a
 								    ///   parent [`ParseState`] context~`SP`,
 								    ///     so stitching the start and accepting states must happen elsewhere
 								    ///     (for now).
 								    ///
 								    /// This assumes that no lookahead token from [`ParseStatus::Dead`] will
 								    ///   need to be handled by the parent state~`SP`.
-												tamer: parser::ParseState::delegate_lookahead: New concept

This introduces a new method similar to the previous `delegate`, but with
another closure that allows for handling lookahead tokens from the child
parser.

Admittedly, this isn't exactly what I was going for---a list of arguments
isn't exactly self-documenting, especially with the brevity when the
arguments line up---but this was easy to do and so I'll run with this for
now.

This also modified `delegate` to accept a context, even though it wasn't
necessary, both for consistency with its lookup counterpart and for brevity
with the `into` argument (allowing, in our case, to just pass the name of
the variant, rather than a closure).

I'm not going to handle the actual starting and accepting state stitching
abstraction for now; I'd like to observe future boilerplate more before I
consider the best way to handle it, though I do have some ideas.

DEV-10863

											
										
										
											2022-03-29 14:18:08 -04:00
+								    /// To handle a token of lookahead,
 								    ///   use [`Self::delegate_lookahead`] instead.
-												tamer: parse::ParseState::delegate: Initial state stitching concept

This is the delegation portion of what I've come to call "state
stitching"---wiring together two state machines that recognize the same
input tokens.

This handles the delegation of tokens once the parser has been entered, but
does not yet handle the actual stitching part of it: wiring the start and
accepting states of the child parser to the parent.

This is indirectly tested by the XmloReader, but it will receive its own
tests once I further finalize this concept.  I'm playing around with some
ideas.  With that said, a quick visual inspection together with the
guarantees provided by the type system should convince any familiar reader
of its correctness.

DEV-10863

											
										
										
											2022-03-29 12:46:16 -04:00
+								    ///
 								    /// _TODO: More documentation once this is finalized._
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								    fn delegate<SP, C>(
-												tamer: parse::ParseState::delegate: Initial state stitching concept

This is the delegation portion of what I've come to call "state
stitching"---wiring together two state machines that recognize the same
input tokens.

This handles the delegation of tokens once the parser has been entered, but
does not yet handle the actual stitching part of it: wiring the start and
accepting states of the child parser to the parent.

This is indirectly tested by the XmloReader, but it will receive its own
tests once I further finalize this concept.  I'm playing around with some
ideas.  With that said, a quick visual inspection together with the
guarantees provided by the type system should convince any familiar reader
of its correctness.

DEV-10863

											
										
										
											2022-03-29 12:46:16 -04:00
+								        self,
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								        mut context: C,
-												tamer: obj::xmlo::reader: preproc:sym-deps processing

This parses the symbol dependency list (adjacency list).

I'm noticing some glaring issues in error handling, particularly that the
token being parsed while an error occurs is not returned and so recovery is
impossible.  I'll have to address that later on, after I get this parser
completed.

Another previous question that I had a hard time answering in prior months
was how I was going to compose boilerplate parsers, e.g. handling the
parsing of single-attribute elements and such.  A pattern is clearly taking
shape, and with the composition of parsers more formalized, that'll be able
to be abstracted away.  But again, that's going to wait until after this
parser is actually functioning.  Too many delays so far.

DEV-10863

											
										
										
											2022-03-30 15:03:50 -04:00
+								        tok: <Self as ParseState>::Token,
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								        into: impl FnOnce(Self) -> SP,
-												tamer: parse::ParseState::delegate: Initial state stitching concept

This is the delegation portion of what I've come to call "state
stitching"---wiring together two state machines that recognize the same
input tokens.

This handles the delegation of tokens once the parser has been entered, but
does not yet handle the actual stitching part of it: wiring the start and
accepting states of the child parser to the parent.

This is indirectly tested by the XmloReader, but it will receive its own
tests once I further finalize this concept.  I'm playing around with some
ideas.  With that said, a quick visual inspection together with the
guarantees provided by the type system should convince any familiar reader
of its correctness.

DEV-10863

											
										
										
											2022-03-29 12:46:16 -04:00
+								    ) -> TransitionResult<SP>
 								    where
-												tamer: obj::xmlo::reader: preproc:sym-deps processing

This parses the symbol dependency list (adjacency list).

I'm noticing some glaring issues in error handling, particularly that the
token being parsed while an error occurs is not returned and so recovery is
impossible.  I'll have to address that later on, after I get this parser
completed.

Another previous question that I had a hard time answering in prior months
was how I was going to compose boilerplate parsers, e.g. handling the
parsing of single-attribute elements and such.  A pattern is clearly taking
shape, and with the composition of parsers more formalized, that'll be able
to be abstracted away.  But again, that's going to wait until after this
parser is actually functioning.  Too many delays so far.

DEV-10863

											
										
										
											2022-03-30 15:03:50 -04:00
+								        Self: StitchableParseState<SP>,
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								        C: AsMut<<Self as ParseState>::Context>,
-												tamer: parse::ParseState::delegate: Initial state stitching concept

This is the delegation portion of what I've come to call "state
stitching"---wiring together two state machines that recognize the same
input tokens.

This handles the delegation of tokens once the parser has been entered, but
does not yet handle the actual stitching part of it: wiring the start and
accepting states of the child parser to the parent.

This is indirectly tested by the XmloReader, but it will receive its own
tests once I further finalize this concept.  I'm playing around with some
ideas.  With that said, a quick visual inspection together with the
guarantees provided by the type system should convince any familiar reader
of its correctness.

DEV-10863

											
										
										
											2022-03-29 12:46:16 -04:00
+								    {
 								        use ParseStatus::{Dead, Incomplete, Object as Obj};
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								        let (Transition(newst), result) =
 								            self.parse_token(tok, context.as_mut()).into();
-												tamer: parse::ParseState::delegate: Initial state stitching concept

This is the delegation portion of what I've come to call "state
stitching"---wiring together two state machines that recognize the same
input tokens.

This handles the delegation of tokens once the parser has been entered, but
does not yet handle the actual stitching part of it: wiring the start and
accepting states of the child parser to the parent.

This is indirectly tested by the XmloReader, but it will receive its own
tests once I further finalize this concept.  I'm playing around with some
ideas.  With that said, a quick visual inspection together with the
guarantees provided by the type system should convince any familiar reader
of its correctness.

DEV-10863

											
										
										
											2022-03-29 12:46:16 -04:00
-												tamer: parser::ParseState::delegate_lookahead: New concept

This introduces a new method similar to the previous `delegate`, but with
another closure that allows for handling lookahead tokens from the child
parser.

Admittedly, this isn't exactly what I was going for---a list of arguments
isn't exactly self-documenting, especially with the brevity when the
arguments line up---but this was easy to do and so I'll run with this for
now.

This also modified `delegate` to accept a context, even though it wasn't
necessary, both for consistency with its lookup counterpart and for brevity
with the `into` argument (allowing, in our case, to just pass the name of
the variant, rather than a closure).

I'm not going to handle the actual starting and accepting state stitching
abstraction for now; I'd like to observe future boilerplate more before I
consider the best way to handle it, though I do have some ideas.

DEV-10863

											
										
										
											2022-03-29 14:18:08 -04:00
+								        // This does not use `delegate_lookahead` so that we can have
 								        //   `into: impl FnOnce` instead of `Fn`.
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								        Transition(into(newst)).result(match result {
-												tamer: parse::ParseState::delegate: Initial state stitching concept

This is the delegation portion of what I've come to call "state
stitching"---wiring together two state machines that recognize the same
input tokens.

This handles the delegation of tokens once the parser has been entered, but
does not yet handle the actual stitching part of it: wiring the start and
accepting states of the child parser to the parent.

This is indirectly tested by the XmloReader, but it will receive its own
tests once I further finalize this concept.  I'm playing around with some
ideas.  With that said, a quick visual inspection together with the
guarantees provided by the type system should convince any familiar reader
of its correctness.

DEV-10863

											
										
										
											2022-03-29 12:46:16 -04:00
+								            Ok(Incomplete) => Ok(Incomplete),
 								            Ok(Obj(obj)) => Ok(Obj(obj.into())),
 								            Ok(Dead(tok)) => Ok(Dead(tok)),
 								            Err(e) => Err(e.into()),
 								        })
 								    }
-												tamer: parser::ParseState::delegate_lookahead: New concept

This introduces a new method similar to the previous `delegate`, but with
another closure that allows for handling lookahead tokens from the child
parser.

Admittedly, this isn't exactly what I was going for---a list of arguments
isn't exactly self-documenting, especially with the brevity when the
arguments line up---but this was easy to do and so I'll run with this for
now.

This also modified `delegate` to accept a context, even though it wasn't
necessary, both for consistency with its lookup counterpart and for brevity
with the `into` argument (allowing, in our case, to just pass the name of
the variant, rather than a closure).

I'm not going to handle the actual starting and accepting state stitching
abstraction for now; I'd like to observe future boilerplate more before I
consider the best way to handle it, though I do have some ideas.

DEV-10863

											
										
										
											2022-03-29 14:18:08 -04:00
 								    /// Delegate parsing from a compatible, stitched [`ParseState`]~`SP` with
 								    ///   support for a lookahead token.
 								    ///
 								    /// This does the same thing as [`Self::delegate`],
 								    ///   but allows for the handling of a lookahead token from [`Self`]
 								    ///   rather than simply proxying [`ParseStatus::Dead`].
 								    ///
 								    /// _TODO: More documentation once this is finalized._
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								    fn delegate_lookahead<SP, C>(
-												tamer: parser::ParseState::delegate_lookahead: New concept

This introduces a new method similar to the previous `delegate`, but with
another closure that allows for handling lookahead tokens from the child
parser.

Admittedly, this isn't exactly what I was going for---a list of arguments
isn't exactly self-documenting, especially with the brevity when the
arguments line up---but this was easy to do and so I'll run with this for
now.

This also modified `delegate` to accept a context, even though it wasn't
necessary, both for consistency with its lookup counterpart and for brevity
with the `into` argument (allowing, in our case, to just pass the name of
the variant, rather than a closure).

I'm not going to handle the actual starting and accepting state stitching
abstraction for now; I'd like to observe future boilerplate more before I
consider the best way to handle it, though I do have some ideas.

DEV-10863

											
										
										
											2022-03-29 14:18:08 -04:00
+								        self,
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								        mut context: C,
-												tamer: obj::xmlo::reader: preproc:sym-deps processing

This parses the symbol dependency list (adjacency list).

I'm noticing some glaring issues in error handling, particularly that the
token being parsed while an error occurs is not returned and so recovery is
impossible.  I'll have to address that later on, after I get this parser
completed.

Another previous question that I had a hard time answering in prior months
was how I was going to compose boilerplate parsers, e.g. handling the
parsing of single-attribute elements and such.  A pattern is clearly taking
shape, and with the composition of parsers more formalized, that'll be able
to be abstracted away.  But again, that's going to wait until after this
parser is actually functioning.  Too many delays so far.

DEV-10863

											
										
										
											2022-03-30 15:03:50 -04:00
+								        tok: <Self as ParseState>::Token,
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								        into: impl FnOnce(Self) -> SP,
 								    ) -> ControlFlow<TransitionResult<SP>, (Self, <Self as ParseState>::Token, C)>
-												tamer: parser::ParseState::delegate_lookahead: New concept

This introduces a new method similar to the previous `delegate`, but with
another closure that allows for handling lookahead tokens from the child
parser.

Admittedly, this isn't exactly what I was going for---a list of arguments
isn't exactly self-documenting, especially with the brevity when the
arguments line up---but this was easy to do and so I'll run with this for
now.

This also modified `delegate` to accept a context, even though it wasn't
necessary, both for consistency with its lookup counterpart and for brevity
with the `into` argument (allowing, in our case, to just pass the name of
the variant, rather than a closure).

I'm not going to handle the actual starting and accepting state stitching
abstraction for now; I'd like to observe future boilerplate more before I
consider the best way to handle it, though I do have some ideas.

DEV-10863

											
										
										
											2022-03-29 14:18:08 -04:00
+								    where
-												tamer: obj::xmlo::reader: preproc:sym-deps processing

This parses the symbol dependency list (adjacency list).

I'm noticing some glaring issues in error handling, particularly that the
token being parsed while an error occurs is not returned and so recovery is
impossible.  I'll have to address that later on, after I get this parser
completed.

Another previous question that I had a hard time answering in prior months
was how I was going to compose boilerplate parsers, e.g. handling the
parsing of single-attribute elements and such.  A pattern is clearly taking
shape, and with the composition of parsers more formalized, that'll be able
to be abstracted away.  But again, that's going to wait until after this
parser is actually functioning.  Too many delays so far.

DEV-10863

											
										
										
											2022-03-30 15:03:50 -04:00
+								        Self: StitchableParseState<SP>,
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								        C: AsMut<<Self as ParseState>::Context>,
-												tamer: parser::ParseState::delegate_lookahead: New concept

This introduces a new method similar to the previous `delegate`, but with
another closure that allows for handling lookahead tokens from the child
parser.

Admittedly, this isn't exactly what I was going for---a list of arguments
isn't exactly self-documenting, especially with the brevity when the
arguments line up---but this was easy to do and so I'll run with this for
now.

This also modified `delegate` to accept a context, even though it wasn't
necessary, both for consistency with its lookup counterpart and for brevity
with the `into` argument (allowing, in our case, to just pass the name of
the variant, rather than a closure).

I'm not going to handle the actual starting and accepting state stitching
abstraction for now; I'd like to observe future boilerplate more before I
consider the best way to handle it, though I do have some ideas.

DEV-10863

											
										
										
											2022-03-29 14:18:08 -04:00
+								    {
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								        use ControlFlow::*;
-												tamer: parser::ParseState::delegate_lookahead: New concept

This introduces a new method similar to the previous `delegate`, but with
another closure that allows for handling lookahead tokens from the child
parser.

Admittedly, this isn't exactly what I was going for---a list of arguments
isn't exactly self-documenting, especially with the brevity when the
arguments line up---but this was easy to do and so I'll run with this for
now.

This also modified `delegate` to accept a context, even though it wasn't
necessary, both for consistency with its lookup counterpart and for brevity
with the `into` argument (allowing, in our case, to just pass the name of
the variant, rather than a closure).

I'm not going to handle the actual starting and accepting state stitching
abstraction for now; I'd like to observe future boilerplate more before I
consider the best way to handle it, though I do have some ideas.

DEV-10863

											
										
										
											2022-03-29 14:18:08 -04:00
+								        use ParseStatus::{Dead, Incomplete, Object as Obj};
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								        // NB: Rust/LLVM are generally able to elide these moves into direct
 								        //   assignments,
 								        //     but sometimes this does not work
 								        //       (e.g. XIRF's use of `ArrayVec`).
 								        // If your [`ParseState`] has a lot of `memcpy`s or other
 								        //   performance issues,
 								        //     move heavy objects into `context`.
 								        let (Transition(newst), result) =
 								            self.parse_token(tok, context.as_mut()).into();
-												tamer: parser::ParseState::delegate_lookahead: New concept

This introduces a new method similar to the previous `delegate`, but with
another closure that allows for handling lookahead tokens from the child
parser.

Admittedly, this isn't exactly what I was going for---a list of arguments
isn't exactly self-documenting, especially with the brevity when the
arguments line up---but this was easy to do and so I'll run with this for
now.

This also modified `delegate` to accept a context, even though it wasn't
necessary, both for consistency with its lookup counterpart and for brevity
with the `into` argument (allowing, in our case, to just pass the name of
the variant, rather than a closure).

I'm not going to handle the actual starting and accepting state stitching
abstraction for now; I'd like to observe future boilerplate more before I
consider the best way to handle it, though I do have some ideas.

DEV-10863

											
										
										
											2022-03-29 14:18:08 -04:00
 								        match result {
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								            Ok(Incomplete) => Break(Transition(into(newst)).incomplete()),
 								            Ok(Obj(obj)) => Break(Transition(into(newst)).ok(obj.into())),
 								            Ok(Dead(tok)) => Continue((newst, tok, context)),
 								            Err(e) => Break(Transition(into(newst)).err(e)),
-												tamer: parser::ParseState::delegate_lookahead: New concept

This introduces a new method similar to the previous `delegate`, but with
another closure that allows for handling lookahead tokens from the child
parser.

Admittedly, this isn't exactly what I was going for---a list of arguments
isn't exactly self-documenting, especially with the brevity when the
arguments line up---but this was easy to do and so I'll run with this for
now.

This also modified `delegate` to accept a context, even though it wasn't
necessary, both for consistency with its lookup counterpart and for brevity
with the `into` argument (allowing, in our case, to just pass the name of
the variant, rather than a closure).

I'm not going to handle the actual starting and accepting state stitching
abstraction for now; I'd like to observe future boilerplate more before I
consider the best way to handle it, though I do have some ideas.

DEV-10863

											
										
										
											2022-03-29 14:18:08 -04:00
+								        }
 								    }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								}
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								/// Empty [`Context`] for [`ParseState`]s with pure functional
 								///   implementations with no mutable state.
 								///
 								/// Using this value means that a [`ParseState`] does not require a
 								///   context.
 								/// All [`Context`]s implement [`AsMut<EmptyContext>`](AsMut),
 								///   and so all pure [`ParseState`]s have contexts compatible with every
 								///   other parser for composition
 								///     (provided that the other invariants in [`StitchableParseState`] are
 								///       met).
 								///
 								/// This can be clearly represented in function signatures using
 								///   [`EmptyContext`].
 								#[derive(Debug, PartialEq, Eq, Default)]
 								pub struct EmptyContext;
 								impl AsMut<EmptyContext> for EmptyContext {
 								    fn as_mut(&mut self) -> &mut EmptyContext {
 								        self
 								    }
 								}
 								/// A [`ParseState`] does not require any mutable [`Context`].
 								///
 								/// A [`ParseState`] using this context is pure
 								///   (has no mutable state),
 								///     returning a new version of itself on each state change.
 								///
 								/// This type is intended to be self-documenting:
 								///   `_: EmptyContext` is nicer to readers than `_: &mut EmptyContext`.
 								///
 								/// See [`EmptyContext`] for more information.
 								pub type NoContext<'a> = &'a mut EmptyContext;
 								/// Mutable context for [`ParseState`].
 								///
 								/// [`ParseState`]s are immutable and pure---they
 								///   are invoked via [`ParseState::parse_token`] and return a new version
 								///   of themselves representing their new state.
 								/// Rust/LLVM are generally able to elide intermediate values and moves,
 								///   optimizing these parsers away into assignments.
 								///
 								/// However,
 								///   there are circumstances where moves may not be elided and may retain
 								///   their `memcpy` equivalents.
 								/// To work around this,
 								///   [`ParseState::parse_token`] accepts a mutable [`Context`] reference
 								///   which is held by the parent [`Parser`],
 								///     which can be mutated in-place without worrying about Rust's move
 								///     semantics.
 								///
 								/// Plainly: you should only use this if you have to.
 								/// This was added because certain parsers may be invoked millions of times
 								///   for each individual token in systems with many source packages,
 								///     which may otherwise result in millions of `memcpy`s.
 								///
 								/// When composing two [`ParseState`]s `A<B, C>`,
 								///   a [`Context<B, C>`](Context) must be contravariant over `B` and~`C`.
 								/// Concretely,
 								///   this means that [`AsMut<B::Context>`](AsMut) and
 								///   [`AsMut<C::Context>`](AsMut) must be implemented for `A::Context`.
 								/// This almost certainly means that `A::Context` is a product type.
 								/// Consequently,
 								///   a single [`Parser`] is able to hold a composite [`Context`] in a
 								///   single memory location.
 								///
 								/// [`Context<T>`](Context) implements [`Deref<T>`](Deref) for convenience.
 								///
 								/// If your [`ParseState`] does not require a mutable [`Context`],
 								///   see [`NoContext`].
 								#[derive(Debug, Default)]
 								pub struct Context<T: Debug + Default>(T, EmptyContext);
 								impl<T: Debug + Default> AsMut<EmptyContext> for Context<T> {
 								    fn as_mut(&mut self) -> &mut EmptyContext {
 								        &mut self.1
 								    }
 								}
 								impl<T: Debug + Default> Deref for Context<T> {
 								    type Target = T;
 								    fn deref(&self) -> &Self::Target {
 								        &self.0
 								    }
 								}
 								impl<T: Debug + Default> DerefMut for Context<T> {
 								    fn deref_mut(&mut self) -> &mut Self::Target {
 								        &mut self.0
 								    }
 								}
 								impl<T: Debug + Default> From<T> for Context<T> {
 								    fn from(x: T) -> Self {
 								        Context(x, EmptyContext)
 								    }
 								}
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								/// Result of applying a [`Token`] to a [`ParseState`].
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								///
 								/// This is used by [`ParseState::parse_token`];
 								///   see that function for rationale.
-												tamer: parse: More flexible Transition API

This does some cleanup and adds `parse::Object` for use in disambiguating
`From` for `ParseStatus`, allowing the `Transition` API to be much more
flexible in the data it accepts and automatically converts.  This allows us
to concisely provide raw output data to be wrapped, or provide `ParseStatus`
directly when more convenient.

There aren't yet examples in the docs; I'll do so once I make sure this API
is actually utilized as intended.

DEV-10863

											
										
										
											2022-03-25 16:45:32 -04:00
+								pub type ParseStateResult<S> = Result<ParseStatus<S>, <S as ParseState>::Error>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: parse::TransitionResult: Alias=>newtype

This converts the tuple type alias into a newtype, so that we may provide
our own implementations.

This differs from a previous approach that I took, which involved making
this type `Result<(S, T), (S, E)>` so that the return values composed well
with other functions.  But the reality is that this is used only by other
`ParseState`s and `Parser`, so it's unnecessary.

However, this is also an attempt to utilize the new Try and FromResidual
traits; note how the Try associated types match precisely what I was trying
to do before, though they're used as intermediate types.  I'll see how this
evolves.

DEV-10863

											
										
										
											2022-03-25 09:56:22 -04:00
+								/// A state transition with associated data.
 								///
 								/// Conceptually,
 								///   imagine the act of a state transition producing data.
 								/// See [`Transition`] for convenience methods for producing this tuple.
-												tamer: parse::ParseState:Error: Relax Eq trait bound

This is unnecessarily restrictive, since we do not require anything further
than `PartialEq` for the situations where we care about equality (tests).

DEV-11864

											
										
										
											2022-05-06 15:28:47 -04:00
+								#[derive(Debug, PartialEq)]
-												tamer: parse::TransitionResult: Alias=>newtype

This converts the tuple type alias into a newtype, so that we may provide
our own implementations.

This differs from a previous approach that I took, which involved making
this type `Result<(S, T), (S, E)>` so that the return values composed well
with other functions.  But the reality is that this is used only by other
`ParseState`s and `Parser`, so it's unnecessary.

However, this is also an attempt to utilize the new Try and FromResidual
traits; note how the Try associated types match precisely what I was trying
to do before, though they're used as intermediate types.  I'll see how this
evolves.

DEV-10863

											
										
										
											2022-03-25 09:56:22 -04:00
+								pub struct TransitionResult<S: ParseState>(
 								    pub Transition<S>,
 								    pub ParseStateResult<S>,
 								);
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								/// Denotes a state transition.
 								///
 								/// This newtype was created to produce clear, self-documenting code;
 								///   parsers can get confusing to read with all of the types involved,
 								///     so this provides a mental synchronization point.
 								///
 								/// This also provides some convenience methods to help remote boilerplate
 								///   and further improve code clarity.
 								#[derive(Debug, PartialEq, Eq)]
 								pub struct Transition<S: ParseState>(pub S);
 								impl<S: ParseState> Transition<S> {
 								    /// A state transition with corresponding data.
 								    ///
 								    /// This allows [`ParseState::parse_token`] to emit a parsed object and
 								    ///   corresponds to [`ParseStatus::Object`].
-												tamer: parse: More flexible Transition API

This does some cleanup and adds `parse::Object` for use in disambiguating
`From` for `ParseStatus`, allowing the `Transition` API to be much more
flexible in the data it accepts and automatically converts.  This allows us
to concisely provide raw output data to be wrapped, or provide `ParseStatus`
directly when more convenient.

There aren't yet examples in the docs; I'll do so once I make sure this API
is actually utilized as intended.

DEV-10863

											
										
										
											2022-03-25 16:45:32 -04:00
+								    pub fn ok<T>(self, obj: T) -> TransitionResult<S>
 								    where
 								        T: Into<ParseStatus<S>>,
 								    {
 								        TransitionResult(self, Ok(obj.into()))
 								    }
 								    /// A transition with corresponding error.
 								    ///
 								    /// This indicates a parsing failure.
 								    /// The state ought to be suitable for error recovery.
 								    pub fn err<E: Into<S::Error>>(self, err: E) -> TransitionResult<S> {
 								        TransitionResult(self, Err(err.into()))
 								    }
 								    /// A state transition with corresponding [`Result`].
 								    ///
 								    /// This translates the provided [`Result`] in a manner equivalent to
 								    ///   [`Transition::ok`] and [`Transition::err`].
 								    pub fn result<T, E>(self, result: Result<T, E>) -> TransitionResult<S>
 								    where
 								        T: Into<ParseStatus<S>>,
 								        E: Into<S::Error>,
 								    {
 								        TransitionResult(self, result.map(Into::into).map_err(Into::into))
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								    }
 								    /// A state transition indicating that more data is needed before an
 								    ///   object can be emitted.
 								    ///
 								    /// This corresponds to [`ParseStatus::Incomplete`].
-												tamer: parse::TransitionResult: Alias=>newtype

This converts the tuple type alias into a newtype, so that we may provide
our own implementations.

This differs from a previous approach that I took, which involved making
this type `Result<(S, T), (S, E)>` so that the return values composed well
with other functions.  But the reality is that this is used only by other
`ParseState`s and `Parser`, so it's unnecessary.

However, this is also an attempt to utilize the new Try and FromResidual
traits; note how the Try associated types match precisely what I was trying
to do before, though they're used as intermediate types.  I'll see how this
evolves.

DEV-10863

											
										
										
											2022-03-25 09:56:22 -04:00
+								    pub fn incomplete(self) -> TransitionResult<S> {
 								        TransitionResult(self, Ok(ParseStatus::Incomplete))
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								    }
 								    /// A dead state transition.
 								    ///
 								    /// This corresponds to [`ParseStatus::Dead`],
 								    ///   and a calling parser should use the provided [`Token`] as
 								    ///   lookahead.
-												tamer: parse::TransitionResult: Alias=>newtype

This converts the tuple type alias into a newtype, so that we may provide
our own implementations.

This differs from a previous approach that I took, which involved making
this type `Result<(S, T), (S, E)>` so that the return values composed well
with other functions.  But the reality is that this is used only by other
`ParseState`s and `Parser`, so it's unnecessary.

However, this is also an attempt to utilize the new Try and FromResidual
traits; note how the Try associated types match precisely what I was trying
to do before, though they're used as intermediate types.  I'll see how this
evolves.

DEV-10863

											
										
										
											2022-03-25 09:56:22 -04:00
+								    pub fn dead(self, tok: S::Token) -> TransitionResult<S> {
 								        TransitionResult(self, Ok(ParseStatus::Dead(tok)))
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								    }
 								}
-												tamer: parse::TransitionResult: Alias=>newtype

This converts the tuple type alias into a newtype, so that we may provide
our own implementations.

This differs from a previous approach that I took, which involved making
this type `Result<(S, T), (S, E)>` so that the return values composed well
with other functions.  But the reality is that this is used only by other
`ParseState`s and `Parser`, so it's unnecessary.

However, this is also an attempt to utilize the new Try and FromResidual
traits; note how the Try associated types match precisely what I was trying
to do before, though they're used as intermediate types.  I'll see how this
evolves.

DEV-10863

											
										
										
											2022-03-25 09:56:22 -04:00
+								impl<S: ParseState> Into<(Transition<S>, ParseStateResult<S>)>
 								    for TransitionResult<S>
 								{
 								    fn into(self) -> (Transition<S>, ParseStateResult<S>) {
 								        (self.0, self.1)
 								    }
 								}
 								impl<S: ParseState> Try for TransitionResult<S> {
 								    type Output = (Transition<S>, ParseStateResult<S>);
 								    type Residual = (Transition<S>, ParseStateResult<S>);
 								    fn from_output(output: Self::Output) -> Self {
 								        match output {
 								            (st, result) => Self(st, result),
 								        }
 								    }
 								    fn branch(self) -> ControlFlow<Self::Residual, Self::Output> {
 								        match self.into() {
 								            (st, Ok(x)) => ControlFlow::Continue((st, Ok(x))),
 								            (st, Err(e)) => ControlFlow::Break((st, Err(e))),
 								        }
 								    }
 								}
 								impl<S: ParseState> FromResidual<(Transition<S>, ParseStateResult<S>)>
 								    for TransitionResult<S>
 								{
 								    fn from_residual(residual: (Transition<S>, ParseStateResult<S>)) -> Self {
 								        match residual {
 								            (st, result) => Self(st, result),
 								        }
 								    }
 								}
 								impl<S: ParseState> FromResidual<Result<Infallible, TransitionResult<S>>>
 								    for TransitionResult<S>
 								{
 								    fn from_residual(
 								        residual: Result<Infallible, TransitionResult<S>>,
 								    ) -> Self {
 								        match residual {
 								            Err(e) => e,
 								            // SAFETY: This match arm doesn't seem to be required in
 								            //   core::result::Result's FromResidual implementation,
 								            //     but as of 1.61 nightly it is here.
 								            // Since this is Infallable,
 								            //   it cannot occur.
 								            Ok(_) => unsafe { unreachable_unchecked() },
 								        }
 								    }
 								}
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								impl<S: ParseState> FromResidual<ControlFlow<TransitionResult<S>, Infallible>>
 								    for TransitionResult<S>
 								{
 								    fn from_residual(
 								        residual: ControlFlow<TransitionResult<S>, Infallible>,
 								    ) -> Self {
 								        match residual {
 								            ControlFlow::Break(result) => result,
 								            // SAFETY: Infallible, so cannot hit.
 								            ControlFlow::Continue(_) => unsafe { unreachable_unchecked() },
 								        }
 								    }
 								}
-												tamer: parse (Transitionable): New

This simply removes boilerplate.

This will receive concrete examples once I come up with docs for the entire
module; there's boilerplate involved in testing and documenting this in
isolation and the time investment is not worth it yet until I'm certain that
this will not be changed.

DEV-10863

											
										
										
											2022-03-30 10:00:17 -04:00
+								/// An object able to be used as data for a state [`Transition`].
 								///
 								/// This flips the usual order of things:
 								///   rather than using a method of [`Transition`] to provide data,
 								///     this starts with the data and produces a transition from it.
 								/// This is sometimes necessary to satisfy ownership/borrowing rules.
 								///
 								/// This trait simply removes boilerplate associated with storing
 								///   intermediate values and translating into the resulting type.
 								pub trait Transitionable<S: ParseState> {
 								    /// Perform a state transition to `S` using [`Self`] as the associated
 								    ///   data.
 								    ///
 								    /// This may be necessary to satisfy ownership/borrowing rules when
 								    ///   state data from `S` is used to compute [`Self`].
 								    fn transition(self, to: S) -> TransitionResult<S>;
 								}
 								impl<S, E> Transitionable<S> for Result<ParseStatus<S>, E>
 								where
 								    S: ParseState,
 								    <S as ParseState>::Error: From<E>,
 								{
 								    fn transition(self, to: S) -> TransitionResult<S> {
 								        Transition(to).result(self)
 								    }
 								}
 								impl<S, E> Transitionable<S> for Result<(), E>
 								where
 								    S: ParseState,
 								    <S as ParseState>::Error: From<E>,
 								{
 								    fn transition(self, to: S) -> TransitionResult<S> {
 								        Transition(to).result(self.map(|_| ParseStatus::Incomplete))
 								    }
 								}
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								/// A streaming parser defined by a [`ParseState`] with exclusive
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///   mutable access to an underlying [`TokenStream`].
 								///
 								/// This parser handles operations that are common among all types of
 								///   parsers,
 								///     such that specialized parsers need only implement logic that is
 								///     unique to their operation.
 								/// This also simplifies combinators,
 								///   since there is more uniformity among distinct parser types.
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								///
 								/// After you have finished with a parser,
 								///   if you have not consumed the entire iterator,
 								///   call [`finalize`](Parser::finalize) to ensure that parsing has
 								///     completed in an accepting state.
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								#[derive(Debug, PartialEq, Eq)]
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								pub struct Parser<S: ParseState, I: TokenStream<S::Token>> {
-												tamer: xir::tree::parse::Parser: Remove lifetime

This will allow Parser to operate on both owned and &mut values, and is the
same approach that Rust's built-in iterators take.

This is at first quite surprising, and I often forget that this is a
feature, and, as a bonus, an attractive way to avoid lifetimes in struct
definitions when generics are used for the type that may become a
reference.

DEV-11268

											
										
										
											2021-12-13 16:51:15 -05:00
+								    toks: I,
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    state: S,
-												tamer: parse::Parser (last_span): Replace Option with UNKNOWN_SPAN

There's no use in complicating the error handling here when we'd just
default to `UNKNOWN_SPAN` anyway when trying to render it.  `UNKNOWN_SPAN`
didn't exist at the time of writing.

DEV-10935

											
										
										
											2022-04-12 09:59:00 -04:00
+								    last_span: Span,
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								    ctx: S::Context,
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								}
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								impl<S: ParseState, I: TokenStream<S::Token>> Parser<S, I> {
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								    /// Indicate that no further parsing will take place using this parser,
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								    ///   retrieve any final aggregate state (the context),
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								    ///   and [`drop`] it.
 								    ///
 								    /// Invoking the method is equivalent to stating that the stream has
 								    ///   ended,
 								    ///     since the parser will have no later opportunity to continue
 								    ///     parsing.
 								    /// Consequently,
 								    ///   the caller should expect [`ParseError::UnexpectedEof`] if the
 								    ///   parser is not in an accepting state.
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								    ///
 								    /// To re-use the context returned by this method,
 								    ///   see [`ParseState::parse_with_context`].
 								    /// Note that whether the context is permitted to be reused,
 								    ///   or is useful independently to the caller,
 								    ///   is a decision made by the [`ParseState`].
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    pub fn finalize(
 								        self,
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								    ) -> Result<S::Context, (Self, ParseError<S::Token, S::Error>)> {
 								        match self.assert_accepting() {
 								            Ok(()) => Ok(self.ctx),
 								            Err(err) => Err((self, err)),
 								        }
-												tamer: parse::Parser: Extract logic from Iterator impl

This introduces a (still-private) way to _push_ tokens into the parser,
rather than relying purely on a pull-based interface.  Not only does this
simplify the iterator, but this is also preparing to make the new `feed_tok`
public so that parsers can be composed in more contexts.  I suspect that
this method may also be useful for error recovery, since it can be used to
inject tokens into arbitrary points of a token stream.

I kept the new method private for now so that I can introduce the new API
and docs separate from this refactoring.

DEV-10863

											
										
										
											2022-03-22 10:10:59 -04:00
+								    }
 								    /// Return [`Ok`] if the parser is in an accepting state,
 								    ///   otherwise [`Err`] with [`ParseError::UnexpectedEof`].
 								    ///
 								    /// See [`finalize`](Self::finalize) for the public-facing method.
 								    fn assert_accepting(&self) -> Result<(), ParseError<S::Token, S::Error>> {
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        if self.state.is_accepting() {
 								            Ok(())
 								        } else {
-												tamer: parse::Parser (last_span): Replace Option with UNKNOWN_SPAN

There's no use in complicating the error handling here when we'd just
default to `UNKNOWN_SPAN` anyway when trying to render it.  `UNKNOWN_SPAN`
didn't exist at the time of writing.

DEV-10935

											
										
										
											2022-04-12 09:59:00 -04:00
+								            let endpoints = self.last_span.endpoints();
 								            Err(ParseError::UnexpectedEof(
 								                endpoints.1.unwrap_or(endpoints.0),
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								                self.state.to_string(),
-												tamer: parse::Parser (last_span): Replace Option with UNKNOWN_SPAN

There's no use in complicating the error handling here when we'd just
default to `UNKNOWN_SPAN` anyway when trying to render it.  `UNKNOWN_SPAN`
didn't exist at the time of writing.

DEV-10935

											
										
										
											2022-04-12 09:59:00 -04:00
+								            ))
-												tamer: parse::Parser: Extract logic from Iterator impl

This introduces a (still-private) way to _push_ tokens into the parser,
rather than relying purely on a pull-based interface.  Not only does this
simplify the iterator, but this is also preparing to make the new `feed_tok`
public so that parsers can be composed in more contexts.  I suspect that
this method may also be useful for error recovery, since it can be used to
inject tokens into arbitrary points of a token stream.

I kept the new method private for now so that I can introduce the new API
and docs separate from this refactoring.

DEV-10863

											
										
										
											2022-03-22 10:10:59 -04:00
+								        }
 								    }
 								    /// Feed an input token to the parser.
 								    ///
 								    /// This _pushes_ data into the parser,
 								    ///   rather than the typical pull system used by [`Parser`]'s
 								    ///   [`Iterator`] implementation.
 								    /// The pull system also uses this method to provided data to the
 								    ///   parser.
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								    ///
 								    /// This method is intentionally private,
 								    ///   since push parsers are currently supported only internally.
 								    /// The only thing preventing this being public is formalization and a
 								    ///   commitment to maintain it.
 								    fn feed_tok(&mut self, tok: S::Token) -> ParsedResult<S> {
-												tamer: parse::Parser: Extract logic from Iterator impl

This introduces a (still-private) way to _push_ tokens into the parser,
rather than relying purely on a pull-based interface.  Not only does this
simplify the iterator, but this is also preparing to make the new `feed_tok`
public so that parsers can be composed in more contexts.  I suspect that
this method may also be useful for error recovery, since it can be used to
inject tokens into arbitrary points of a token stream.

I kept the new method private for now so that I can introduce the new API
and docs separate from this refactoring.

DEV-10863

											
										
										
											2022-03-22 10:10:59 -04:00
+								        // Store the most recently encountered Span for error
 								        //   reporting in case we encounter an EOF.
-												tamer: parse::Parser (last_span): Replace Option with UNKNOWN_SPAN

There's no use in complicating the error handling here when we'd just
default to `UNKNOWN_SPAN` anyway when trying to render it.  `UNKNOWN_SPAN`
didn't exist at the time of writing.

DEV-10935

											
										
										
											2022-04-12 09:59:00 -04:00
+								        self.last_span = tok.span();
-												tamer: parse::Parser: Extract logic from Iterator impl

This introduces a (still-private) way to _push_ tokens into the parser,
rather than relying purely on a pull-based interface.  Not only does this
simplify the iterator, but this is also preparing to make the new `feed_tok`
public so that parsers can be composed in more contexts.  I suspect that
this method may also be useful for error recovery, since it can be used to
inject tokens into arbitrary points of a token stream.

I kept the new method private for now so that I can introduce the new API
and docs separate from this refactoring.

DEV-10863

											
										
										
											2022-03-22 10:10:59 -04:00
 								        let result;
-												tamer: parse::TransitionResult: Alias=>newtype

This converts the tuple type alias into a newtype, so that we may provide
our own implementations.

This differs from a previous approach that I took, which involved making
this type `Result<(S, T), (S, E)>` so that the return values composed well
with other functions.  But the reality is that this is used only by other
`ParseState`s and `Parser`, so it's unnecessary.

However, this is also an attempt to utilize the new Try and FromResidual
traits; note how the Try associated types match precisely what I was trying
to do before, though they're used as intermediate types.  I'll see how this
evolves.

DEV-10863

											
										
										
											2022-03-25 09:56:22 -04:00
+								        TransitionResult(Transition(self.state), result) =
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								            take(&mut self.state).parse_token(tok, &mut self.ctx);
-												tamer: parse::Parser: Extract logic from Iterator impl

This introduces a (still-private) way to _push_ tokens into the parser,
rather than relying purely on a pull-based interface.  Not only does this
simplify the iterator, but this is also preparing to make the new `feed_tok`
public so that parsers can be composed in more contexts.  I suspect that
this method may also be useful for error recovery, since it can be used to
inject tokens into arbitrary points of a token stream.

I kept the new method private for now so that I can introduce the new API
and docs separate from this refactoring.

DEV-10863

											
										
										
											2022-03-22 10:10:59 -04:00
 								        use ParseStatus::*;
 								        match result {
 								            // Nothing handled this dead state,
 								            //   and we cannot discard a lookahead token,
 								            //   so we have no choice but to produce an error.
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								            Ok(Dead(invalid)) => Err(ParseError::UnexpectedToken(
 								                invalid,
 								                self.state.to_string(),
 								            )),
-												tamer: parse::Parser: Extract logic from Iterator impl

This introduces a (still-private) way to _push_ tokens into the parser,
rather than relying purely on a pull-based interface.  Not only does this
simplify the iterator, but this is also preparing to make the new `feed_tok`
public so that parsers can be composed in more contexts.  I suspect that
this method may also be useful for error recovery, since it can be used to
inject tokens into arbitrary points of a token stream.

I kept the new method private for now so that I can introduce the new API
and docs separate from this refactoring.

DEV-10863

											
										
										
											2022-03-22 10:10:59 -04:00
 								            Ok(parsed @ (Incomplete | Object(..))) => Ok(parsed.into()),
 								            Err(e) => Err(e.into()),
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        }
 								    }
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
 								    /// Lower the IR produced by this [`Parser`] into another IR by piping
 								    ///   the output to a new parser defined by the [`ParseState`] `LS`.
 								    ///
 								    /// This parser consumes tokens `S::Token` and produces the IR
 								    ///   `S::Output`.
 								    /// If there is some other [`ParseState`] `LS` such that
 								    ///   `LS::Token == S::Output`
 								    ///     (that is—the output of this parser is the input to another),
 								    ///     then this method will wire the two together into a new iterator
 								    ///       that produces `LS::Output`.
 								    ///
 								    /// Visually, we have,
 								    ///   within the provided closure `f`,
 								    ///   a [`LowerIter`] that acts as this pipeline:
 								    ///
 								    /// ```text
 								    /// (S::Token) -> (S::Output == LS::Token) -> (LS::Output)
 								    /// ```
 								    ///
 								    /// The new iterator is a [`LowerIter`],
 								    ///   and scoped to the provided closure `f`.
 								    /// The outer [`Result`] of `Self`'s [`ParsedResult`] is stripped by
 								    ///   a [`TripIter`] before being provided as input to a new push
 								    ///   [`Parser`] utilizing `LS`.
 								    /// A push parser,
 								    ///   rather than pulling tokens from a [`TokenStream`],
 								    ///   has tokens pushed into it;
 								    ///     this parser is created automatically for you.
 								    ///
 								    /// _TODO_: There's no way to access the inner parser for error recovery
 								    ///   after tripping the [`TripIter`].
 								    /// Consequently,
 								    ///   this API (likely the return type) will change.
 								    #[inline]
-												tamer: iter::trip: Flatten Result

The `*_iter_while_ok` functions now compose like monads, flattening `Result`
at each step and drastically simplifying handling of error types.  This also
removes the bunch of `?`s at the end of the expression, and allows me to use
`?` within the callback itself.

I had originally not used `Result` as the return type of the callback
because I was not entirely sure how I was going to use them, but it's now
clear that I _always_ use `Result` as the return type, and so there's no use
in trying to be too accommodating; it can always change in the future.

This is desirable not just for cleanup, but because trying to refactor
`asg_builder` into a pair of `Parser`s is really messy to chain without
flattening, especially given some state that has to leak temporarily to the
caller.  More on that in a future commit.

DEV-11864

											
										
										
											2022-05-20 15:43:49 -04:00
+								    pub fn lower_while_ok<LS, U, E>(
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								        &mut self,
-												tamer: parse::LowerIter: Generic inner TripIter iterator

This commit is preparing to compose LowerIter directly.

DEV-11864

											
										
										
											2022-05-24 10:27:14 -04:00
+								        f: impl FnOnce(&mut LowerIter<S, Parser<S, I>, LS>) -> Result<U, E>,
-												tamer: iter::trip: Flatten Result

The `*_iter_while_ok` functions now compose like monads, flattening `Result`
at each step and drastically simplifying handling of error types.  This also
removes the bunch of `?`s at the end of the expression, and allows me to use
`?` within the callback itself.

I had originally not used `Result` as the return type of the callback
because I was not entirely sure how I was going to use them, but it's now
clear that I _always_ use `Result` as the return type, and so there's no use
in trying to be too accommodating; it can always change in the future.

This is desirable not just for cleanup, but because trying to refactor
`asg_builder` into a pair of `Parser`s is really messy to chain without
flattening, especially given some state that has to leak temporarily to the
caller.  More on that in a future commit.

DEV-11864

											
										
										
											2022-05-20 15:43:49 -04:00
+								    ) -> Result<U, E>
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								    where
 								        LS: ParseState<Token = S::Object>,
 								        <S as ParseState>::Object: Token,
-												tamer: parse::ParseState::Context: Remove Default trait bound

This is too restrictive, especially for parsers that fold into something,
like the ASG, which may exist prior to invoking the parser.

This moves the trait bound to the functions that actually need it.  Those
obviously cannot be used if the Context does not implement `Default`, but
I'll provide alternative conveniences.

DEV-11864

											
										
										
											2022-05-05 15:55:04 -04:00
+								        <LS as ParseState>::Context: Default,
-												tamer: iter::trip: Flatten Result

The `*_iter_while_ok` functions now compose like monads, flattening `Result`
at each step and drastically simplifying handling of error types.  This also
removes the bunch of `?`s at the end of the expression, and allows me to use
`?` within the callback itself.

I had originally not used `Result` as the return type of the callback
because I was not entirely sure how I was going to use them, but it's now
clear that I _always_ use `Result` as the return type, and so there's no use
in trying to be too accommodating; it can always change in the future.

This is desirable not just for cleanup, but because trying to refactor
`asg_builder` into a pair of `Parser`s is really messy to chain without
flattening, especially given some state that has to leak temporarily to the
caller.  More on that in a future commit.

DEV-11864

											
										
										
											2022-05-20 15:43:49 -04:00
+								        ParseError<S::Token, S::Error>: Into<E>,
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								    {
 								        self.while_ok(|toks| {
 								            // TODO: This parser is not accessible after error recovery!
 								            let lower = LS::parse(iter::empty());
 								            f(&mut LowerIter { lower, toks })
 								        })
-												tamer: iter::trip: Flatten Result

The `*_iter_while_ok` functions now compose like monads, flattening `Result`
at each step and drastically simplifying handling of error types.  This also
removes the bunch of `?`s at the end of the expression, and allows me to use
`?` within the callback itself.

I had originally not used `Result` as the return type of the callback
because I was not entirely sure how I was going to use them, but it's now
clear that I _always_ use `Result` as the return type, and so there's no use
in trying to be too accommodating; it can always change in the future.

This is desirable not just for cleanup, but because trying to refactor
`asg_builder` into a pair of `Parser`s is really messy to chain without
flattening, especially given some state that has to leak temporarily to the
caller.  More on that in a future commit.

DEV-11864

											
										
										
											2022-05-20 15:43:49 -04:00
+								        .map_err(Into::into)
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								    }
 								}
 								/// An IR lowering operation that pipes the output of one [`Parser`] to the
 								///   input of another.
 								///
 								/// This is produced by [`Parser::lower_while_ok`].
 								pub struct LowerIter<'a, 'b, S, I, LS>
 								where
 								    S: ParseState,
-												tamer: parse::LowerIter: Generic inner TripIter iterator

This commit is preparing to compose LowerIter directly.

DEV-11864

											
										
										
											2022-05-24 10:27:14 -04:00
+								    I: Iterator<Item = ParsedResult<S>>,
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								    LS: ParseState<Token = S::Object>,
 								    <S as ParseState>::Object: Token,
 								{
 								    /// A push [`Parser`].
 								    lower: Parser<LS, Empty<LS::Token>>,
 								    /// Source tokens from higher-level [`Parser`],
 								    ///   with the outer [`Result`] having been stripped by a [`TripIter`].
 								    toks: &'a mut TripIter<
 								        'b,
-												tamer: parse::LowerIter: Generic inner TripIter iterator

This commit is preparing to compose LowerIter directly.

DEV-11864

											
										
										
											2022-05-24 10:27:14 -04:00
+								        I,
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								        Parsed<S::Object>,
 								        ParseError<S::Token, S::Error>,
 								    >,
 								}
 								impl<'a, 'b, S, I, LS> Iterator for LowerIter<'a, 'b, S, I, LS>
 								where
 								    S: ParseState,
-												tamer: parse::LowerIter: Generic inner TripIter iterator

This commit is preparing to compose LowerIter directly.

DEV-11864

											
										
										
											2022-05-24 10:27:14 -04:00
+								    I: Iterator<Item = ParsedResult<S>>,
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								    LS: ParseState<Token = S::Object>,
 								    <S as ParseState>::Object: Token,
 								{
 								    type Item = ParsedResult<LS>;
 								    /// Pull a token through the higher-level [`Parser`],
 								    ///   push it to the lowering parser,
 								    ///   and yield the resulting [`ParseResult`].
 								    #[inline]
 								    fn next(&mut self) -> Option<Self::Item> {
 								        match self.toks.next() {
 								            None => None,
 								            Some(Parsed::Incomplete) => Some(Ok(Parsed::Incomplete)),
 								            Some(Parsed::Object(obj)) => Some(self.lower.feed_tok(obj)),
 								        }
 								    }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								}
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								impl<S: ParseState, I: TokenStream<S::Token>> Iterator for Parser<S, I> {
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								    type Item = ParsedResult<S>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    /// Parse a single [`Token`] according to the current
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								    ///   [`ParseState`],
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    ///     if available.
 								    ///
 								    /// If the underlying [`TokenStream`] yields [`None`],
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								    ///   then the [`ParseState`] must be in an accepting state;
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    ///     otherwise, [`ParseError::UnexpectedEof`] will occur.
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    ///
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    /// This is intended to be invoked by [`Iterator::next`].
 								    /// Accepting a token rather than the [`TokenStream`] allows the caller
 								    ///   to inspect the token first
 								    ///     (e.g. to store a copy of the [`Span`][crate::span::Span]).
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    #[inline]
 								    fn next(&mut self) -> Option<Self::Item> {
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        let otok = self.toks.next();
 								        match otok {
-												tamer: parse::Parser: Extract logic from Iterator impl

This introduces a (still-private) way to _push_ tokens into the parser,
rather than relying purely on a pull-based interface.  Not only does this
simplify the iterator, but this is also preparing to make the new `feed_tok`
public so that parsers can be composed in more contexts.  I suspect that
this method may also be useful for error recovery, since it can be used to
inject tokens into arbitrary points of a token stream.

I kept the new method private for now so that I can introduce the new API
and docs separate from this refactoring.

DEV-10863

											
										
										
											2022-03-22 10:10:59 -04:00
+								            None => match self.assert_accepting() {
 								                Ok(()) => None,
 								                Err(e) => Some(Err(e)),
 								            },
 								            Some(tok) => Some(self.feed_tok(tok)),
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    }
 								}
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								/// Common parsing errors produced by [`Parser`].
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///
 								/// These errors are common enough that they are handled in a common way,
 								///   such that individual parsers needn't check for these situations
 								///   themselves.
 								///
 								/// Having a common type also allows combinators to handle error types in a
 								///   consistent way when composing parsers.
 								///
 								/// Parsers may return their own unique errors via the
 								///   [`StateError`][ParseError::StateError] variant.
-												tamer: parse::ParseError: Remove Eq trait bound

Just as in other commits, since it's an unnecessary limitation.

DEV-11864

											
										
										
											2022-05-18 15:48:08 -04:00
+								#[derive(Debug, PartialEq)]
-												tamer: parse::ParseState:Error: Relax Eq trait bound

This is unnecessarily restrictive, since we do not require anything further
than `PartialEq` for the situations where we care about equality (tests).

DEV-11864

											
										
										
											2022-05-06 15:28:47 -04:00
+								pub enum ParseError<T: Token, E: Diagnostic + PartialEq> {
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    /// Token stream ended unexpectedly.
 								    ///
 								    /// This error means that the parser was expecting more input before
 								    ///   reaching an accepting state.
 								    /// This could represent a truncated file,
 								    ///   a malformed stream,
 								    ///   or maybe just a user that's not done typing yet
 								    ///     (e.g. in the case of an LSP implementation).
 								    ///
 								    /// If no span is available,
 								    ///   then parsing has not even had the chance to begin.
 								    /// If this parser follows another,
 								    ///   then the combinator ought to substitute a missing span with
 								    ///   whatever span preceded this invocation.
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								    ///
 								    /// The string is intended to describe what was expected to have been
 								    ///   available based on the current [`ParseState`].
 								    /// It is a heap-allocated string so that a copy of [`ParseState`]
 								    ///   needn't be stored.
 								    UnexpectedEof(Span, String),
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								    /// The parser reached an unhandled dead state.
 								    ///
 								    /// Once a parser returns [`ParseStatus::Dead`],
 								    ///   a parent context must use that provided token as a lookahead.
 								    /// If that does not occur,
 								    ///   [`Parser`] produces this error.
 								    ///
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								    /// The string is intended to describe what was expected to have been
 								    ///   available based on the current [`ParseState`].
 								    /// It is a heap-allocated string so that a copy of [`ParseState`]
 								    ///   needn't be stored.
 								    UnexpectedToken(T, String),
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    /// A parser-specific error associated with an inner
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								    ///   [`ParseState`].
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    StateError(E),
 								}
-												tamer: parse::ParseError: Remove Eq trait bound

Just as in other commits, since it's an unnecessary limitation.

DEV-11864

											
										
										
											2022-05-18 15:48:08 -04:00
+								impl<T: Token, EA: Diagnostic + PartialEq> ParseError<T, EA> {
-												tamer: diagnose: Introduction of diagnostic system

This is a working concept that will continue to evolve.  I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.

This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does.  The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them.  I need to
balance this work with everything else I have going on.

This is a large commit, but it converts the existing Error Display impls
into Diagnostic.  This separation is a bit verbose, so I'll see how this
ends up evolving.

Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.

Output is integrated into tameld only in this commit; I'll add tamec
next.  Examples of what this outputs are available in the test cases in this
commit.

DEV-10935

											
										
										
											2022-04-13 14:41:54 -04:00
+								    pub fn inner_into<EB: Diagnostic + PartialEq + Eq>(
 								        self,
 								    ) -> ParseError<T, EB>
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								    where
 								        EA: Into<EB>,
 								    {
 								        use ParseError::*;
 								        match self {
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								            UnexpectedEof(span, desc) => UnexpectedEof(span, desc),
 								            UnexpectedToken(x, desc) => UnexpectedToken(x, desc),
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								            StateError(e) => StateError(e.into()),
 								        }
 								    }
 								}
-												tamer: parse::ParseState:Error: Relax Eq trait bound

This is unnecessarily restrictive, since we do not require anything further
than `PartialEq` for the situations where we care about equality (tests).

DEV-11864

											
										
										
											2022-05-06 15:28:47 -04:00
+								impl<T: Token, E: Diagnostic + PartialEq> From<E> for ParseError<T, E> {
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    fn from(e: E) -> Self {
 								        Self::StateError(e)
 								    }
 								}
-												tamer: parse::ParseError: Remove Eq trait bound

Just as in other commits, since it's an unnecessary limitation.

DEV-11864

											
										
										
											2022-05-18 15:48:08 -04:00
+								impl<T: Token, E: Diagnostic + PartialEq> Display for ParseError<T, E> {
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
 								        match self {
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								            Self::UnexpectedEof(_, desc) => {
 								                write!(f, "unexpected end of input while {desc}")
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								            }
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								            Self::UnexpectedToken(_, desc) => {
 								                write!(f, "unexpected input while {desc}")
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								            }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								            Self::StateError(e) => Display::fmt(e, f),
 								        }
 								    }
 								}
-												tamer: parse::ParseError: Remove Eq trait bound

Just as in other commits, since it's an unnecessary limitation.

DEV-11864

											
										
										
											2022-05-18 15:48:08 -04:00
+								impl<T: Token, E: Diagnostic + PartialEq + 'static> Error for ParseError<T, E> {
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    fn source(&self) -> Option<&(dyn Error + 'static)> {
 								        match self {
 								            Self::StateError(e) => Some(e),
 								            _ => None,
 								        }
 								    }
 								}
-												tamer: parse::ParseError: Remove Eq trait bound

Just as in other commits, since it's an unnecessary limitation.

DEV-11864

											
										
										
											2022-05-18 15:48:08 -04:00
+								impl<T: Token, E: Diagnostic + PartialEq + 'static> Diagnostic
-												tamer: diagnose: Introduction of diagnostic system

This is a working concept that will continue to evolve.  I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.

This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does.  The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them.  I need to
balance this work with everything else I have going on.

This is a large commit, but it converts the existing Error Display impls
into Diagnostic.  This separation is a bit verbose, so I'll see how this
ends up evolving.

Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.

Output is integrated into tameld only in this commit; I'll add tamec
next.  Examples of what this outputs are available in the test cases in this
commit.

DEV-10935

											
										
										
											2022-04-13 14:41:54 -04:00
+								    for ParseError<T, E>
 								{
 								    fn describe(&self) -> Vec<AnnotatedSpan> {
 								        use ParseError::*;
 								        match self {
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								            UnexpectedEof(span, desc) => span.error(desc).into(),
-												tamer: diagnose: Introduction of diagnostic system

This is a working concept that will continue to evolve.  I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.

This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does.  The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them.  I need to
balance this work with everything else I have going on.

This is a large commit, but it converts the existing Error Display impls
into Diagnostic.  This separation is a bit verbose, so I'll see how this
ends up evolving.

Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.

Output is integrated into tameld only in this commit; I'll add tamec
next.  Examples of what this outputs are available in the test cases in this
commit.

DEV-10935

											
										
										
											2022-04-13 14:41:54 -04:00
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								            UnexpectedToken(tok, desc) => tok.span().error(desc).into(),
-												tamer: diagnose: Introduction of diagnostic system

This is a working concept that will continue to evolve.  I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.

This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does.  The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them.  I need to
balance this work with everything else I have going on.

This is a large commit, but it converts the existing Error Display impls
into Diagnostic.  This separation is a bit verbose, so I'll see how this
ends up evolving.

Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.

Output is integrated into tameld only in this commit; I'll add tamec
next.  Examples of what this outputs are available in the test cases in this
commit.

DEV-10935

											
										
										
											2022-04-13 14:41:54 -04:00
 								            // TODO: Is there any additional useful context we can augment
 								            //   this with?
 								            StateError(e) => e.describe(),
 								        }
 								    }
 								}
-												tamer: parse::ParseState::Context: Remove Default trait bound

This is too restrictive, especially for parsers that fold into something,
like the ASG, which may exist prior to invoking the parser.

This moves the trait bound to the functions that actually need it.  Those
obviously cannot be used if the Context does not implement `Default`, but
I'll provide alternative conveniences.

DEV-11864

											
										
										
											2022-05-05 15:55:04 -04:00
+								impl<S, I> From<I> for Parser<S, I>
 								where
 								    S: ParseState,
 								    I: TokenStream<S::Token>,
 								    <S as ParseState>::Context: Default,
 								{
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								    /// Create a new parser with a default context.
 								    ///
 								    /// This can only be used if the associated [`ParseState::Context`] does
 								    ///   not implement [`Default`];
 								    ///     otherwise,
 								    ///       consider instantiating from a `(TokenStream, Context)` pair.
 								    /// See also [`ParseState::parse`] and
 								    ///   [`ParseState::parse_with_context`].
-												tamer: xir::tree::parse::Parser: Remove lifetime

This will allow Parser to operate on both owned and &mut values, and is the
same approach that Rust's built-in iterators take.

This is at first quite surprising, and I often forget that this is a
feature, and, as a bonus, an attractive way to avoid lifetimes in struct
definitions when generics are used for the type that may become a
reference.

DEV-11268

											
										
										
											2021-12-13 16:51:15 -05:00
+								    fn from(toks: I) -> Self {
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        Self {
 								            toks,
 								            state: Default::default(),
-												tamer: parse::Parser (last_span): Replace Option with UNKNOWN_SPAN

There's no use in complicating the error handling here when we'd just
default to `UNKNOWN_SPAN` anyway when trying to render it.  `UNKNOWN_SPAN`
didn't exist at the time of writing.

DEV-10935

											
										
										
											2022-04-12 09:59:00 -04:00
+								            last_span: UNKNOWN_SPAN,
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								            ctx: Default::default(),
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        }
 								    }
 								}
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								impl<S, I, C> From<(I, C)> for Parser<S, I>
 								where
 								    S: ParseState<Context = C>,
 								    I: TokenStream<S::Token>,
 								{
 								    /// Create a new parser with a provided context.
 								    ///
 								    /// For more information,
 								    ///   see [`ParseState::parse_with_context`].
 								    ///
 								    /// See also [`ParseState::parse`].
 								    fn from((toks, ctx): (I, C)) -> Self {
 								        Self {
 								            toks,
 								            state: Default::default(),
 								            last_span: UNKNOWN_SPAN,
 								            ctx,
 								        }
 								    }
 								}
-												tamer: xir::tree::parse: Use new Parsed::Done variant over None

This removes Option from ParseState, as mentioned in previous commits.

This is ideal because it not only removes a layer of abstraction, but also
makes the intent very clear; the use of None was too tied to the concept of
an Iterator, which is the concern of Parser, _not_ ParseState.

This is now similar to tree::Parsed, which will help with that refactoring
shortly.

The Done variant is not accessible outside of Parser, since it always
coverts it to None (to halt iteration); given that, we should have another
public-facing type, as was also mentioned in a previous commit.

DEV-11268

											
										
										
											2021-12-10 16:22:02 -05:00
+								/// Result of a parsing operation.
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								#[derive(Debug, PartialEq, Eq)]
-												tamer: parse: More flexible Transition API

This does some cleanup and adds `parse::Object` for use in disambiguating
`From` for `ParseStatus`, allowing the `Transition` API to be much more
flexible in the data it accepts and automatically converts.  This allows us
to concisely provide raw output data to be wrapped, or provide `ParseStatus`
directly when more convenient.

There aren't yet examples in the docs; I'll do so once I make sure this API
is actually utilized as intended.

DEV-10863

											
										
										
											2022-03-25 16:45:32 -04:00
+								pub enum ParseStatus<S: ParseState> {
-												tamer: xir::tree::parse: Use new Parsed::Done variant over None

This removes Option from ParseState, as mentioned in previous commits.

This is ideal because it not only removes a layer of abstraction, but also
makes the intent very clear; the use of None was too tied to the concept of
an Iterator, which is the concern of Parser, _not_ ParseState.

This is now similar to tree::Parsed, which will help with that refactoring
shortly.

The Done variant is not accessible outside of Parser, since it always
coverts it to None (to halt iteration); given that, we should have another
public-facing type, as was also mentioned in a previous commit.

DEV-11268

											
										
										
											2021-12-10 16:22:02 -05:00
+								    /// Additional tokens are needed to complete parsing of the next object.
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    Incomplete,
-												tamer: xir::tree::parse: Use new Parsed::Done variant over None

This removes Option from ParseState, as mentioned in previous commits.

This is ideal because it not only removes a layer of abstraction, but also
makes the intent very clear; the use of None was too tied to the concept of
an Iterator, which is the concern of Parser, _not_ ParseState.

This is now similar to tree::Parsed, which will help with that refactoring
shortly.

The Done variant is not accessible outside of Parser, since it always
coverts it to None (to halt iteration); given that, we should have another
public-facing type, as was also mentioned in a previous commit.

DEV-11268

											
										
										
											2021-12-10 16:22:02 -05:00
 								    /// Parsing of an object is complete.
 								    ///
 								    /// This does not indicate that the parser is complete,
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								    ///   as more objects may be able to be emitted.
-												tamer: parse: More flexible Transition API

This does some cleanup and adds `parse::Object` for use in disambiguating
`From` for `ParseStatus`, allowing the `Transition` API to be much more
flexible in the data it accepts and automatically converts.  This allows us
to concisely provide raw output data to be wrapped, or provide `ParseStatus`
directly when more convenient.

There aren't yet examples in the docs; I'll do so once I make sure this API
is actually utilized as intended.

DEV-10863

											
										
										
											2022-03-25 16:45:32 -04:00
+								    Object(S::Object),
-												tamer: xir::tree::parse: Use new Parsed::Done variant over None

This removes Option from ParseState, as mentioned in previous commits.

This is ideal because it not only removes a layer of abstraction, but also
makes the intent very clear; the use of None was too tied to the concept of
an Iterator, which is the concern of Parser, _not_ ParseState.

This is now similar to tree::Parsed, which will help with that refactoring
shortly.

The Done variant is not accessible outside of Parser, since it always
coverts it to None (to halt iteration); given that, we should have another
public-facing type, as was also mentioned in a previous commit.

DEV-11268

											
										
										
											2021-12-10 16:22:02 -05:00
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								    /// Parser encountered a dead state relative to the given token.
 								    ///
 								    /// A dead state is an empty accepting state that has no state
 								    ///   transition for the given token.
 								    /// A state is empty if a [`ParseStatus::Object`] will not be lost if
 								    ///   parsing ends at this point
 								    ///     (that is---there is no partially-built object).
 								    /// This could simply mean that the parser has completed its job and
 								    ///   that control must be returned to a parent context.
 								    ///
 								    /// If a parser is _not_ in an accepting state,
 								    ///   then an error ought to occur rather than a dead state;
 								    ///     the difference between the two is that the token associated with
 								    ///       a dead state can be used as a lookahead token in order to
 								    ///       produce a state transition at a higher level,
 								    ///     whereas an error indicates that parsing has failed.
 								    /// Intuitively,
 								    ///   this means that a [`ParseStatus::Object`] had just been emitted
 								    ///   and that the token following it isn't something that can be
 								    ///   parsed.
 								    ///
 								    /// If there is no parent context to handle the token,
 								    ///   [`Parser`] must yield an error.
-												tamer: parse: More flexible Transition API

This does some cleanup and adds `parse::Object` for use in disambiguating
`From` for `ParseStatus`, allowing the `Transition` API to be much more
flexible in the data it accepts and automatically converts.  This allows us
to concisely provide raw output data to be wrapped, or provide `ParseStatus`
directly when more convenient.

There aren't yet examples in the docs; I'll do so once I make sure this API
is actually utilized as intended.

DEV-10863

											
										
										
											2022-03-25 16:45:32 -04:00
+								    Dead(S::Token),
 								}
 								impl<S: ParseState<Object = T>, T: Object> From<T> for ParseStatus<S> {
 								    fn from(obj: T) -> Self {
 								        Self::Object(obj)
 								    }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								}
-												tamer: xir::tree::parse: ParseStatus and Parsed

The old Parsed was renamed to ParseStatus to be used by Parser, and Parser
converts it into Parsed, which has the same variants as it did before and
has all but the Done variant, since it's not possible for Parser to yield
it.

DEV-11268

											
										
										
											2021-12-10 16:51:53 -05:00
+								/// Result of a parsing operation.
 								///
 								/// Whereas [`ParseStatus`] is used by [`ParseState`] to influence parser
 								///   operation,
 								///     this type is public-facing and used by [`Parser`].
 								#[derive(Debug, PartialEq, Eq)]
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								pub enum Parsed<O> {
-												tamer: xir::tree::parse: ParseStatus and Parsed

The old Parsed was renamed to ParseStatus to be used by Parser, and Parser
converts it into Parsed, which has the same variants as it did before and
has all but the Done variant, since it's not possible for Parser to yield
it.

DEV-11268

											
										
										
											2021-12-10 16:51:53 -05:00
+								    /// Additional tokens are needed to complete parsing of the next object.
 								    Incomplete,
 								    /// Parsing of an object is complete.
 								    ///
 								    /// This does not indicate that the parser is complete,
 								    ///   as more objects may be able to be emitted.
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    Object(O),
-												tamer: xir::tree::parse: ParseStatus and Parsed

The old Parsed was renamed to ParseStatus to be used by Parser, and Parser
converts it into Parsed, which has the same variants as it did before and
has all but the Done variant, since it's not possible for Parser to yield
it.

DEV-11268

											
										
										
											2021-12-10 16:51:53 -05:00
+								}
-												tamer: parse: More flexible Transition API

This does some cleanup and adds `parse::Object` for use in disambiguating
`From` for `ParseStatus`, allowing the `Transition` API to be much more
flexible in the data it accepts and automatically converts.  This allows us
to concisely provide raw output data to be wrapped, or provide `ParseStatus`
directly when more convenient.

There aren't yet examples in the docs; I'll do so once I make sure this API
is actually utilized as intended.

DEV-10863

											
										
										
											2022-03-25 16:45:32 -04:00
+								impl<S: ParseState> From<ParseStatus<S>> for Parsed<S::Object> {
 								    fn from(status: ParseStatus<S>) -> Self {
-												tamer: xir::tree::parse: ParseStatus and Parsed

The old Parsed was renamed to ParseStatus to be used by Parser, and Parser
converts it into Parsed, which has the same variants as it did before and
has all but the Done variant, since it's not possible for Parser to yield
it.

DEV-11268

											
										
										
											2021-12-10 16:51:53 -05:00
+								        match status {
 								            ParseStatus::Incomplete => Parsed::Incomplete,
 								            ParseStatus::Object(x) => Parsed::Object(x),
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								            ParseStatus::Dead(_) => {
 								                unreachable!("Dead status must be filtered by Parser")
-												tamer: xir::tree::parse: ParseStatus and Parsed

The old Parsed was renamed to ParseStatus to be used by Parser, and Parser
converts it into Parsed, which has the same variants as it did before and
has all but the Done variant, since it's not possible for Parser to yield
it.

DEV-11268

											
										
										
											2021-12-10 16:51:53 -05:00
+								            }
 								        }
 								    }
 								}
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								#[cfg(test)]
 								pub mod test {
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								    use std::{assert_matches::assert_matches, iter::once};
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								    use super::*;
-												tamer: xir::parse: UnexpectedEof Span at final offset

I'm not rendering errors yet in practice, so this wouldn't have been
noticed, but we want error messages to reference the final byte in a file on
EOF, not the offset of the last-encountered token, which would be confusing.

This doesn't _directly_ pertain to what I'm working on; I just happened to
notice it.

DEV-10863

											
										
										
											2022-03-17 21:33:05 -04:00
+								    use crate::{span::DUMMY_SPAN as DS, sym::GlobalSymbolIntern};
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								    #[derive(Debug, PartialEq, Eq, Clone)]
 								    enum TestToken {
 								        Close(Span),
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								        MarkDone(Span),
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        Text(Span),
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								        SetCtxVal(u8),
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								    }
 								    impl Display for TestToken {
 								        fn fmt(&self, _f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
 								            unimplemented!("fmt::Display")
 								        }
 								    }
 								    impl Token for TestToken {
 								        fn span(&self) -> Span {
 								            use TestToken::*;
 								            match self {
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								                Close(span) | MarkDone(span) | Text(span) => *span,
 								                _ => UNKNOWN_SPAN,
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								            }
 								        }
 								    }
-												tamer: parse: More flexible Transition API

This does some cleanup and adds `parse::Object` for use in disambiguating
`From` for `ParseStatus`, allowing the `Transition` API to be much more
flexible in the data it accepts and automatically converts.  This allows us
to concisely provide raw output data to be wrapped, or provide `ParseStatus`
directly when more convenient.

There aren't yet examples in the docs; I'll do so once I make sure this API
is actually utilized as intended.

DEV-10863

											
										
										
											2022-03-25 16:45:32 -04:00
+								    impl Object for TestToken {}
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    #[derive(Debug, PartialEq, Eq)]
 								    enum EchoState {
 								        Empty,
 								        Done,
 								    }
 								    impl Default for EchoState {
 								        fn default() -> Self {
 								            Self::Empty
 								        }
 								    }
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								    #[derive(Debug, PartialEq, Default)]
 								    struct StubContext {
 								        val: u8,
 								    }
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								    impl ParseState for EchoState {
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        type Token = TestToken;
 								        type Object = TestToken;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        type Error = EchoStateError;
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								        type Context = StubContext;
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								        fn parse_token(
 								            self,
 								            tok: TestToken,
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								            ctx: &mut StubContext,
-												tamer: parse: Introduce mutable Context

This resolves the performance issues caused by Rust's failure to elide the
ElementStack (ArrayVec) memcpys on move.

Since XIRF is invoked tens of millions of times in some cases for larger
systems, prior to this change, failure to optimize away moves for XIRF
resulted in tens of millions of memcpys.  This resulted in linking of one
program going from 1s -> ~15s.  This change reduces it to ~2.5s with the
wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the
subject of future changes).

In particular, this change introduces a new mutable reference to
`ParseState::parse_token`, which is a reference to a `Context` owned by the
caller (e.g. `Parser`).  In the case of XIRF, this means that
`Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of
`flat::State`; this allows the latter to remain pure and benefit from Rust's
move optimizations, without sacrificing the otherwise-pure implementation.

ParseStates that do not need a mutable context can use `NoContext` and
remain pure.

DEV-12024

											
										
										
											2022-04-04 21:50:47 -04:00
+								        ) -> TransitionResult<Self> {
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								            match tok {
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								                TestToken::MarkDone(..) => Transition(Self::Done).ok(tok),
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								                TestToken::Close(..) => {
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								                    Transition(self).err(EchoStateError::InnerError(tok))
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								                }
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								                TestToken::Text(..) => Transition(self).dead(tok),
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								                TestToken::SetCtxVal(val) => {
 								                    ctx.val = val;
 								                    Transition(Self::Done).incomplete()
 								                }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								            }
 								        }
 								        fn is_accepting(&self) -> bool {
 								            *self == Self::Done
 								        }
 								    }
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								    impl Display for EchoState {
 								        fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
 								            write!(f, "<EchoState as Display>::fmt")
 								        }
 								    }
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								    #[derive(Debug, PartialEq, Eq)]
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    enum EchoStateError {
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        InnerError(TestToken),
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    }
 								    impl Display for EchoStateError {
 								        fn fmt(&self, _: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
 								            unimplemented!()
 								        }
 								    }
 								    impl Error for EchoStateError {
 								        fn source(&self) -> Option<&(dyn Error + 'static)> {
 								            None
 								        }
 								    }
-												tamer: diagnose: Introduction of diagnostic system

This is a working concept that will continue to evolve.  I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.

This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does.  The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them.  I need to
balance this work with everything else I have going on.

This is a large commit, but it converts the existing Error Display impls
into Diagnostic.  This separation is a bit verbose, so I'll see how this
ends up evolving.

Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.

Output is integrated into tameld only in this commit; I'll add tamec
next.  Examples of what this outputs are available in the test cases in this
commit.

DEV-10935

											
										
										
											2022-04-13 14:41:54 -04:00
+								    impl Diagnostic for EchoStateError {
 								        fn describe(&self) -> Vec<AnnotatedSpan> {
 								            unimplemented!()
 								        }
 								    }
-												tamer: xir::tree::parse::Parser: Remove lifetime

This will allow Parser to operate on both owned and &mut values, and is the
same approach that Rust's built-in iterators take.

This is at first quite surprising, and I often forget that this is a
feature, and, as a bonus, an attractive way to avoid lifetimes in struct
definitions when generics are used for the type that may become a
reference.

DEV-11268

											
										
										
											2021-12-13 16:51:15 -05:00
+								    type Sut<I> = Parser<EchoState, I>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								    #[test]
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    fn successful_parse_in_accepting_state_with_spans() {
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								        // EchoState is placed into a Done state given Comment.
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								        let tok = TestToken::MarkDone(DS);
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								        let mut toks = once(tok.clone());
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								        let mut sut = Sut::from(&mut toks);
 								        // The first token should be processed normally.
 								        // EchoState proxies the token back.
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								        assert_eq!(Some(Ok(Parsed::Object(tok))), sut.next());
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								        // This is now the end of the token stream,
 								        //   which should be okay provided that the first token put us into
 								        //   a proper accepting state.
 								        assert_eq!(None, sut.next());
 								        // Further, finalizing should work in this state.
 								        assert!(sut.finalize().is_ok());
 								    }
 								    #[test]
 								    fn fails_on_end_of_stream_when_not_in_accepting_state() {
-												tamer: xir::parse: UnexpectedEof Span at final offset

I'm not rendering errors yet in practice, so this wouldn't have been
noticed, but we want error messages to reference the final byte in a file on
EOF, not the offset of the last-encountered token, which would be confusing.

This doesn't _directly_ pertain to what I'm working on; I just happened to
notice it.

DEV-10863

											
										
										
											2022-03-17 21:33:05 -04:00
+								        let span = Span::new(10, 20, "ctx".intern());
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        let mut toks = [TestToken::Close(span)].into_iter();
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								        let mut sut = Sut::from(&mut toks);
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        // The first token is fine,
 								        //   and allows us to acquire our most recent span.
 								        sut.next();
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        // Given that we have no tokens,
 								        //   and that EchoState::default does not start in an accepting
 								        //     state,
 								        //   we must fail when we encounter the end of the stream.
-												tamer: xir::parse: UnexpectedEof Span at final offset

I'm not rendering errors yet in practice, so this wouldn't have been
noticed, but we want error messages to reference the final byte in a file on
EOF, not the offset of the last-encountered token, which would be confusing.

This doesn't _directly_ pertain to what I'm working on; I just happened to
notice it.

DEV-10863

											
										
										
											2022-03-17 21:33:05 -04:00
+								        assert_eq!(
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								            Some(Err(ParseError::UnexpectedEof(
 								                span.endpoints().1.unwrap(),
 								                // All the states have the same string
 								                //   (at time of writing).
 								                EchoState::default().to_string(),
 								            ))),
-												tamer: xir::parse: UnexpectedEof Span at final offset

I'm not rendering errors yet in practice, so this wouldn't have been
noticed, but we want error messages to reference the final byte in a file on
EOF, not the offset of the last-encountered token, which would be confusing.

This doesn't _directly_ pertain to what I'm working on; I just happened to
notice it.

DEV-10863

											
										
										
											2022-03-17 21:33:05 -04:00
+								            sut.next()
 								        );
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    }
 								    #[test]
 								    fn returns_state_specific_error() {
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        // TestToken::Close causes EchoState to produce an error.
 								        let errtok = TestToken::Close(DS);
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        let mut toks = [errtok.clone()].into_iter();
 								        let mut sut = Sut::from(&mut toks);
 								        assert_eq!(
 								            Some(Err(ParseError::StateError(EchoStateError::InnerError(
 								                errtok
 								            )))),
 								            sut.next()
 								        );
 								        // The token must have been consumed.
 								        // It is up to a recovery process to either bail out or provide
 								        //   recovery tokens;
 								        //     continuing without recovery is unlikely to make sense.
 								        assert_eq!(0, toks.len());
 								    }
 								    #[test]
 								    fn fails_when_parser_is_finalized_in_non_accepting_state() {
-												tamer: xir::parse: UnexpectedEof Span at final offset

I'm not rendering errors yet in practice, so this wouldn't have been
noticed, but we want error messages to reference the final byte in a file on
EOF, not the offset of the last-encountered token, which would be confusing.

This doesn't _directly_ pertain to what I'm working on; I just happened to
notice it.

DEV-10863

											
										
										
											2022-03-17 21:33:05 -04:00
+								        let span = Span::new(10, 10, "ctx".intern());
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        // Set up so that we have a single token that we can use for
 								        //   recovery as part of the same iterator.
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
+								        let recovery = TestToken::MarkDone(DS);
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        let mut toks = [
 								            // Used purely to populate a Span.
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								            TestToken::Close(span),
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								            // Recovery token here:
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								            recovery.clone(),
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        ]
 								        .into_iter();
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        let mut sut = Sut::from(&mut toks);
 								        // Populate our most recently seen token's span.
 								        sut.next();
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								        // Attempting to finalize now in a non-accepting state should fail
 								        //   in the same way that encountering an end-of-stream does,
 								        //     since we're effectively saying "we're done with the stream"
 								        //     and the parser will have no further opportunity to reach an
 								        //     accepting state.
 								        let result = sut.finalize();
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        assert_matches!(
 								            result,
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								            Err((_, ParseError::UnexpectedEof(s, _))) if s == span.endpoints().1.unwrap()
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        );
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								        // The sut should have been re-returned,
 								        //   allowing for attempted error recovery if the caller can manage
 								        //   to produce a sequence of tokens that will be considered valid.
 								        // `toks` above is set up already for this,
 								        //   which allows us to assert that we received back the same `sut`.
 								        let mut sut = result.unwrap_err().0;
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								        assert_eq!(Some(Ok(Parsed::Object(recovery))), sut.next());
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								        // And so we should now be in an accepting state,
 								        //   able to finalize.
 								        assert!(sut.finalize().is_ok());
 								    }
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
 								    #[test]
 								    fn unhandled_dead_state_results_in_error() {
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								        // A Text will cause our parser to return Dead.
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        let tok = TestToken::Text(DS);
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								        let mut toks = once(tok.clone());
 								        let mut sut = Sut::from(&mut toks);
 								        // Our parser returns a Dead status,
 								        //   which is unhandled by any parent context
 								        //     (since we're not composing parsers),
 								        //     which causes an error due to an unhandled Dead state.
-												tamer: Add Display impl for each ParseState for generic ParseErrors

This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864

											
										
										
											2022-05-25 14:20:10 -04:00
+								        assert_eq!(
 								            sut.next(),
 								            Some(Err(ParseError::UnexpectedToken(
 								                tok,
 								                EchoState::default().to_string()
 								            ))),
 								        );
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								    }
-												tamer: parse: Persistent context

This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864

											
										
										
											2022-05-18 16:06:48 -04:00
 								    // A context can be both retrieved from a finished parser and provided
 								    //   to a new one.
 								    #[test]
 								    fn provide_and_retrieve_context() {
 								        // First, verify that it's initialized to a default context.
 								        let mut toks = vec![TestToken::MarkDone(DS)].into_iter();
 								        let mut sut = Sut::from(&mut toks);
 								        sut.next().unwrap().unwrap();
 								        let ctx = sut.finalize().unwrap();
 								        assert_eq!(ctx, Default::default());
 								        // Next, verify that the context that is manipulated is the context
 								        //   that is returned to us.
 								        let val = 5;
 								        let mut toks = vec![TestToken::SetCtxVal(5)].into_iter();
 								        let mut sut = Sut::from(&mut toks);
 								        sut.next().unwrap().unwrap();
 								        let ctx = sut.finalize().unwrap();
 								        assert_eq!(ctx, StubContext { val });
 								        // Finally, verify that the context provided is the context that is
 								        //   used.
 								        let val = 10;
 								        let given_ctx = StubContext { val };
 								        let mut toks = vec![TestToken::MarkDone(DS)].into_iter();
 								        let mut sut = EchoState::parse_with_context(&mut toks, given_ctx);
 								        sut.next().unwrap().unwrap();
 								        let ctx = sut.finalize().unwrap();
 								        assert_eq!(ctx, StubContext { val });
 								    }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								}