tame/tamer/src/parse.rs

// Basic streaming parsing framework
//
//  Copyright (C) 2014-2021 Ryan Specialty Group, LLC.
//
//  This file is part of TAME.
//
//  This program is free software: you can redistribute it and/or modify
//  it under the terms of the GNU General Public License as published by
//  the Free Software Foundation, either version 3 of the License, or
//  (at your option) any later version.
//
//  This program is distributed in the hope that it will be useful,
//  but WITHOUT ANY WARRANTY; without even the implied warranty of
//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
//  GNU General Public License for more details.
//
//  You should have received a copy of the GNU General Public License
//  along with this program.  If not, see <http://www.gnu.org/licenses/>.

//! Basic streaming parser framework for lowering operations.
//!
//! _TODO: Some proper docs and examples!_

use crate::iter::{TripIter, TrippableIterator};
use crate::span::Span;
use std::fmt::Debug;
use std::iter::{self, Empty};
use std::mem::take;
use std::{error::Error, fmt::Display};

/// Result of applying a [`Token`] to a [`ParseState`],
///   with any error having been wrapped in a [`ParseError`].
pub type ParsedResult<S> = ParseResult<S, Parsed<<S as ParseState>::Object>>;

/// Result of some non-parsing operation on a [`Parser`],
///   with any error having been wrapped in a [`ParseError`].
pub type ParseResult<S, T> =
    Result<T, ParseError<<S as ParseState>::Token, <S as ParseState>::Error>>;

/// A single datum from a streaming IR with an associated [`Span`].
///
/// A token may be a lexeme with associated data,
///   or a more structured object having been lowered from other IRs.
pub trait Token: Display + Debug + PartialEq + Eq {
    /// Retrieve the [`Span`] representing the source location of the token.
    fn span(&self) -> Span;
}

impl<T: Token> From<T> for Span {
    fn from(tok: T) -> Self {
        tok.span()
    }
}

/// An infallible [`Token`] stream.
///
/// If the token stream originates from an operation that could potentially
///   fail and ought to be propagated,
///     use [`TokenResultStream`].
///
/// The name "stream" in place of "iterator" is intended to convey that this
///   type is expected to be processed in real-time as a stream,
///     not read into memory.
pub trait TokenStream<T: Token> = Iterator<Item = T>;

/// A [`Token`] stream that may encounter errors during parsing.
///
/// If the stream cannot fail,
///   consider using [`TokenStream`].
pub trait TokenResultStream<T: Token, E: Error> = Iterator<Item = Result<T, E>>;

/// A deterministic parsing automaton.
///
/// These states are utilized by a [`Parser`].
///
/// A [`ParseState`] is also responsible for storing data about the
///   accepted input,
///     and handling appropriate type conversions into the final type.
/// That is---an
///   automaton may store metadata that is subsequently emitted once an
///   accepting state has been reached.
/// Whatever the underlying automaton,
///   a `(state, token)` pair must uniquely determine the next parser
///   action.
///
/// Intuitively,
///   since only one [`Parser`] may hold a mutable reference to
///   an underlying [`TokenStream`] at any given point,
///   this does in fact represent the current state of the entire
///     [`TokenStream`] at the current position for a given parser
///     composition.
pub trait ParseState: Default + PartialEq + Eq + Debug {
    /// Input tokens to the parser.
    type Token: Token;

    /// Objects produced by a parser utilizing these states.
    type Object;

    /// Errors specific to this set of states.
    type Error: Error + PartialEq + Eq;

    /// Construct a parser.
    ///
    /// Whether this method is helpful or provides any clarity depends on
    ///   the context and the types that are able to be inferred.
    fn parse<I: TokenStream<Self::Token>>(toks: I) -> Parser<Self, I> {
        Parser::from(toks)
    }

    /// Parse a single [`Token`] and optionally perform a state transition.
    ///
    /// The current state is represented by `self`.
    /// The result of a parsing operation is a state transition with
    ///   associated [`ParseStatus`] data.
    ///
    /// Note that `self` is owned,
    ///   for a couple primary reasons:
    ///
    ///   1. This forces the parser to explicitly consider and document all
    ///        state transitions,
    ///          rather than potentially missing unintended behavior through
    ///          implicit behavior; and
    ///   2. It allows for more natural functional composition of state,
    ///        which in turn makes it easier to compose parsers
    ///          (which conceptually involves stitching together state
    ///            machines).
    fn parse_token(self, tok: Self::Token) -> TransitionResult<Self>;

    /// Whether the current state represents an accepting state.
    ///
    /// An accepting state represents a valid state to stop parsing.
    /// If parsing stops at a state that is _not_ accepting,
    ///   then the [`TokenStream`] has ended unexpectedly and should produce
    ///   a [`ParseError::UnexpectedEof`].
    ///
    /// It makes sense for there to be exist multiple accepting states for a
    ///   parser.
    /// For example:
    ///   A parser that parses a list of attributes may be used to parse one
    ///   or more attributes,
    ///     or the entire list of attributes.
    ///   It is acceptable to attempt to parse just one of those attributes,
    ///     or it is acceptable to parse all the way until the end.
    fn is_accepting(&self) -> bool;
}

/// Result of applying a [`Token`] to a [`ParseState`].
///
/// This is used by [`ParseState::parse_token`];
///   see that function for rationale.
pub type ParseStateResult<S> = Result<
    ParseStatus<<S as ParseState>::Token, <S as ParseState>::Object>,
    <S as ParseState>::Error,
>;

/// Denotes a state transition.
///
/// This newtype was created to produce clear, self-documenting code;
///   parsers can get confusing to read with all of the types involved,
///     so this provides a mental synchronization point.
///
/// This also provides some convenience methods to help remote boilerplate
///   and further improve code clarity.
#[derive(Debug, PartialEq, Eq)]
pub struct Transition<S: ParseState>(pub S);

impl<S: ParseState> Transition<S> {
    /// A state transition with corresponding data.
    ///
    /// This allows [`ParseState::parse_token`] to emit a parsed object and
    ///   corresponds to [`ParseStatus::Object`].
    pub fn with(self, obj: S::Object) -> (Self, ParseStateResult<S>) {
        (self, Ok(ParseStatus::Object(obj)))
    }

    /// A state transition indicating that more data is needed before an
    ///   object can be emitted.
    ///
    /// This corresponds to [`ParseStatus::Incomplete`].
    pub fn incomplete(self) -> (Self, ParseStateResult<S>) {
        (self, Ok(ParseStatus::Incomplete))
    }

    /// A dead state transition.
    ///
    /// This corresponds to [`ParseStatus::Dead`],
    ///   and a calling parser should use the provided [`Token`] as
    ///   lookahead.
    pub fn dead(self, tok: S::Token) -> (Self, ParseStateResult<S>) {
        (self, Ok(ParseStatus::Dead(tok)))
    }

    /// A transition with corresponding error.
    ///
    /// This indicates a parsing failure.
    /// The state ought to be suitable for error recovery.
    pub fn err<E: Into<S::Error>>(self, err: E) -> (Self, ParseStateResult<S>) {
        (self, Err(err.into()))
    }
}

/// A state transition with associated data.
///
/// Conceptually,
///   imagine the act of a state transition producing data.
/// See [`Transition`] for convenience methods for producing this tuple.
pub type TransitionResult<S> = (Transition<S>, ParseStateResult<S>);

/// A streaming parser defined by a [`ParseState`] with exclusive
///   mutable access to an underlying [`TokenStream`].
///
/// This parser handles operations that are common among all types of
///   parsers,
///     such that specialized parsers need only implement logic that is
///     unique to their operation.
/// This also simplifies combinators,
///   since there is more uniformity among distinct parser types.
///
/// After you have finished with a parser,
///   if you have not consumed the entire iterator,
///   call [`finalize`](Parser::finalize) to ensure that parsing has
///     completed in an accepting state.
#[derive(Debug, PartialEq, Eq)]
pub struct Parser<S: ParseState, I: TokenStream<S::Token>> {
    toks: I,
    state: S,
    last_span: Option<Span>,
}

impl<S: ParseState, I: TokenStream<S::Token>> Parser<S, I> {
    /// Indicate that no further parsing will take place using this parser,
    ///   and [`drop`] it.
    ///
    /// Invoking the method is equivalent to stating that the stream has
    ///   ended,
    ///     since the parser will have no later opportunity to continue
    ///     parsing.
    /// Consequently,
    ///   the caller should expect [`ParseError::UnexpectedEof`] if the
    ///   parser is not in an accepting state.
    pub fn finalize(
        self,
    ) -> Result<(), (Self, ParseError<S::Token, S::Error>)> {
        self.assert_accepting().map_err(|err| (self, err))
    }

    /// Return [`Ok`] if the parser is in an accepting state,
    ///   otherwise [`Err`] with [`ParseError::UnexpectedEof`].
    ///
    /// See [`finalize`](Self::finalize) for the public-facing method.
    fn assert_accepting(&self) -> Result<(), ParseError<S::Token, S::Error>> {
        if self.state.is_accepting() {
            Ok(())
        } else {
            let span = self.last_span.and_then(|s| s.endpoints().1);
            Err(ParseError::UnexpectedEof(span))
        }
    }

    /// Feed an input token to the parser.
    ///
    /// This _pushes_ data into the parser,
    ///   rather than the typical pull system used by [`Parser`]'s
    ///   [`Iterator`] implementation.
    /// The pull system also uses this method to provided data to the
    ///   parser.
    ///
    /// This method is intentionally private,
    ///   since push parsers are currently supported only internally.
    /// The only thing preventing this being public is formalization and a
    ///   commitment to maintain it.
    fn feed_tok(&mut self, tok: S::Token) -> ParsedResult<S> {
        // Store the most recently encountered Span for error
        //   reporting in case we encounter an EOF.
        self.last_span = Some(tok.span());

        let result;
        (Transition(self.state), result) =
            take(&mut self.state).parse_token(tok);

        use ParseStatus::*;
        match result {
            // Nothing handled this dead state,
            //   and we cannot discard a lookahead token,
            //   so we have no choice but to produce an error.
            Ok(Dead(invalid)) => Err(ParseError::UnexpectedToken(invalid)),

            Ok(parsed @ (Incomplete | Object(..))) => Ok(parsed.into()),
            Err(e) => Err(e.into()),
        }
    }

    /// Lower the IR produced by this [`Parser`] into another IR by piping
    ///   the output to a new parser defined by the [`ParseState`] `LS`.
    ///
    /// This parser consumes tokens `S::Token` and produces the IR
    ///   `S::Output`.
    /// If there is some other [`ParseState`] `LS` such that
    ///   `LS::Token == S::Output`
    ///     (that is—the output of this parser is the input to another),
    ///     then this method will wire the two together into a new iterator
    ///       that produces `LS::Output`.
    ///
    /// Visually, we have,
    ///   within the provided closure `f`,
    ///   a [`LowerIter`] that acts as this pipeline:
    ///
    /// ```text
    /// (S::Token) -> (S::Output == LS::Token) -> (LS::Output)
    /// ```
    ///
    /// The new iterator is a [`LowerIter`],
    ///   and scoped to the provided closure `f`.
    /// The outer [`Result`] of `Self`'s [`ParsedResult`] is stripped by
    ///   a [`TripIter`] before being provided as input to a new push
    ///   [`Parser`] utilizing `LS`.
    /// A push parser,
    ///   rather than pulling tokens from a [`TokenStream`],
    ///   has tokens pushed into it;
    ///     this parser is created automatically for you.
    ///
    /// _TODO_: There's no way to access the inner parser for error recovery
    ///   after tripping the [`TripIter`].
    /// Consequently,
    ///   this API (likely the return type) will change.
    #[inline]
    pub fn lower_while_ok<LS, U>(
        &mut self,
        f: impl FnOnce(&mut LowerIter<S, I, LS>) -> U,
    ) -> Result<U, ParseError<S::Token, S::Error>>
    where
        LS: ParseState<Token = S::Object>,
        <S as ParseState>::Object: Token,
    {
        self.while_ok(|toks| {
            // TODO: This parser is not accessible after error recovery!
            let lower = LS::parse(iter::empty());
            f(&mut LowerIter { lower, toks })
        })
    }
}

/// An IR lowering operation that pipes the output of one [`Parser`] to the
///   input of another.
///
/// This is produced by [`Parser::lower_while_ok`].
pub struct LowerIter<'a, 'b, S, I, LS>
where
    S: ParseState,
    I: TokenStream<S::Token>,
    LS: ParseState<Token = S::Object>,
    <S as ParseState>::Object: Token,
{
    /// A push [`Parser`].
    lower: Parser<LS, Empty<LS::Token>>,

    /// Source tokens from higher-level [`Parser`],
    ///   with the outer [`Result`] having been stripped by a [`TripIter`].
    toks: &'a mut TripIter<
        'b,
        Parser<S, I>,
        Parsed<S::Object>,
        ParseError<S::Token, S::Error>,
    >,
}

impl<'a, 'b, S, I, LS> Iterator for LowerIter<'a, 'b, S, I, LS>
where
    S: ParseState,
    I: TokenStream<S::Token>,
    LS: ParseState<Token = S::Object>,
    <S as ParseState>::Object: Token,
{
    type Item = ParsedResult<LS>;

    /// Pull a token through the higher-level [`Parser`],
    ///   push it to the lowering parser,
    ///   and yield the resulting [`ParseResult`].
    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        match self.toks.next() {
            None => None,
            Some(Parsed::Incomplete) => Some(Ok(Parsed::Incomplete)),
            Some(Parsed::Object(obj)) => Some(self.lower.feed_tok(obj)),
        }
    }
}

impl<S: ParseState, I: TokenStream<S::Token>> Iterator for Parser<S, I> {
    type Item = ParsedResult<S>;

    /// Parse a single [`Token`] according to the current
    ///   [`ParseState`],
    ///     if available.
    ///
    /// If the underlying [`TokenStream`] yields [`None`],
    ///   then the [`ParseState`] must be in an accepting state;
    ///     otherwise, [`ParseError::UnexpectedEof`] will occur.
    ///
    /// This is intended to be invoked by [`Iterator::next`].
    /// Accepting a token rather than the [`TokenStream`] allows the caller
    ///   to inspect the token first
    ///     (e.g. to store a copy of the [`Span`][crate::span::Span]).
    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        let otok = self.toks.next();

        match otok {
            None => match self.assert_accepting() {
                Ok(()) => None,
                Err(e) => Some(Err(e)),
            },

            Some(tok) => Some(self.feed_tok(tok)),
        }
    }
}

/// Common parsing errors produced by [`Parser`].
///
/// These errors are common enough that they are handled in a common way,
///   such that individual parsers needn't check for these situations
///   themselves.
///
/// Having a common type also allows combinators to handle error types in a
///   consistent way when composing parsers.
///
/// Parsers may return their own unique errors via the
///   [`StateError`][ParseError::StateError] variant.
#[derive(Debug, PartialEq, Eq)]
pub enum ParseError<T: Token, E: Error + PartialEq + Eq> {
    /// Token stream ended unexpectedly.
    ///
    /// This error means that the parser was expecting more input before
    ///   reaching an accepting state.
    /// This could represent a truncated file,
    ///   a malformed stream,
    ///   or maybe just a user that's not done typing yet
    ///     (e.g. in the case of an LSP implementation).
    ///
    /// If no span is available,
    ///   then parsing has not even had the chance to begin.
    /// If this parser follows another,
    ///   then the combinator ought to substitute a missing span with
    ///   whatever span preceded this invocation.
    UnexpectedEof(Option<Span>),

    /// The parser reached an unhandled dead state.
    ///
    /// Once a parser returns [`ParseStatus::Dead`],
    ///   a parent context must use that provided token as a lookahead.
    /// If that does not occur,
    ///   [`Parser`] produces this error.
    ///
    /// In the future,
    ///   it may be desirable to be able to query [`ParseState`] for what
    ///   tokens are acceptable at this point,
    ///     to provide better error messages.
    UnexpectedToken(T),

    /// A parser-specific error associated with an inner
    ///   [`ParseState`].
    StateError(E),
}

impl<T: Token, EA: Error + PartialEq + Eq> ParseError<T, EA> {
    pub fn inner_into<EB: Error + PartialEq + Eq>(self) -> ParseError<T, EB>
    where
        EA: Into<EB>,
    {
        use ParseError::*;
        match self {
            UnexpectedEof(x) => UnexpectedEof(x),
            UnexpectedToken(x) => UnexpectedToken(x),
            StateError(e) => StateError(e.into()),
        }
    }
}

impl<T: Token, E: Error + PartialEq + Eq> From<E> for ParseError<T, E> {
    fn from(e: E) -> Self {
        Self::StateError(e)
    }
}

impl<T: Token, E: Error + PartialEq + Eq> Display for ParseError<T, E> {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Self::UnexpectedEof(ospan) => {
                write!(f, "unexpected end of input at ")?;

                match ospan {
                    None => write!(f, "<unknown location>"),
                    Some(span) => write!(f, "{}", span),
                }
            }
            Self::UnexpectedToken(tok) => {
                write!(f, "unexpected {}", tok)
            }
            Self::StateError(e) => Display::fmt(e, f),
        }
    }
}

impl<T: Token, E: Error + PartialEq + Eq + 'static> Error for ParseError<T, E> {
    fn source(&self) -> Option<&(dyn Error + 'static)> {
        match self {
            Self::StateError(e) => Some(e),
            _ => None,
        }
    }
}

impl<S: ParseState, I: TokenStream<S::Token>> From<I> for Parser<S, I> {
    fn from(toks: I) -> Self {
        Self {
            toks,
            state: Default::default(),
            last_span: None,
        }
    }
}

/// Result of a parsing operation.
#[derive(Debug, PartialEq, Eq)]
pub enum ParseStatus<T, O> {
    /// Additional tokens are needed to complete parsing of the next object.
    Incomplete,

    /// Parsing of an object is complete.
    ///
    /// This does not indicate that the parser is complete,
    ///   as more objects may be able to be emitted.
    Object(O),

    /// Parser encountered a dead state relative to the given token.
    ///
    /// A dead state is an empty accepting state that has no state
    ///   transition for the given token.
    /// A state is empty if a [`ParseStatus::Object`] will not be lost if
    ///   parsing ends at this point
    ///     (that is---there is no partially-built object).
    /// This could simply mean that the parser has completed its job and
    ///   that control must be returned to a parent context.
    ///
    /// If a parser is _not_ in an accepting state,
    ///   then an error ought to occur rather than a dead state;
    ///     the difference between the two is that the token associated with
    ///       a dead state can be used as a lookahead token in order to
    ///       produce a state transition at a higher level,
    ///     whereas an error indicates that parsing has failed.
    /// Intuitively,
    ///   this means that a [`ParseStatus::Object`] had just been emitted
    ///   and that the token following it isn't something that can be
    ///   parsed.
    ///
    /// If there is no parent context to handle the token,
    ///   [`Parser`] must yield an error.
    Dead(T),
}

/// Result of a parsing operation.
///
/// Whereas [`ParseStatus`] is used by [`ParseState`] to influence parser
///   operation,
///     this type is public-facing and used by [`Parser`].
#[derive(Debug, PartialEq, Eq)]
pub enum Parsed<O> {
    /// Additional tokens are needed to complete parsing of the next object.
    Incomplete,

    /// Parsing of an object is complete.
    ///
    /// This does not indicate that the parser is complete,
    ///   as more objects may be able to be emitted.
    Object(O),
}

impl<T: Token, O> From<ParseStatus<T, O>> for Parsed<O> {
    fn from(status: ParseStatus<T, O>) -> Self {
        match status {
            ParseStatus::Incomplete => Parsed::Incomplete,
            ParseStatus::Object(x) => Parsed::Object(x),
            ParseStatus::Dead(_) => {
                unreachable!("Dead status must be filtered by Parser")
            }
        }
    }
}

#[cfg(test)]
pub mod test {
    use std::{assert_matches::assert_matches, iter::once};

    use super::*;
    use crate::{span::DUMMY_SPAN as DS, sym::GlobalSymbolIntern};

    #[derive(Debug, PartialEq, Eq, Clone)]
    enum TestToken {
        Close(Span),
        Comment(Span),
        Text(Span),
    }

    impl Display for TestToken {
        fn fmt(&self, _f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
            unimplemented!("fmt::Display")
        }
    }

    impl Token for TestToken {
        fn span(&self) -> Span {
            use TestToken::*;
            match self {
                Close(span) | Comment(span) | Text(span) => *span,
            }
        }
    }

    #[derive(Debug, PartialEq, Eq)]
    enum EchoState {
        Empty,
        Done,
    }

    impl Default for EchoState {
        fn default() -> Self {
            Self::Empty
        }
    }

    impl ParseState for EchoState {
        type Token = TestToken;
        type Object = TestToken;
        type Error = EchoStateError;

        fn parse_token(self, tok: TestToken) -> TransitionResult<Self> {
            match tok {
                TestToken::Comment(..) => Transition(Self::Done).with(tok),
                TestToken::Close(..) => {
                    Transition(self).err(EchoStateError::InnerError(tok))
                }
                TestToken::Text(..) => Transition(self).dead(tok),
            }
        }

        fn is_accepting(&self) -> bool {
            *self == Self::Done
        }
    }

    #[derive(Debug, PartialEq, Eq)]
    enum EchoStateError {
        InnerError(TestToken),
    }

    impl Display for EchoStateError {
        fn fmt(&self, _: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
            unimplemented!()
        }
    }

    impl Error for EchoStateError {
        fn source(&self) -> Option<&(dyn Error + 'static)> {
            None
        }
    }

    type Sut<I> = Parser<EchoState, I>;

    #[test]
    fn successful_parse_in_accepting_state_with_spans() {
        // EchoState is placed into a Done state given Comment.
        let tok = TestToken::Comment(DS);
        let mut toks = once(tok.clone());

        let mut sut = Sut::from(&mut toks);

        // The first token should be processed normally.
        // EchoState proxies the token back.
        assert_eq!(Some(Ok(Parsed::Object(tok))), sut.next());

        // This is now the end of the token stream,
        //   which should be okay provided that the first token put us into
        //   a proper accepting state.
        assert_eq!(None, sut.next());

        // Further, finalizing should work in this state.
        assert!(sut.finalize().is_ok());
    }

    #[test]
    fn fails_on_end_of_stream_when_not_in_accepting_state() {
        let span = Span::new(10, 20, "ctx".intern());
        let mut toks = [TestToken::Close(span)].into_iter();

        let mut sut = Sut::from(&mut toks);

        // The first token is fine,
        //   and allows us to acquire our most recent span.
        sut.next();

        // Given that we have no tokens,
        //   and that EchoState::default does not start in an accepting
        //     state,
        //   we must fail when we encounter the end of the stream.
        assert_eq!(
            Some(Err(ParseError::UnexpectedEof(span.endpoints().1))),
            sut.next()
        );
    }

    #[test]
    fn returns_state_specific_error() {
        // TestToken::Close causes EchoState to produce an error.
        let errtok = TestToken::Close(DS);
        let mut toks = [errtok.clone()].into_iter();

        let mut sut = Sut::from(&mut toks);

        assert_eq!(
            Some(Err(ParseError::StateError(EchoStateError::InnerError(
                errtok
            )))),
            sut.next()
        );

        // The token must have been consumed.
        // It is up to a recovery process to either bail out or provide
        //   recovery tokens;
        //     continuing without recovery is unlikely to make sense.
        assert_eq!(0, toks.len());
    }

    #[test]
    fn fails_when_parser_is_finalized_in_non_accepting_state() {
        let span = Span::new(10, 10, "ctx".intern());

        // Set up so that we have a single token that we can use for
        //   recovery as part of the same iterator.
        let recovery = TestToken::Comment(DS);
        let mut toks = [
            // Used purely to populate a Span.
            TestToken::Close(span),
            // Recovery token here:
            recovery.clone(),
        ]
        .into_iter();

        let mut sut = Sut::from(&mut toks);

        // Populate our most recently seen token's span.
        sut.next();

        // Attempting to finalize now in a non-accepting state should fail
        //   in the same way that encountering an end-of-stream does,
        //     since we're effectively saying "we're done with the stream"
        //     and the parser will have no further opportunity to reach an
        //     accepting state.
        let result = sut.finalize();
        assert_matches!(
            result,
            Err((_, ParseError::UnexpectedEof(s))) if s == span.endpoints().1
        );

        // The sut should have been re-returned,
        //   allowing for attempted error recovery if the caller can manage
        //   to produce a sequence of tokens that will be considered valid.
        // `toks` above is set up already for this,
        //   which allows us to assert that we received back the same `sut`.
        let mut sut = result.unwrap_err().0;
        assert_eq!(Some(Ok(Parsed::Object(recovery))), sut.next());

        // And so we should now be in an accepting state,
        //   able to finalize.
        assert!(sut.finalize().is_ok());
    }

    #[test]
    fn unhandled_dead_state_results_in_error() {
        // A Text will cause our parser to return Dead.
        let tok = TestToken::Text(DS);
        let mut toks = once(tok.clone());

        let mut sut = Sut::from(&mut toks);

        // Our parser returns a Dead status,
        //   which is unhandled by any parent context
        //     (since we're not composing parsers),
        //     which causes an error due to an unhandled Dead state.
        assert_eq!(sut.next(), Some(Err(ParseError::UnexpectedToken(tok))),);
    }
}
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								// Basic streaming parsing framework
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								//
 								//  Copyright (C) 2014-2021 Ryan Specialty Group, LLC.
 								//
 								//  This file is part of TAME.
 								//
 								//  This program is free software: you can redistribute it and/or modify
 								//  it under the terms of the GNU General Public License as published by
 								//  the Free Software Foundation, either version 3 of the License, or
 								//  (at your option) any later version.
 								//
 								//  This program is distributed in the hope that it will be useful,
 								//  but WITHOUT ANY WARRANTY; without even the implied warranty of
 								//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 								//  GNU General Public License for more details.
 								//
 								//  You should have received a copy of the GNU General Public License
 								//  along with this program.  If not, see <http://www.gnu.org/licenses/>.
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								//! Basic streaming parser framework for lowering operations.
 								//!
 								//! _TODO: Some proper docs and examples!_
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								use crate::iter::{TripIter, TrippableIterator};
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								use crate::span::Span;
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								use std::fmt::Debug;
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								use std::iter::{self, Empty};
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								use std::mem::take;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								use std::{error::Error, fmt::Display};
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								/// Result of applying a [`Token`] to a [`ParseState`],
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///   with any error having been wrapped in a [`ParseError`].
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								pub type ParsedResult<S> = ParseResult<S, Parsed<<S as ParseState>::Object>>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								/// Result of some non-parsing operation on a [`Parser`],
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///   with any error having been wrapped in a [`ParseError`].
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								pub type ParseResult<S, T> =
 								    Result<T, ParseError<<S as ParseState>::Token, <S as ParseState>::Error>>;
 								/// A single datum from a streaming IR with an associated [`Span`].
 								///
 								/// A token may be a lexeme with associated data,
 								///   or a more structured object having been lowered from other IRs.
 								pub trait Token: Display + Debug + PartialEq + Eq {
 								    /// Retrieve the [`Span`] representing the source location of the token.
 								    fn span(&self) -> Span;
 								}
 								impl<T: Token> From<T> for Span {
 								    fn from(tok: T) -> Self {
 								        tok.span()
 								    }
 								}
 								/// An infallible [`Token`] stream.
 								///
 								/// If the token stream originates from an operation that could potentially
 								///   fail and ought to be propagated,
 								///     use [`TokenResultStream`].
 								///
 								/// The name "stream" in place of "iterator" is intended to convey that this
 								///   type is expected to be processed in real-time as a stream,
 								///     not read into memory.
 								pub trait TokenStream<T: Token> = Iterator<Item = T>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								/// A [`Token`] stream that may encounter errors during parsing.
 								///
 								/// If the stream cannot fail,
 								///   consider using [`TokenStream`].
 								pub trait TokenResultStream<T: Token, E: Error> = Iterator<Item = Result<T, E>>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								/// A deterministic parsing automaton.
 								///
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								/// These states are utilized by a [`Parser`].
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								/// A [`ParseState`] is also responsible for storing data about the
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///   accepted input,
 								///     and handling appropriate type conversions into the final type.
 								/// That is---an
 								///   automaton may store metadata that is subsequently emitted once an
 								///   accepting state has been reached.
 								/// Whatever the underlying automaton,
 								///   a `(state, token)` pair must uniquely determine the next parser
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								///   action.
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///
 								/// Intuitively,
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								///   since only one [`Parser`] may hold a mutable reference to
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///   an underlying [`TokenStream`] at any given point,
 								///   this does in fact represent the current state of the entire
 								///     [`TokenStream`] at the current position for a given parser
 								///     composition.
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								pub trait ParseState: Default + PartialEq + Eq + Debug {
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    /// Input tokens to the parser.
 								    type Token: Token;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    /// Objects produced by a parser utilizing these states.
 								    type Object;
 								    /// Errors specific to this set of states.
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								    type Error: Error + PartialEq + Eq;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								    /// Construct a parser.
 								    ///
 								    /// Whether this method is helpful or provides any clarity depends on
 								    ///   the context and the types that are able to be inferred.
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    fn parse<I: TokenStream<Self::Token>>(toks: I) -> Parser<Self, I> {
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								        Parser::from(toks)
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    }
 								    /// Parse a single [`Token`] and optionally perform a state transition.
 								    ///
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								    /// The current state is represented by `self`.
 								    /// The result of a parsing operation is a state transition with
 								    ///   associated [`ParseStatus`] data.
 								    ///
 								    /// Note that `self` is owned,
 								    ///   for a couple primary reasons:
 								    ///
 								    ///   1. This forces the parser to explicitly consider and document all
 								    ///        state transitions,
 								    ///          rather than potentially missing unintended behavior through
 								    ///          implicit behavior; and
 								    ///   2. It allows for more natural functional composition of state,
 								    ///        which in turn makes it easier to compose parsers
 								    ///          (which conceptually involves stitching together state
 								    ///            machines).
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    fn parse_token(self, tok: Self::Token) -> TransitionResult<Self>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								    /// Whether the current state represents an accepting state.
 								    ///
 								    /// An accepting state represents a valid state to stop parsing.
 								    /// If parsing stops at a state that is _not_ accepting,
 								    ///   then the [`TokenStream`] has ended unexpectedly and should produce
 								    ///   a [`ParseError::UnexpectedEof`].
 								    ///
 								    /// It makes sense for there to be exist multiple accepting states for a
 								    ///   parser.
 								    /// For example:
 								    ///   A parser that parses a list of attributes may be used to parse one
 								    ///   or more attributes,
 								    ///     or the entire list of attributes.
 								    ///   It is acceptable to attempt to parse just one of those attributes,
 								    ///     or it is acceptable to parse all the way until the end.
 								    fn is_accepting(&self) -> bool;
 								}
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								/// Result of applying a [`Token`] to a [`ParseState`].
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								///
 								/// This is used by [`ParseState::parse_token`];
 								///   see that function for rationale.
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								pub type ParseStateResult<S> = Result<
 								    ParseStatus<<S as ParseState>::Token, <S as ParseState>::Object>,
 								    <S as ParseState>::Error,
 								>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								/// Denotes a state transition.
 								///
 								/// This newtype was created to produce clear, self-documenting code;
 								///   parsers can get confusing to read with all of the types involved,
 								///     so this provides a mental synchronization point.
 								///
 								/// This also provides some convenience methods to help remote boilerplate
 								///   and further improve code clarity.
 								#[derive(Debug, PartialEq, Eq)]
 								pub struct Transition<S: ParseState>(pub S);
 								impl<S: ParseState> Transition<S> {
 								    /// A state transition with corresponding data.
 								    ///
 								    /// This allows [`ParseState::parse_token`] to emit a parsed object and
 								    ///   corresponds to [`ParseStatus::Object`].
 								    pub fn with(self, obj: S::Object) -> (Self, ParseStateResult<S>) {
 								        (self, Ok(ParseStatus::Object(obj)))
 								    }
 								    /// A state transition indicating that more data is needed before an
 								    ///   object can be emitted.
 								    ///
 								    /// This corresponds to [`ParseStatus::Incomplete`].
 								    pub fn incomplete(self) -> (Self, ParseStateResult<S>) {
 								        (self, Ok(ParseStatus::Incomplete))
 								    }
 								    /// A dead state transition.
 								    ///
 								    /// This corresponds to [`ParseStatus::Dead`],
 								    ///   and a calling parser should use the provided [`Token`] as
 								    ///   lookahead.
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    pub fn dead(self, tok: S::Token) -> (Self, ParseStateResult<S>) {
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								        (self, Ok(ParseStatus::Dead(tok)))
 								    }
 								    /// A transition with corresponding error.
 								    ///
 								    /// This indicates a parsing failure.
 								    /// The state ought to be suitable for error recovery.
 								    pub fn err<E: Into<S::Error>>(self, err: E) -> (Self, ParseStateResult<S>) {
 								        (self, Err(err.into()))
 								    }
 								}
 								/// A state transition with associated data.
 								///
 								/// Conceptually,
 								///   imagine the act of a state transition producing data.
 								/// See [`Transition`] for convenience methods for producing this tuple.
 								pub type TransitionResult<S> = (Transition<S>, ParseStateResult<S>);
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								/// A streaming parser defined by a [`ParseState`] with exclusive
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///   mutable access to an underlying [`TokenStream`].
 								///
 								/// This parser handles operations that are common among all types of
 								///   parsers,
 								///     such that specialized parsers need only implement logic that is
 								///     unique to their operation.
 								/// This also simplifies combinators,
 								///   since there is more uniformity among distinct parser types.
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								///
 								/// After you have finished with a parser,
 								///   if you have not consumed the entire iterator,
 								///   call [`finalize`](Parser::finalize) to ensure that parsing has
 								///     completed in an accepting state.
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								#[derive(Debug, PartialEq, Eq)]
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								pub struct Parser<S: ParseState, I: TokenStream<S::Token>> {
-												tamer: xir::tree::parse::Parser: Remove lifetime

This will allow Parser to operate on both owned and &mut values, and is the
same approach that Rust's built-in iterators take.

This is at first quite surprising, and I often forget that this is a
feature, and, as a bonus, an attractive way to avoid lifetimes in struct
definitions when generics are used for the type that may become a
reference.

DEV-11268

											
										
										
											2021-12-13 16:51:15 -05:00
+								    toks: I,
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    state: S,
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    last_span: Option<Span>,
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								}
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								impl<S: ParseState, I: TokenStream<S::Token>> Parser<S, I> {
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								    /// Indicate that no further parsing will take place using this parser,
 								    ///   and [`drop`] it.
 								    ///
 								    /// Invoking the method is equivalent to stating that the stream has
 								    ///   ended,
 								    ///     since the parser will have no later opportunity to continue
 								    ///     parsing.
 								    /// Consequently,
 								    ///   the caller should expect [`ParseError::UnexpectedEof`] if the
 								    ///   parser is not in an accepting state.
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    pub fn finalize(
 								        self,
 								    ) -> Result<(), (Self, ParseError<S::Token, S::Error>)> {
-												tamer: parse::Parser: Extract logic from Iterator impl

This introduces a (still-private) way to _push_ tokens into the parser,
rather than relying purely on a pull-based interface.  Not only does this
simplify the iterator, but this is also preparing to make the new `feed_tok`
public so that parsers can be composed in more contexts.  I suspect that
this method may also be useful for error recovery, since it can be used to
inject tokens into arbitrary points of a token stream.

I kept the new method private for now so that I can introduce the new API
and docs separate from this refactoring.

DEV-10863

											
										
										
											2022-03-22 10:10:59 -04:00
+								        self.assert_accepting().map_err(|err| (self, err))
 								    }
 								    /// Return [`Ok`] if the parser is in an accepting state,
 								    ///   otherwise [`Err`] with [`ParseError::UnexpectedEof`].
 								    ///
 								    /// See [`finalize`](Self::finalize) for the public-facing method.
 								    fn assert_accepting(&self) -> Result<(), ParseError<S::Token, S::Error>> {
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        if self.state.is_accepting() {
 								            Ok(())
 								        } else {
-												tamer: xir::parse: UnexpectedEof Span at final offset

I'm not rendering errors yet in practice, so this wouldn't have been
noticed, but we want error messages to reference the final byte in a file on
EOF, not the offset of the last-encountered token, which would be confusing.

This doesn't _directly_ pertain to what I'm working on; I just happened to
notice it.

DEV-10863

											
										
										
											2022-03-17 21:33:05 -04:00
+								            let span = self.last_span.and_then(|s| s.endpoints().1);
-												tamer: parse::Parser: Extract logic from Iterator impl

This introduces a (still-private) way to _push_ tokens into the parser,
rather than relying purely on a pull-based interface.  Not only does this
simplify the iterator, but this is also preparing to make the new `feed_tok`
public so that parsers can be composed in more contexts.  I suspect that
this method may also be useful for error recovery, since it can be used to
inject tokens into arbitrary points of a token stream.

I kept the new method private for now so that I can introduce the new API
and docs separate from this refactoring.

DEV-10863

											
										
										
											2022-03-22 10:10:59 -04:00
+								            Err(ParseError::UnexpectedEof(span))
 								        }
 								    }
 								    /// Feed an input token to the parser.
 								    ///
 								    /// This _pushes_ data into the parser,
 								    ///   rather than the typical pull system used by [`Parser`]'s
 								    ///   [`Iterator`] implementation.
 								    /// The pull system also uses this method to provided data to the
 								    ///   parser.
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
+								    ///
 								    /// This method is intentionally private,
 								    ///   since push parsers are currently supported only internally.
 								    /// The only thing preventing this being public is formalization and a
 								    ///   commitment to maintain it.
 								    fn feed_tok(&mut self, tok: S::Token) -> ParsedResult<S> {
-												tamer: parse::Parser: Extract logic from Iterator impl

This introduces a (still-private) way to _push_ tokens into the parser,
rather than relying purely on a pull-based interface.  Not only does this
simplify the iterator, but this is also preparing to make the new `feed_tok`
public so that parsers can be composed in more contexts.  I suspect that
this method may also be useful for error recovery, since it can be used to
inject tokens into arbitrary points of a token stream.

I kept the new method private for now so that I can introduce the new API
and docs separate from this refactoring.

DEV-10863

											
										
										
											2022-03-22 10:10:59 -04:00
+								        // Store the most recently encountered Span for error
 								        //   reporting in case we encounter an EOF.
 								        self.last_span = Some(tok.span());
 								        let result;
 								        (Transition(self.state), result) =
 								            take(&mut self.state).parse_token(tok);
 								        use ParseStatus::*;
 								        match result {
 								            // Nothing handled this dead state,
 								            //   and we cannot discard a lookahead token,
 								            //   so we have no choice but to produce an error.
 								            Ok(Dead(invalid)) => Err(ParseError::UnexpectedToken(invalid)),
 								            Ok(parsed @ (Incomplete | Object(..))) => Ok(parsed.into()),
 								            Err(e) => Err(e.into()),
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        }
 								    }
-												tamer: parse::Parser (lower_while_ok): New method

This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863

											
										
										
											2022-03-23 14:25:04 -04:00
 								    /// Lower the IR produced by this [`Parser`] into another IR by piping
 								    ///   the output to a new parser defined by the [`ParseState`] `LS`.
 								    ///
 								    /// This parser consumes tokens `S::Token` and produces the IR
 								    ///   `S::Output`.
 								    /// If there is some other [`ParseState`] `LS` such that
 								    ///   `LS::Token == S::Output`
 								    ///     (that is—the output of this parser is the input to another),
 								    ///     then this method will wire the two together into a new iterator
 								    ///       that produces `LS::Output`.
 								    ///
 								    /// Visually, we have,
 								    ///   within the provided closure `f`,
 								    ///   a [`LowerIter`] that acts as this pipeline:
 								    ///
 								    /// ```text
 								    /// (S::Token) -> (S::Output == LS::Token) -> (LS::Output)
 								    /// ```
 								    ///
 								    /// The new iterator is a [`LowerIter`],
 								    ///   and scoped to the provided closure `f`.
 								    /// The outer [`Result`] of `Self`'s [`ParsedResult`] is stripped by
 								    ///   a [`TripIter`] before being provided as input to a new push
 								    ///   [`Parser`] utilizing `LS`.
 								    /// A push parser,
 								    ///   rather than pulling tokens from a [`TokenStream`],
 								    ///   has tokens pushed into it;
 								    ///     this parser is created automatically for you.
 								    ///
 								    /// _TODO_: There's no way to access the inner parser for error recovery
 								    ///   after tripping the [`TripIter`].
 								    /// Consequently,
 								    ///   this API (likely the return type) will change.
 								    #[inline]
 								    pub fn lower_while_ok<LS, U>(
 								        &mut self,
 								        f: impl FnOnce(&mut LowerIter<S, I, LS>) -> U,
 								    ) -> Result<U, ParseError<S::Token, S::Error>>
 								    where
 								        LS: ParseState<Token = S::Object>,
 								        <S as ParseState>::Object: Token,
 								    {
 								        self.while_ok(|toks| {
 								            // TODO: This parser is not accessible after error recovery!
 								            let lower = LS::parse(iter::empty());
 								            f(&mut LowerIter { lower, toks })
 								        })
 								    }
 								}
 								/// An IR lowering operation that pipes the output of one [`Parser`] to the
 								///   input of another.
 								///
 								/// This is produced by [`Parser::lower_while_ok`].
 								pub struct LowerIter<'a, 'b, S, I, LS>
 								where
 								    S: ParseState,
 								    I: TokenStream<S::Token>,
 								    LS: ParseState<Token = S::Object>,
 								    <S as ParseState>::Object: Token,
 								{
 								    /// A push [`Parser`].
 								    lower: Parser<LS, Empty<LS::Token>>,
 								    /// Source tokens from higher-level [`Parser`],
 								    ///   with the outer [`Result`] having been stripped by a [`TripIter`].
 								    toks: &'a mut TripIter<
 								        'b,
 								        Parser<S, I>,
 								        Parsed<S::Object>,
 								        ParseError<S::Token, S::Error>,
 								    >,
 								}
 								impl<'a, 'b, S, I, LS> Iterator for LowerIter<'a, 'b, S, I, LS>
 								where
 								    S: ParseState,
 								    I: TokenStream<S::Token>,
 								    LS: ParseState<Token = S::Object>,
 								    <S as ParseState>::Object: Token,
 								{
 								    type Item = ParsedResult<LS>;
 								    /// Pull a token through the higher-level [`Parser`],
 								    ///   push it to the lowering parser,
 								    ///   and yield the resulting [`ParseResult`].
 								    #[inline]
 								    fn next(&mut self) -> Option<Self::Item> {
 								        match self.toks.next() {
 								            None => None,
 								            Some(Parsed::Incomplete) => Some(Ok(Parsed::Incomplete)),
 								            Some(Parsed::Object(obj)) => Some(self.lower.feed_tok(obj)),
 								        }
 								    }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								}
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								impl<S: ParseState, I: TokenStream<S::Token>> Iterator for Parser<S, I> {
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								    type Item = ParsedResult<S>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    /// Parse a single [`Token`] according to the current
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								    ///   [`ParseState`],
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    ///     if available.
 								    ///
 								    /// If the underlying [`TokenStream`] yields [`None`],
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								    ///   then the [`ParseState`] must be in an accepting state;
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    ///     otherwise, [`ParseError::UnexpectedEof`] will occur.
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    ///
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    /// This is intended to be invoked by [`Iterator::next`].
 								    /// Accepting a token rather than the [`TokenStream`] allows the caller
 								    ///   to inspect the token first
 								    ///     (e.g. to store a copy of the [`Span`][crate::span::Span]).
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    #[inline]
 								    fn next(&mut self) -> Option<Self::Item> {
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        let otok = self.toks.next();
 								        match otok {
-												tamer: parse::Parser: Extract logic from Iterator impl

This introduces a (still-private) way to _push_ tokens into the parser,
rather than relying purely on a pull-based interface.  Not only does this
simplify the iterator, but this is also preparing to make the new `feed_tok`
public so that parsers can be composed in more contexts.  I suspect that
this method may also be useful for error recovery, since it can be used to
inject tokens into arbitrary points of a token stream.

I kept the new method private for now so that I can introduce the new API
and docs separate from this refactoring.

DEV-10863

											
										
										
											2022-03-22 10:10:59 -04:00
+								            None => match self.assert_accepting() {
 								                Ok(()) => None,
 								                Err(e) => Some(Err(e)),
 								            },
 								            Some(tok) => Some(self.feed_tok(tok)),
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    }
 								}
-												tamer: xir::tree::parse: Remove TokenStreamParser trait

This just leaves Parser, which is what I started with, but I wasn't sure how
far I was going to take this.  I went against my usual judgment in creating
a trait that I may not need, in an attempt to try to reason about the API
that I wanted, because it wasn't yet clear at the time whether the Parser
ought to be generic.

Since then (as detailed in the last commit), this has become more of a
coordinator/mediator, and the real parser is actually TokenStreamState,
which will be renamed shortly.

DEV-11268

											
										
										
											2021-12-10 14:58:44 -05:00
+								/// Common parsing errors produced by [`Parser`].
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								///
 								/// These errors are common enough that they are handled in a common way,
 								///   such that individual parsers needn't check for these situations
 								///   themselves.
 								///
 								/// Having a common type also allows combinators to handle error types in a
 								///   consistent way when composing parsers.
 								///
 								/// Parsers may return their own unique errors via the
 								///   [`StateError`][ParseError::StateError] variant.
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								#[derive(Debug, PartialEq, Eq)]
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								pub enum ParseError<T: Token, E: Error + PartialEq + Eq> {
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    /// Token stream ended unexpectedly.
 								    ///
 								    /// This error means that the parser was expecting more input before
 								    ///   reaching an accepting state.
 								    /// This could represent a truncated file,
 								    ///   a malformed stream,
 								    ///   or maybe just a user that's not done typing yet
 								    ///     (e.g. in the case of an LSP implementation).
 								    ///
 								    /// If no span is available,
 								    ///   then parsing has not even had the chance to begin.
 								    /// If this parser follows another,
 								    ///   then the combinator ought to substitute a missing span with
 								    ///   whatever span preceded this invocation.
 								    UnexpectedEof(Option<Span>),
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								    /// The parser reached an unhandled dead state.
 								    ///
 								    /// Once a parser returns [`ParseStatus::Dead`],
 								    ///   a parent context must use that provided token as a lookahead.
 								    /// If that does not occur,
 								    ///   [`Parser`] produces this error.
 								    ///
 								    /// In the future,
 								    ///   it may be desirable to be able to query [`ParseState`] for what
 								    ///   tokens are acceptable at this point,
 								    ///     to provide better error messages.
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    UnexpectedToken(T),
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    /// A parser-specific error associated with an inner
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								    ///   [`ParseState`].
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    StateError(E),
 								}
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								impl<T: Token, EA: Error + PartialEq + Eq> ParseError<T, EA> {
 								    pub fn inner_into<EB: Error + PartialEq + Eq>(self) -> ParseError<T, EB>
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								    where
 								        EA: Into<EB>,
 								    {
 								        use ParseError::*;
 								        match self {
 								            UnexpectedEof(x) => UnexpectedEof(x),
 								            UnexpectedToken(x) => UnexpectedToken(x),
 								            StateError(e) => StateError(e.into()),
 								        }
 								    }
 								}
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								impl<T: Token, E: Error + PartialEq + Eq> From<E> for ParseError<T, E> {
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    fn from(e: E) -> Self {
 								        Self::StateError(e)
 								    }
 								}
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								impl<T: Token, E: Error + PartialEq + Eq> Display for ParseError<T, E> {
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
 								        match self {
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								            Self::UnexpectedEof(ospan) => {
 								                write!(f, "unexpected end of input at ")?;
 								                match ospan {
 								                    None => write!(f, "<unknown location>"),
 								                    Some(span) => write!(f, "{}", span),
 								                }
 								            }
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								            Self::UnexpectedToken(tok) => {
 								                write!(f, "unexpected {}", tok)
 								            }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								            Self::StateError(e) => Display::fmt(e, f),
 								        }
 								    }
 								}
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								impl<T: Token, E: Error + PartialEq + Eq + 'static> Error for ParseError<T, E> {
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    fn source(&self) -> Option<&(dyn Error + 'static)> {
 								        match self {
 								            Self::StateError(e) => Some(e),
 								            _ => None,
 								        }
 								    }
 								}
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								impl<S: ParseState, I: TokenStream<S::Token>> From<I> for Parser<S, I> {
-												tamer: xir::tree::parse::Parser: Remove lifetime

This will allow Parser to operate on both owned and &mut values, and is the
same approach that Rust's built-in iterators take.

This is at first quite surprising, and I often forget that this is a
feature, and, as a bonus, an attractive way to avoid lifetimes in struct
definitions when generics are used for the type that may become a
reference.

DEV-11268

											
										
										
											2021-12-13 16:51:15 -05:00
+								    fn from(toks: I) -> Self {
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        Self {
 								            toks,
 								            state: Default::default(),
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								            last_span: None,
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        }
 								    }
 								}
-												tamer: xir::tree::parse: Use new Parsed::Done variant over None

This removes Option from ParseState, as mentioned in previous commits.

This is ideal because it not only removes a layer of abstraction, but also
makes the intent very clear; the use of None was too tied to the concept of
an Iterator, which is the concern of Parser, _not_ ParseState.

This is now similar to tree::Parsed, which will help with that refactoring
shortly.

The Done variant is not accessible outside of Parser, since it always
coverts it to None (to halt iteration); given that, we should have another
public-facing type, as was also mentioned in a previous commit.

DEV-11268

											
										
										
											2021-12-10 16:22:02 -05:00
+								/// Result of a parsing operation.
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								#[derive(Debug, PartialEq, Eq)]
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								pub enum ParseStatus<T, O> {
-												tamer: xir::tree::parse: Use new Parsed::Done variant over None

This removes Option from ParseState, as mentioned in previous commits.

This is ideal because it not only removes a layer of abstraction, but also
makes the intent very clear; the use of None was too tied to the concept of
an Iterator, which is the concern of Parser, _not_ ParseState.

This is now similar to tree::Parsed, which will help with that refactoring
shortly.

The Done variant is not accessible outside of Parser, since it always
coverts it to None (to halt iteration); given that, we should have another
public-facing type, as was also mentioned in a previous commit.

DEV-11268

											
										
										
											2021-12-10 16:22:02 -05:00
+								    /// Additional tokens are needed to complete parsing of the next object.
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    Incomplete,
-												tamer: xir::tree::parse: Use new Parsed::Done variant over None

This removes Option from ParseState, as mentioned in previous commits.

This is ideal because it not only removes a layer of abstraction, but also
makes the intent very clear; the use of None was too tied to the concept of
an Iterator, which is the concern of Parser, _not_ ParseState.

This is now similar to tree::Parsed, which will help with that refactoring
shortly.

The Done variant is not accessible outside of Parser, since it always
coverts it to None (to halt iteration); given that, we should have another
public-facing type, as was also mentioned in a previous commit.

DEV-11268

											
										
										
											2021-12-10 16:22:02 -05:00
 								    /// Parsing of an object is complete.
 								    ///
 								    /// This does not indicate that the parser is complete,
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								    ///   as more objects may be able to be emitted.
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    Object(O),
-												tamer: xir::tree::parse: Use new Parsed::Done variant over None

This removes Option from ParseState, as mentioned in previous commits.

This is ideal because it not only removes a layer of abstraction, but also
makes the intent very clear; the use of None was too tied to the concept of
an Iterator, which is the concern of Parser, _not_ ParseState.

This is now similar to tree::Parsed, which will help with that refactoring
shortly.

The Done variant is not accessible outside of Parser, since it always
coverts it to None (to halt iteration); given that, we should have another
public-facing type, as was also mentioned in a previous commit.

DEV-11268

											
										
										
											2021-12-10 16:22:02 -05:00
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								    /// Parser encountered a dead state relative to the given token.
 								    ///
 								    /// A dead state is an empty accepting state that has no state
 								    ///   transition for the given token.
 								    /// A state is empty if a [`ParseStatus::Object`] will not be lost if
 								    ///   parsing ends at this point
 								    ///     (that is---there is no partially-built object).
 								    /// This could simply mean that the parser has completed its job and
 								    ///   that control must be returned to a parent context.
 								    ///
 								    /// If a parser is _not_ in an accepting state,
 								    ///   then an error ought to occur rather than a dead state;
 								    ///     the difference between the two is that the token associated with
 								    ///       a dead state can be used as a lookahead token in order to
 								    ///       produce a state transition at a higher level,
 								    ///     whereas an error indicates that parsing has failed.
 								    /// Intuitively,
 								    ///   this means that a [`ParseStatus::Object`] had just been emitted
 								    ///   and that the token following it isn't something that can be
 								    ///   parsed.
 								    ///
 								    /// If there is no parent context to handle the token,
 								    ///   [`Parser`] must yield an error.
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    Dead(T),
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								}
-												tamer: xir::tree::parse: ParseStatus and Parsed

The old Parsed was renamed to ParseStatus to be used by Parser, and Parser
converts it into Parsed, which has the same variants as it did before and
has all but the Done variant, since it's not possible for Parser to yield
it.

DEV-11268

											
										
										
											2021-12-10 16:51:53 -05:00
+								/// Result of a parsing operation.
 								///
 								/// Whereas [`ParseStatus`] is used by [`ParseState`] to influence parser
 								///   operation,
 								///     this type is public-facing and used by [`Parser`].
 								#[derive(Debug, PartialEq, Eq)]
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								pub enum Parsed<O> {
-												tamer: xir::tree::parse: ParseStatus and Parsed

The old Parsed was renamed to ParseStatus to be used by Parser, and Parser
converts it into Parsed, which has the same variants as it did before and
has all but the Done variant, since it's not possible for Parser to yield
it.

DEV-11268

											
										
										
											2021-12-10 16:51:53 -05:00
+								    /// Additional tokens are needed to complete parsing of the next object.
 								    Incomplete,
 								    /// Parsing of an object is complete.
 								    ///
 								    /// This does not indicate that the parser is complete,
 								    ///   as more objects may be able to be emitted.
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								    Object(O),
-												tamer: xir::tree::parse: ParseStatus and Parsed

The old Parsed was renamed to ParseStatus to be used by Parser, and Parser
converts it into Parsed, which has the same variants as it did before and
has all but the Done variant, since it's not possible for Parser to yield
it.

DEV-11268

											
										
										
											2021-12-10 16:51:53 -05:00
+								}
-												tamer: xir::parse: Generalize input token type

This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863

											
										
										
											2022-03-18 15:26:05 -04:00
+								impl<T: Token, O> From<ParseStatus<T, O>> for Parsed<O> {
 								    fn from(status: ParseStatus<T, O>) -> Self {
-												tamer: xir::tree::parse: ParseStatus and Parsed

The old Parsed was renamed to ParseStatus to be used by Parser, and Parser
converts it into Parsed, which has the same variants as it did before and
has all but the Done variant, since it's not possible for Parser to yield
it.

DEV-11268

											
										
										
											2021-12-10 16:51:53 -05:00
+								        match status {
 								            ParseStatus::Incomplete => Parsed::Incomplete,
 								            ParseStatus::Object(x) => Parsed::Object(x),
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								            ParseStatus::Dead(_) => {
 								                unreachable!("Dead status must be filtered by Parser")
-												tamer: xir::tree::parse: ParseStatus and Parsed

The old Parsed was renamed to ParseStatus to be used by Parser, and Parser
converts it into Parsed, which has the same variants as it did before and
has all but the Done variant, since it's not possible for Parser to yield
it.

DEV-11268

											
										
										
											2021-12-10 16:51:53 -05:00
+								            }
 								        }
 								    }
 								}
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								#[cfg(test)]
 								pub mod test {
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								    use std::{assert_matches::assert_matches, iter::once};
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								    use super::*;
-												tamer: xir::parse: UnexpectedEof Span at final offset

I'm not rendering errors yet in practice, so this wouldn't have been
noticed, but we want error messages to reference the final byte in a file on
EOF, not the offset of the last-encountered token, which would be confusing.

This doesn't _directly_ pertain to what I'm working on; I just happened to
notice it.

DEV-10863

											
										
										
											2022-03-17 21:33:05 -04:00
+								    use crate::{span::DUMMY_SPAN as DS, sym::GlobalSymbolIntern};
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								    #[derive(Debug, PartialEq, Eq, Clone)]
 								    enum TestToken {
 								        Close(Span),
 								        Comment(Span),
 								        Text(Span),
 								    }
 								    impl Display for TestToken {
 								        fn fmt(&self, _f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
 								            unimplemented!("fmt::Display")
 								        }
 								    }
 								    impl Token for TestToken {
 								        fn span(&self) -> Span {
 								            use TestToken::*;
 								            match self {
 								                Close(span) | Comment(span) | Text(span) => *span,
 								            }
 								        }
 								    }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    #[derive(Debug, PartialEq, Eq)]
 								    enum EchoState {
 								        Empty,
 								        Done,
 								    }
 								    impl Default for EchoState {
 								        fn default() -> Self {
 								            Self::Empty
 								        }
 								    }
-												tamer: xir::tree: {TokenStream=>ParseState}

This also renames related types.

See previous commits for more in formation.  In essence, this trait
represents the reification of all parser state.  The omission of "r" in the
name ParseState is intentional, since it indicates the state of a current
parse.  We'll see whether that naming ends up being too confusing; it's easy
enough to change.

DEV-11268

											
										
										
											2021-12-10 15:39:59 -05:00
+								    impl ParseState for EchoState {
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        type Token = TestToken;
 								        type Object = TestToken;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        type Error = EchoStateError;
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        fn parse_token(self, tok: TestToken) -> TransitionResult<Self> {
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								            match tok {
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								                TestToken::Comment(..) => Transition(Self::Done).with(tok),
 								                TestToken::Close(..) => {
-												tamer: xir::parse::Transition: Generalize flat::Transition

XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863

											
										
										
											2022-03-17 15:50:35 -04:00
+								                    Transition(self).err(EchoStateError::InnerError(tok))
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								                }
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								                TestToken::Text(..) => Transition(self).dead(tok),
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								            }
 								        }
 								        fn is_accepting(&self) -> bool {
 								            *self == Self::Done
 								        }
 								    }
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								    #[derive(Debug, PartialEq, Eq)]
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    enum EchoStateError {
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        InnerError(TestToken),
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    }
 								    impl Display for EchoStateError {
 								        fn fmt(&self, _: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
 								            unimplemented!()
 								        }
 								    }
 								    impl Error for EchoStateError {
 								        fn source(&self) -> Option<&(dyn Error + 'static)> {
 								            None
 								        }
 								    }
-												tamer: xir::tree::parse::Parser: Remove lifetime

This will allow Parser to operate on both owned and &mut values, and is the
same approach that Rust's built-in iterators take.

This is at first quite surprising, and I often forget that this is a
feature, and, as a bonus, an attractive way to avoid lifetimes in struct
definitions when generics are used for the type that may become a
reference.

DEV-11268

											
										
										
											2021-12-13 16:51:15 -05:00
+								    type Sut<I> = Parser<EchoState, I>;
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								    #[test]
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								    fn successful_parse_in_accepting_state_with_spans() {
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								        // EchoState is placed into a Done state given Comment.
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        let tok = TestToken::Comment(DS);
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								        let mut toks = once(tok.clone());
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								        let mut sut = Sut::from(&mut toks);
 								        // The first token should be processed normally.
 								        // EchoState proxies the token back.
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								        assert_eq!(Some(Ok(Parsed::Object(tok))), sut.next());
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								        // This is now the end of the token stream,
 								        //   which should be okay provided that the first token put us into
 								        //   a proper accepting state.
 								        assert_eq!(None, sut.next());
 								        // Further, finalizing should work in this state.
 								        assert!(sut.finalize().is_ok());
 								    }
 								    #[test]
 								    fn fails_on_end_of_stream_when_not_in_accepting_state() {
-												tamer: xir::parse: UnexpectedEof Span at final offset

I'm not rendering errors yet in practice, so this wouldn't have been
noticed, but we want error messages to reference the final byte in a file on
EOF, not the offset of the last-encountered token, which would be confusing.

This doesn't _directly_ pertain to what I'm working on; I just happened to
notice it.

DEV-10863

											
										
										
											2022-03-17 21:33:05 -04:00
+								        let span = Span::new(10, 20, "ctx".intern());
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        let mut toks = [TestToken::Close(span)].into_iter();
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								        let mut sut = Sut::from(&mut toks);
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        // The first token is fine,
 								        //   and allows us to acquire our most recent span.
 								        sut.next();
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        // Given that we have no tokens,
 								        //   and that EchoState::default does not start in an accepting
 								        //     state,
 								        //   we must fail when we encounter the end of the stream.
-												tamer: xir::parse: UnexpectedEof Span at final offset

I'm not rendering errors yet in practice, so this wouldn't have been
noticed, but we want error messages to reference the final byte in a file on
EOF, not the offset of the last-encountered token, which would be confusing.

This doesn't _directly_ pertain to what I'm working on; I just happened to
notice it.

DEV-10863

											
										
										
											2022-03-17 21:33:05 -04:00
+								        assert_eq!(
 								            Some(Err(ParseError::UnexpectedEof(span.endpoints().1))),
 								            sut.next()
 								        );
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								    }
 								    #[test]
 								    fn returns_state_specific_error() {
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        // TestToken::Close causes EchoState to produce an error.
 								        let errtok = TestToken::Close(DS);
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        let mut toks = [errtok.clone()].into_iter();
 								        let mut sut = Sut::from(&mut toks);
 								        assert_eq!(
 								            Some(Err(ParseError::StateError(EchoStateError::InnerError(
 								                errtok
 								            )))),
 								            sut.next()
 								        );
 								        // The token must have been consumed.
 								        // It is up to a recovery process to either bail out or provide
 								        //   recovery tokens;
 								        //     continuing without recovery is unlikely to make sense.
 								        assert_eq!(0, toks.len());
 								    }
 								    #[test]
 								    fn fails_when_parser_is_finalized_in_non_accepting_state() {
-												tamer: xir::parse: UnexpectedEof Span at final offset

I'm not rendering errors yet in practice, so this wouldn't have been
noticed, but we want error messages to reference the final byte in a file on
EOF, not the offset of the last-encountered token, which would be confusing.

This doesn't _directly_ pertain to what I'm working on; I just happened to
notice it.

DEV-10863

											
										
										
											2022-03-17 21:33:05 -04:00
+								        let span = Span::new(10, 10, "ctx".intern());
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								        // Set up so that we have a single token that we can use for
 								        //   recovery as part of the same iterator.
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        let recovery = TestToken::Comment(DS);
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        let mut toks = [
 								            // Used purely to populate a Span.
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								            TestToken::Close(span),
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								            // Recovery token here:
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								            recovery.clone(),
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        ]
 								        .into_iter();
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        let mut sut = Sut::from(&mut toks);
 								        // Populate our most recently seen token's span.
 								        sut.next();
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								        // Attempting to finalize now in a non-accepting state should fail
 								        //   in the same way that encountering an end-of-stream does,
 								        //     since we're effectively saying "we're done with the stream"
 								        //     and the parser will have no further opportunity to reach an
 								        //     accepting state.
 								        let result = sut.finalize();
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        assert_matches!(
 								            result,
-												tamer: xir::parse: UnexpectedEof Span at final offset

I'm not rendering errors yet in practice, so this wouldn't have been
noticed, but we want error messages to reference the final byte in a file on
EOF, not the offset of the last-encountered token, which would be confusing.

This doesn't _directly_ pertain to what I'm working on; I just happened to
notice it.

DEV-10863

											
										
										
											2022-03-17 21:33:05 -04:00
+								            Err((_, ParseError::UnexpectedEof(s))) if s == span.endpoints().1
-												tamer: xir::tree::parse: EOF span

This stores the last seen Span and uses that when reporting EOF, so that the
user will be able to be notified of where exactly the problem occurred.

When I get into creating combinators, it'll be the responsibility of those
combinators to ensure that any None return value will be supplemented by its
own last span.

DEV-11268

											
										
										
											2021-12-06 15:34:29 -05:00
+								        );
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								        // The sut should have been re-returned,
 								        //   allowing for attempted error recovery if the caller can manage
 								        //   to produce a sequence of tokens that will be considered valid.
 								        // `toks` above is set up already for this,
 								        //   which allows us to assert that we received back the same `sut`.
 								        let mut sut = result.unwrap_err().0;
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								        assert_eq!(Some(Ok(Parsed::Object(recovery))), sut.next());
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
 								        // And so we should now be in an accepting state,
 								        //   able to finalize.
 								        assert!(sut.finalize().is_ok());
 								    }
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
 								    #[test]
 								    fn unhandled_dead_state_results_in_error() {
-												tamer: xir::Token::AttrEnd: Remove

More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339

											
										
										
											2021-12-17 10:14:31 -05:00
+								        // A Text will cause our parser to return Dead.
-												tamer: {xir::=>}parse: Move parser out of XIR

The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863

											
										
										
											2022-03-18 16:24:53 -04:00
+								        let tok = TestToken::Text(DS);
-												tamer: xir::tree: Integrate AttrParserState into Stack

Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268

											
										
										
											2021-12-16 09:44:02 -05:00
+								        let mut toks = once(tok.clone());
 								        let mut sut = Sut::from(&mut toks);
 								        // Our parser returns a Dead status,
 								        //   which is unhandled by any parent context
 								        //     (since we're not composing parsers),
 								        //     which causes an error due to an unhandled Dead state.
 								        assert_eq!(sut.next(), Some(Err(ParseError::UnexpectedToken(tok))),);
 								    }
-												tamer: xir:tree: Begin work on composable XIRT parser

The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR).  While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ.  Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.

When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate.  The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).

A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones.  The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.

TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate.  Specifically:

  1. Rust’s type system should be used as combinators, so that parsers are
  automatically constructed from the type definition.

  2. Primitive parsers are written as explicit automata, not as primitive
     combinators.

  3. Parsing should directly produce IRs as a lowering operation below XIRT,
     rather than producing XIRT itself.  That is, target IRs should consume
     XIRT and produce parse themselves immediately, during streaming.

In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now.  And, to be
honest, I’m hoping that won’t be necessary.

											
										
										
											2021-12-06 11:26:53 -05:00
+								}