tame/tamer/src/xir/parse/ele.rs

// XIR element parser generator
//
//  Copyright (C) 2014-2022 Ryan Specialty Group, LLC.
//
//  This file is part of TAME.
//
//  This program is free software: you can redistribute it and/or modify
//  it under the terms of the GNU General Public License as published by
//  the Free Software Foundation, either version 3 of the License, or
//  (at your option) any later version.
//
//  This program is distributed in the hope that it will be useful,
//  but WITHOUT ANY WARRANTY; without even the implied warranty of
//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
//  GNU General Public License for more details.
//
//  You should have received a copy of the GNU General Public License
//  along with this program.  If not, see <http://www.gnu.org/licenses/>.

//! Element parser generator for parsing of [XIRF](super::super::flat).

use arrayvec::ArrayVec;
use std::fmt::Display;

use crate::{
    diagnostic_panic,
    fmt::{DisplayWrapper, TtQuote},
    parse::{
        ClosedParseState, Context, ParseState, Transition, TransitionResult,
    },
    xir::{Prefix, QName},
};

#[cfg(doc)]
use crate::{ele_parse, parse::Parser};

/// A parser accepting a single element.
pub trait EleParseState: ParseState {}

/// Element parser configuration.
///
/// This configuration is set on a nonterminal reference using square
///   brackets
///     (e.g. `Foo[*]`).
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
pub struct EleParseCfg {
    /// Whether to allow zero-or-more repetition for this element.
    ///
    /// This is the Kleene star modifier (`*`).
    pub repeat: bool,
}

// This is an implementation detail for the internal state of EleParseState.
impl From<EleParseCfg> for () {
    fn from(_: EleParseCfg) -> Self {
        ()
    }
}

/// Maximum level of nesting for source XML trees.
///
/// Technically this is the maximum level of nesting for _parsing_ those
///   trees,
///     which may end up being less than this value.
///
/// This should be set to something reasonable,
///   but is not an alternative to coming up with code conventions that
///   disallow ridiculous levels of nesting.
/// TAME does have a lot of nesting with primitives,
///   but that nesting is easily abstracted with templates.
/// Templates may expand into ridiculous levels of nesting---this
///   has no impact on the template expansion phase.
///
/// Note that this is assuming that this parser is used only for TAME
///   sources.
/// If that's not the case,
///   this can be made to be configurable like XIRF.
pub const MAX_DEPTH: usize = 16;

/// Parser stack for trampoline.
///
/// This can be used as a call stack for parsers while avoiding creating
///   otherwise-recursive data structures with composition-based delegation.
/// However,
///   it is more similar to CPS,
///   in that the parser popped off the stack need not be the parser that
///     initiated the request and merely represents the next step in
///     a delayed computation.
/// If such a return context is unneeded,
///   a [`ParseState`] may implement tail calls by simply not pushing itself
///   onto the stack before requesting transfer to another [`ParseState`].
#[derive(Debug, Default)]
pub struct StateStack<S: ClosedParseState>(ArrayVec<S, MAX_DEPTH>);

pub type StateStackContext<S> = Context<StateStack<S>>;

// Note that public visibility is needed because `ele_parse` expands outside
//   of this module.
impl<S: ClosedParseState> StateStack<S> {
    /// Request a transfer to another [`ParseState`],
    ///   expecting that control be returned to `ret` after it has
    ///   completed.
    ///
    /// This can be reasoned about like calling a thunk:
    ///   the return [`ParseState`] is put onto the stack,
    ///   the target [`ParseState`] is used for the state transition to
    ///     cause [`Parser`] to perform the call to it,
    ///   and when it is done
    ///     (e.g. a dead state),
    ///     `ret` will be pop'd from the stack and we'll transition back to
    ///     it.
    /// Note that this method is not responsible for returning;
    ///   see [`Self::ret_or_dead`] to perform a return.
    ///
    /// However,
    ///   the calling [`ParseState`] is not responsible for its return,
    ///   unlike a typical function call.
    /// Instead,
    ///   this _actually_ more closely resembles CPS
    ///     (continuation passing style),
    ///     and so [`ele_parse!`] must be careful to ensure that stack
    ///     operations are properly paired.
    /// On the upside,
    ///   if something is erroneously `ret`'d,
    ///   the parser is guaranteed to be in a consistent state since the
    ///   entire state has been reified
    ///     (but the input would then be parsed incorrectly).
    ///
    /// Note that tail calls can be implemented by transferring control
    ///   without pushing an entry on the stack to return to,
    ///     but that hasn't been formalized \[yet\] and requires extra care.
    pub fn transfer_with_ret<SA, ST>(
        &mut self,
        Transition(ret): Transition<SA>,
        target: TransitionResult<ST>,
    ) -> TransitionResult<ST>
    where
        SA: ParseState<Super = S::Super>,
        ST: ParseState,
    {
        let Self(stack) = self;

        // TODO: Global configuration to (hopefully) ensure that XIRF will
        //   actually catch this.
        if stack.is_full() {
            // TODO: We need some spans here and ideally convert the
            //   parenthetical error message into a diagnostic footnote.
            // TODO: Or should we have a special error type that tells the
            //   parent `Parser` to panic with context?
            diagnostic_panic!(
                vec![],
                "maximum parsing depth of {} exceeded while attempting \
                   to push return state {} \
                   (expected XIRF configuration to prevent this error)",
                MAX_DEPTH,
                TtQuote::wrap(ret),
            );
        }

        stack.push(ret.into());
        target
    }

    /// Attempt to return to a previous [`ParseState`] that transferred
    ///   control away from itself,
    ///     otherwise yield a dead state transition to `deadst`.
    ///
    /// Conceptually,
    ///   this is like returning from a function call,
    ///   where the function was invoked using [`Self::transfer_with_ret`].
    /// However,
    ///   this system is more akin to CPS
    ///     (continuation passing style);
    ///       see [`Self::transfer_with_ret`] for important information.
    ///
    /// If there is no state to return to on the stack,
    ///   then it is assumed that we have received more input than expected
    ///   after having completed a full parse.
    pub fn ret_or_dead(
        &mut self,
        lookahead: S::Token,
        deadst: S,
    ) -> TransitionResult<S> {
        let Self(stack) = self;

        // This should certainly never happen unless there is a bug in the
        //   `ele_parse!` parser-generator,
        //     since it means that we're trying to return to a caller that
        //     does not exist.
        match stack.pop() {
            Some(st) => Transition(st).incomplete().with_lookahead(lookahead),
            None => Transition(deadst).dead(lookahead),
        }
    }

    /// Test every [`ParseState`] on the stack against the predicate `f`.
    pub fn all(&self, f: impl Fn(&S) -> bool) -> bool {
        let Self(stack) = self;
        stack[..].iter().all(f)
    }
}

/// Match some type of node.
#[derive(Debug, PartialEq, Eq)]
pub enum NodeMatcher {
    /// Static [`QName`] with a simple equality check.
    QName(QName),
    /// Any element with a matching [`Prefix`].
    Prefix(Prefix),
}

impl NodeMatcher {
    /// Match against the provided [`QName`].
    pub fn matches(&self, qname: QName) -> bool {
        match self {
            Self::QName(qn_match) if qn_match == &qname => true,
            Self::Prefix(prefix) if Some(*prefix) == qname.prefix() => true,
            _ => false,
        }
    }
}

impl From<QName> for NodeMatcher {
    fn from(qname: QName) -> Self {
        Self::QName(qname)
    }
}

impl From<Prefix> for NodeMatcher {
    fn from(prefix: Prefix) -> Self {
        Self::Prefix(prefix)
    }
}

impl Display for NodeMatcher {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        use crate::xir::fmt::XmlPrefixAnyLocal;

        match self {
            Self::QName(qname) => Display::fmt(qname, f),
            Self::Prefix(prefix) => XmlPrefixAnyLocal::fmt(prefix, f),
        }
    }
}

#[macro_export]
macro_rules! ele_parse {
    (
        $vis:vis enum $super:ident;

        // Attr has to be first to avoid ambiguity with `$rest`.
        $(type AttrValueError = $evty:ty;)?
        type Object = $objty:ty;

        $($rest:tt)*
    ) => {
        ele_parse! {@!next $vis $super
            $(type AttrValueError = $evty;)?
            type Object = $objty;
            $($rest)*
        }

        ele_parse!(@!super_sum <$objty> $vis $super $($rest)*);
    };

    (@!next $vis:vis $super:ident
        // Attr has to be first to avoid ambiguity with `$rest`.
        $(type AttrValueError = $evty:ty;)?
        type Object = $objty:ty;

        $($rest:tt)*
    ) => {
        ele_parse!(@!nonterm_decl <$objty, $($evty)?> $vis $super $($rest)*);
    };

    (@!nonterm_decl <$objty:ty, $($evty:ty)?>
        $vis:vis $super:ident $nt:ident := $($rest:tt)*
    ) => {
        ele_parse!(@!nonterm_def <$objty, $($evty)?> $vis $super $nt $($rest)*);
    };

    (@!nonterm_def <$objty:ty, $($evty:ty)?>
        $vis:vis $super:ident $nt:ident $qname:ident $(($($ntp:tt)*))?
        { $($matches:tt)* }; $($rest:tt)*
    ) => {
        ele_parse!(@!ele_expand_body <$objty, $($evty)?>
            $vis $super $nt $qname ($($($ntp)*)?) $($matches)*
        );

        ele_parse! {@!next $vis $super
            $(type AttrValueError = $evty;)?
            type Object = $objty;
            $($rest)*
        }
    };

    (@!nonterm_def <$objty:ty, $($evty:ty)?>
        $vis:vis $super:ident $nt:ident
        ($ntref_first:ident $(| $ntref:ident)+); $($rest:tt)*
    ) => {
        ele_parse!(@!ele_dfn_sum <$objty>
            $vis $super $nt [$ntref_first $($ntref)*]
        );

        ele_parse! {@!next $vis $super
            $(type AttrValueError = $evty;)?
            type Object = $objty;
            $($rest)*
        }
    };

    (@!nonterm_decl <$objty:ty, $($evty:ty)?> $vis:vis $super:ident) => {};

    // Expand the provided data to a more verbose form that provides the
    //   context necessary for state transitions.
    (@!ele_expand_body <$objty:ty, $($evty:ty)?>
        $vis:vis $super:ident $nt:ident $qname:ident ($($ntp:tt)*)

        @ { $($attrbody:tt)* } => $attrmap:expr,
        $(/$(($close_span:ident))? => $closemap:expr,)?

        // Special forms (`[sp](args) => expr`).
        $(
            [$special:ident]$(($($special_arg:ident),*))?
                => $special_map:expr,
        )?

        // Nonterminal references are provided as a list.
        // A configuration specifier can be provided,
        //   currently intended to support the Kleene star.
        $(
            $ntref:ident $([$ntref_cfg:tt])?,
        )*
    ) => {
        ele_parse! {
            @!ele_dfn_body <$objty, $($evty)?> $vis $super $nt $qname ($($ntp)*)
            @ { $($attrbody)* } => $attrmap,
            /$($($close_span)?)? => ele_parse!(@!ele_close $($closemap)?),

            $([$special]$(($($special_arg),*))? => $special_map,)?

            <> {
                $(
                    $ntref,
                )*
            }

            // Generate state transitions of the form `(S) -> (S')`.
            -> {
                @ ->
                $(
                    ($nt::$ntref, $ntref) [$($ntref_cfg)?],
                    ($nt::$ntref) ->
                )* ($nt::ExpectClose_, ()) [],
            }
        }
    };

    // No explicit Close mapping defaults to doing nothing at all
    //   (so yield Incomplete).
    (@!ele_close) => {
        crate::parse::ParseStatus::Incomplete
    };

    (@!ele_close $close:expr) => {
        crate::parse::ParseStatus::Object($close)
    };

    // NT[*] modifier.
    (@!ntref_cfg *) => {
        crate::xir::parse::EleParseCfg {
            repeat: true,
        }
    };

    // No bracketed modifier following NT.
    (@!ntref_cfg) => {
        crate::xir::parse::EleParseCfg {
            repeat: false,
        }
    };

    // Delegation when the destination type is `()`,
    //   indicating that the next state is not a child NT
    //   (it is likely the state expecting a closing tag).
    (@!ntref_delegate
        $stack:ident, $ret:expr, (), $_target:expr, $done:expr
    ) => {
        $done
    };

    // Delegate to a child parser by pushing self onto the stack and
    //   yielding to one of the child's states.
    // This uses a trampoline,
    //   which avoids recursive data structures
    //     (due to `ParseState` composition/stitching)
    //   and does not grow the call stack.
    (@!ntref_delegate
        $stack:ident, $ret:expr, $ntnext_st:ty, $target:expr, $_done:expr
    ) => {
        $stack.transfer_with_ret(
            Transition($ret),
            $target,
        )
    };

    (@!ele_dfn_body <$objty:ty, $($evty:ty)?>
        $vis:vis $super:ident $nt:ident $qname:ident
        ($($qname_matched:pat, $open_span:pat)?)

        // Attribute definition special form.
        @ {
            // We must lightly parse attributes here so that we can retrieve
            //   the field identifiers that may be later used as bindings in
            //   `$attrmap`.
            $(
                $(#[$fattr:meta])*
                $field:ident: ($($fmatch:tt)+) => $fty:ty,
            )*
        } => $attrmap:expr,

        // Close expression
        //   (defaulting to Incomplete via @!ele_expand_body).
        /$($close_span:ident)? => $closemap:expr,

        // Non-whitespace text nodes can be mapped into elements with the
        //   given QName as a preprocessing step,
        //     allowing them to reuse the existing element NT system.
        $([text]($text:ident, $text_span:ident) => $text_map:expr,)?

        // Nonterminal references.
        <> {
            $(
                $ntref:ident,
            )*
        }

        -> {
            @ -> ($ntfirst:path, $ntfirst_st:ty) [$($ntfirst_cfg:tt)?],
            $(
                ($ntprev:path) -> ($ntnext:path, $ntnext_st:ty) [$($ntnext_cfg:tt)?],
            )*
        }
    ) => {
        paste::paste! {
            crate::attr_parse! {
                vis($vis);
                $(type ValueError = $evty;)?

                struct [<$nt AttrsState_>] -> [<$nt Attrs_>] {
                    $(
                        $(#[$fattr])*
                        $field: ($($fmatch)+) => $fty,
                    )*
                }
            }

            #[doc=concat!("Parser for element [`", stringify!($qname), "`].")]
            #[derive(Debug, PartialEq, Eq)]
            $vis enum $nt {
                #[doc=concat!(
                    "Expecting opening tag for element [`",
                    stringify!($qname),
                    "`]."
                )]
                Expecting_(crate::xir::parse::EleParseCfg),
                /// Recovery state ignoring all remaining tokens for this
                ///   element.
                RecoverEleIgnore_(
                    crate::xir::parse::EleParseCfg,
                    crate::xir::QName,
                    crate::xir::OpenSpan,
                    crate::xir::flat::Depth
                ),
                // Recovery completed because end tag corresponding to the
                //   invalid element has been found.
                RecoverEleIgnoreClosed_(
                    crate::xir::parse::EleParseCfg,
                    crate::xir::QName,
                    crate::xir::CloseSpan
                ),
                /// Recovery state ignoring all tokens when a `Close` is
                ///   expected.
                ///
                /// This is token-agnostic---it
                ///   may be a child element,
                ///     but it may be text,
                ///     for example.
                CloseRecoverIgnore_(
                    (
                        crate::xir::parse::EleParseCfg,
                        crate::xir::QName,
                        crate::span::Span,
                        crate::xir::flat::Depth
                    ),
                    crate::span::Span
                ),
                /// Parsing element attributes.
                Attrs_(
                    (
                        crate::xir::parse::EleParseCfg,
                        crate::xir::QName,
                        crate::span::Span,
                        crate::xir::flat::Depth
                    ),
                    [<$nt AttrsState_>]
                ),
                $(
                    $ntref(
                        (
                            crate::xir::parse::EleParseCfg,
                            crate::xir::QName,
                            crate::span::Span,
                            crate::xir::flat::Depth
                        ),
                    ),
                )*
                ExpectClose_(
                    (
                        crate::xir::parse::EleParseCfg,
                        crate::xir::QName,
                        crate::span::Span,
                        crate::xir::flat::Depth
                    ),
                ),
                /// Closing tag found and parsing of the element is
                ///   complete.
                Closed_(
                    crate::xir::parse::EleParseCfg,
                    crate::xir::QName,
                    crate::span::Span
                ),
            }

            impl From<crate::xir::parse::EleParseCfg> for $nt {
                fn from(repeat: crate::xir::parse::EleParseCfg) -> Self {
                    Self::Expecting_(repeat)
                }
            }

            impl crate::xir::parse::EleParseState for $nt {}

            impl $nt {
                /// Matcher describing the node recognized by this parser.
                #[allow(dead_code)] // used by sum parser
                fn matcher() -> crate::xir::parse::NodeMatcher {
                    crate::xir::parse::NodeMatcher::from($qname)
                }

                /// Yield the expected depth of child elements,
                ///   if known.
                #[allow(dead_code)] // used by text special form
                fn child_depth(&self) -> Option<crate::xir::flat::Depth> {
                    match self {
                        $ntfirst((_, _, _, depth)) => Some(depth.child_depth()),
                        $(
                            $ntnext((_, _, _, depth)) => Some(depth.child_depth()),
                        )*
                        _ => None,
                    }
                }
            }

            impl std::fmt::Display for $nt {
                fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
                    use crate::{
                        fmt::{DisplayWrapper, TtQuote},
                        xir::fmt::{TtOpenXmlEle, TtCloseXmlEle},
                    };

                    match self {
                        Self::Expecting_(_) => write!(
                            f,
                            "expecting opening tag {}",
                            TtOpenXmlEle::wrap(Self::matcher()),
                        ),
                        Self::RecoverEleIgnore_(_, name, _, _)
                        | Self::RecoverEleIgnoreClosed_(_, name, _) => write!(
                            f,
                            "attempting to recover by ignoring element \
                               with unexpected name {given} \
                               (expected {expected})",
                            given = TtQuote::wrap(name),
                            expected = TtQuote::wrap(Self::matcher()),
                        ),
                        Self::CloseRecoverIgnore_((_, qname, _, depth), _) => write!(
                            f,
                            "attempting to recover by ignoring input \
                               until the expected end tag {expected} \
                               at depth {depth}",
                            expected = TtCloseXmlEle::wrap(qname),
                        ),

                        Self::Attrs_(_, sa) => std::fmt::Display::fmt(sa, f),
                        Self::ExpectClose_((_, qname, _, depth)) => write!(
                            f,
                            "expecting closing element {} at depth {depth}",
                            TtCloseXmlEle::wrap(qname)
                        ),
                        Self::Closed_(_, qname, _) => write!(
                            f,
                            "done parsing element {}",
                            TtQuote::wrap(qname),
                        ),
                        $(
                            // TODO: A better description.
                            Self::$ntref(_) => {
                                write!(
                                    f,
                                    "preparing to transition to \
                                       parser for next child element(s)"
                                )
                            },
                        )*
                    }
                }
            }

            #[derive(Debug, PartialEq)]
            $vis enum [<$nt Error_>] {
                /// An element was expected,
                ///   but the name of the element was unexpected.
                UnexpectedEle_(crate::xir::QName, crate::span::Span),

                /// Unexpected input while expecting an end tag for this
                ///   element.
                ///
                /// The span corresponds to the opening tag.
                CloseExpected_(
                    crate::xir::QName,
                    crate::span::Span,
                    crate::xir::flat::XirfToken<crate::xir::flat::RefinedText>,
                ),

                Attrs_(crate::xir::parse::AttrParseError<[<$nt AttrsState_>]>),
            }

            impl From<crate::xir::parse::AttrParseError<[<$nt AttrsState_>]>>
                for [<$nt Error_>]
            {
                fn from(
                    e: crate::xir::parse::AttrParseError<[<$nt AttrsState_>]>
                ) -> Self {
                    [<$nt Error_>]::Attrs_(e)
                }
            }

            impl std::error::Error for [<$nt Error_>] {
                fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
                    // TODO
                    None
                }
            }

            impl std::fmt::Display for [<$nt Error_>] {
                fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
                    use crate::{
                        fmt::{DisplayWrapper, TtQuote},
                        xir::fmt::{TtOpenXmlEle, TtCloseXmlEle},
                    };

                    match self {
                        Self::UnexpectedEle_(name, _) => write!(
                            f,
                            "unexpected {unexpected} (expecting {expected})",
                            unexpected = TtOpenXmlEle::wrap(name),
                            expected = TtOpenXmlEle::wrap($nt::matcher()),
                        ),
                        Self::CloseExpected_(qname, _, tok) => write!(
                            f,
                            "expected {}, but found {}",
                            TtCloseXmlEle::wrap(qname),
                            TtQuote::wrap(tok)
                        ),
                        Self::Attrs_(e) => std::fmt::Display::fmt(e, f),
                    }
                }
            }

            impl crate::diagnose::Diagnostic for [<$nt Error_>] {
                fn describe(&self) -> Vec<crate::diagnose::AnnotatedSpan> {
                    use crate::{
                        diagnose::Annotate,
                        fmt::{DisplayWrapper, TtQuote},
                        parse::Token,
                        xir::fmt::{TtCloseXmlEle},
                    };

                    match self {
                        Self::UnexpectedEle_(_, ospan) => ospan.error(
                            format!(
                                "expected {ele_name} here",
                                ele_name = TtQuote::wrap($nt::matcher())
                            )
                        ).into(),

                        Self::CloseExpected_(qname, span, tok) => vec![
                            span.note("element starts here"),
                            tok.span().error(format!(
                                "expected {}",
                                TtCloseXmlEle::wrap(qname),
                            )),
                        ],

                        Self::Attrs_(e) => e.describe(),
                    }
                }
            }

            impl crate::parse::ParseState for $nt {
                type Token = crate::xir::flat::XirfToken<
                    crate::xir::flat::RefinedText
                >;
                type Object = $objty;
                type Error = [<$nt Error_>];
                type Context = crate::xir::parse::StateStackContext<Self::Super>;
                type Super = $super;

                fn parse_token(
                    self,
                    tok: Self::Token,
                    #[allow(unused_variables)] // used only if child NTs
                    stack: &mut Self::Context,
                ) -> crate::parse::TransitionResult<Self::Super> {
                    use crate::{
                        parse::{EmptyContext, Transition, Transitionable},
                        xir::{
                            EleSpan,
                            flat::XirfToken,
                            parse::parse_attrs,
                        },
                    };

                    // Used only by _some_ expansions.
                    #[allow(unused_imports)]
                    use crate::xir::flat::Text;

                    use $nt::{
                        Attrs_, Expecting_, RecoverEleIgnore_,
                        CloseRecoverIgnore_, RecoverEleIgnoreClosed_,
                        ExpectClose_, Closed_
                    };

                    match (self, tok) {
                        (
                            Expecting_(cfg),
                            XirfToken::Open(qname, span, depth)
                        ) if $nt::matcher().matches(qname) => {
                            Transition(Attrs_(
                                (cfg, qname, span.tag_span(), depth),
                                parse_attrs(qname, span)
                            )).incomplete()
                        },

                        (
                            Closed_(cfg, ..),
                            XirfToken::Open(qname, span, depth)
                        ) if cfg.repeat && Self::matcher().matches(qname) => {
                            Transition(Attrs_(
                                (cfg, qname, span.tag_span(), depth),
                                parse_attrs(qname, span)
                            )).incomplete()
                        },

                        (
                            Expecting_(cfg),
                            XirfToken::Open(qname, span, depth)
                        ) => {
                            Transition(RecoverEleIgnore_(cfg, qname, span, depth)).err(
                                [<$nt Error_>]::UnexpectedEle_(qname, span.name_span())
                            )
                        },

                        (
                            RecoverEleIgnore_(cfg, qname, _, depth_open),
                            XirfToken::Close(_, span, depth_close)
                        ) if depth_open == depth_close => {
                            Transition(
                                RecoverEleIgnoreClosed_(cfg, qname, span)
                            ).incomplete()
                        },

                        (Attrs_(meta @ (_, qname, _, _), sa), tok) => {
                            sa.delegate_until_obj::<Self, _>(
                                tok,
                                EmptyContext,
                                |sa| Transition(Attrs_(meta, sa)),
                                || unreachable!("see ParseState::delegate_until_obj dead"),
                                |#[allow(unused_variables)] sa, attrs| {
                                    let obj = match attrs {
                                        // Attribute field bindings for `$attrmap`
                                        [<$nt Attrs_>] {
                                            $(
                                                $field,
                                            )*
                                        } => {
                                            // Optional `OpenSpan` binding
                                            let _ = qname; // avoid unused warning
                                            $(
                                                use crate::xir::parse::attr::AttrParseState;
                                                let $qname_matched = qname;
                                                let $open_span = sa.element_span();
                                            )?

                                            $attrmap
                                        },
                                    };

                                    // Lookahead is added by `delegate_until_obj`.
                                    ele_parse!(@!ntref_delegate
                                        stack,
                                        $ntfirst(meta),
                                        $ntfirst_st,
                                        Transition(
                                            Into::<$ntfirst_st>::into(
                                                ele_parse!(@!ntref_cfg $($ntfirst_cfg)?)
                                            )
                                        ).ok(obj),
                                        Transition($ntfirst(meta)).ok(obj)
                                    )
                                }
                            )
                        },

                        $(
                            ($ntprev(meta), tok) => {
                                ele_parse!(@!ntref_delegate
                                    stack,
                                    $ntnext(meta),
                                    $ntnext_st,
                                    Transition(
                                        Into::<$ntnext_st>::into(
                                            ele_parse!(@!ntref_cfg $($ntnext_cfg)?)
                                        )
                                    ).incomplete().with_lookahead(tok),
                                    Transition($ntnext(meta)).incomplete().with_lookahead(tok)
                                )
                            },
                        )*

                        // XIRF ensures proper nesting,
                        //   so we do not need to check the element name.
                        (
                            ExpectClose_((cfg, qname, _, depth))
                            | CloseRecoverIgnore_((cfg, qname, _, depth), _),
                            XirfToken::Close(_, span, tok_depth)
                        ) if tok_depth == depth => {
                            $(
                                let $close_span = span;
                            )?
                            $closemap.transition(Closed_(cfg, qname, span.tag_span()))
                        },

                        (ExpectClose_(meta @ (_, qname, otspan, _)), unexpected_tok) => {
                            use crate::parse::Token;
                            Transition(
                                CloseRecoverIgnore_(meta, unexpected_tok.span())
                            ).err([<$nt Error_>]::CloseExpected_(qname, otspan, unexpected_tok))
                        }

                        // We're still in recovery,
                        //   so this token gets thrown out.
                        (st @ (RecoverEleIgnore_(..) | CloseRecoverIgnore_(..)), _) => {
                            Transition(st).incomplete()
                        },

                        // TODO: Use `is_accepting` guard if we do not utilize
                        //   exhaustiveness check.
                        (st @ (Closed_(..) | RecoverEleIgnoreClosed_(..)), tok) => {
                            Transition(st).dead(tok)
                        }

                        todo => todo!("{todo:?}"),
                    }
                }

                fn is_accepting(&self, _: &Self::Context) -> bool {
                    matches!(*self, Self::Closed_(..) | Self::RecoverEleIgnoreClosed_(..))
                }
            }
        }
    };

    (@!ele_dfn_sum <$objty:ty> $vis:vis $super:ident $nt:ident [$($ntref:ident)*]) => {
        $(
            // Provide a (hopefully) helpful error that can be corrected
            //   rather than any obscure errors that may follow from trying
            //   to compose parsers that were not generated with this macro.
            assert_impl_all!($ntref: crate::xir::parse::EleParseState);
        )*

        paste::paste! {
            #[doc=concat!(
                "Parser expecting one of ",
                $("[`", stringify!($ntref), "`], ",)*
                "."
            )]
            #[derive(Debug, PartialEq, Eq)]
            $vis enum $nt {
                Expecting_(crate::xir::parse::EleParseCfg),
                /// Recovery state ignoring all remaining tokens for this
                ///   element.
                RecoverEleIgnore_(
                    crate::xir::parse::EleParseCfg,
                    crate::xir::QName,
                    crate::xir::OpenSpan,
                    crate::xir::flat::Depth,
                ),
                RecoverEleIgnoreClosed_(
                    crate::xir::parse::EleParseCfg,
                    crate::xir::QName,
                    crate::xir::CloseSpan
                ),
                /// Inner element has been parsed and is dead;
                ///   this indicates that this parser is also dead.
                Done_,
            }

            impl std::fmt::Display for $nt {
                fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
                    use crate::{
                        fmt::{DisplayWrapper, ListDisplayWrapper, TtQuote},
                        xir::fmt::EleSumList,
                    };

                    let ntrefs = [
                        $(
                            $ntref::matcher(),
                        )*
                    ];
                    let expected = EleSumList::wrap(&ntrefs);

                    match self {
                        Self::Expecting_(_) => {
                            write!(f, "expecting {expected}")
                        },

                        Self::RecoverEleIgnore_(_, name, _, _)
                        | Self::RecoverEleIgnoreClosed_(_, name, _) => write!(
                            f,
                            "attempting to recover by ignoring element \
                               with unexpected name {given} \
                               (expected {expected})",
                            given = TtQuote::wrap(name),
                        ),

                        Self::Done_ => write!(f, "done parsing {expected}"),
                    }
                }
            }

            impl From<crate::xir::parse::EleParseCfg> for $nt {
                fn from(repeat: crate::xir::parse::EleParseCfg) -> Self {
                    Self::Expecting_(repeat)
                }
            }

            #[derive(Debug, PartialEq)]
            $vis enum [<$nt Error_>] {
                UnexpectedEle_(crate::xir::QName, crate::span::Span),
            }

            impl std::error::Error for [<$nt Error_>] {
                fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
                    // TODO
                    None
                }
            }

            impl std::fmt::Display for [<$nt Error_>] {
                fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
                    use crate::{
                        fmt::DisplayWrapper,
                        xir::fmt::TtOpenXmlEle,
                    };

                    match self {
                        Self::UnexpectedEle_(qname, _) => {
                            write!(f, "unexpected {}", TtOpenXmlEle::wrap(qname))
                        },
                    }
                }
            }

            impl crate::diagnose::Diagnostic for [<$nt Error_>] {
                fn describe(&self) -> Vec<crate::diagnose::AnnotatedSpan> {
                    use crate::{
                        diagnose::Annotate,
                        fmt::{DisplayWrapper, ListDisplayWrapper, TtQuote},
                        xir::fmt::EleSumList,
                    };

                    let ntrefs = [
                        $(
                            $ntref::matcher(),
                        )*
                    ];
                    let expected = EleSumList::wrap(&ntrefs);

                    match self {
                        Self::UnexpectedEle_(qname, span) => {
                            span
                                .error(format!(
                                    "element {name} cannot appear here",
                                    name = TtQuote::wrap(qname),
                                ))
                                .with_help(format!("expecting {expected}"))
                                .into()
                        },
                    }
                }
            }

            impl crate::parse::ParseState for $nt {
                type Token = crate::xir::flat::XirfToken<
                    crate::xir::flat::RefinedText
                >;
                type Object = $objty;
                type Error = [<$nt Error_>];
                type Context = crate::xir::parse::StateStackContext<Self::Super>;
                type Super = $super;

                fn parse_token(
                    self,
                    tok: Self::Token,
                    stack: &mut Self::Context,
                ) -> crate::parse::TransitionResult<Self::Super> {
                    use crate::{
                        parse::Transition,
                        xir::{
                            flat::XirfToken,
                            parse::EleParseCfg,
                        },
                    };

                    use $nt::{
                        Expecting_, RecoverEleIgnore_,
                        RecoverEleIgnoreClosed_, Done_
                    };

                    match (self, tok) {
                        $(
                            (
                                Expecting_(cfg),
                                XirfToken::Open(qname, span, depth)
                            ) if $ntref::matcher().matches(qname) => {
                                ele_parse!(@!ntref_delegate
                                    stack,
                                    match cfg.repeat {
                                        true => Expecting_(cfg),
                                        false => Done_,
                                    },
                                    $ntref,
                                    Transition(
                                        $ntref::from(
                                            EleParseCfg::default()
                                        )
                                    ).incomplete().with_lookahead(
                                        XirfToken::Open(qname, span, depth)
                                    ),
                                    unreachable!("TODO: remove me (ntref_delegate done)")
                                )
                            },
                        )*

                        // An unexpected token when repeating ends
                        //   repetition and should not result in an error.
                        (Expecting_(cfg), tok) if cfg.repeat => {
                            Transition(Done_).dead(tok)
                        }

                        (Expecting_(cfg), XirfToken::Open(qname, span, depth)) => {
                            use crate::xir::EleSpan;
                            Transition(RecoverEleIgnore_(cfg, qname, span, depth)).err(
                                // Use name span rather than full `OpenSpan`
                                //   since it's specifically the name that
                                //   was unexpected,
                                //     not the fact that it's an element.
                                [<$nt Error_>]::UnexpectedEle_(qname, span.name_span())
                            )
                        },

                        // XIRF ensures that the closing tag matches the opening,
                        //   so we need only check depth.
                        (
                            RecoverEleIgnore_(cfg, qname, _, depth_open),
                            XirfToken::Close(_, span, depth_close)
                        ) if depth_open == depth_close => {
                            Transition(RecoverEleIgnoreClosed_(cfg, qname, span)).incomplete()
                        },

                        (st @ RecoverEleIgnore_(..), _) => {
                            Transition(st).incomplete()
                        },

                        (
                            st @ (Done_ | RecoverEleIgnoreClosed_(..)),
                            tok
                        ) => Transition(st).dead(tok),

                        todo => todo!("sum {todo:?}"),
                    }
                }

                fn is_accepting(&self, _: &Self::Context) -> bool {
                    match self {
                        Self::RecoverEleIgnoreClosed_(..) | Self::Done_ => true,
                        _ => false,
                    }
                }
            }
        }
    };

    // Generate superstate sum type.
    //
    // This is really annoying because we cannot read the output of another
    //   macro,
    //     and so we have to do our best to re-parse the body of the
    //     original `ele_parse!` invocation without duplicating too much
    //     logic,
    //       and we have to do so in a way that we can aggregate all of
    //       those data.
    (@!super_sum <$objty:ty> $vis:vis $super:ident
        $(
            // NT definition is always followed by `:=`.
            $nt:ident :=
                // Identifier if an element NT.
                $($_i:ident)?
                // Parenthesis for a sum NT,
                //   or possibly the span match for an element NT.
                // So: `:= QN_IDENT(span)` or `:= (A | B | C)`.
                $( ($($_p:tt)*) )?
                // Braces for an element NT body.
                $( {$($_b:tt)*} )?
            // Element and sum NT both conclude with a semicolon,
            //   which we need to disambiguate the next `$nt`.
            ;
        )*
    ) => {
        paste::paste! {
            /// Superstate representing the union of all related parsers.
            ///
            /// This [`ParseState`] allows sub-parsers to independently
            ///   the states associated with their own subgraph,
            ///     and then yield a state transition directly to a state of
            ///     another parser.
            /// This is conceptually like CPS (continuation passing style),
            ///   where this [`ParseState`] acts as a trampoline.
            ///
            /// This [`ParseState`] is required for use with [`Parser`];
            ///   see [`ClosedParseState`] for more information.
            #[derive(Debug, PartialEq, Eq)]
            $vis enum $super {
                $(
                    $nt($nt),
                )*
            }

            // Default parser is the first NT.
            impl Default for $super {
                fn default() -> Self {
                    use $super::*;
                    ele_parse!(@!ntfirst $($nt)*)(
                        crate::xir::parse::EleParseCfg::default().into()
                    )
                }
            }

            $(
                impl From<$nt> for $super {
                    fn from(st: $nt) -> Self {
                        $super::$nt(st)
                    }
                }
            )*

            impl std::fmt::Display for $super {
                fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
                    match self {
                        $(
                            Self::$nt(e) => std::fmt::Display::fmt(e, f),
                        )*
                    }
                }
            }

            /// Superstate error object representing the union of all
            ///   related parsers' errors.
            #[derive(Debug, PartialEq)]
            $vis enum [<$super Error_>] {
                $(
                    $nt([<$nt Error_>]),
                )*
            }

            $(
                impl From<[<$nt Error_>]> for [<$super Error_>] {
                    fn from(e: [<$nt Error_>]) -> Self {
                        [<$super Error_>]::$nt(e)
                    }
                }
            )*

            impl std::error::Error for [<$super Error_>] {
                fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
                    // TODO
                    None
                }
            }

            impl std::fmt::Display for [<$super Error_>] {
                fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
                    match self {
                        $(
                            Self::$nt(e) => std::fmt::Display::fmt(e, f),
                        )*
                    }
                }
            }

            impl crate::diagnose::Diagnostic for [<$super Error_>] {
                fn describe(&self) -> Vec<crate::diagnose::AnnotatedSpan> {
                    match self {
                        $(
                            Self::$nt(e) => e.describe(),
                        )*
                    }
                }
            }

            impl crate::parse::ParseState for $super {
                type Token = crate::xir::flat::XirfToken<
                    crate::xir::flat::RefinedText
                >;
                type Object = $objty;
                type Error = [<$super Error_>];
                type Context = crate::xir::parse::StateStackContext<Self>;

                fn parse_token(
                    self,
                    tok: Self::Token,
                    stack: &mut Self::Context,
                ) -> crate::parse::TransitionResult<Self> {
                    use crate::{
                        parse::Transition,
                        xir::flat::{XirfToken, RefinedText},
                    };

                    match (self, tok) {
                        // Depth check is unnecessary since _all_ xir::parse
                        //   parsers
                        //     (at least at the time of writing)
                        //     ignore whitespace and comments,
                        //       so may as well return early.
                        // TODO: I'm ignoring _all_ text for now to
                        //   proceed with development; fix.
                        (
                            st,
                            XirfToken::Text(RefinedText::Whitespace(..), _)
                            | XirfToken::Text(RefinedText::Unrefined(..), _) // XXX
                            | XirfToken::Comment(..)
                        ) => {
                            Transition(st).incomplete()
                        }

                        $(
                            // Pass token directly to child until it reports
                            //   a dead state,
                            //     after which we return to the `ParseState`
                            //     atop of the stack.
                            (Self::$nt(st), tok) => st.delegate_child(
                                tok,
                                stack,
                                |deadst, tok, stack| {
                                    stack.ret_or_dead(tok, deadst)
                                },
                            ),
                        )*
                    }
                }

                fn is_accepting(&self, stack: &Self::Context) -> bool {
                    // This is short-circuiting,
                    //   starting at the _bottom_ of the stack and
                    //   moving upward.
                    // The idea is that,
                    //   is we're still in the middle of parsing,
                    //   then it's almost certain that the [`ParseState`] on
                    //     the bottom of the stack will not be in an
                    //     accepting state,
                    //       and so we can stop checking early.
                    // In most cases,
                    //   if we haven't hit EOF early,
                    //   the stack should be either empty or consist of only
                    //     the root state.
                    //
                    // After having considered the stack,
                    //   we can then consider the active `ParseState`.
                    stack.all(|st| st.is_inner_accepting(stack))
                        && self.is_inner_accepting(stack)
                }
            }

            impl $super {
                /// Whether the inner (active child) [`ParseState`] is in an
                ///   accepting state.
                fn is_inner_accepting(
                    &self,
                    ctx: &<Self as crate::parse::ParseState>::Context
                ) -> bool {
                    use crate::parse::ParseState;

                    match self {
                        $(
                            Self::$nt(st) => st.is_accepting(ctx),
                        )*
                    }
                }
            }
        }
    };

    (@!ntfirst $ntfirst:ident $($nt:ident)*) => {
        $ntfirst
    }
}

#[cfg(test)]
mod test;
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								// XIR element parser generator
 								//
 								//  Copyright (C) 2014-2022 Ryan Specialty Group, LLC.
 								//
 								//  This file is part of TAME.
 								//
 								//  This program is free software: you can redistribute it and/or modify
 								//  it under the terms of the GNU General Public License as published by
 								//  the Free Software Foundation, either version 3 of the License, or
 								//  (at your option) any later version.
 								//
 								//  This program is distributed in the hope that it will be useful,
 								//  but WITHOUT ANY WARRANTY; without even the implied warranty of
 								//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 								//  GNU General Public License for more details.
 								//
 								//  You should have received a copy of the GNU General Public License
 								//  along with this program.  If not, see <http://www.gnu.org/licenses/>.
 								//! Element parser generator for parsing of [XIRF](super::super::flat).
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								use arrayvec::ArrayVec;
-												tamer: xir::parse::ele: Abstract node matching

This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes.  It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).

DEV-7145

											
										
										
											2022-08-10 13:50:40 -04:00
+								use std::fmt::Display;
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
 								use crate::{
 								    diagnostic_panic,
 								    fmt::{DisplayWrapper, TtQuote},
 								    parse::{
-												tamer: xir::parse::ele: Hoist whitespace/comment handling to superstate

All child parsers do the same thing, so this simplifies things.

DEV-7145

											
										
										
											2022-08-11 13:00:07 -04:00
+								        ClosedParseState, Context, ParseState, Transition, TransitionResult,
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								    },
-												tamer: xir::parse::ele: Initial namespace prefix matching support

This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`.  This works, but is missing a couple notable things (and
possibly more):

  1. Tracking the QName that is _actually_ matched so that it can be used in
     messages stating what the expected closing tag is; and
  2. Making that QName available via a binding.

This will be used to match on `t:*` in NIR.  If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.

DEV-7145

											
										
										
											2022-08-10 16:50:02 -04:00
+								    xir::{Prefix, QName},
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								};
 								#[cfg(doc)]
 								use crate::{ele_parse, parse::Parser};
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
 								/// A parser accepting a single element.
 								pub trait EleParseState: ParseState {}
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								/// Element parser configuration.
 								///
 								/// This configuration is set on a nonterminal reference using square
 								///   brackets
 								///     (e.g. `Foo[*]`).
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								pub struct EleParseCfg {
 								    /// Whether to allow zero-or-more repetition for this element.
 								    ///
 								    /// This is the Kleene star modifier (`*`).
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								    pub repeat: bool,
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								}
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								// This is an implementation detail for the internal state of EleParseState.
 								impl From<EleParseCfg> for () {
 								    fn from(_: EleParseCfg) -> Self {
 								        ()
 								    }
 								}
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								/// Maximum level of nesting for source XML trees.
 								///
 								/// Technically this is the maximum level of nesting for _parsing_ those
 								///   trees,
 								///     which may end up being less than this value.
 								///
 								/// This should be set to something reasonable,
 								///   but is not an alternative to coming up with code conventions that
 								///   disallow ridiculous levels of nesting.
 								/// TAME does have a lot of nesting with primitives,
 								///   but that nesting is easily abstracted with templates.
 								/// Templates may expand into ridiculous levels of nesting---this
 								///   has no impact on the template expansion phase.
 								///
 								/// Note that this is assuming that this parser is used only for TAME
 								///   sources.
 								/// If that's not the case,
 								///   this can be made to be configurable like XIRF.
 								pub const MAX_DEPTH: usize = 16;
 								/// Parser stack for trampoline.
 								///
 								/// This can be used as a call stack for parsers while avoiding creating
 								///   otherwise-recursive data structures with composition-based delegation.
 								/// However,
 								///   it is more similar to CPS,
 								///   in that the parser popped off the stack need not be the parser that
 								///     initiated the request and merely represents the next step in
 								///     a delayed computation.
 								/// If such a return context is unneeded,
 								///   a [`ParseState`] may implement tail calls by simply not pushing itself
 								///   onto the stack before requesting transfer to another [`ParseState`].
 								#[derive(Debug, Default)]
 								pub struct StateStack<S: ClosedParseState>(ArrayVec<S, MAX_DEPTH>);
 								pub type StateStackContext<S> = Context<StateStack<S>>;
 								// Note that public visibility is needed because `ele_parse` expands outside
 								//   of this module.
 								impl<S: ClosedParseState> StateStack<S> {
 								    /// Request a transfer to another [`ParseState`],
 								    ///   expecting that control be returned to `ret` after it has
 								    ///   completed.
 								    ///
 								    /// This can be reasoned about like calling a thunk:
 								    ///   the return [`ParseState`] is put onto the stack,
 								    ///   the target [`ParseState`] is used for the state transition to
 								    ///     cause [`Parser`] to perform the call to it,
 								    ///   and when it is done
 								    ///     (e.g. a dead state),
 								    ///     `ret` will be pop'd from the stack and we'll transition back to
 								    ///     it.
 								    /// Note that this method is not responsible for returning;
-												tamer: xir::parse::ele: Correct handling of sum dead state post-recovery

Along with this change we also had to change how we handle dead states in
the superstate.  So there were two problems here:

  1. Sum states were not yielding a dead state after recovery, which meant
     that parsing was unable to continue (we still have a `todo!`); and
  2. The superstate considered it an error when there was nothing left on
     the stack, because I assumed that ought not happen.

Regarding #2---it _shouldn't_ happen, _unless_ we have extra input after we
have completed parsing.  Which happens to be the case for this test case,
but more importantly, we shouldn't be panicing with errors about TAMER bugs
if somebody puts extra input after a closing root tag in a source file.

DEV-7145

											
										
										
											2022-08-11 10:39:00 -04:00
+								    ///   see [`Self::ret_or_dead`] to perform a return.
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								    ///
 								    /// However,
 								    ///   the calling [`ParseState`] is not responsible for its return,
 								    ///   unlike a typical function call.
 								    /// Instead,
 								    ///   this _actually_ more closely resembles CPS
 								    ///     (continuation passing style),
 								    ///     and so [`ele_parse!`] must be careful to ensure that stack
 								    ///     operations are properly paired.
 								    /// On the upside,
 								    ///   if something is erroneously `ret`'d,
 								    ///   the parser is guaranteed to be in a consistent state since the
 								    ///   entire state has been reified
 								    ///     (but the input would then be parsed incorrectly).
 								    ///
 								    /// Note that tail calls can be implemented by transferring control
 								    ///   without pushing an entry on the stack to return to,
 								    ///     but that hasn't been formalized \[yet\] and requires extra care.
 								    pub fn transfer_with_ret<SA, ST>(
 								        &mut self,
 								        Transition(ret): Transition<SA>,
 								        target: TransitionResult<ST>,
 								    ) -> TransitionResult<ST>
 								    where
 								        SA: ParseState<Super = S::Super>,
 								        ST: ParseState,
 								    {
 								        let Self(stack) = self;
 								        // TODO: Global configuration to (hopefully) ensure that XIRF will
 								        //   actually catch this.
 								        if stack.is_full() {
 								            // TODO: We need some spans here and ideally convert the
 								            //   parenthetical error message into a diagnostic footnote.
 								            // TODO: Or should we have a special error type that tells the
 								            //   parent `Parser` to panic with context?
 								            diagnostic_panic!(
 								                vec![],
 								                "maximum parsing depth of {} exceeded while attempting \
 								                   to push return state {} \
 								                   (expected XIRF configuration to prevent this error)",
 								                MAX_DEPTH,
 								                TtQuote::wrap(ret),
 								            );
 								        }
 								        stack.push(ret.into());
 								        target
 								    }
-												tamer: xir::parse::ele: Correct handling of sum dead state post-recovery

Along with this change we also had to change how we handle dead states in
the superstate.  So there were two problems here:

  1. Sum states were not yielding a dead state after recovery, which meant
     that parsing was unable to continue (we still have a `todo!`); and
  2. The superstate considered it an error when there was nothing left on
     the stack, because I assumed that ought not happen.

Regarding #2---it _shouldn't_ happen, _unless_ we have extra input after we
have completed parsing.  Which happens to be the case for this test case,
but more importantly, we shouldn't be panicing with errors about TAMER bugs
if somebody puts extra input after a closing root tag in a source file.

DEV-7145

											
										
										
											2022-08-11 10:39:00 -04:00
+								    /// Attempt to return to a previous [`ParseState`] that transferred
 								    ///   control away from itself,
 								    ///     otherwise yield a dead state transition to `deadst`.
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								    ///
 								    /// Conceptually,
 								    ///   this is like returning from a function call,
 								    ///   where the function was invoked using [`Self::transfer_with_ret`].
 								    /// However,
 								    ///   this system is more akin to CPS
 								    ///     (continuation passing style);
 								    ///       see [`Self::transfer_with_ret`] for important information.
-												tamer: xir::parse::ele: Correct handling of sum dead state post-recovery

Along with this change we also had to change how we handle dead states in
the superstate.  So there were two problems here:

  1. Sum states were not yielding a dead state after recovery, which meant
     that parsing was unable to continue (we still have a `todo!`); and
  2. The superstate considered it an error when there was nothing left on
     the stack, because I assumed that ought not happen.

Regarding #2---it _shouldn't_ happen, _unless_ we have extra input after we
have completed parsing.  Which happens to be the case for this test case,
but more importantly, we shouldn't be panicing with errors about TAMER bugs
if somebody puts extra input after a closing root tag in a source file.

DEV-7145

											
										
										
											2022-08-11 10:39:00 -04:00
+								    ///
 								    /// If there is no state to return to on the stack,
 								    ///   then it is assumed that we have received more input than expected
 								    ///   after having completed a full parse.
 								    pub fn ret_or_dead(
 								        &mut self,
 								        lookahead: S::Token,
 								        deadst: S,
 								    ) -> TransitionResult<S> {
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								        let Self(stack) = self;
 								        // This should certainly never happen unless there is a bug in the
 								        //   `ele_parse!` parser-generator,
 								        //     since it means that we're trying to return to a caller that
 								        //     does not exist.
-												tamer: xir::parse::ele: Correct handling of sum dead state post-recovery

Along with this change we also had to change how we handle dead states in
the superstate.  So there were two problems here:

  1. Sum states were not yielding a dead state after recovery, which meant
     that parsing was unable to continue (we still have a `todo!`); and
  2. The superstate considered it an error when there was nothing left on
     the stack, because I assumed that ought not happen.

Regarding #2---it _shouldn't_ happen, _unless_ we have extra input after we
have completed parsing.  Which happens to be the case for this test case,
but more importantly, we shouldn't be panicing with errors about TAMER bugs
if somebody puts extra input after a closing root tag in a source file.

DEV-7145

											
										
										
											2022-08-11 10:39:00 -04:00
+								        match stack.pop() {
 								            Some(st) => Transition(st).incomplete().with_lookahead(lookahead),
 								            None => Transition(deadst).dead(lookahead),
 								        }
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								    }
-												tamer: xir::parse::ele: Superstate not to accept early EOF

This was accepting an early EOF when the active child `ParseState` was in an
accepting state, because it was not ensuring that anything on the stack was
also accepting.

Ideally, there should be nothing on the stack, and hopefully in the future
that's what happens.  But with how things are today, it's important that, if
anything is on the stack, it is accepting.

Since `is_accepting` on the superstate is only called during finalization,
and because the check terminates early, and because the stack practically
speaking will only have a couple things on it max (unless we're in tail
position in a deeply nested tree, without TCO [yet]), this shouldn't be an
expensive check.

Implementing this did require that we expose `Context` to `is_accepting`,
which I had hoped to avoid having to do, but here we are.

DEV-7145

											
										
										
											2022-08-11 13:49:11 -04:00
 								    /// Test every [`ParseState`] on the stack against the predicate `f`.
 								    pub fn all(&self, f: impl Fn(&S) -> bool) -> bool {
 								        let Self(stack) = self;
 								        stack[..].iter().all(f)
 								    }
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								}
-												tamer: xir::parse::ele: Abstract node matching

This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes.  It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).

DEV-7145

											
										
										
											2022-08-10 13:50:40 -04:00
+								/// Match some type of node.
 								#[derive(Debug, PartialEq, Eq)]
 								pub enum NodeMatcher {
 								    /// Static [`QName`] with a simple equality check.
 								    QName(QName),
-												tamer: xir::parse::ele: Initial namespace prefix matching support

This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`.  This works, but is missing a couple notable things (and
possibly more):

  1. Tracking the QName that is _actually_ matched so that it can be used in
     messages stating what the expected closing tag is; and
  2. Making that QName available via a binding.

This will be used to match on `t:*` in NIR.  If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.

DEV-7145

											
										
										
											2022-08-10 16:50:02 -04:00
+								    /// Any element with a matching [`Prefix`].
 								    Prefix(Prefix),
-												tamer: xir::parse::ele: Abstract node matching

This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes.  It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).

DEV-7145

											
										
										
											2022-08-10 13:50:40 -04:00
+								}
 								impl NodeMatcher {
 								    /// Match against the provided [`QName`].
 								    pub fn matches(&self, qname: QName) -> bool {
-												tamer: xir::parse::ele: Initial namespace prefix matching support

This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`.  This works, but is missing a couple notable things (and
possibly more):

  1. Tracking the QName that is _actually_ matched so that it can be used in
     messages stating what the expected closing tag is; and
  2. Making that QName available via a binding.

This will be used to match on `t:*` in NIR.  If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.

DEV-7145

											
										
										
											2022-08-10 16:50:02 -04:00
+								        match self {
 								            Self::QName(qn_match) if qn_match == &qname => true,
 								            Self::Prefix(prefix) if Some(*prefix) == qname.prefix() => true,
 								            _ => false,
 								        }
-												tamer: xir::parse::ele: Abstract node matching

This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes.  It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).

DEV-7145

											
										
										
											2022-08-10 13:50:40 -04:00
+								    }
 								}
 								impl From<QName> for NodeMatcher {
 								    fn from(qname: QName) -> Self {
 								        Self::QName(qname)
 								    }
 								}
-												tamer: xir::parse::ele: Initial namespace prefix matching support

This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`.  This works, but is missing a couple notable things (and
possibly more):

  1. Tracking the QName that is _actually_ matched so that it can be used in
     messages stating what the expected closing tag is; and
  2. Making that QName available via a binding.

This will be used to match on `t:*` in NIR.  If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.

DEV-7145

											
										
										
											2022-08-10 16:50:02 -04:00
+								impl From<Prefix> for NodeMatcher {
 								    fn from(prefix: Prefix) -> Self {
 								        Self::Prefix(prefix)
 								    }
 								}
-												tamer: xir::parse::ele: Abstract node matching

This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes.  It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).

DEV-7145

											
										
										
											2022-08-10 13:50:40 -04:00
+								impl Display for NodeMatcher {
 								    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
-												tamer: xir::parse::ele: Initial namespace prefix matching support

This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`.  This works, but is missing a couple notable things (and
possibly more):

  1. Tracking the QName that is _actually_ matched so that it can be used in
     messages stating what the expected closing tag is; and
  2. Making that QName available via a binding.

This will be used to match on `t:*` in NIR.  If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.

DEV-7145

											
										
										
											2022-08-10 16:50:02 -04:00
+								        use crate::xir::fmt::XmlPrefixAnyLocal;
-												tamer: xir::parse::ele: Abstract node matching

This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes.  It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).

DEV-7145

											
										
										
											2022-08-10 13:50:40 -04:00
+								        match self {
 								            Self::QName(qname) => Display::fmt(qname, f),
-												tamer: xir::parse::ele: Initial namespace prefix matching support

This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`.  This works, but is missing a couple notable things (and
possibly more):

  1. Tracking the QName that is _actually_ matched so that it can be used in
     messages stating what the expected closing tag is; and
  2. Making that QName available via a binding.

This will be used to match on `t:*` in NIR.  If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.

DEV-7145

											
										
										
											2022-08-10 16:50:02 -04:00
+								            Self::Prefix(prefix) => XmlPrefixAnyLocal::fmt(prefix, f),
-												tamer: xir::parse::ele: Abstract node matching

This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes.  It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).

DEV-7145

											
										
										
											2022-08-10 13:50:40 -04:00
+								        }
 								    }
 								}
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								#[macro_export]
 								macro_rules! ele_parse {
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
+								    (
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								        $vis:vis enum $super:ident;
-												tamer: xir::parse::ele: Visibility specifier

We need to be able to export generated identifiers.  Trying to figure out a
syntax for this was a bit tricky considering how much is generated, so I
just settled on something that's reasonably clear and easy to parse with
`macro_rules!`.

I had intended to just make everything public by default and encapsulate
using private modules, but that then required making everything else that it
uses public (e.g. error and token objects), which would have been a bizarre
thing to do in e.g. test cases.

DEV-7145

											
										
										
											2022-07-21 14:56:43 -04:00
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
+								        // Attr has to be first to avoid ambiguity with `$rest`.
 								        $(type AttrValueError = $evty:ty;)?
 								        type Object = $objty:ty;
 								        $($rest:tt)*
 								    ) => {
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								        ele_parse! {@!next $vis $super
 								            $(type AttrValueError = $evty;)?
 								            type Object = $objty;
 								            $($rest)*
 								        }
 								        ele_parse!(@!super_sum <$objty> $vis $super $($rest)*);
 								    };
 								    (@!next $vis:vis $super:ident
 								        // Attr has to be first to avoid ambiguity with `$rest`.
 								        $(type AttrValueError = $evty:ty;)?
 								        type Object = $objty:ty;
 								        $($rest:tt)*
 								    ) => {
 								        ele_parse!(@!nonterm_decl <$objty, $($evty)?> $vis $super $($rest)*);
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								    };
-												tamer: xir::parse::ele: Visibility specifier

We need to be able to export generated identifiers.  Trying to figure out a
syntax for this was a bit tricky considering how much is generated, so I
just settled on something that's reasonably clear and easy to parse with
`macro_rules!`.

I had intended to just make everything public by default and encapsulate
using private modules, but that then required making everything else that it
uses public (e.g. error and token objects), which would have been a bizarre
thing to do in e.g. test cases.

DEV-7145

											
										
										
											2022-07-21 14:56:43 -04:00
+								    (@!nonterm_decl <$objty:ty, $($evty:ty)?>
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								        $vis:vis $super:ident $nt:ident := $($rest:tt)*
-												tamer: xir::parse::ele: Visibility specifier

We need to be able to export generated identifiers.  Trying to figure out a
syntax for this was a bit tricky considering how much is generated, so I
just settled on something that's reasonably clear and easy to parse with
`macro_rules!`.

I had intended to just make everything public by default and encapsulate
using private modules, but that then required making everything else that it
uses public (e.g. error and token objects), which would have been a bizarre
thing to do in e.g. test cases.

DEV-7145

											
										
										
											2022-07-21 14:56:43 -04:00
+								    ) => {
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								        ele_parse!(@!nonterm_def <$objty, $($evty)?> $vis $super $nt $($rest)*);
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								    };
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
+								    (@!nonterm_def <$objty:ty, $($evty:ty)?>
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								        $vis:vis $super:ident $nt:ident $qname:ident $(($($ntp:tt)*))?
 								        { $($matches:tt)* }; $($rest:tt)*
-												tamer: xir::parse::ele: Introduce open/close span bindings

This adds the ability to bind identifiers to represent `OpenSpan` and
`CloseSpan`, available to the `@` and `/` maps.  Since identifiers in TAME
originate from attributes, this may not get a whole lot of use, but it's
important to be available.

There is some awkwardness in that the opening span appears to be scoped to
the entire nonterminal, but it's actually only available in the `@`
mapping.  I'll change this if it's actually needed; this keeps things simple
for now.

DEV-7145

											
										
										
											2022-07-13 23:42:51 -04:00
+								    ) => {
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
+								        ele_parse!(@!ele_expand_body <$objty, $($evty)?>
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								            $vis $super $nt $qname ($($($ntp)*)?) $($matches)*
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
+								        );
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								        ele_parse! {@!next $vis $super
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
+								            $(type AttrValueError = $evty;)?
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								            type Object = $objty;
 								            $($rest)*
 								        }
 								    };
-												tamer: xir::parse::ele: Visibility specifier

We need to be able to export generated identifiers.  Trying to figure out a
syntax for this was a bit tricky considering how much is generated, so I
just settled on something that's reasonably clear and easy to parse with
`macro_rules!`.

I had intended to just make everything public by default and encapsulate
using private modules, but that then required making everything else that it
uses public (e.g. error and token objects), which would have been a bizarre
thing to do in e.g. test cases.

DEV-7145

											
										
										
											2022-07-21 14:56:43 -04:00
+								    (@!nonterm_def <$objty:ty, $($evty:ty)?>
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								        $vis:vis $super:ident $nt:ident
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								        ($ntref_first:ident $(| $ntref:ident)+); $($rest:tt)*
 								    ) => {
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
+								        ele_parse!(@!ele_dfn_sum <$objty>
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								            $vis $super $nt [$ntref_first $($ntref)*]
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
+								        );
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								        ele_parse! {@!next $vis $super
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
+								            $(type AttrValueError = $evty;)?
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								            type Object = $objty;
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								            $($rest)*
 								        }
 								    };
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								    (@!nonterm_decl <$objty:ty, $($evty:ty)?> $vis:vis $super:ident) => {};
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
 								    // Expand the provided data to a more verbose form that provides the
 								    //   context necessary for state transitions.
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
+								    (@!ele_expand_body <$objty:ty, $($evty:ty)?>
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								        $vis:vis $super:ident $nt:ident $qname:ident ($($ntp:tt)*)
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								        @ { $($attrbody:tt)* } => $attrmap:expr,
-												tamer: xir::parse::ele: Introduce open/close span bindings

This adds the ability to bind identifiers to represent `OpenSpan` and
`CloseSpan`, available to the `@` and `/` maps.  Since identifiers in TAME
originate from attributes, this may not get a whole lot of use, but it's
important to be available.

There is some awkwardness in that the opening span appears to be scoped to
the entire nonterminal, but it's actually only available in the `@`
mapping.  I'll change this if it's actually needed; this keeps things simple
for now.

DEV-7145

											
										
										
											2022-07-13 23:42:51 -04:00
+								        $(/$(($close_span:ident))? => $closemap:expr,)?
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
-												tamer: xir::parse::ele: Mixed content parsing

"Mixed content" is the XML term representing element nodes mixed with text
nodes.  For example, `foo <strong>bar</strong> baz` is mixed.

TAME supports text nodes as documentation, intended to be in a literate
style but never fully realized.  In any case, we need to permit them, and I
wanted to do more than just ignore the nodes.

This takes a different approach than typical parser delegation---it has the
parent parser _preempt_ the child by intercepting text before delegation
takes place, rather than having the child reject the token (or possibly
interpret it itself!) and have to handle an error or dead state.

And while this makes it more confusing in terms of state machine stitching,
it does make sense, in the sense that the parent parser is really what
"owns" the text node---the parser is delegating _element_ parsing only, take
asserts authority when necessary to take back control where it shouldn't be
delegated.

DEV-7145

											
										
										
											2022-08-01 13:37:16 -04:00
+								        // Special forms (`[sp](args) => expr`).
 								        $(
 								            [$special:ident]$(($($special_arg:ident),*))?
 								                => $special_map:expr,
 								        )?
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								        // Nonterminal references are provided as a list.
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								        // A configuration specifier can be provided,
 								        //   currently intended to support the Kleene star.
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								        $(
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								            $ntref:ident $([$ntref_cfg:tt])?,
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								        )*
 								    ) => {
 								        ele_parse! {
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								            @!ele_dfn_body <$objty, $($evty)?> $vis $super $nt $qname ($($ntp)*)
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								            @ { $($attrbody)* } => $attrmap,
-												tamer: xir::parse::ele: Introduce open/close span bindings

This adds the ability to bind identifiers to represent `OpenSpan` and
`CloseSpan`, available to the `@` and `/` maps.  Since identifiers in TAME
originate from attributes, this may not get a whole lot of use, but it's
important to be available.

There is some awkwardness in that the opening span appears to be scoped to
the entire nonterminal, but it's actually only available in the `@`
mapping.  I'll change this if it's actually needed; this keeps things simple
for now.

DEV-7145

											
										
										
											2022-07-13 23:42:51 -04:00
+								            /$($($close_span)?)? => ele_parse!(@!ele_close $($closemap)?),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
-												tamer: xir::parse::ele: Mixed content parsing

"Mixed content" is the XML term representing element nodes mixed with text
nodes.  For example, `foo <strong>bar</strong> baz` is mixed.

TAME supports text nodes as documentation, intended to be in a literate
style but never fully realized.  In any case, we need to permit them, and I
wanted to do more than just ignore the nodes.

This takes a different approach than typical parser delegation---it has the
parent parser _preempt_ the child by intercepting text before delegation
takes place, rather than having the child reject the token (or possibly
interpret it itself!) and have to handle an error or dead state.

And while this makes it more confusing in terms of state machine stitching,
it does make sense, in the sense that the parent parser is really what
"owns" the text node---the parser is delegating _element_ parsing only, take
asserts authority when necessary to take back control where it shouldn't be
delegated.

DEV-7145

											
										
										
											2022-08-01 13:37:16 -04:00
+								            $([$special]$(($($special_arg),*))? => $special_map,)?
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								            <> {
 								                $(
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                    $ntref,
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                )*
 								            }
 								            // Generate state transitions of the form `(S) -> (S')`.
 								            -> {
 								                @ ->
 								                $(
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                    ($nt::$ntref, $ntref) [$($ntref_cfg)?],
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                    ($nt::$ntref) ->
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                )* ($nt::ExpectClose_, ()) [],
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								            }
 								        }
 								    };
-												tamer: xir::parse::ele: Initial Close mapping support

Since the parsers produce streaming IRs, we need to be able to emit tokens
representing closing delimiters, where they are important.

This notably doesn't use spans; I'll add those next, since they're also
needed for the previous work.

DEV-7145

											
										
										
											2022-07-13 15:02:46 -04:00
+								    // No explicit Close mapping defaults to doing nothing at all
 								    //   (so yield Incomplete).
 								    (@!ele_close) => {
 								        crate::parse::ParseStatus::Incomplete
 								    };
 								    (@!ele_close $close:expr) => {
 								        crate::parse::ParseStatus::Object($close)
 								    };
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								    // NT[*] modifier.
 								    (@!ntref_cfg *) => {
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								        crate::xir::parse::EleParseCfg {
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								            repeat: true,
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								        }
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								    };
 								    // No bracketed modifier following NT.
 								    (@!ntref_cfg) => {
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								        crate::xir::parse::EleParseCfg {
 								            repeat: false,
 								        }
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								    };
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								    // Delegation when the destination type is `()`,
 								    //   indicating that the next state is not a child NT
 								    //   (it is likely the state expecting a closing tag).
 								    (@!ntref_delegate
 								        $stack:ident, $ret:expr, (), $_target:expr, $done:expr
 								    ) => {
 								        $done
 								    };
 								    // Delegate to a child parser by pushing self onto the stack and
 								    //   yielding to one of the child's states.
 								    // This uses a trampoline,
 								    //   which avoids recursive data structures
 								    //     (due to `ParseState` composition/stitching)
 								    //   and does not grow the call stack.
 								    (@!ntref_delegate
 								        $stack:ident, $ret:expr, $ntnext_st:ty, $target:expr, $_done:expr
 								    ) => {
 								        $stack.transfer_with_ret(
 								            Transition($ret),
 								            $target,
 								        )
 								    };
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
+								    (@!ele_dfn_body <$objty:ty, $($evty:ty)?>
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								        $vis:vis $super:ident $nt:ident $qname:ident
 								        ($($qname_matched:pat, $open_span:pat)?)
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								        // Attribute definition special form.
 								        @ {
 								            // We must lightly parse attributes here so that we can retrieve
 								            //   the field identifiers that may be later used as bindings in
 								            //   `$attrmap`.
 								            $(
 								                $(#[$fattr:meta])*
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								                $field:ident: ($($fmatch:tt)+) => $fty:ty,
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								            )*
 								        } => $attrmap:expr,
-												tamer: xir::parse::ele: Initial Close mapping support

Since the parsers produce streaming IRs, we need to be able to emit tokens
representing closing delimiters, where they are important.

This notably doesn't use spans; I'll add those next, since they're also
needed for the previous work.

DEV-7145

											
										
										
											2022-07-13 15:02:46 -04:00
+								        // Close expression
 								        //   (defaulting to Incomplete via @!ele_expand_body).
-												tamer: xir::parse::ele: Introduce open/close span bindings

This adds the ability to bind identifiers to represent `OpenSpan` and
`CloseSpan`, available to the `@` and `/` maps.  Since identifiers in TAME
originate from attributes, this may not get a whole lot of use, but it's
important to be available.

There is some awkwardness in that the opening span appears to be scoped to
the entire nonterminal, but it's actually only available in the `@`
mapping.  I'll change this if it's actually needed; this keeps things simple
for now.

DEV-7145

											
										
										
											2022-07-13 23:42:51 -04:00
+								        /$($close_span:ident)? => $closemap:expr,
-												tamer: xir::parse::ele: Initial Close mapping support

Since the parsers produce streaming IRs, we need to be able to emit tokens
representing closing delimiters, where they are important.

This notably doesn't use spans; I'll add those next, since they're also
needed for the previous work.

DEV-7145

											
										
										
											2022-07-13 15:02:46 -04:00
-												tamer: xir::parse::ele: Mixed content parsing

"Mixed content" is the XML term representing element nodes mixed with text
nodes.  For example, `foo <strong>bar</strong> baz` is mixed.

TAME supports text nodes as documentation, intended to be in a literate
style but never fully realized.  In any case, we need to permit them, and I
wanted to do more than just ignore the nodes.

This takes a different approach than typical parser delegation---it has the
parent parser _preempt_ the child by intercepting text before delegation
takes place, rather than having the child reject the token (or possibly
interpret it itself!) and have to handle an error or dead state.

And while this makes it more confusing in terms of state machine stitching,
it does make sense, in the sense that the parent parser is really what
"owns" the text node---the parser is delegating _element_ parsing only, take
asserts authority when necessary to take back control where it shouldn't be
delegated.

DEV-7145

											
										
										
											2022-08-01 13:37:16 -04:00
+								        // Non-whitespace text nodes can be mapped into elements with the
 								        //   given QName as a preprocessing step,
 								        //     allowing them to reuse the existing element NT system.
 								        $([text]($text:ident, $text_span:ident) => $text_map:expr,)?
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								        // Nonterminal references.
 								        <> {
 								            $(
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                $ntref:ident,
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								            )*
 								        }
 								        -> {
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								            @ -> ($ntfirst:path, $ntfirst_st:ty) [$($ntfirst_cfg:tt)?],
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								            $(
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                ($ntprev:path) -> ($ntnext:path, $ntnext_st:ty) [$($ntnext_cfg:tt)?],
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								            )*
 								        }
 								    ) => {
 								        paste::paste! {
 								            crate::attr_parse! {
-												tamer: xir::parse::ele: Visibility specifier

We need to be able to export generated identifiers.  Trying to figure out a
syntax for this was a bit tricky considering how much is generated, so I
just settled on something that's reasonably clear and easy to parse with
`macro_rules!`.

I had intended to just make everything public by default and encapsulate
using private modules, but that then required making everything else that it
uses public (e.g. error and token objects), which would have been a bizarre
thing to do in e.g. test cases.

DEV-7145

											
										
										
											2022-07-21 14:56:43 -04:00
+								                vis($vis);
-												tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError

This integrates the previous ValueError for `attr_parse!` into
`ele_parse!`.

DEV-7145

											
										
										
											2022-07-21 09:23:34 -04:00
+								                $(type ValueError = $evty;)?
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                struct [<$nt AttrsState_>] -> [<$nt Attrs_>] {
 								                    $(
 								                        $(#[$fattr])*
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								                        $field: ($($fmatch)+) => $fty,
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                    )*
 								                }
 								            }
 								            #[doc=concat!("Parser for element [`", stringify!($qname), "`].")]
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								            #[derive(Debug, PartialEq, Eq)]
-												tamer: xir::parse::ele: Visibility specifier

We need to be able to export generated identifiers.  Trying to figure out a
syntax for this was a bit tricky considering how much is generated, so I
just settled on something that's reasonably clear and easy to parse with
`macro_rules!`.

I had intended to just make everything public by default and encapsulate
using private modules, but that then required making everything else that it
uses public (e.g. error and token objects), which would have been a bizarre
thing to do in e.g. test cases.

DEV-7145

											
										
										
											2022-07-21 14:56:43 -04:00
+								            $vis enum $nt {
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                #[doc=concat!(
 								                    "Expecting opening tag for element [`",
 								                    stringify!($qname),
 								                    "`]."
 								                )]
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                Expecting_(crate::xir::parse::EleParseCfg),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                /// Recovery state ignoring all remaining tokens for this
 								                ///   element.
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								                RecoverEleIgnore_(
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                    crate::xir::parse::EleParseCfg,
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								                    crate::xir::QName,
 								                    crate::xir::OpenSpan,
 								                    crate::xir::flat::Depth
 								                ),
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                // Recovery completed because end tag corresponding to the
 								                //   invalid element has been found.
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                RecoverEleIgnoreClosed_(
 								                    crate::xir::parse::EleParseCfg,
 								                    crate::xir::QName,
 								                    crate::xir::CloseSpan
 								                ),
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                /// Recovery state ignoring all tokens when a `Close` is
 								                ///   expected.
 								                ///
 								                /// This is token-agnostic---it
 								                ///   may be a child element,
 								                ///     but it may be text,
 								                ///     for example.
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								                CloseRecoverIgnore_(
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                    (
 								                        crate::xir::parse::EleParseCfg,
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                        crate::xir::QName,
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                        crate::span::Span,
 								                        crate::xir::flat::Depth
 								                    ),
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								                    crate::span::Span
 								                ),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                /// Parsing element attributes.
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								                Attrs_(
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                    (
 								                        crate::xir::parse::EleParseCfg,
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                        crate::xir::QName,
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                        crate::span::Span,
 								                        crate::xir::flat::Depth
 								                    ),
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								                    [<$nt AttrsState_>]
 								                ),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                $(
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								                    $ntref(
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                        (
 								                            crate::xir::parse::EleParseCfg,
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                            crate::xir::QName,
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                            crate::span::Span,
 								                            crate::xir::flat::Depth
 								                        ),
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								                    ),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                )*
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                ExpectClose_(
 								                    (
 								                        crate::xir::parse::EleParseCfg,
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                        crate::xir::QName,
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                        crate::span::Span,
 								                        crate::xir::flat::Depth
 								                    ),
 								                ),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                /// Closing tag found and parsing of the element is
 								                ///   complete.
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                Closed_(
 								                    crate::xir::parse::EleParseCfg,
 								                    crate::xir::QName,
 								                    crate::span::Span
 								                ),
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								            }
 								            impl From<crate::xir::parse::EleParseCfg> for $nt {
 								                fn from(repeat: crate::xir::parse::EleParseCfg) -> Self {
 								                    Self::Expecting_(repeat)
 								                }
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								            }
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								            impl crate::xir::parse::EleParseState for $nt {}
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
 								            impl $nt {
-												tamer: xir::parse::ele: Initial namespace prefix matching support

This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`.  This works, but is missing a couple notable things (and
possibly more):

  1. Tracking the QName that is _actually_ matched so that it can be used in
     messages stating what the expected closing tag is; and
  2. Making that QName available via a binding.

This will be used to match on `t:*` in NIR.  If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.

DEV-7145

											
										
										
											2022-08-10 16:50:02 -04:00
+								                /// Matcher describing the node recognized by this parser.
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                #[allow(dead_code)] // used by sum parser
-												tamer: xir::parse::ele: Abstract node matching

This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes.  It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).

DEV-7145

											
										
										
											2022-08-10 13:50:40 -04:00
+								                fn matcher() -> crate::xir::parse::NodeMatcher {
 								                    crate::xir::parse::NodeMatcher::from($qname)
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                }
-												tamer: xir::parse::ele: Mixed content parsing

"Mixed content" is the XML term representing element nodes mixed with text
nodes.  For example, `foo <strong>bar</strong> baz` is mixed.

TAME supports text nodes as documentation, intended to be in a literate
style but never fully realized.  In any case, we need to permit them, and I
wanted to do more than just ignore the nodes.

This takes a different approach than typical parser delegation---it has the
parent parser _preempt_ the child by intercepting text before delegation
takes place, rather than having the child reject the token (or possibly
interpret it itself!) and have to handle an error or dead state.

And while this makes it more confusing in terms of state machine stitching,
it does make sense, in the sense that the parent parser is really what
"owns" the text node---the parser is delegating _element_ parsing only, take
asserts authority when necessary to take back control where it shouldn't be
delegated.

DEV-7145

											
										
										
											2022-08-01 13:37:16 -04:00
 								                /// Yield the expected depth of child elements,
 								                ///   if known.
 								                #[allow(dead_code)] // used by text special form
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                fn child_depth(&self) -> Option<crate::xir::flat::Depth> {
-												tamer: xir::parse::ele: Mixed content parsing

"Mixed content" is the XML term representing element nodes mixed with text
nodes.  For example, `foo <strong>bar</strong> baz` is mixed.

TAME supports text nodes as documentation, intended to be in a literate
style but never fully realized.  In any case, we need to permit them, and I
wanted to do more than just ignore the nodes.

This takes a different approach than typical parser delegation---it has the
parent parser _preempt_ the child by intercepting text before delegation
takes place, rather than having the child reject the token (or possibly
interpret it itself!) and have to handle an error or dead state.

And while this makes it more confusing in terms of state machine stitching,
it does make sense, in the sense that the parent parser is really what
"owns" the text node---the parser is delegating _element_ parsing only, take
asserts authority when necessary to take back control where it shouldn't be
delegated.

DEV-7145

											
										
										
											2022-08-01 13:37:16 -04:00
+								                    match self {
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                        $ntfirst((_, _, _, depth)) => Some(depth.child_depth()),
-												tamer: xir::parse::ele: Mixed content parsing

"Mixed content" is the XML term representing element nodes mixed with text
nodes.  For example, `foo <strong>bar</strong> baz` is mixed.

TAME supports text nodes as documentation, intended to be in a literate
style but never fully realized.  In any case, we need to permit them, and I
wanted to do more than just ignore the nodes.

This takes a different approach than typical parser delegation---it has the
parent parser _preempt_ the child by intercepting text before delegation
takes place, rather than having the child reject the token (or possibly
interpret it itself!) and have to handle an error or dead state.

And while this makes it more confusing in terms of state machine stitching,
it does make sense, in the sense that the parent parser is really what
"owns" the text node---the parser is delegating _element_ parsing only, take
asserts authority when necessary to take back control where it shouldn't be
delegated.

DEV-7145

											
										
										
											2022-08-01 13:37:16 -04:00
+								                        $(
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                            $ntnext((_, _, _, depth)) => Some(depth.child_depth()),
-												tamer: xir::parse::ele: Mixed content parsing

"Mixed content" is the XML term representing element nodes mixed with text
nodes.  For example, `foo <strong>bar</strong> baz` is mixed.

TAME supports text nodes as documentation, intended to be in a literate
style but never fully realized.  In any case, we need to permit them, and I
wanted to do more than just ignore the nodes.

This takes a different approach than typical parser delegation---it has the
parent parser _preempt_ the child by intercepting text before delegation
takes place, rather than having the child reject the token (or possibly
interpret it itself!) and have to handle an error or dead state.

And while this makes it more confusing in terms of state machine stitching,
it does make sense, in the sense that the parent parser is really what
"owns" the text node---the parser is delegating _element_ parsing only, take
asserts authority when necessary to take back control where it shouldn't be
delegated.

DEV-7145

											
										
										
											2022-08-01 13:37:16 -04:00
+								                        )*
 								                        _ => None,
 								                    }
 								                }
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								            }
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								            impl std::fmt::Display for $nt {
 								                fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
 								                    use crate::{
 								                        fmt::{DisplayWrapper, TtQuote},
-												tamer: xir::parse::ele: Remaining Display::fmt for nonterminals

The following commit (test tracing) requires non-panicing `Display` and
`Debug` values.

DEV-7145

											
										
										
											2022-07-18 14:31:42 -04:00
+								                        xir::fmt::{TtOpenXmlEle, TtCloseXmlEle},
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                    };
 								                    match self {
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                        Self::Expecting_(_) => write!(
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                            f,
 								                            "expecting opening tag {}",
-												tamer: xir::parse::ele: Initial namespace prefix matching support

This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`.  This works, but is missing a couple notable things (and
possibly more):

  1. Tracking the QName that is _actually_ matched so that it can be used in
     messages stating what the expected closing tag is; and
  2. Making that QName available via a binding.

This will be used to match on `t:*` in NIR.  If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.

DEV-7145

											
										
										
											2022-08-10 16:50:02 -04:00
+								                            TtOpenXmlEle::wrap(Self::matcher()),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                        ),
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                        Self::RecoverEleIgnore_(_, name, _, _)
 								                        | Self::RecoverEleIgnoreClosed_(_, name, _) => write!(
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                            f,
 								                            "attempting to recover by ignoring element \
 								                               with unexpected name {given} \
 								                               (expected {expected})",
 								                            given = TtQuote::wrap(name),
-												tamer: xir::parse::ele: Initial namespace prefix matching support

This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`.  This works, but is missing a couple notable things (and
possibly more):

  1. Tracking the QName that is _actually_ matched so that it can be used in
     messages stating what the expected closing tag is; and
  2. Making that QName available via a binding.

This will be used to match on `t:*` in NIR.  If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.

DEV-7145

											
										
										
											2022-08-10 16:50:02 -04:00
+								                            expected = TtQuote::wrap(Self::matcher()),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                        ),
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                        Self::CloseRecoverIgnore_((_, qname, _, depth), _) => write!(
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                            f,
 								                            "attempting to recover by ignoring input \
 								                               until the expected end tag {expected} \
 								                               at depth {depth}",
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                            expected = TtCloseXmlEle::wrap(qname),
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        ),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        Self::Attrs_(_, sa) => std::fmt::Display::fmt(sa, f),
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                        Self::ExpectClose_((_, qname, _, depth)) => write!(
-												tamer: xir::parse::ele: Remaining Display::fmt for nonterminals

The following commit (test tracing) requires non-panicing `Display` and
`Debug` values.

DEV-7145

											
										
										
											2022-07-18 14:31:42 -04:00
+								                            f,
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                            "expecting closing element {} at depth {depth}",
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                            TtCloseXmlEle::wrap(qname)
-												tamer: xir::parse::ele: Remaining Display::fmt for nonterminals

The following commit (test tracing) requires non-panicing `Display` and
`Debug` values.

DEV-7145

											
										
										
											2022-07-18 14:31:42 -04:00
+								                        ),
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                        Self::Closed_(_, qname, _) => write!(
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                            f,
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                            "done parsing element {}",
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                            TtQuote::wrap(qname),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                        ),
 								                        $(
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                            // TODO: A better description.
 								                            Self::$ntref(_) => {
 								                                write!(
 								                                    f,
 								                                    "preparing to transition to \
 								                                       parser for next child element(s)"
 								                                )
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                            },
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                        )*
 								                    }
 								                }
 								            }
 								            #[derive(Debug, PartialEq)]
-												tamer: xir::parse::ele: Visibility specifier

We need to be able to export generated identifiers.  Trying to figure out a
syntax for this was a bit tricky considering how much is generated, so I
just settled on something that's reasonably clear and easy to parse with
`macro_rules!`.

I had intended to just make everything public by default and encapsulate
using private modules, but that then required making everything else that it
uses public (e.g. error and token objects), which would have been a bizarre
thing to do in e.g. test cases.

DEV-7145

											
										
										
											2022-07-21 14:56:43 -04:00
+								            $vis enum [<$nt Error_>] {
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                /// An element was expected,
 								                ///   but the name of the element was unexpected.
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                UnexpectedEle_(crate::xir::QName, crate::span::Span),
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                /// Unexpected input while expecting an end tag for this
 								                ///   element.
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                ///
 								                /// The span corresponds to the opening tag.
-												tamer: Xirf::Text refinement

This teaches XIRF to optionally refine Text into RefinedText, which
determines whether the given SymbolId represents entirely whitespace.

This is something I've been putting off for some time, but now that I'm
parsing source language for NIR, it is necessary, in that we can only permit
whitespace Text nodes in certain contexts.

The idea is to capture the most common whitespace as preinterned
symbols.  Note that this heuristic ought to be determined from scanning a
codebase, which I haven't done yet; this is just an initial list.

The fallback is to look up the string associated with the SymbolId and
perform a linear scan, aborting on the first non-whitespace character.  This
combination of checks should be sufficiently performant for now considering
that this is only being run on source files, which really are not all that
large.  (They become large when template-expanded.)  I'll optimize further
if I notice it show up during profiling.

This also frees XIR itself from being concerned by Whitespace.  Initially I
had used quick-xml's whitespace trimming, but it messed up my span
calculations, and those were a pain in the ass to implement to begin with,
since I had to resort to pointer arithmetic.  I'd rather avoid tweaking it.

tameld will not check for whitespace, since it's not important---xmlo files,
if malformed, are the fault of the compiler; we can ignore text nodes except
in the context of code fragments, where they are never whitespace (unless
that's also a compiler bug).

Onward and yonward.

DEV-7145

											
										
										
											2022-07-27 15:49:38 -04:00
+								                CloseExpected_(
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                    crate::xir::QName,
-												tamer: Xirf::Text refinement

This teaches XIRF to optionally refine Text into RefinedText, which
determines whether the given SymbolId represents entirely whitespace.

This is something I've been putting off for some time, but now that I'm
parsing source language for NIR, it is necessary, in that we can only permit
whitespace Text nodes in certain contexts.

The idea is to capture the most common whitespace as preinterned
symbols.  Note that this heuristic ought to be determined from scanning a
codebase, which I haven't done yet; this is just an initial list.

The fallback is to look up the string associated with the SymbolId and
perform a linear scan, aborting on the first non-whitespace character.  This
combination of checks should be sufficiently performant for now considering
that this is only being run on source files, which really are not all that
large.  (They become large when template-expanded.)  I'll optimize further
if I notice it show up during profiling.

This also frees XIR itself from being concerned by Whitespace.  Initially I
had used quick-xml's whitespace trimming, but it messed up my span
calculations, and those were a pain in the ass to implement to begin with,
since I had to resort to pointer arithmetic.  I'd rather avoid tweaking it.

tameld will not check for whitespace, since it's not important---xmlo files,
if malformed, are the fault of the compiler; we can ignore text nodes except
in the context of code fragments, where they are never whitespace (unless
that's also a compiler bug).

Onward and yonward.

DEV-7145

											
										
										
											2022-07-27 15:49:38 -04:00
+								                    crate::span::Span,
 								                    crate::xir::flat::XirfToken<crate::xir::flat::RefinedText>,
 								                ),
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                Attrs_(crate::xir::parse::AttrParseError<[<$nt AttrsState_>]>),
 								            }
 								            impl From<crate::xir::parse::AttrParseError<[<$nt AttrsState_>]>>
 								                for [<$nt Error_>]
 								            {
 								                fn from(
 								                    e: crate::xir::parse::AttrParseError<[<$nt AttrsState_>]>
 								                ) -> Self {
 								                    [<$nt Error_>]::Attrs_(e)
 								                }
 								            }
 								            impl std::error::Error for [<$nt Error_>] {
 								                fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
 								                    // TODO
 								                    None
 								                }
 								            }
 								            impl std::fmt::Display for [<$nt Error_>] {
 								                fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
 								                    use crate::{
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        fmt::{DisplayWrapper, TtQuote},
 								                        xir::fmt::{TtOpenXmlEle, TtCloseXmlEle},
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                    };
 								                    match self {
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                        Self::UnexpectedEle_(name, _) => write!(
 								                            f,
 								                            "unexpected {unexpected} (expecting {expected})",
 								                            unexpected = TtOpenXmlEle::wrap(name),
-												tamer: xir::parse::ele: Initial namespace prefix matching support

This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`.  This works, but is missing a couple notable things (and
possibly more):

  1. Tracking the QName that is _actually_ matched so that it can be used in
     messages stating what the expected closing tag is; and
  2. Making that QName available via a binding.

This will be used to match on `t:*` in NIR.  If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.

DEV-7145

											
										
										
											2022-08-10 16:50:02 -04:00
+								                            expected = TtOpenXmlEle::wrap($nt::matcher()),
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                        ),
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                        Self::CloseExpected_(qname, _, tok) => write!(
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                            f,
 								                            "expected {}, but found {}",
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                            TtCloseXmlEle::wrap(qname),
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                            TtQuote::wrap(tok)
 								                        ),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                        Self::Attrs_(e) => std::fmt::Display::fmt(e, f),
 								                    }
 								                }
 								            }
 								            impl crate::diagnose::Diagnostic for [<$nt Error_>] {
 								                fn describe(&self) -> Vec<crate::diagnose::AnnotatedSpan> {
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                    use crate::{
 								                        diagnose::Annotate,
 								                        fmt::{DisplayWrapper, TtQuote},
 								                        parse::Token,
 								                        xir::fmt::{TtCloseXmlEle},
 								                    };
 								                    match self {
 								                        Self::UnexpectedEle_(_, ospan) => ospan.error(
 								                            format!(
 								                                "expected {ele_name} here",
-												tamer: xir::parse::ele: Initial namespace prefix matching support

This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`.  This works, but is missing a couple notable things (and
possibly more):

  1. Tracking the QName that is _actually_ matched so that it can be used in
     messages stating what the expected closing tag is; and
  2. Making that QName available via a binding.

This will be used to match on `t:*` in NIR.  If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.

DEV-7145

											
										
										
											2022-08-10 16:50:02 -04:00
+								                                ele_name = TtQuote::wrap($nt::matcher())
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                            )
 								                        ).into(),
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                        Self::CloseExpected_(qname, span, tok) => vec![
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                            span.note("element starts here"),
 								                            tok.span().error(format!(
 								                                "expected {}",
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                                TtCloseXmlEle::wrap(qname),
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                            )),
 								                        ],
 								                        Self::Attrs_(e) => e.describe(),
 								                    }
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                }
 								            }
 								            impl crate::parse::ParseState for $nt {
-												tamer: Xirf::Text refinement

This teaches XIRF to optionally refine Text into RefinedText, which
determines whether the given SymbolId represents entirely whitespace.

This is something I've been putting off for some time, but now that I'm
parsing source language for NIR, it is necessary, in that we can only permit
whitespace Text nodes in certain contexts.

The idea is to capture the most common whitespace as preinterned
symbols.  Note that this heuristic ought to be determined from scanning a
codebase, which I haven't done yet; this is just an initial list.

The fallback is to look up the string associated with the SymbolId and
perform a linear scan, aborting on the first non-whitespace character.  This
combination of checks should be sufficiently performant for now considering
that this is only being run on source files, which really are not all that
large.  (They become large when template-expanded.)  I'll optimize further
if I notice it show up during profiling.

This also frees XIR itself from being concerned by Whitespace.  Initially I
had used quick-xml's whitespace trimming, but it messed up my span
calculations, and those were a pain in the ass to implement to begin with,
since I had to resort to pointer arithmetic.  I'd rather avoid tweaking it.

tameld will not check for whitespace, since it's not important---xmlo files,
if malformed, are the fault of the compiler; we can ignore text nodes except
in the context of code fragments, where they are never whitespace (unless
that's also a compiler bug).

Onward and yonward.

DEV-7145

											
										
										
											2022-07-27 15:49:38 -04:00
+								                type Token = crate::xir::flat::XirfToken<
 								                    crate::xir::flat::RefinedText
 								                >;
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                type Object = $objty;
 								                type Error = [<$nt Error_>];
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                type Context = crate::xir::parse::StateStackContext<Self::Super>;
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								                type Super = $super;
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
 								                fn parse_token(
 								                    self,
 								                    tok: Self::Token,
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                    #[allow(unused_variables)] // used only if child NTs
 								                    stack: &mut Self::Context,
 								                ) -> crate::parse::TransitionResult<Self::Super> {
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                    use crate::{
-												tamer: xir::parse::ele: Initial Close mapping support

Since the parsers produce streaming IRs, we need to be able to emit tokens
representing closing delimiters, where they are important.

This notably doesn't use spans; I'll add those next, since they're also
needed for the previous work.

DEV-7145

											
										
										
											2022-07-13 15:02:46 -04:00
+								                        parse::{EmptyContext, Transition, Transitionable},
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                        xir::{
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								                            EleSpan,
-												tamer: xir::parse::ele: Hoist whitespace/comment handling to superstate

All child parsers do the same thing, so this simplifies things.

DEV-7145

											
										
										
											2022-08-11 13:00:07 -04:00
+								                            flat::XirfToken,
-												tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module

The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145

											
										
										
											2022-07-20 15:36:28 -04:00
+								                            parse::parse_attrs,
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                        },
 								                    };
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                    // Used only by _some_ expansions.
 								                    #[allow(unused_imports)]
 								                    use crate::xir::flat::Text;
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                    use $nt::{
 								                        Attrs_, Expecting_, RecoverEleIgnore_,
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        CloseRecoverIgnore_, RecoverEleIgnoreClosed_,
 								                        ExpectClose_, Closed_
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                    };
 								                    match (self, tok) {
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        (
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                            Expecting_(cfg),
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                            XirfToken::Open(qname, span, depth)
-												tamer: xir::parse::ele: Abstract node matching

This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes.  It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).

DEV-7145

											
										
										
											2022-08-10 13:50:40 -04:00
+								                        ) if $nt::matcher().matches(qname) => {
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                            Transition(Attrs_(
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                                (cfg, qname, span.tag_span(), depth),
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                                parse_attrs(qname, span)
 								                            )).incomplete()
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        },
 								                        (
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                            Closed_(cfg, ..),
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                            XirfToken::Open(qname, span, depth)
-												tamer: xir::parse::ele: Initial namespace prefix matching support

This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`.  This works, but is missing a couple notable things (and
possibly more):

  1. Tracking the QName that is _actually_ matched so that it can be used in
     messages stating what the expected closing tag is; and
  2. Making that QName available via a binding.

This will be used to match on `t:*` in NIR.  If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.

DEV-7145

											
										
										
											2022-08-10 16:50:02 -04:00
+								                        ) if cfg.repeat && Self::matcher().matches(qname) => {
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                            Transition(Attrs_(
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                                (cfg, qname, span.tag_span(), depth),
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                                parse_attrs(qname, span)
 								                            )).incomplete()
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                        },
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                        (
 								                            Expecting_(cfg),
 								                            XirfToken::Open(qname, span, depth)
 								                        ) => {
 								                            Transition(RecoverEleIgnore_(cfg, qname, span, depth)).err(
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                                [<$nt Error_>]::UnexpectedEle_(qname, span.name_span())
 								                            )
 								                        },
 								                        (
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                            RecoverEleIgnore_(cfg, qname, _, depth_open),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                            XirfToken::Close(_, span, depth_close)
 								                        ) if depth_open == depth_close => {
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                            Transition(
 								                                RecoverEleIgnoreClosed_(cfg, qname, span)
 								                            ).incomplete()
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                        },
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                        (Attrs_(meta @ (_, qname, _, _), sa), tok) => {
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                            sa.delegate_until_obj::<Self, _>(
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                                tok,
 								                                EmptyContext,
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                                |sa| Transition(Attrs_(meta, sa)),
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                                || unreachable!("see ParseState::delegate_until_obj dead"),
-												tamer: xir::parse::ele: Introduce open/close span bindings

This adds the ability to bind identifiers to represent `OpenSpan` and
`CloseSpan`, available to the `@` and `/` maps.  Since identifiers in TAME
originate from attributes, this may not get a whole lot of use, but it's
important to be available.

There is some awkwardness in that the opening span appears to be scoped to
the entire nonterminal, but it's actually only available in the `@`
mapping.  I'll change this if it's actually needed; this keeps things simple
for now.

DEV-7145

											
										
										
											2022-07-13 23:42:51 -04:00
+								                                |#[allow(unused_variables)] sa, attrs| {
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                                    let obj = match attrs {
-												tamer: xir::parse::ele: Introduce open/close span bindings

This adds the ability to bind identifiers to represent `OpenSpan` and
`CloseSpan`, available to the `@` and `/` maps.  Since identifiers in TAME
originate from attributes, this may not get a whole lot of use, but it's
important to be available.

There is some awkwardness in that the opening span appears to be scoped to
the entire nonterminal, but it's actually only available in the `@`
mapping.  I'll change this if it's actually needed; this keeps things simple
for now.

DEV-7145

											
										
										
											2022-07-13 23:42:51 -04:00
+								                                        // Attribute field bindings for `$attrmap`
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                                        [<$nt Attrs_>] {
 								                                            $(
 								                                                $field,
 								                                            )*
-												tamer: xir::parse::ele: Introduce open/close span bindings

This adds the ability to bind identifiers to represent `OpenSpan` and
`CloseSpan`, available to the `@` and `/` maps.  Since identifiers in TAME
originate from attributes, this may not get a whole lot of use, but it's
important to be available.

There is some awkwardness in that the opening span appears to be scoped to
the entire nonterminal, but it's actually only available in the `@`
mapping.  I'll change this if it's actually needed; this keeps things simple
for now.

DEV-7145

											
										
										
											2022-07-13 23:42:51 -04:00
+								                                        } => {
 								                                            // Optional `OpenSpan` binding
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                                            let _ = qname; // avoid unused warning
-												tamer: xir::parse::ele: Introduce open/close span bindings

This adds the ability to bind identifiers to represent `OpenSpan` and
`CloseSpan`, available to the `@` and `/` maps.  Since identifiers in TAME
originate from attributes, this may not get a whole lot of use, but it's
important to be available.

There is some awkwardness in that the opening span appears to be scoped to
the entire nonterminal, but it's actually only available in the `@`
mapping.  I'll change this if it's actually needed; this keeps things simple
for now.

DEV-7145

											
										
										
											2022-07-13 23:42:51 -04:00
+								                                            $(
 								                                                use crate::xir::parse::attr::AttrParseState;
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                                                let $qname_matched = qname;
-												tamer: xir::parse::ele: Introduce open/close span bindings

This adds the ability to bind identifiers to represent `OpenSpan` and
`CloseSpan`, available to the `@` and `/` maps.  Since identifiers in TAME
originate from attributes, this may not get a whole lot of use, but it's
important to be available.

There is some awkwardness in that the opening span appears to be scoped to
the entire nonterminal, but it's actually only available in the `@`
mapping.  I'll change this if it's actually needed; this keeps things simple
for now.

DEV-7145

											
										
										
											2022-07-13 23:42:51 -04:00
+								                                                let $open_span = sa.element_span();
 								                                            )?
 								                                            $attrmap
 								                                        },
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                                    };
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                                    // Lookahead is added by `delegate_until_obj`.
 								                                    ele_parse!(@!ntref_delegate
 								                                        stack,
 								                                        $ntfirst(meta),
 								                                        $ntfirst_st,
 								                                        Transition(
 								                                            Into::<$ntfirst_st>::into(
 								                                                ele_parse!(@!ntref_cfg $($ntfirst_cfg)?)
 								                                            )
 								                                        ).ok(obj),
 								                                        Transition($ntfirst(meta)).ok(obj)
 								                                    )
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                                }
 								                            )
 								                        },
 								                        $(
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                            ($ntprev(meta), tok) => {
 								                                ele_parse!(@!ntref_delegate
 								                                    stack,
 								                                    $ntnext(meta),
 								                                    $ntnext_st,
 								                                    Transition(
 								                                        Into::<$ntnext_st>::into(
 								                                            ele_parse!(@!ntref_cfg $($ntnext_cfg)?)
 								                                        )
 								                                    ).incomplete().with_lookahead(tok),
 								                                    Transition($ntnext(meta)).incomplete().with_lookahead(tok)
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                                )
 								                            },
 								                        )*
 								                        // XIRF ensures proper nesting,
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        //   so we do not need to check the element name.
 								                        (
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                            ExpectClose_((cfg, qname, _, depth))
 								                            | CloseRecoverIgnore_((cfg, qname, _, depth), _),
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                            XirfToken::Close(_, span, tok_depth)
 								                        ) if tok_depth == depth => {
-												tamer: xir::parse::ele: Introduce open/close span bindings

This adds the ability to bind identifiers to represent `OpenSpan` and
`CloseSpan`, available to the `@` and `/` maps.  Since identifiers in TAME
originate from attributes, this may not get a whole lot of use, but it's
important to be available.

There is some awkwardness in that the opening span appears to be scoped to
the entire nonterminal, but it's actually only available in the `@`
mapping.  I'll change this if it's actually needed; this keeps things simple
for now.

DEV-7145

											
										
										
											2022-07-13 23:42:51 -04:00
+								                            $(
 								                                let $close_span = span;
 								                            )?
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                            $closemap.transition(Closed_(cfg, qname, span.tag_span()))
-												tamer: xir::parse::ele: Introduce open/close span bindings

This adds the ability to bind identifiers to represent `OpenSpan` and
`CloseSpan`, available to the `@` and `/` maps.  Since identifiers in TAME
originate from attributes, this may not get a whole lot of use, but it's
important to be available.

There is some awkwardness in that the opening span appears to be scoped to
the entire nonterminal, but it's actually only available in the `@`
mapping.  I'll change this if it's actually needed; this keeps things simple
for now.

DEV-7145

											
										
										
											2022-07-13 23:42:51 -04:00
+								                        },
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                        (ExpectClose_(meta @ (_, qname, otspan, _)), unexpected_tok) => {
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                            use crate::parse::Token;
 								                            Transition(
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                                CloseRecoverIgnore_(meta, unexpected_tok.span())
-												tamer: xir::parse::ele: Store matching QName on NS match

When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding.  The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.

DEV-7145

											
										
										
											2022-08-11 01:15:44 -04:00
+								                            ).err([<$nt Error_>]::CloseExpected_(qname, otspan, unexpected_tok))
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        }
 								                        // We're still in recovery,
 								                        //   so this token gets thrown out.
 								                        (st @ (RecoverEleIgnore_(..) | CloseRecoverIgnore_(..)), _) => {
 								                            Transition(st).incomplete()
 								                        },
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                        // TODO: Use `is_accepting` guard if we do not utilize
 								                        //   exhaustiveness check.
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        (st @ (Closed_(..) | RecoverEleIgnoreClosed_(..)), tok) => {
 								                            Transition(st).dead(tok)
 								                        }
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
 								                        todo => todo!("{todo:?}"),
 								                    }
 								                }
-												tamer: xir::parse::ele: Superstate not to accept early EOF

This was accepting an early EOF when the active child `ParseState` was in an
accepting state, because it was not ensuring that anything on the stack was
also accepting.

Ideally, there should be nothing on the stack, and hopefully in the future
that's what happens.  But with how things are today, it's important that, if
anything is on the stack, it is accepting.

Since `is_accepting` on the superstate is only called during finalization,
and because the check terminates early, and because the stack practically
speaking will only have a couple things on it max (unless we're in tail
position in a deeply nested tree, without TCO [yet]), this shouldn't be an
expensive check.

Implementing this did require that we expose `Context` to `is_accepting`,
which I had hoped to avoid having to do, but here we are.

DEV-7145

											
										
										
											2022-08-11 13:49:11 -04:00
+								                fn is_accepting(&self, _: &Self::Context) -> bool {
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								                    matches!(*self, Self::Closed_(..) | Self::RecoverEleIgnoreClosed_(..))
 								                }
 								            }
 								        }
 								    };
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								    (@!ele_dfn_sum <$objty:ty> $vis:vis $super:ident $nt:ident [$($ntref:ident)*]) => {
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								        $(
 								            // Provide a (hopefully) helpful error that can be corrected
 								            //   rather than any obscure errors that may follow from trying
 								            //   to compose parsers that were not generated with this macro.
-												tamer: xir::parse::ele: Expand types for external expansion for sum NT

Like a previous commit, this corrects the types for sum NTs so that they
properly resolve in contexts external to xir::parse.

DEV-7145

											
										
										
											2022-07-21 13:44:30 -04:00
+								            assert_impl_all!($ntref: crate::xir::parse::EleParseState);
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								        )*
 								        paste::paste! {
 								            #[doc=concat!(
 								                "Parser expecting one of ",
 								                $("[`", stringify!($ntref), "`], ",)*
 								                "."
 								            )]
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								            #[derive(Debug, PartialEq, Eq)]
-												tamer: xir::parse::ele: Visibility specifier

We need to be able to export generated identifiers.  Trying to figure out a
syntax for this was a bit tricky considering how much is generated, so I
just settled on something that's reasonably clear and easy to parse with
`macro_rules!`.

I had intended to just make everything public by default and encapsulate
using private modules, but that then required making everything else that it
uses public (e.g. error and token objects), which would have been a bizarre
thing to do in e.g. test cases.

DEV-7145

											
										
										
											2022-07-21 14:56:43 -04:00
+								            $vis enum $nt {
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                Expecting_(crate::xir::parse::EleParseCfg),
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                /// Recovery state ignoring all remaining tokens for this
 								                ///   element.
-												tamer: xir::parse::ele: Expand types for external expansion for sum NT

Like a previous commit, this corrects the types for sum NTs so that they
properly resolve in contexts external to xir::parse.

DEV-7145

											
										
										
											2022-07-21 13:44:30 -04:00
+								                RecoverEleIgnore_(
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                    crate::xir::parse::EleParseCfg,
-												tamer: xir::parse::ele: Expand types for external expansion for sum NT

Like a previous commit, this corrects the types for sum NTs so that they
properly resolve in contexts external to xir::parse.

DEV-7145

											
										
										
											2022-07-21 13:44:30 -04:00
+								                    crate::xir::QName,
 								                    crate::xir::OpenSpan,
 								                    crate::xir::flat::Depth,
 								                ),
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                RecoverEleIgnoreClosed_(
 								                    crate::xir::parse::EleParseCfg,
 								                    crate::xir::QName,
 								                    crate::xir::CloseSpan
 								                ),
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                /// Inner element has been parsed and is dead;
 								                ///   this indicates that this parser is also dead.
 								                Done_,
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								            }
 								            impl std::fmt::Display for $nt {
 								                fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
 								                    use crate::{
 								                        fmt::{DisplayWrapper, ListDisplayWrapper, TtQuote},
-												tamer: xir::parse::ele: Adjust diagnostic display of expected element list

This does two things:

  1. Places the expected list on a separate help line as a footnote where
     it'll be a bit more tolerable when it inevitably overflows the terminal
     width in certain contexts (we may wrap in the future); and
  2. Removes angled brackets from the element names so that they (a) better
     correspond with the span which highlights only the element name and (b)
     do not imply that the elements take no attributes.

DEV-7145

											
										
										
											2022-08-11 01:33:24 -04:00
+								                        xir::fmt::EleSumList,
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                    };
 								                    let ntrefs = [
 								                        $(
-												tamer: xir::parse::ele: Abstract node matching

This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes.  It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).

DEV-7145

											
										
										
											2022-08-10 13:50:40 -04:00
+								                            $ntref::matcher(),
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                        )*
 								                    ];
-												tamer: xir::parse::ele: Adjust diagnostic display of expected element list

This does two things:

  1. Places the expected list on a separate help line as a footnote where
     it'll be a bit more tolerable when it inevitably overflows the terminal
     width in certain contexts (we may wrap in the future); and
  2. Removes angled brackets from the element names so that they (a) better
     correspond with the span which highlights only the element name and (b)
     do not imply that the elements take no attributes.

DEV-7145

											
										
										
											2022-08-11 01:33:24 -04:00
+								                    let expected = EleSumList::wrap(&ntrefs);
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
 								                    match self {
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                        Self::Expecting_(_) => {
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                            write!(f, "expecting {expected}")
 								                        },
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                        Self::RecoverEleIgnore_(_, name, _, _)
 								                        | Self::RecoverEleIgnoreClosed_(_, name, _) => write!(
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                            f,
 								                            "attempting to recover by ignoring element \
 								                               with unexpected name {given} \
 								                               (expected {expected})",
 								                            given = TtQuote::wrap(name),
 								                        ),
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        Self::Done_ => write!(f, "done parsing {expected}"),
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                    }
 								                }
 								            }
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								            impl From<crate::xir::parse::EleParseCfg> for $nt {
 								                fn from(repeat: crate::xir::parse::EleParseCfg) -> Self {
 								                    Self::Expecting_(repeat)
 								                }
 								            }
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								            #[derive(Debug, PartialEq)]
-												tamer: xir::parse::ele: Visibility specifier

We need to be able to export generated identifiers.  Trying to figure out a
syntax for this was a bit tricky considering how much is generated, so I
just settled on something that's reasonably clear and easy to parse with
`macro_rules!`.

I had intended to just make everything public by default and encapsulate
using private modules, but that then required making everything else that it
uses public (e.g. error and token objects), which would have been a bizarre
thing to do in e.g. test cases.

DEV-7145

											
										
										
											2022-07-21 14:56:43 -04:00
+								            $vis enum [<$nt Error_>] {
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                UnexpectedEle_(crate::xir::QName, crate::span::Span),
 								            }
 								            impl std::error::Error for [<$nt Error_>] {
 								                fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
 								                    // TODO
 								                    None
 								                }
 								            }
 								            impl std::fmt::Display for [<$nt Error_>] {
 								                fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
 								                    use crate::{
 								                        fmt::DisplayWrapper,
 								                        xir::fmt::TtOpenXmlEle,
 								                    };
 								                    match self {
 								                        Self::UnexpectedEle_(qname, _) => {
 								                            write!(f, "unexpected {}", TtOpenXmlEle::wrap(qname))
 								                        },
 								                    }
 								                }
 								            }
 								            impl crate::diagnose::Diagnostic for [<$nt Error_>] {
 								                fn describe(&self) -> Vec<crate::diagnose::AnnotatedSpan> {
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                    use crate::{
 								                        diagnose::Annotate,
 								                        fmt::{DisplayWrapper, ListDisplayWrapper, TtQuote},
-												tamer: xir::parse::ele: Adjust diagnostic display of expected element list

This does two things:

  1. Places the expected list on a separate help line as a footnote where
     it'll be a bit more tolerable when it inevitably overflows the terminal
     width in certain contexts (we may wrap in the future); and
  2. Removes angled brackets from the element names so that they (a) better
     correspond with the span which highlights only the element name and (b)
     do not imply that the elements take no attributes.

DEV-7145

											
										
										
											2022-08-11 01:33:24 -04:00
+								                        xir::fmt::EleSumList,
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                    };
 								                    let ntrefs = [
 								                        $(
-												tamer: xir::parse::ele: Abstract node matching

This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes.  It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).

DEV-7145

											
										
										
											2022-08-10 13:50:40 -04:00
+								                            $ntref::matcher(),
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                        )*
 								                    ];
-												tamer: xir::parse::ele: Adjust diagnostic display of expected element list

This does two things:

  1. Places the expected list on a separate help line as a footnote where
     it'll be a bit more tolerable when it inevitably overflows the terminal
     width in certain contexts (we may wrap in the future); and
  2. Removes angled brackets from the element names so that they (a) better
     correspond with the span which highlights only the element name and (b)
     do not imply that the elements take no attributes.

DEV-7145

											
										
										
											2022-08-11 01:33:24 -04:00
+								                    let expected = EleSumList::wrap(&ntrefs);
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
 								                    match self {
 								                        Self::UnexpectedEle_(qname, span) => {
-												tamer: xir::parse::ele: Adjust diagnostic display of expected element list

This does two things:

  1. Places the expected list on a separate help line as a footnote where
     it'll be a bit more tolerable when it inevitably overflows the terminal
     width in certain contexts (we may wrap in the future); and
  2. Removes angled brackets from the element names so that they (a) better
     correspond with the span which highlights only the element name and (b)
     do not imply that the elements take no attributes.

DEV-7145

											
										
										
											2022-08-11 01:33:24 -04:00
+								                            span
 								                                .error(format!(
 								                                    "element {name} cannot appear here",
 								                                    name = TtQuote::wrap(qname),
 								                                ))
 								                                .with_help(format!("expecting {expected}"))
 								                                .into()
-												tamer: xir::parse::ele: Diagnostic output

The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.

This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.

DEV-7145

											
										
										
											2022-07-20 12:17:15 -04:00
+								                        },
 								                    }
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                }
 								            }
 								            impl crate::parse::ParseState for $nt {
-												tamer: Xirf::Text refinement

This teaches XIRF to optionally refine Text into RefinedText, which
determines whether the given SymbolId represents entirely whitespace.

This is something I've been putting off for some time, but now that I'm
parsing source language for NIR, it is necessary, in that we can only permit
whitespace Text nodes in certain contexts.

The idea is to capture the most common whitespace as preinterned
symbols.  Note that this heuristic ought to be determined from scanning a
codebase, which I haven't done yet; this is just an initial list.

The fallback is to look up the string associated with the SymbolId and
perform a linear scan, aborting on the first non-whitespace character.  This
combination of checks should be sufficiently performant for now considering
that this is only being run on source files, which really are not all that
large.  (They become large when template-expanded.)  I'll optimize further
if I notice it show up during profiling.

This also frees XIR itself from being concerned by Whitespace.  Initially I
had used quick-xml's whitespace trimming, but it messed up my span
calculations, and those were a pain in the ass to implement to begin with,
since I had to resort to pointer arithmetic.  I'd rather avoid tweaking it.

tameld will not check for whitespace, since it's not important---xmlo files,
if malformed, are the fault of the compiler; we can ignore text nodes except
in the context of code fragments, where they are never whitespace (unless
that's also a compiler bug).

Onward and yonward.

DEV-7145

											
										
										
											2022-07-27 15:49:38 -04:00
+								                type Token = crate::xir::flat::XirfToken<
 								                    crate::xir::flat::RefinedText
 								                >;
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                type Object = $objty;
 								                type Error = [<$nt Error_>];
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                type Context = crate::xir::parse::StateStackContext<Self::Super>;
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								                type Super = $super;
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
 								                fn parse_token(
 								                    self,
 								                    tok: Self::Token,
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                    stack: &mut Self::Context,
 								                ) -> crate::parse::TransitionResult<Self::Super> {
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                    use crate::{
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        parse::Transition,
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                        xir::{
-												tamer: xir::parse::ele: Hoist whitespace/comment handling to superstate

All child parsers do the same thing, so this simplifies things.

DEV-7145

											
										
										
											2022-08-11 13:00:07 -04:00
+								                            flat::XirfToken,
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                            parse::EleParseCfg,
 								                        },
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                    };
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                    use $nt::{
 								                        Expecting_, RecoverEleIgnore_,
 								                        RecoverEleIgnoreClosed_, Done_
 								                    };
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
 								                    match (self, tok) {
 								                        $(
 								                            (
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                                Expecting_(cfg),
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                                XirfToken::Open(qname, span, depth)
-												tamer: xir::parse::ele: Abstract node matching

This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes.  It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).

DEV-7145

											
										
										
											2022-08-10 13:50:40 -04:00
+								                            ) if $ntref::matcher().matches(qname) => {
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                                ele_parse!(@!ntref_delegate
 								                                    stack,
 								                                    match cfg.repeat {
 								                                        true => Expecting_(cfg),
 								                                        false => Done_,
 								                                    },
 								                                    $ntref,
 								                                    Transition(
 								                                        $ntref::from(
 								                                            EleParseCfg::default()
 								                                        )
 								                                    ).incomplete().with_lookahead(
 								                                        XirfToken::Open(qname, span, depth)
 								                                    ),
 								                                    unreachable!("TODO: remove me (ntref_delegate done)")
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                                )
 								                            },
 								                        )*
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        // An unexpected token when repeating ends
 								                        //   repetition and should not result in an error.
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                        (Expecting_(cfg), tok) if cfg.repeat => {
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                            Transition(Done_).dead(tok)
 								                        }
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                        (Expecting_(cfg), XirfToken::Open(qname, span, depth)) => {
-												tamer: xir::parse::ele: Expand types for external expansion for sum NT

Like a previous commit, this corrects the types for sum NTs so that they
properly resolve in contexts external to xir::parse.

DEV-7145

											
										
										
											2022-07-21 13:44:30 -04:00
+								                            use crate::xir::EleSpan;
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                            Transition(RecoverEleIgnore_(cfg, qname, span, depth)).err(
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                                // Use name span rather than full `OpenSpan`
 								                                //   since it's specifically the name that
 								                                //   was unexpected,
 								                                //     not the fact that it's an element.
 								                                [<$nt Error_>]::UnexpectedEle_(qname, span.name_span())
 								                            )
 								                        },
 								                        // XIRF ensures that the closing tag matches the opening,
 								                        //   so we need only check depth.
 								                        (
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                            RecoverEleIgnore_(cfg, qname, _, depth_open),
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                            XirfToken::Close(_, span, depth_close)
 								                        ) if depth_open == depth_close => {
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                            Transition(RecoverEleIgnoreClosed_(cfg, qname, span)).incomplete()
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                        },
 								                        (st @ RecoverEleIgnore_(..), _) => {
 								                            Transition(st).incomplete()
 								                        },
-												tamer: xir::parse::ele: Correct handling of sum dead state post-recovery

Along with this change we also had to change how we handle dead states in
the superstate.  So there were two problems here:

  1. Sum states were not yielding a dead state after recovery, which meant
     that parsing was unable to continue (we still have a `todo!`); and
  2. The superstate considered it an error when there was nothing left on
     the stack, because I assumed that ought not happen.

Regarding #2---it _shouldn't_ happen, _unless_ we have extra input after we
have completed parsing.  Which happens to be the case for this test case,
but more importantly, we shouldn't be panicing with errors about TAMER bugs
if somebody puts extra input after a closing root tag in a source file.

DEV-7145

											
										
										
											2022-08-11 10:39:00 -04:00
+								                        (
 								                            st @ (Done_ | RecoverEleIgnoreClosed_(..)),
 								                            tok
 								                        ) => Transition(st).dead(tok),
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                        todo => todo!("sum {todo:?}"),
 								                    }
 								                }
-												tamer: xir::parse::ele: Superstate not to accept early EOF

This was accepting an early EOF when the active child `ParseState` was in an
accepting state, because it was not ensuring that anything on the stack was
also accepting.

Ideally, there should be nothing on the stack, and hopefully in the future
that's what happens.  But with how things are today, it's important that, if
anything is on the stack, it is accepting.

Since `is_accepting` on the superstate is only called during finalization,
and because the check terminates early, and because the stack practically
speaking will only have a couple things on it max (unless we're in tail
position in a deeply nested tree, without TCO [yet]), this shouldn't be an
expensive check.

Implementing this did require that we expose `Context` to `is_accepting`,
which I had hoped to avoid having to do, but here we are.

DEV-7145

											
										
										
											2022-08-11 13:49:11 -04:00
+								                fn is_accepting(&self, _: &Self::Context) -> bool {
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                    match self {
-												tamer: xir::parse::ele: Nonterminal repetition (Kleene star)

This allows an element to be repeated by the parent NT.  The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.

This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.

DEV-7145

											
										
										
											2022-07-19 12:15:59 -04:00
+								                        Self::RecoverEleIgnoreClosed_(..) | Self::Done_ => true,
-												tamer: xir::parse::ele: Introduce sum nonterminals

This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.

This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser.  This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.

This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that.  However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins.  Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.

This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.

The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element.  But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one).  But more to come
on that later; it's not critical at this point.  I need to get parsing
completed for TAME's input language.

DEV-7145

											
										
										
											2022-07-14 15:12:57 -04:00
+								                        _ => false,
 								                    }
 								                }
 								            }
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								        }
 								    };
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
 								    // Generate superstate sum type.
 								    //
 								    // This is really annoying because we cannot read the output of another
 								    //   macro,
 								    //     and so we have to do our best to re-parse the body of the
 								    //     original `ele_parse!` invocation without duplicating too much
 								    //     logic,
 								    //       and we have to do so in a way that we can aggregate all of
 								    //       those data.
 								    (@!super_sum <$objty:ty> $vis:vis $super:ident
 								        $(
 								            // NT definition is always followed by `:=`.
 								            $nt:ident :=
 								                // Identifier if an element NT.
 								                $($_i:ident)?
 								                // Parenthesis for a sum NT,
 								                //   or possibly the span match for an element NT.
 								                // So: `:= QN_IDENT(span)` or `:= (A | B | C)`.
 								                $( ($($_p:tt)*) )?
 								                // Braces for an element NT body.
 								                $( {$($_b:tt)*} )?
 								            // Element and sum NT both conclude with a semicolon,
 								            //   which we need to disambiguate the next `$nt`.
 								            ;
 								        )*
 								    ) => {
 								        paste::paste! {
 								            /// Superstate representing the union of all related parsers.
 								            ///
 								            /// This [`ParseState`] allows sub-parsers to independently
 								            ///   the states associated with their own subgraph,
 								            ///     and then yield a state transition directly to a state of
 								            ///     another parser.
 								            /// This is conceptually like CPS (continuation passing style),
 								            ///   where this [`ParseState`] acts as a trampoline.
 								            ///
 								            /// This [`ParseState`] is required for use with [`Parser`];
 								            ///   see [`ClosedParseState`] for more information.
 								            #[derive(Debug, PartialEq, Eq)]
 								            $vis enum $super {
 								                $(
 								                    $nt($nt),
 								                )*
 								            }
 								            // Default parser is the first NT.
 								            impl Default for $super {
 								                fn default() -> Self {
 								                    use $super::*;
-												tamer: xir::parse::ele: Move repeat configuration out of Context

I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option.  But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.

This adds to a mess that needs cleaning up, but I'll do that after
everything is working.

DEV-7145

											
										
										
											2022-08-05 16:25:34 -04:00
+								                    ele_parse!(@!ntfirst $($nt)*)(
 								                        crate::xir::parse::EleParseCfg::default().into()
 								                    )
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								                }
 								            }
 								            $(
 								                impl From<$nt> for $super {
 								                    fn from(st: $nt) -> Self {
 								                        $super::$nt(st)
 								                    }
 								                }
 								            )*
 								            impl std::fmt::Display for $super {
 								                fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
 								                    match self {
 								                        $(
 								                            Self::$nt(e) => std::fmt::Display::fmt(e, f),
 								                        )*
 								                    }
 								                }
 								            }
-												tamer: xir::parse::ele: Un-nest child parser errors

This will utilize the superstate's error object in place of nested errors,
which was the result of the previous composition-based delegation.

As you can see, all we had to do was remove the special handling of these
errors; the existing delegation setup continues to handle the types properly
with no change.  The composition continues to work for `*Attr_`.

The alternative was to box inner errors, since they're far from the hot code
path, but that's clearly unnecessary.

To be clear: this is necessary to allow for recursive grammars in
`ele_parse` without creating recursive data structures in Rust.

DEV-7145

											
										
										
											2022-08-10 10:23:20 -04:00
+								            /// Superstate error object representing the union of all
 								            ///   related parsers' errors.
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								            #[derive(Debug, PartialEq)]
 								            $vis enum [<$super Error_>] {
 								                $(
 								                    $nt([<$nt Error_>]),
 								                )*
 								            }
 								            $(
 								                impl From<[<$nt Error_>]> for [<$super Error_>] {
 								                    fn from(e: [<$nt Error_>]) -> Self {
 								                        [<$super Error_>]::$nt(e)
 								                    }
 								                }
 								            )*
 								            impl std::error::Error for [<$super Error_>] {
 								                fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
 								                    // TODO
 								                    None
 								                }
 								            }
 								            impl std::fmt::Display for [<$super Error_>] {
 								                fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
 								                    match self {
 								                        $(
 								                            Self::$nt(e) => std::fmt::Display::fmt(e, f),
 								                        )*
 								                    }
 								                }
 								            }
 								            impl crate::diagnose::Diagnostic for [<$super Error_>] {
 								                fn describe(&self) -> Vec<crate::diagnose::AnnotatedSpan> {
 								                    match self {
 								                        $(
 								                            Self::$nt(e) => e.describe(),
 								                        )*
 								                    }
 								                }
 								            }
 								            impl crate::parse::ParseState for $super {
 								                type Token = crate::xir::flat::XirfToken<
 								                    crate::xir::flat::RefinedText
 								                >;
 								                type Object = $objty;
 								                type Error = [<$super Error_>];
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                type Context = crate::xir::parse::StateStackContext<Self>;
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
 								                fn parse_token(
 								                    self,
 								                    tok: Self::Token,
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                    stack: &mut Self::Context,
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								                ) -> crate::parse::TransitionResult<Self> {
-												tamer: xir::parse::ele: Superstate not to accept early EOF

This was accepting an early EOF when the active child `ParseState` was in an
accepting state, because it was not ensuring that anything on the stack was
also accepting.

Ideally, there should be nothing on the stack, and hopefully in the future
that's what happens.  But with how things are today, it's important that, if
anything is on the stack, it is accepting.

Since `is_accepting` on the superstate is only called during finalization,
and because the check terminates early, and because the stack practically
speaking will only have a couple things on it max (unless we're in tail
position in a deeply nested tree, without TCO [yet]), this shouldn't be an
expensive check.

Implementing this did require that we expose `Context` to `is_accepting`,
which I had hoped to avoid having to do, but here we are.

DEV-7145

											
										
										
											2022-08-11 13:49:11 -04:00
+								                    use crate::{
 								                        parse::Transition,
 								                        xir::flat::{XirfToken, RefinedText},
 								                    };
-												tamer: xir::parse::ele: Hoist whitespace/comment handling to superstate

All child parsers do the same thing, so this simplifies things.

DEV-7145

											
										
										
											2022-08-11 13:00:07 -04:00
 								                    match (self, tok) {
 								                        // Depth check is unnecessary since _all_ xir::parse
 								                        //   parsers
 								                        //     (at least at the time of writing)
 								                        //     ignore whitespace and comments,
 								                        //       so may as well return early.
 								                        // TODO: I'm ignoring _all_ text for now to
 								                        //   proceed with development; fix.
 								                        (
 								                            st,
 								                            XirfToken::Text(RefinedText::Whitespace(..), _)
 								                            | XirfToken::Text(RefinedText::Unrefined(..), _) // XXX
 								                            | XirfToken::Comment(..)
 								                        ) => {
 								                            Transition(st).incomplete()
 								                        }
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								                        $(
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                            // Pass token directly to child until it reports
 								                            //   a dead state,
 								                            //     after which we return to the `ParseState`
 								                            //     atop of the stack.
-												tamer: xir::parse::ele: Hoist whitespace/comment handling to superstate

All child parsers do the same thing, so this simplifies things.

DEV-7145

											
										
										
											2022-08-11 13:00:07 -04:00
+								                            (Self::$nt(st), tok) => st.delegate_child(
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								                                tok,
-												tamer: xir::parse::ele: Transition trampoline

This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145

											
										
										
											2022-08-10 00:21:45 -04:00
+								                                stack,
-												tamer: xir::parse::ele: Correct handling of sum dead state post-recovery

Along with this change we also had to change how we handle dead states in
the superstate.  So there were two problems here:

  1. Sum states were not yielding a dead state after recovery, which meant
     that parsing was unable to continue (we still have a `todo!`); and
  2. The superstate considered it an error when there was nothing left on
     the stack, because I assumed that ought not happen.

Regarding #2---it _shouldn't_ happen, _unless_ we have extra input after we
have completed parsing.  Which happens to be the case for this test case,
but more importantly, we shouldn't be panicing with errors about TAMER bugs
if somebody puts extra input after a closing root tag in a source file.

DEV-7145

											
										
										
											2022-08-11 10:39:00 -04:00
+								                                |deadst, tok, stack| {
 								                                    stack.ret_or_dead(tok, deadst)
 								                                },
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								                            ),
 								                        )*
 								                    }
 								                }
-												tamer: xir::parse::ele: Superstate not to accept early EOF

This was accepting an early EOF when the active child `ParseState` was in an
accepting state, because it was not ensuring that anything on the stack was
also accepting.

Ideally, there should be nothing on the stack, and hopefully in the future
that's what happens.  But with how things are today, it's important that, if
anything is on the stack, it is accepting.

Since `is_accepting` on the superstate is only called during finalization,
and because the check terminates early, and because the stack practically
speaking will only have a couple things on it max (unless we're in tail
position in a deeply nested tree, without TCO [yet]), this shouldn't be an
expensive check.

Implementing this did require that we expose `Context` to `is_accepting`,
which I had hoped to avoid having to do, but here we are.

DEV-7145

											
										
										
											2022-08-11 13:49:11 -04:00
+								                fn is_accepting(&self, stack: &Self::Context) -> bool {
 								                    // This is short-circuiting,
 								                    //   starting at the _bottom_ of the stack and
 								                    //   moving upward.
 								                    // The idea is that,
 								                    //   is we're still in the middle of parsing,
 								                    //   then it's almost certain that the [`ParseState`] on
 								                    //     the bottom of the stack will not be in an
 								                    //     accepting state,
 								                    //       and so we can stop checking early.
 								                    // In most cases,
 								                    //   if we haven't hit EOF early,
 								                    //   the stack should be either empty or consist of only
 								                    //     the root state.
 								                    //
 								                    // After having considered the stack,
 								                    //   we can then consider the active `ParseState`.
 								                    stack.all(|st| st.is_inner_accepting(stack))
 								                        && self.is_inner_accepting(stack)
 								                }
 								            }
 								            impl $super {
 								                /// Whether the inner (active child) [`ParseState`] is in an
 								                ///   accepting state.
 								                fn is_inner_accepting(
 								                    &self,
 								                    ctx: &<Self as crate::parse::ParseState>::Context
 								                ) -> bool {
 								                    use crate::parse::ParseState;
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								                    match self {
 								                        $(
-												tamer: xir::parse::ele: Superstate not to accept early EOF

This was accepting an early EOF when the active child `ParseState` was in an
accepting state, because it was not ensuring that anything on the stack was
also accepting.

Ideally, there should be nothing on the stack, and hopefully in the future
that's what happens.  But with how things are today, it's important that, if
anything is on the stack, it is accepting.

Since `is_accepting` on the superstate is only called during finalization,
and because the check terminates early, and because the stack practically
speaking will only have a couple things on it max (unless we're in tail
position in a deeply nested tree, without TCO [yet]), this shouldn't be an
expensive check.

Implementing this did require that we expose `Context` to `is_accepting`,
which I had hoped to avoid having to do, but here we are.

DEV-7145

											
										
										
											2022-08-11 13:49:11 -04:00
+								                            Self::$nt(st) => st.is_accepting(ctx),
-												tamer: xir::parse::ele: Generate superstate

And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved.  But, it's not too terrible.

This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.

_This does not yet use trampolining in place of the call stack._  That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing.  This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation.  That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline.  This will require an explicit stack
via `Context`, like XIRF.  And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.

The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants).  Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.

Once this is sorted out, this should be the last big thing for the
parser.  This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.

DEV-7145

											
										
										
											2022-08-04 10:03:07 -04:00
+								                        )*
 								                    }
 								                }
 								            }
 								        }
 								    };
 								    (@!ntfirst $ntfirst:ident $($nt:ident)*) => {
 								        $ntfirst
 								    }
-												tamer: xir::parse::ele: Initial element parser generator concept

This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145

											
										
										
											2022-07-13 13:55:32 -04:00
+								}
 								#[cfg(test)]
 								mod test;