tamer: xir::parse::ele: Properly handle previous state transitions

This includes when on the last state / expecting a close. Previously, there were a couple major issues: 1. After parsing an NT, we can't allow preemption because we must emit a dead state so that we can remove the NT from the stack, otherwise they'll never close (until the parent does) and that results in unbounded stack growth for a lot of siblings. Therefore, we cannot preempt on `Text`, which causes the NT to receive it, emit a dead state, transition away from the NT, and not accept another NT of the same type after `Text`. 2. When encountering an unknown element, the error message stated that a closing tag was expected rather than one of the elements accepted by the final NT. For #1, this was solved by allowing the parent to transition back to the NT if it would have been matched by the previous NT. A future change may therefore allow us to remove repetition handling entirely and allow the parent to deal with it (maybe). For #2, the trouble is with the parser generator macro---we don't have a good way of knowing the last NT, and the last NT may not even exist if none was provided. This solution is a compromise, after having tried and failed at many others; I desperately need to move on, and this results in the correct behavior and doesn't sacrifice performance. But it can be done better in the future. It's also worth noting for #2 that the behavior isn't _entirely_ desirable, but in practice it is mostly correct. Specifically, if we encounter an unknown token, we're going to blow through all NTs until the last one, which will be forced to handle it. After that, we cannot return to a previous NT, and so we've forefitted the ability to parse anything that came before it. NIR's grammar is such that sequences are rare and, if present, there's really only ever two NTs, and so this awkward behavior will rarely cause practical issues. With that said, it ought to be improved in the future, but let's wait to see if other parts of the lowering pipeline provide more appropriate places to handle some of these things (even though it really ought to be handled at the grammar level). But I'm well out of time to spend on this. I have to move on. DEV-7145
2022-08-24 13:55:18 -04:00 · 2022-08-24 13:55:18 -04:00 · 51728545f7
parent 2fcd0b35ae
commit 51728545f7
3 changed files with 223 additions and 30 deletions
--- a/tamer/src/xir/fmt.rs
+++ b/tamer/src/xir/fmt.rs
@ -26,6 +26,10 @@ use crate::fmt::{
 /// Denote an XML attribute by prefixing the value with `@`.
 pub type XmlAttr = Prefix<"@", Raw>;

+/// [`XmlAttr`] formatted as teletypewriter
+///   (for use in sentences).
+pub type TtXmlAttr = Tt<XmlAttr>;
+
 /// A list of XML attributes [`Tt`]-quoted.
 pub type XmlAttrList = AndQualConjList<"attribute", "attributes", Tt<XmlAttr>>;

--- a/tamer/src/xir/parse/ele.rs
+++ b/tamer/src/xir/parse/ele.rs
@ -335,7 +335,7 @@ macro_rules! ele_parse {
                @ ->
                $(
                    ($nt::$ntref, $ntref),
-                    ($nt::$ntref) ->
+                    ($nt::$ntref, $ntref) ->
                )* ($nt::ExpectClose_, ()),
            }
        }
@ -407,7 +407,7 @@ macro_rules! ele_parse {
        -> {
            @ -> ($ntfirst:path, $ntfirst_st:ty),
            $(
-                ($ntprev:path) -> ($ntnext:path, $ntnext_st:ty),
+                ($ntprev:path, $ntprev_st:ty) -> ($ntnext:path, $ntnext_st:ty),
            )*
        }
    ) => {
@ -609,6 +609,26 @@ macro_rules! ele_parse {
                        | Closed_(..) => false,
                    }
                }
+
+                #[allow(dead_code)] // used only when there are child NTs
+                /// Whether the current state represents the last child NT.
+                fn is_last_nt(&self) -> bool {
+                    // This results in `Self::$ntref(..) => true,` for the
+                    //   _last_ NT,
+                    //     and `=> false` for all others.
+                    // If there are no NTs,
+                    //   it results in `Self::Attrs(..) => true,`,
+                    //     which is technically true but will never be
+                    //     called in that context.
+                    match self {
+                        Self::Attrs_(..) => $(
+                            false,
+                            Self::$ntref(..) =>
+                        )* true,
+
+                        _ => false,
+                    }
+                }
            }

            impl std::fmt::Display for $nt {
@ -836,8 +856,11 @@ macro_rules! ele_parse {
                            )).incomplete()
                        },

+                        // We only attempt recovery when encountering an
+                        //   unknown token if we're forced to accept that
+                        //   token.
                        (
-                            Expecting_ | NonPreemptableExpecting_,
+                            NonPreemptableExpecting_,
                            XirfToken::Open(qname, span, depth)
                        ) => {
                            Transition(RecoverEleIgnore_(qname, span, depth)).err(
@ -939,20 +962,93 @@ macro_rules! ele_parse {
                        },

                        $(
+                            // We're transitioning from `(ntprev) -> (ntnext)`.
+                            // If we have a token that matches `ntprev`,
+                            //   we can transition _back_ to that state
+                            //   rather than transitioning forward.
+                            // We can _only_ do this when we know we are
+                            //   transitioning away from this state,
+                            //     otherwise we could return to a previous state,
+                            //     which violates the semantics of the
+                            //     implied DFA.
+                            (
+                                $ntprev(meta),
+                                XirfToken::Open(qname, span, depth)
+                            ) if $ntprev_st::matches(qname) => {
+                                let tok = XirfToken::Open(qname, span, depth);
+
+                                ele_parse!(@!ntref_delegate
+                                    stack,
+                                    $ntprev(meta),
+                                    $ntprev_st,
+                                    // This NT said it could process this token,
+                                    //   so force it to either do so or error,
+                                    //   to ensure that bugs don't cause
+                                    //   infinite processing of lookahead.
+                                    Transition(<$ntprev_st>::NonPreemptableExpecting_)
+                                        .incomplete()
+                                        .with_lookahead(tok),
+                                    Transition($ntprev(meta)).incomplete().with_lookahead(tok)
+                                )
+                            },
+
                            ($ntprev(meta), tok) => {
                                ele_parse!(@!ntref_delegate
                                    stack,
                                    $ntnext(meta),
                                    $ntnext_st,
-                                    // Since we're just transitioning,
-                                    //   this _must_ accept the token of input,
-                                    //   otherwise error.
                                    Transition(<$ntnext_st>::default())
                                        .incomplete()
                                        .with_lookahead(tok),
                                    Transition($ntnext(meta)).incomplete().with_lookahead(tok)
                                )
                            },
+
+                            // Since `ExpectClose_` does not have an
+                            //   `$ntprev` match,
+                            //     we have to handle transitioning back to
+                            //     the previous state as a special case.
+                            // Further,
+                            //   we choose to transition back to this state
+                            //   _no matter what the element_,
+                            //     to force error recovery and diagnostics
+                            //     in that context,
+                            //       which will tell the user what elements
+                            //       were expected in the last NT rather
+                            //       than just telling them a closing tag
+                            //       was expected.
+                            //
+                            // To avoid a bunch of rework of this macro
+                            //   (which can hopefully be done in the future),
+                            //   this match is output for _every_ NT,
+                            //     but takes effect only for the final NT
+                            //     because of the `is_last_nt` predicate.
+                            // _It is important that it only affect the
+                            //   final NT_,
+                            //     otherwise we'll transition back to _any_
+                            //     previous state at the close,
+                            //       which completely defeats the purpose of
+                            //       having ordered states.
+                            (
+                                ExpectClose_(meta),
+                                XirfToken::Open(qname, span, depth)
+                            ) if $ntprev(meta).is_last_nt() => {
+                                let tok = XirfToken::Open(qname, span, depth);
+                                ele_parse!(@!ntref_delegate
+                                    stack,
+                                    $ntprev(meta),
+                                    $ntprev_st,
+                                    // If this NT cannot handle this element,
+                                    //   it should error and enter recovery
+                                    //   to ignore it.
+                                    Transition(<$ntprev_st>::NonPreemptableExpecting_)
+                                        .incomplete()
+                                        .with_lookahead(tok),
+                                    Transition(ExpectClose_(meta))
+                                        .incomplete()
+                                        .with_lookahead(tok)
+                                )
+                            },
                        )*

                        // XIRF ensures proper nesting,
--- a/tamer/src/xir/parse/ele/test.rs
+++ b/tamer/src/xir/parse/ele/test.rs
@ -37,7 +37,7 @@ use std::{assert_matches::assert_matches, error::Error, fmt::Display};
 use crate::{
    convert::ExpectInto,
    diagnose::Diagnostic,
-    parse::{Object, ParseError, ParseState, Parsed},
+    parse::{Object, ParseError, ParseState, Parsed, ParsedResult},
    span::{dummy::*, Span},
    sym::SymbolId,
    xir::{
@ -898,11 +898,15 @@ fn child_error_and_recovery() {

    impl crate::parse::Object for Foo {}

+    const QN_ROOT: QName = QN_PACKAGE;
+    const QN_A: QName = QN_CLASSIFY;
+    const QN_B: QName = QN_EXPORT;
+
    ele_parse! {
        enum Sut;
        type Object = Foo;

-        Root := QN_PACKAGE {
+        Root := QN_ROOT {
            @ {} => Foo::RootOpen,

            // Must be emitted if `RootOpen` is to maintain balance.
@ -917,11 +921,11 @@ fn child_error_and_recovery() {
            ChildB,
        };

-        ChildA := QN_CLASSIFY {
+        ChildA := QN_A {
            @ {} => Foo::ChildABad,
        };

-        ChildB := QN_EXPORT {
+        ChildB := QN_B {
            @ {} => Foo::ChildB,
        };
    }
@ -931,7 +935,7 @@ fn child_error_and_recovery() {

    let toks = vec![
        // The first token is the expected root.
-        XirfToken::Open(QN_PACKAGE, OpenSpan(S1, N), Depth(0)),
+        XirfToken::Open(QN_ROOT, OpenSpan(S1, N), Depth(0)),
        // --> But this one is unexpected (name).
        XirfToken::Open(unexpected, span, Depth(1)),
        // And so we should ignore it up to this point.
@ -943,9 +947,9 @@ fn child_error_and_recovery() {
        //     for `ChildA`,
        //       which means that we expect `ChildB`.
        // Parsing continues normally.
-        XirfToken::Open(QN_EXPORT, OpenSpan(S4, N), Depth(1)),
+        XirfToken::Open(QN_B, OpenSpan(S4, N), Depth(1)),
        XirfToken::Close(None, CloseSpan::empty(S5), Depth(1)),
-        XirfToken::Close(Some(QN_PACKAGE), CloseSpan(S4, N), Depth(0)),
+        XirfToken::Close(Some(QN_ROOT), CloseSpan(S4, N), Depth(0)),
    ];

    let mut sut = Sut::parse(toks.into_iter());
@ -967,20 +971,25 @@ fn child_error_and_recovery() {
    );

    // The token of lookahead (`Open`) is unexpected for `ChildA`,
-    //   which must throw an error and enter a recovery state.
+    //   when then skips to `ChildB`,
+    //   which is _also_ not expecting it and must throw an error and enter
+    //   a recovery state.
    // The token should be consumed and returned in the error,
    //   _not_ produced as a token of lookahead,
    //   since we do not want to reprocess bad input.
    let err = sut.next().unwrap().unwrap_err();
    assert_eq!(
-        // TODO: This references generated identifiers.
-        ParseError::StateError(SutError_::ChildA(ChildAError_::UnexpectedEle(
-            unexpected,
-            span.name_span()
-        ))),
        err,
+        // TODO: This references generated identifiers.
+        ParseError::StateError(SutError_::ChildB(ChildBError_::UnexpectedEle(
+            unexpected,
+            span.name_span(),
+        ))),
    );

+    // TODO: Can't deal with this until we know exactly what error we'll
+    //   have above;
+    //     see above TODO.
    // Diagnostic message should be delegated to the child.
    assert_eq!(err.describe()[0].span(), span.name_span());

@ -1574,6 +1583,93 @@ fn child_repetition() {
    );
 }

+// Once we transition `(S) -> (S')`,
+//   we should not be able to transition back under any circumstance.
+#[test]
+fn child_nt_sequence_no_prev_after_next() {
+    #[derive(Debug, PartialEq, Eq)]
+    enum Foo {
+        Open(QName),
+        Close(QName),
+    }
+
+    impl crate::parse::Object for Foo {}
+
+    const QN_ROOT: QName = QN_PACKAGE;
+    const QN_A: QName = QN_DIM;
+    const QN_B: QName = QN_CLASSIFY;
+
+    ele_parse! {
+        enum Sut;
+        type Object = Foo;
+
+        Root := QN_ROOT {
+            @ {} => Foo::Open(QN_ROOT),
+            / => Foo::Close(QN_ROOT),
+
+            A,
+            B,
+        };
+
+        A := QN_A {
+            @ {} => Foo::Open(QN_A),
+            / => Foo::Close(QN_A),
+        };
+
+        B := QN_B {
+            @ {} => Foo::Open(QN_B),
+            / => Foo::Close(QN_B),
+        };
+    }
+
+    let toks = vec![
+        XirfToken::Open(QN_ROOT, OpenSpan(S1, N), Depth(0)),
+        // A
+        XirfToken::Open(QN_A, OpenSpan(S2, N), Depth(1)),
+        XirfToken::Close(None, CloseSpan::empty(S2), Depth(1)),
+        // A -> A OK
+        XirfToken::Open(QN_A, OpenSpan(S3, N), Depth(1)),
+        XirfToken::Close(None, CloseSpan::empty(S3), Depth(1)),
+        // A -> B
+        XirfToken::Open(QN_B, OpenSpan(S4, N), Depth(1)),
+        XirfToken::Close(None, CloseSpan::empty(S4), Depth(1)),
+        // B -> B OK
+        XirfToken::Open(QN_B, OpenSpan(S4, N), Depth(1)),
+        XirfToken::Close(None, CloseSpan::empty(S4), Depth(1)),
+        // B -> A _not_ OK.
+        XirfToken::Open(QN_A, OpenSpan(S6, N), Depth(1)),
+        XirfToken::Close(None, CloseSpan::empty(S6), Depth(1)),
+        XirfToken::Close(Some(QN_ROOT), CloseSpan(S8, N), Depth(0)),
+    ];
+
+    use Parsed::*;
+
+    assert_eq!(
+        vec![
+            Ok(Incomplete),                 // [Root]  Root Open
+            Ok(Object(Foo::Open(QN_ROOT))), // [Root@] A Open (>LA)
+            Ok(Incomplete),                 // [A]     A Open (<LA)
+            Ok(Object(Foo::Open(QN_A))),    // [A@]    A Close (>LA)
+            Ok(Object(Foo::Close(QN_A))),   // [A]     A Close (<LA)
+            Ok(Incomplete),                 // [A]     A Open (<LA)
+            Ok(Object(Foo::Open(QN_A))),    // [A@]    A Close (>LA)
+            Ok(Object(Foo::Close(QN_A))),   // [A]     A Close (<LA)
+            Ok(Incomplete),                 // [B]     B Open (<LA)
+            Ok(Object(Foo::Open(QN_B))),    // [B@]    B Close (>LA)
+            Ok(Object(Foo::Close(QN_B))),   // [B]     B Close (<LA)
+            Ok(Incomplete),                 // [B]     B Open (<LA)
+            Ok(Object(Foo::Open(QN_B))),    // [B@]    B Close (>LA)
+            Ok(Object(Foo::Close(QN_B))),   // [B]     B Close (<LA)
+            Err(ParseError::StateError(SutError_::B(
+                BError_::UnexpectedEle(QN_A, OpenSpan(S6, N).name_span())
+            ))), // [B!] A Open
+            Ok(Incomplete),                 // [B!] A Close
+            Ok(Object(Foo::Close(QN_ROOT))), // [Root] Root Close
+        ],
+        Sut::parse(toks.into_iter()).collect::<Vec<ParsedResult<Sut>>>(),
+    );
+}
+
 #[test]
 fn child_repetition_invalid_tok_dead() {
    #[derive(Debug, PartialEq, Eq)]
@ -1613,7 +1709,7 @@ fn child_repetition_invalid_tok_dead() {
        // Child (success)
        XirfToken::Open(QN_CHILD, OpenSpan(S2, N), Depth(1)),
        XirfToken::Close(None, CloseSpan::empty(S3), Depth(1)),
-        // Repeat (unexpected)
+        // unexpected
        XirfToken::Open(unexpected, OpenSpan(S2, N), Depth(1)),
        XirfToken::Close(None, CloseSpan::empty(S3), Depth(1)),
        XirfToken::Close(Some(QN_ROOT), CloseSpan(S8, N), Depth(0)),
@ -1640,18 +1736,15 @@ fn child_repetition_invalid_tok_dead() {
    // Given that dead state and token of lookahead,
    //   `Parser` will immediately recurse to re-process the erroneous
    //   `Open`.
-    // Since the next token expected after the `Child` NT is `Close`,
-    //   this will result in an error and trigger recovery _on `Root`_,
-    //     which will ignore the erroneous `Open`.
+    // The next state after the `Child` NT is expecting a `Close`,
+    //   but upon encountering a `Open` it forces the last NT to perform the
+    //   processing,
+    //     and so the error will occur on `Child`.
    assert_eq!(
        next(),
-        // TODO: This references generated identifiers.
-        Some(Err(ParseError::StateError(SutError_::Root(
-            RootError_::CloseExpected(
-                QN_ROOT,
-                OpenSpan(S1, N),
-                XirfToken::Open(unexpected, OpenSpan(S2, N), Depth(1)),
-            )
+        Some(Err(ParseError::StateError(SutError_::Child(
+            // TODO: This references generated identifiers.
+            ChildError_::UnexpectedEle(unexpected, OpenSpan(S2, N).name_span())
        )))),
    );