tamer: xir::parse::ele: Properly handle previous state transitions

This includes when on the last state / expecting a close.

Previously, there were a couple major issues:

  1. After parsing an NT, we can't allow preemption because we must emit a
     dead state so that we can remove the NT from the stack, otherwise
     they'll never close (until the parent does) and that results in
     unbounded stack growth for a lot of siblings.  Therefore, we cannot
     preempt on `Text`, which causes the NT to receive it, emit a dead
     state, transition away from the NT, and not accept another NT of the
     same type after `Text`.

  2. When encountering an unknown element, the error message stated that a
     closing tag was expected rather than one of the elements accepted by the
     final NT.

For #1, this was solved by allowing the parent to transition back to the NT
if it would have been matched by the previous NT.  A future change may
therefore allow us to remove repetition handling entirely and allow the
parent to deal with it (maybe).

For #2, the trouble is with the parser generator macro---we don't have a
good way of knowing the last NT, and the last NT may not even exist if none
was provided.  This solution is a compromise, after having tried and failed
at many others; I desperately need to move on, and this results in the
correct behavior and doesn't sacrifice performance.  But it can be done
better in the future.

It's also worth noting for #2 that the behavior isn't _entirely_ desirable,
but in practice it is mostly correct.  Specifically, if we encounter an
unknown token, we're going to blow through all NTs until the last one, which
will be forced to handle it.  After that, we cannot return to a previous NT,
and so we've forefitted the ability to parse anything that came before it.

NIR's grammar is such that sequences are rare and, if present, there's
really only ever two NTs, and so this awkward behavior will rarely cause
practical issues.  With that said, it ought to be improved in the future,
but let's wait to see if other parts of the lowering pipeline provide more
appropriate places to handle some of these things (even though it really
ought to be handled at the grammar level).

But I'm well out of time to spend on this.  I have to move on.

DEV-7145
main
Mike Gerwitz 2022-08-24 13:55:18 -04:00
parent 2fcd0b35ae
commit 51728545f7
3 changed files with 223 additions and 30 deletions

View File

@ -26,6 +26,10 @@ use crate::fmt::{
/// Denote an XML attribute by prefixing the value with `@`.
pub type XmlAttr = Prefix<"@", Raw>;
/// [`XmlAttr`] formatted as teletypewriter
/// (for use in sentences).
pub type TtXmlAttr = Tt<XmlAttr>;
/// A list of XML attributes [`Tt`]-quoted.
pub type XmlAttrList = AndQualConjList<"attribute", "attributes", Tt<XmlAttr>>;

View File

@ -335,7 +335,7 @@ macro_rules! ele_parse {
@ ->
$(
($nt::$ntref, $ntref),
($nt::$ntref) ->
($nt::$ntref, $ntref) ->
)* ($nt::ExpectClose_, ()),
}
}
@ -407,7 +407,7 @@ macro_rules! ele_parse {
-> {
@ -> ($ntfirst:path, $ntfirst_st:ty),
$(
($ntprev:path) -> ($ntnext:path, $ntnext_st:ty),
($ntprev:path, $ntprev_st:ty) -> ($ntnext:path, $ntnext_st:ty),
)*
}
) => {
@ -609,6 +609,26 @@ macro_rules! ele_parse {
| Closed_(..) => false,
}
}
#[allow(dead_code)] // used only when there are child NTs
/// Whether the current state represents the last child NT.
fn is_last_nt(&self) -> bool {
// This results in `Self::$ntref(..) => true,` for the
// _last_ NT,
// and `=> false` for all others.
// If there are no NTs,
// it results in `Self::Attrs(..) => true,`,
// which is technically true but will never be
// called in that context.
match self {
Self::Attrs_(..) => $(
false,
Self::$ntref(..) =>
)* true,
_ => false,
}
}
}
impl std::fmt::Display for $nt {
@ -836,8 +856,11 @@ macro_rules! ele_parse {
)).incomplete()
},
// We only attempt recovery when encountering an
// unknown token if we're forced to accept that
// token.
(
Expecting_ | NonPreemptableExpecting_,
NonPreemptableExpecting_,
XirfToken::Open(qname, span, depth)
) => {
Transition(RecoverEleIgnore_(qname, span, depth)).err(
@ -939,20 +962,93 @@ macro_rules! ele_parse {
},
$(
// We're transitioning from `(ntprev) -> (ntnext)`.
// If we have a token that matches `ntprev`,
// we can transition _back_ to that state
// rather than transitioning forward.
// We can _only_ do this when we know we are
// transitioning away from this state,
// otherwise we could return to a previous state,
// which violates the semantics of the
// implied DFA.
(
$ntprev(meta),
XirfToken::Open(qname, span, depth)
) if $ntprev_st::matches(qname) => {
let tok = XirfToken::Open(qname, span, depth);
ele_parse!(@!ntref_delegate
stack,
$ntprev(meta),
$ntprev_st,
// This NT said it could process this token,
// so force it to either do so or error,
// to ensure that bugs don't cause
// infinite processing of lookahead.
Transition(<$ntprev_st>::NonPreemptableExpecting_)
.incomplete()
.with_lookahead(tok),
Transition($ntprev(meta)).incomplete().with_lookahead(tok)
)
},
($ntprev(meta), tok) => {
ele_parse!(@!ntref_delegate
stack,
$ntnext(meta),
$ntnext_st,
// Since we're just transitioning,
// this _must_ accept the token of input,
// otherwise error.
Transition(<$ntnext_st>::default())
.incomplete()
.with_lookahead(tok),
Transition($ntnext(meta)).incomplete().with_lookahead(tok)
)
},
// Since `ExpectClose_` does not have an
// `$ntprev` match,
// we have to handle transitioning back to
// the previous state as a special case.
// Further,
// we choose to transition back to this state
// _no matter what the element_,
// to force error recovery and diagnostics
// in that context,
// which will tell the user what elements
// were expected in the last NT rather
// than just telling them a closing tag
// was expected.
//
// To avoid a bunch of rework of this macro
// (which can hopefully be done in the future),
// this match is output for _every_ NT,
// but takes effect only for the final NT
// because of the `is_last_nt` predicate.
// _It is important that it only affect the
// final NT_,
// otherwise we'll transition back to _any_
// previous state at the close,
// which completely defeats the purpose of
// having ordered states.
(
ExpectClose_(meta),
XirfToken::Open(qname, span, depth)
) if $ntprev(meta).is_last_nt() => {
let tok = XirfToken::Open(qname, span, depth);
ele_parse!(@!ntref_delegate
stack,
$ntprev(meta),
$ntprev_st,
// If this NT cannot handle this element,
// it should error and enter recovery
// to ignore it.
Transition(<$ntprev_st>::NonPreemptableExpecting_)
.incomplete()
.with_lookahead(tok),
Transition(ExpectClose_(meta))
.incomplete()
.with_lookahead(tok)
)
},
)*
// XIRF ensures proper nesting,

View File

@ -37,7 +37,7 @@ use std::{assert_matches::assert_matches, error::Error, fmt::Display};
use crate::{
convert::ExpectInto,
diagnose::Diagnostic,
parse::{Object, ParseError, ParseState, Parsed},
parse::{Object, ParseError, ParseState, Parsed, ParsedResult},
span::{dummy::*, Span},
sym::SymbolId,
xir::{
@ -898,11 +898,15 @@ fn child_error_and_recovery() {
impl crate::parse::Object for Foo {}
const QN_ROOT: QName = QN_PACKAGE;
const QN_A: QName = QN_CLASSIFY;
const QN_B: QName = QN_EXPORT;
ele_parse! {
enum Sut;
type Object = Foo;
Root := QN_PACKAGE {
Root := QN_ROOT {
@ {} => Foo::RootOpen,
// Must be emitted if `RootOpen` is to maintain balance.
@ -917,11 +921,11 @@ fn child_error_and_recovery() {
ChildB,
};
ChildA := QN_CLASSIFY {
ChildA := QN_A {
@ {} => Foo::ChildABad,
};
ChildB := QN_EXPORT {
ChildB := QN_B {
@ {} => Foo::ChildB,
};
}
@ -931,7 +935,7 @@ fn child_error_and_recovery() {
let toks = vec![
// The first token is the expected root.
XirfToken::Open(QN_PACKAGE, OpenSpan(S1, N), Depth(0)),
XirfToken::Open(QN_ROOT, OpenSpan(S1, N), Depth(0)),
// --> But this one is unexpected (name).
XirfToken::Open(unexpected, span, Depth(1)),
// And so we should ignore it up to this point.
@ -943,9 +947,9 @@ fn child_error_and_recovery() {
// for `ChildA`,
// which means that we expect `ChildB`.
// Parsing continues normally.
XirfToken::Open(QN_EXPORT, OpenSpan(S4, N), Depth(1)),
XirfToken::Open(QN_B, OpenSpan(S4, N), Depth(1)),
XirfToken::Close(None, CloseSpan::empty(S5), Depth(1)),
XirfToken::Close(Some(QN_PACKAGE), CloseSpan(S4, N), Depth(0)),
XirfToken::Close(Some(QN_ROOT), CloseSpan(S4, N), Depth(0)),
];
let mut sut = Sut::parse(toks.into_iter());
@ -967,20 +971,25 @@ fn child_error_and_recovery() {
);
// The token of lookahead (`Open`) is unexpected for `ChildA`,
// which must throw an error and enter a recovery state.
// when then skips to `ChildB`,
// which is _also_ not expecting it and must throw an error and enter
// a recovery state.
// The token should be consumed and returned in the error,
// _not_ produced as a token of lookahead,
// since we do not want to reprocess bad input.
let err = sut.next().unwrap().unwrap_err();
assert_eq!(
// TODO: This references generated identifiers.
ParseError::StateError(SutError_::ChildA(ChildAError_::UnexpectedEle(
unexpected,
span.name_span()
))),
err,
// TODO: This references generated identifiers.
ParseError::StateError(SutError_::ChildB(ChildBError_::UnexpectedEle(
unexpected,
span.name_span(),
))),
);
// TODO: Can't deal with this until we know exactly what error we'll
// have above;
// see above TODO.
// Diagnostic message should be delegated to the child.
assert_eq!(err.describe()[0].span(), span.name_span());
@ -1574,6 +1583,93 @@ fn child_repetition() {
);
}
// Once we transition `(S) -> (S')`,
// we should not be able to transition back under any circumstance.
#[test]
fn child_nt_sequence_no_prev_after_next() {
#[derive(Debug, PartialEq, Eq)]
enum Foo {
Open(QName),
Close(QName),
}
impl crate::parse::Object for Foo {}
const QN_ROOT: QName = QN_PACKAGE;
const QN_A: QName = QN_DIM;
const QN_B: QName = QN_CLASSIFY;
ele_parse! {
enum Sut;
type Object = Foo;
Root := QN_ROOT {
@ {} => Foo::Open(QN_ROOT),
/ => Foo::Close(QN_ROOT),
A,
B,
};
A := QN_A {
@ {} => Foo::Open(QN_A),
/ => Foo::Close(QN_A),
};
B := QN_B {
@ {} => Foo::Open(QN_B),
/ => Foo::Close(QN_B),
};
}
let toks = vec![
XirfToken::Open(QN_ROOT, OpenSpan(S1, N), Depth(0)),
// A
XirfToken::Open(QN_A, OpenSpan(S2, N), Depth(1)),
XirfToken::Close(None, CloseSpan::empty(S2), Depth(1)),
// A -> A OK
XirfToken::Open(QN_A, OpenSpan(S3, N), Depth(1)),
XirfToken::Close(None, CloseSpan::empty(S3), Depth(1)),
// A -> B
XirfToken::Open(QN_B, OpenSpan(S4, N), Depth(1)),
XirfToken::Close(None, CloseSpan::empty(S4), Depth(1)),
// B -> B OK
XirfToken::Open(QN_B, OpenSpan(S4, N), Depth(1)),
XirfToken::Close(None, CloseSpan::empty(S4), Depth(1)),
// B -> A _not_ OK.
XirfToken::Open(QN_A, OpenSpan(S6, N), Depth(1)),
XirfToken::Close(None, CloseSpan::empty(S6), Depth(1)),
XirfToken::Close(Some(QN_ROOT), CloseSpan(S8, N), Depth(0)),
];
use Parsed::*;
assert_eq!(
vec![
Ok(Incomplete), // [Root] Root Open
Ok(Object(Foo::Open(QN_ROOT))), // [Root@] A Open (>LA)
Ok(Incomplete), // [A] A Open (<LA)
Ok(Object(Foo::Open(QN_A))), // [A@] A Close (>LA)
Ok(Object(Foo::Close(QN_A))), // [A] A Close (<LA)
Ok(Incomplete), // [A] A Open (<LA)
Ok(Object(Foo::Open(QN_A))), // [A@] A Close (>LA)
Ok(Object(Foo::Close(QN_A))), // [A] A Close (<LA)
Ok(Incomplete), // [B] B Open (<LA)
Ok(Object(Foo::Open(QN_B))), // [B@] B Close (>LA)
Ok(Object(Foo::Close(QN_B))), // [B] B Close (<LA)
Ok(Incomplete), // [B] B Open (<LA)
Ok(Object(Foo::Open(QN_B))), // [B@] B Close (>LA)
Ok(Object(Foo::Close(QN_B))), // [B] B Close (<LA)
Err(ParseError::StateError(SutError_::B(
BError_::UnexpectedEle(QN_A, OpenSpan(S6, N).name_span())
))), // [B!] A Open
Ok(Incomplete), // [B!] A Close
Ok(Object(Foo::Close(QN_ROOT))), // [Root] Root Close
],
Sut::parse(toks.into_iter()).collect::<Vec<ParsedResult<Sut>>>(),
);
}
#[test]
fn child_repetition_invalid_tok_dead() {
#[derive(Debug, PartialEq, Eq)]
@ -1613,7 +1709,7 @@ fn child_repetition_invalid_tok_dead() {
// Child (success)
XirfToken::Open(QN_CHILD, OpenSpan(S2, N), Depth(1)),
XirfToken::Close(None, CloseSpan::empty(S3), Depth(1)),
// Repeat (unexpected)
// unexpected
XirfToken::Open(unexpected, OpenSpan(S2, N), Depth(1)),
XirfToken::Close(None, CloseSpan::empty(S3), Depth(1)),
XirfToken::Close(Some(QN_ROOT), CloseSpan(S8, N), Depth(0)),
@ -1640,18 +1736,15 @@ fn child_repetition_invalid_tok_dead() {
// Given that dead state and token of lookahead,
// `Parser` will immediately recurse to re-process the erroneous
// `Open`.
// Since the next token expected after the `Child` NT is `Close`,
// this will result in an error and trigger recovery _on `Root`_,
// which will ignore the erroneous `Open`.
// The next state after the `Child` NT is expecting a `Close`,
// but upon encountering a `Open` it forces the last NT to perform the
// processing,
// and so the error will occur on `Child`.
assert_eq!(
next(),
// TODO: This references generated identifiers.
Some(Err(ParseError::StateError(SutError_::Root(
RootError_::CloseExpected(
QN_ROOT,
OpenSpan(S1, N),
XirfToken::Open(unexpected, OpenSpan(S2, N), Depth(1)),
)
Some(Err(ParseError::StateError(SutError_::Child(
// TODO: This references generated identifiers.
ChildError_::UnexpectedEle(unexpected, OpenSpan(S2, N).name_span())
)))),
);