tamer: Replace ParseStatus::Dead with generic lookahead

Oh what a tortured journey.  I had originally tried to avoid formalizing
lookahead for all parsers by pretending that it was only needed for dead
state transitions (that is---states that have no transitions for a given
input token), but then I needed to yield information for aggregation.  So I
added the ability to override the token for `Dead` to yield that, in
addition to the token.  But then I also needed to yield lookahead for error
conditions.  It was a mess that didn't make sense.

This eliminates `ParseStatus::Dead` entirely and fully integrates the
lookahead token in `Parser` that was previously implemented.

Notably, the lookahead token is encapsulated in `TransitionResult` and
unavailable to `ParseState` implementations, forcing them to rely on
`Parser` for recursion.  This not only prevents `ParseState` from recursing,
but also simplifies delegation by removing the need to manually handle
tokens of lookahead.

The awkward case here is XIRT, which does not follow the streaming parsing
convention, because it was conceived before the parsing framework.  It needs
to go away, but doing so right now would be a lot of work, so it has to
stick around for a little bit longer until the new parser generators can be
used instead.  It is a persistent thorn in my side, going against the grain.

`Parser` will immediately recurse if it sees a token of lookahead with an
incomplete parse.  This is because stitched parsers will frequently yield a
dead state indication when they're done parsing, and there's no use in
propagating an `Incomplete` status down the entire lowering pipeline.  But,
that does mean that the toplevel is not the only thing recursing.  _But_,
the behavior doesn't really change, in the sense that it would infinitely
recurse down the entire lowering stack (though there'd be an opportunity to
detect that).  This should never happen with a correct parser, but it's not
worth the effort right now to try to force such a thing with Rust's type
system.  Something like TLA+ is better suited here as an aid, but it
shouldn't be necessary with clear implementations and proper test
cases.  Parser generators will also ensure such a thing cannot occur.

I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a
lot of type inference that happens based on the fact that `ParseStatus` has
a `ParseState` type parameter; `Parsed` has only `Object`.  It is desirable
for a public-facing `Parsed` to not be tied to `ParseState`, since consumers
need not be concerned with such a heavy type; however, we _do_ want that
heavy type internally, as it carries a lot of useful information that allows
for significant and powerful type inference, which in turn creates
expressive and convenient APIs.

DEV-7145
main
Mike Gerwitz 2022-07-11 23:49:57 -04:00
parent 61ce7d3fc7
commit bd783ac08b
12 changed files with 409 additions and 332 deletions

View File

@ -136,12 +136,11 @@ impl Display for XmloToken {
}
/// A parser capable of being composed with [`XmloReader`].
pub trait XmloState =
ParseState<Token = Xirf, DeadToken = Xirf, Context = EmptyContext>
where
Self: Default,
<Self as ParseState>::Error: Into<XmloError>,
<Self as ParseState>::Object: Into<XmloToken>;
pub trait XmloState = ParseState<Token = Xirf, Context = EmptyContext>
where
Self: Default,
<Self as ParseState>::Error: Into<XmloError>,
<Self as ParseState>::Object: Into<XmloToken>;
#[derive(Debug, Default, PartialEq, Eq)]
pub enum XmloReader<
@ -227,9 +226,12 @@ impl<SS: XmloState, SD: XmloState, SF: XmloState> ParseState
// TOOD: It'd be nice to augment errors with the symbol table
// span as well (e.g. "while processing symbol table at <loc>").
(Symtable(span, ss), tok) => {
ss.delegate(ctx, tok, |ss| Symtable(span, ss))
}
(Symtable(span, ss), tok) => ss.delegate(
tok,
ctx,
|ss| Transition(Symtable(span, ss)),
|| unreachable!(), // TODO: currently caught by preceding match
),
(SymDepsExpected, Xirf::Open(QN_SYM_DEPS, span, _)) => {
Transition(SymDeps(span.tag_span(), SD::default())).incomplete()
@ -241,9 +243,12 @@ impl<SS: XmloState, SD: XmloState, SF: XmloState> ParseState
Transition(FragmentsExpected).incomplete()
}
(SymDeps(span, sd), tok) => {
sd.delegate(ctx, tok, |sd| SymDeps(span, sd))
}
(SymDeps(span, sd), tok) => sd.delegate(
tok,
ctx,
|sd| Transition(SymDeps(span, sd)),
|| unreachable!(), // TODO: currently caught by preceding match
),
(FragmentsExpected, Xirf::Open(QN_FRAGMENTS, span, _)) => {
Transition(Fragments(span.tag_span(), SF::default()))
@ -257,9 +262,12 @@ impl<SS: XmloState, SD: XmloState, SF: XmloState> ParseState
Transition(Eoh).ok(XmloToken::Eoh(span.tag_span()))
}
(Fragments(span, sf), tok) => {
sf.delegate(ctx, tok, |sf| Fragments(span, sf))
}
(Fragments(span, sf), tok) => sf.delegate(
tok,
ctx,
|sf| Transition(Fragments(span, sf)),
|| unreachable!(), // TODO: currently caught by preceding match
),
(Eoh, Xirf::Close(Some(QN_PACKAGE), ..)) => {
Transition(Done).incomplete()

View File

@ -31,8 +31,8 @@ pub use lower::{Lower, LowerIter, ParsedObject};
pub use parser::{Parsed, ParsedResult, Parser};
pub use state::{
context::{Context, Empty as EmptyContext, NoContext},
Aggregate, ParseResult, ParseState, ParseStatus, Transition,
TransitionResult, Transitionable,
ParseResult, ParseState, ParseStatus, Transition, TransitionResult,
Transitionable,
};
use crate::span::{Span, DUMMY_SPAN};

View File

@ -65,10 +65,8 @@ pub enum ParseError<T: Token, E: Diagnostic + PartialEq> {
/// The parser reached an unhandled dead state.
///
/// Once a parser returns [`ParseStatus::Dead`],
/// a parent context must use that provided token as a lookahead.
/// If that does not occur,
/// [`Parser`] produces this error.
/// For more information,
/// see [`ParseState::delegate`] and [`Parser::feed_tok`].
///
/// The string is intended to describe what was expected to have been
/// available based on the current [`ParseState`].

View File

@ -52,7 +52,7 @@ where
'b,
I,
Parsed<S::Object>,
ParseError<S::DeadToken, S::Error>,
ParseError<S::Token, S::Error>,
>,
}
@ -65,9 +65,7 @@ where
{
/// Consume inner parser and yield its context.
#[inline]
fn finalize(
self,
) -> Result<LS::Context, ParseError<LS::DeadToken, LS::Error>> {
fn finalize(self) -> Result<LS::Context, ParseError<LS::Token, LS::Error>> {
self.lower.finalize().map_err(|(_, e)| e)
}
}
@ -123,8 +121,8 @@ where
where
Self: Iterator<Item = ParsedResult<S>> + Sized,
<LS as ParseState>::Context: Default,
ParseError<S::DeadToken, S::Error>: Into<E>,
ParseError<LS::DeadToken, LS::Error>: Into<E>,
ParseError<S::Token, S::Error>: Into<E>,
ParseError<LS::Token, LS::Error>: Into<E>,
{
self.while_ok(|toks| {
// TODO: This parser is not accessible after error recovery!
@ -149,8 +147,8 @@ where
) -> Result<(U, LS::Context), E>
where
Self: Iterator<Item = ParsedResult<S>> + Sized,
ParseError<S::DeadToken, S::Error>: Into<E>,
ParseError<LS::DeadToken, LS::Error>: Into<E>,
ParseError<S::Token, S::Error>: Into<E>,
ParseError<LS::Token, LS::Error>: Into<E>,
{
self.while_ok(|toks| {
let lower = LS::parse_with_context(iter::empty(), ctx);

View File

@ -23,7 +23,10 @@ use super::{
ParseError, ParseResult, ParseState, ParseStatus, TokenStream, Transition,
TransitionResult,
};
use crate::span::{Span, UNKNOWN_SPAN};
use crate::{
parse::state::{Lookahead, TransitionData},
span::{Span, UNKNOWN_SPAN},
};
#[cfg(doc)]
use super::Token;
@ -54,9 +57,6 @@ impl<S: ParseState> From<ParseStatus<S>> for Parsed<S::Object> {
match status {
ParseStatus::Incomplete => Parsed::Incomplete,
ParseStatus::Object(x) => Parsed::Object(x),
ParseStatus::Dead(_) => {
unreachable!("Dead status must be filtered by Parser")
}
}
}
}
@ -75,7 +75,7 @@ impl<S: ParseState> From<ParseStatus<S>> for Parsed<S::Object> {
/// if you have not consumed the entire iterator,
/// call [`finalize`](Parser::finalize) to ensure that parsing has
/// completed in an accepting state.
#[derive(Debug, PartialEq, Eq)]
#[derive(Debug, PartialEq)]
pub struct Parser<S: ParseState, I: TokenStream<S::Token>> {
/// Input token stream to be parsed by the [`ParseState`] `S`.
toks: I,
@ -85,7 +85,7 @@ pub struct Parser<S: ParseState, I: TokenStream<S::Token>> {
///
/// See [`take_lookahead_tok`](Parser::take_lookahead_tok) for more
/// information.
lookahead: Option<S::Token>,
lookahead: Option<Lookahead<S>>,
/// Parsing automaton.
///
@ -108,6 +108,8 @@ pub struct Parser<S: ParseState, I: TokenStream<S::Token>> {
/// [`ParseState::parse_token`] in [`Parser::feed_tok`],
/// so it is safe to call [`unwrap`](Option::unwrap) without
/// worrying about panics.
/// This is also why Dead states require transitions,
/// given that [`ParseState`] does not implement [`Default`].
///
/// For more information,
/// see the implementation of [`Parser::feed_tok`].
@ -166,7 +168,7 @@ impl<S: ParseState, I: TokenStream<S::Token>> Parser<S, I> {
/// is a decision made by the [`ParseState`].
pub fn finalize(
self,
) -> Result<S::Context, (Self, ParseError<S::DeadToken, S::Error>)> {
) -> Result<S::Context, (Self, ParseError<S::Token, S::Error>)> {
match self.assert_accepting() {
Ok(()) => Ok(self.ctx),
Err(err) => Err((self, err)),
@ -178,12 +180,10 @@ impl<S: ParseState, I: TokenStream<S::Token>> Parser<S, I> {
/// otherwise [`Err`] with [`ParseError::UnexpectedEof`].
///
/// See [`finalize`](Self::finalize) for the public-facing method.
fn assert_accepting(
&self,
) -> Result<(), ParseError<S::DeadToken, S::Error>> {
fn assert_accepting(&self) -> Result<(), ParseError<S::Token, S::Error>> {
let st = self.state.as_ref().unwrap();
if let Some(lookahead) = &self.lookahead {
if let Some(Lookahead(lookahead)) = &self.lookahead {
Err(ParseError::Lookahead(lookahead.span(), st.to_string()))
} else if st.is_accepting() {
Ok(())
@ -209,6 +209,26 @@ impl<S: ParseState, I: TokenStream<S::Token>> Parser<S, I> {
/// The only thing preventing this being public is formalization and a
/// commitment to maintain it.
///
/// Recursion Warning
/// -----------------
/// If a [`ParseState`] yields an incomplete parse along with a token of
/// lookahead,
/// this will immediately recurse with that token;
/// that situation is common with [`ParseState::delegate`].
/// This is intended as an optimization to save a wasteful
/// [`Parsed::Incomplete`] from being propagated down the entire
/// lowering pipeline,
/// but it could potentially result in unbounded recursion if a
/// misbehaving [`ParseState`] continuously yields the same token of
/// lookahead.
/// Such behavior would be incorrect,
/// but would otherwise result in recursion across the entire lowering
/// pipeline.
///
/// A [`ParseState`] should never yield a token of lookahead unless
/// consuming that same token will result in either a state transition
/// or a dead state.
///
/// Panics
/// ------
/// This uses a debug assertion to enforce the invariant
@ -241,23 +261,47 @@ impl<S: ParseState, I: TokenStream<S::Token>> Parser<S, I> {
//
// Note that this used to use `mem::take`,
// and the generated assembly was identical in both cases.
let TransitionResult(Transition(state), result, lookahead) =
//
// Note also that this is what Dead states require transitions.
let TransitionResult(Transition(state), data) =
self.state.take().unwrap().parse_token(tok, &mut self.ctx);
self.state.replace(state);
self.lookahead = lookahead;
use ParseStatus::*;
match result {
use ParseStatus::{Incomplete, Object};
match data {
// Nothing handled this dead state,
// and we cannot discard a lookahead token,
// so we have no choice but to produce an error.
Ok(Dead(invalid)) => Err(ParseError::UnexpectedToken(
invalid,
self.state.as_ref().unwrap().to_string(),
)),
TransitionData::Dead(Lookahead(invalid)) => {
Err(ParseError::UnexpectedToken(
invalid,
self.state.as_ref().unwrap().to_string(),
))
}
Ok(parsed @ (Incomplete | Object(..))) => Ok(parsed.into()),
Err(e) => Err(e.into()),
// If provided a token of lookahead and an incomplete parse,
// then just try again right away and avoid propagating this
// delay throughout the entire lowering pipeline.
// This is likely to happen on a dead state transition during
// parser delegation
// (see [`ParseState::delegate`]).
// This will only result in unbounded recursion if the parser
// continues to yield the same token of lookahead
// continuously,
// which represents an implementation flaw in the parser.
TransitionData::Result(
Ok(Incomplete),
Some(Lookahead(lookahead)),
) => self.feed_tok(lookahead),
TransitionData::Result(result, lookahead) => {
self.lookahead = lookahead;
match result {
Ok(parsed @ (Incomplete | Object(..))) => Ok(parsed.into()),
Err(e) => Err(e.into()),
}
}
}
}
@ -306,7 +350,7 @@ impl<S: ParseState, I: TokenStream<S::Token>> Parser<S, I> {
/// but proving correctness is better left to proof systems than
/// Rust's type system.
pub(super) fn take_lookahead_tok(&mut self) -> Option<S::Token> {
self.lookahead.take()
self.lookahead.take().map(|Lookahead(tok)| tok)
}
}

View File

@ -23,10 +23,7 @@ mod transition;
use super::{Object, ParseError, Parser, Token, TokenStream};
use crate::diagnose::Diagnostic;
use std::{
fmt::{Debug, Display},
ops::ControlFlow,
};
use std::fmt::{Debug, Display};
pub use transition::*;
#[cfg(doc)]
@ -34,10 +31,8 @@ use context::{Context, NoContext};
/// Result of some non-parsing operation on a [`Parser`],
/// with any error having been wrapped in a [`ParseError`].
pub type ParseResult<S, T> = Result<
T,
ParseError<<S as ParseState>::DeadToken, <S as ParseState>::Error>,
>;
pub type ParseResult<S, T> =
Result<T, ParseError<<S as ParseState>::Token, <S as ParseState>::Error>>;
/// Result of a parsing operation.
#[derive(Debug, PartialEq, Eq)]
@ -50,32 +45,6 @@ pub enum ParseStatus<S: ParseState> {
/// This does not indicate that the parser is complete,
/// as more objects may be able to be emitted.
Object(S::Object),
/// Parser encountered a dead state relative to the given token.
///
/// A dead state is an accepting state that has no state transition for
/// the given token.
/// This could simply mean that the parser has completed its job and
/// that control must be returned to a parent context.
///
/// If a parser is _not_ in an accepting state,
/// then an error ought to occur rather than a dead state;
/// the difference between the two is that the token associated with
/// a dead state can be used as a lookahead token in order to
/// produce a state transition at a higher level,
/// whereas an error indicates that parsing has failed.
/// Intuitively,
/// this means that a [`ParseStatus::Object`] had just been emitted
/// and that the token following it isn't something that can be
/// parsed.
///
/// Certain parsers may aggregate data until reaching a dead state,
/// in which case [`Aggregate`] may be of use to yield both a
/// lookahead token and an aggregate [`ParseStatus::Object`].
///
/// If there is no parent context to handle the token,
/// [`Parser`] must yield an error.
Dead(S::DeadToken),
}
impl<S: ParseState<Object = T>, T: Object> From<T> for ParseStatus<S> {
@ -122,16 +91,6 @@ pub trait ParseState: PartialEq + Eq + Display + Debug + Sized {
/// otherwise-immutable [`ParseState`].
type Context: Debug = context::Empty;
/// Token returned when the parser cannot perform a state transition.
///
/// This is generally the type of the input token itself
/// (and so the same as [`ParseState::Token`]),
/// which can be used as a token of lookahead.
/// Parsers may change this type to provide additional data.
/// For more information and a practical use case of this,
/// see [`Aggregate`].
type DeadToken: Token = Self::Token;
/// Construct a parser with a [`Default`] state.
///
/// Whether this method is helpful or provides any clarity depends on
@ -199,6 +158,12 @@ pub trait ParseState: PartialEq + Eq + Display + Debug + Sized {
/// information that is not subject to Rust's move semantics.
/// If this is not necessary,
/// see [`NoContext`].
///
/// This method must not produce a token of [`Lookahead`] _unless_
/// consuming that token again will result in either a state
/// transition or a dead state indication.
/// Otherwise,
/// the system may recurse indefinitely.
fn parse_token(
self,
tok: Self::Token,
@ -222,90 +187,128 @@ pub trait ParseState: PartialEq + Eq + Display + Debug + Sized {
/// or it is acceptable to parse all the way until the end.
fn is_accepting(&self) -> bool;
/// Delegate parsing from a compatible, stitched [`ParseState`]~`SP`.
/// Delegate parsing from a compatible, stitched [`ParseState`] `SP`.
///
/// This helps to combine two state machines that speak the same input
/// language
/// (share the same [`Self::Token`]),
/// handling the boilerplate of delegating [`Self::Token`] from a
/// parent state~`SP` to `Self`.
/// parent state `SP` to `Self`.
///
/// Token delegation happens after [`Self`] has been entered from a
/// parent [`ParseState`] context~`SP`,
/// parent [`ParseState`] context `SP`,
/// so stitching the start and accepting states must happen elsewhere
/// (for now).
///
/// This assumes that no lookahead token from [`ParseStatus::Dead`] will
/// need to be handled by the parent state~`SP`.
/// To handle a token of lookahead,
/// use [`Self::delegate_lookahead`] instead.
///
/// _TODO: More documentation once this is finalized._
/// If the parser indicates a dead state,
/// the token of lookahead will be delegated to the parent `SP` and
/// result in an incomplete parse to the state indicated by the `dead`
/// callback.
/// This will cause a [`Parser`] to yield that token of lookahead back
/// to `SP`
/// (or whatever ancestor exists at the root)
/// for re-processing.
/// It is expected that the `dead` callback will cause~`SP` to
/// transition into a state that will avoid invoking this parser again
/// with the same token,
/// which may otherwise result in unbounded recursion
/// (see Recursion Warning in [`Parser::feed_tok`]).
fn delegate<SP, C>(
self,
mut context: C,
tok: <Self as ParseState>::Token,
into: impl FnOnce(Self) -> SP,
mut context: C,
into: impl FnOnce(Self) -> Transition<SP>,
dead: impl FnOnce() -> Transition<SP>,
) -> TransitionResult<SP>
where
Self: StitchableParseState<SP>
+ ParseState<DeadToken = <SP as ParseState>::DeadToken>,
C: AsMut<<Self as ParseState>::Context>,
{
use ParseStatus::{Dead, Incomplete, Object as Obj};
let (Transition(newst), result) =
self.parse_token(tok, context.as_mut()).into();
// This does not use `delegate_lookahead` so that we can have
// `into: impl FnOnce` instead of `Fn`.
Transition(into(newst)).result(match result {
Ok(Incomplete) => Ok(Incomplete),
Ok(Obj(obj)) => Ok(Obj(obj.into())),
Ok(Dead(tok)) => Ok(Dead(tok.into())),
Err(e) => Err(e.into()),
})
}
/// Delegate parsing from a compatible, stitched [`ParseState`]~`SP` with
/// support for a lookahead token.
///
/// This does the same thing as [`Self::delegate`],
/// but allows for the handling of a lookahead token from [`Self`]
/// rather than simply proxying [`ParseStatus::Dead`].
///
/// _TODO: More documentation once this is finalized._
fn delegate_lookahead<SP, C>(
self,
mut context: C,
tok: <Self as ParseState>::Token,
into: impl FnOnce(Self) -> SP,
) -> ControlFlow<
TransitionResult<SP>,
(Self, <Self as ParseState>::DeadToken, C),
>
where
Self: StitchableParseState<SP>,
C: AsMut<<Self as ParseState>::Context>,
{
use ControlFlow::*;
use ParseStatus::{Dead, Incomplete, Object as Obj};
use ParseStatus::{Incomplete, Object as Obj};
// NB: Rust/LLVM are generally able to elide these moves into direct
// assignments,
// but sometimes this does not work
// (e.g. XIRF's use of `ArrayVec`).
// If your [`ParseState`] has a lot of `memcpy`s or other
// performance issues,
// move heavy objects into `context`.
let (Transition(newst), result) =
self.parse_token(tok, context.as_mut()).into();
let TransitionResult(Transition(newst), data) =
self.parse_token(tok, context.as_mut());
match result {
Ok(Incomplete) => Break(Transition(into(newst)).incomplete()),
Ok(Obj(obj)) => Break(Transition(into(newst)).ok(obj.into())),
Ok(Dead(tok)) => Continue((newst, tok, context)),
Err(e) => Break(Transition(into(newst)).err(e)),
match data {
// The token of lookahead must bubble up to the ancestor
// [`Parser`] so that it knows to provide that token in place
// of the next from the token stream,
// otherwise the token will be lost.
// Since we have stitched together states,
// the dead state simply means that we should transition back
// out of this parser back to `SP` so that it can use the
// token of lookahead.
TransitionData::Dead(Lookahead(lookahead)) => {
dead().incomplete().with_lookahead(lookahead)
}
TransitionData::Result(result, lookahead) => TransitionResult(
into(newst),
TransitionData::Result(
match result {
Ok(Incomplete) => Ok(Incomplete),
Ok(Obj(obj)) => Ok(Obj(obj.into())),
Err(e) => Err(e.into()),
},
lookahead.map(|Lookahead(la)| Lookahead(la)),
),
),
}
}
/// Delegate parsing from a compatible, stitched [`ParseState`] `SP`
/// while consuming objects during `SP` state transition.
///
/// See [`ParseState::delegate`] for more information.
/// This method exists for a XIRT and ought to be removed when it is no
/// longer needed.
fn delegate_with_obj<SP, C, X>(
self,
tok: <Self as ParseState>::Token,
mut context: C,
env: X,
into: impl FnOnce(
Self,
Option<<Self as ParseState>::Object>,
X,
) -> Transition<SP>,
dead: impl FnOnce(X) -> Transition<SP>,
) -> TransitionResult<SP>
where
Self: PartiallyStitchableParseState<SP>,
C: AsMut<<Self as ParseState>::Context>,
{
use ParseStatus::{Incomplete, Object as Obj};
let TransitionResult(Transition(newst), data) =
self.parse_token(tok, context.as_mut());
match data {
TransitionData::Dead(Lookahead(lookahead)) => {
dead(env).incomplete().with_lookahead(lookahead)
}
// Consume object and allow processing as part of state
// transition.
TransitionData::Result(Ok(Obj(obj)), lookahead) => {
TransitionResult(
into(newst, Some(obj), env),
TransitionData::Result(
Ok(Incomplete),
lookahead.map(|Lookahead(la)| Lookahead(la)),
),
)
}
TransitionData::Result(result, lookahead) => TransitionResult(
into(newst, None, env),
TransitionData::Result(
match result {
Ok(_) => Ok(Incomplete),
Err(e) => Err(e.into()),
},
lookahead.map(|Lookahead(la)| Lookahead(la)),
),
),
}
}
}
@ -328,43 +331,15 @@ pub type ParseStateResult<S> = Result<ParseStatus<S>, <S as ParseState>::Error>;
/// it is not necessary for parser composition,
/// provided that you perform the necessary wiring yourself in absence
/// of state stitching.
pub trait StitchableParseState<SP: ParseState> = ParseState
pub trait StitchableParseState<SP: ParseState> =
PartiallyStitchableParseState<SP>
where <Self as ParseState>::Object: Into<<SP as ParseState>::Object>;
pub trait PartiallyStitchableParseState<SP: ParseState> = ParseState
where
SP: ParseState<Token = <Self as ParseState>::Token>,
<Self as ParseState>::Object: Into<<SP as ParseState>::Object>,
<Self as ParseState>::Error: Into<<SP as ParseState>::Error>;
/// Indicates that a parser has completed an aggregate operation,
/// marked by having reached a [dead state](ParseStatus::Dead).
///
/// This struct is compatible with [`ParseState::DeadToken`] and is intended
/// to be used with parsers that continue to aggregate data until they no
/// longer can.
/// For example,
/// an attribute parser may continue to parse element attributes until it
/// reaches the end of the attribute list,
/// which cannot be determined until reading a [`ParseState::Token`]
/// that must result in a [`ParseStatus::Dead`].
#[derive(Debug, PartialEq, Eq)]
pub struct Aggregate<O: Object, T: Token>(pub O, pub T);
impl<O: Object, T: Token> Token for Aggregate<O, T> {
fn span(&self) -> crate::span::Span {
let Aggregate(_, tok) = self;
tok.span()
}
}
impl<O: Object, T: Token> Object for Aggregate<O, T> {}
impl<O: Object, T: Token> Display for Aggregate<O, T> {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
match self {
Aggregate(_obj, tok) => write!(f, "{tok} with associated object"),
}
}
}
pub mod context {
use super::Debug;
use std::ops::{Deref, DerefMut};

View File

@ -23,11 +23,11 @@ use super::{ParseState, ParseStateResult, ParseStatus};
use std::{
convert::Infallible,
hint::unreachable_unchecked,
ops::{ControlFlow, FromResidual, Try},
ops::{ControlFlow, FromResidual},
};
#[cfg(doc)]
use super::Token;
use super::{Parser, Token};
/// A state transition with associated data.
///
@ -54,10 +54,7 @@ pub struct TransitionResult<S: ParseState>(
/// New parser state.
pub(in super::super) Transition<S>,
/// Result of the parsing operation.
pub(in super::super) ParseStateResult<S>,
/// Optional unused token to use as a lookahead token in place of
/// the next token from the input stream.
pub(in super::super) Option<S::Token>,
pub(in super::super) TransitionData<S>,
);
impl<S: ParseState> TransitionResult<S> {
@ -66,22 +63,74 @@ impl<S: ParseState> TransitionResult<S> {
/// next token from the input stream.
pub fn with_lookahead(self, lookahead: S::Token) -> Self {
match self {
Self(transition, result, None) => {
Self(transition, result, Some(lookahead))
}
Self(transition, TransitionData::Result(result, None)) => Self(
transition,
TransitionData::Result(result, Some(Lookahead(lookahead))),
),
// This represents a problem with the parser;
// we should never specify a lookahead token more than once.
// This could be enforced statically with the type system if
// ever such a thing is deemed to be worth doing.
Self(.., Some(prev)) => {
Self(
..,
TransitionData::Result(_, Some(prev))
| TransitionData::Dead(prev),
) => {
panic!("internal error: lookahead token overwrite: {prev:?}")
}
}
}
}
/// Denotes a state transition.
/// Token to use as a lookahead token in place of the next token from the
/// input stream.
#[derive(Debug, PartialEq)]
pub struct Lookahead<S: ParseState>(pub(in super::super) S::Token);
/// Information about the state transition.
///
/// Note: Ideally a state wouldn't even be required for
/// [`Dead`](TransitionData::Dead),
/// but [`ParseState`] does not implement [`Default`] and [`Parser`]
/// requires _some_ state exist.
#[derive(Debug, PartialEq)]
pub(in super::super) enum TransitionData<S: ParseState> {
/// State transition was successful or not attempted,
/// with an optional token of [`Lookahead`].
///
/// Note that a successful state transition _does not_ imply a
/// successful [`ParseStateResult`]---the
/// parser may choose to successfully transition into an error
/// recovery state to accommodate future tokens.
Result(ParseStateResult<S>, Option<Lookahead<S>>),
/// No valid state transition exists from the current state for the
/// given input token,
/// which is returned as a token of [`Lookahead`].
///
/// A dead state is an accepting state that has no state transition for
/// the given token.
/// This could simply mean that the parser has completed its job and
/// that control must be returned to a parent context.
/// Note that this differs from an error state,
/// where a parser is unable to reach an accepting state because it
/// received unexpected input.
///
/// Note that the parser may still choose to perform a state transition
/// for the sake of error recovery,
/// but note that the dead state is generally interpreted to mean
/// "I have no further work that I am able to perform"
/// and may lead to finalization of the parser.
/// If a parser intends to do additional work,
/// it should return an error instead via [`TransitionData::Result`].
Dead(Lookahead<S>),
}
/// A verb denoting a state transition.
///
/// This is typically instantiated directly by a [`ParseState`] to perform a
/// state transition in [`ParseState::parse_token`].
///
/// This newtype was created to produce clear, self-documenting code;
/// parsers can get confusing to read with all of the types involved,
@ -101,7 +150,7 @@ impl<S: ParseState> Transition<S> {
where
T: Into<ParseStatus<S>>,
{
TransitionResult(self, Ok(obj.into()), None)
TransitionResult(self, TransitionData::Result(Ok(obj.into()), None))
}
/// A transition with corresponding error.
@ -109,7 +158,7 @@ impl<S: ParseState> Transition<S> {
/// This indicates a parsing failure.
/// The state ought to be suitable for error recovery.
pub fn err<E: Into<S::Error>>(self, err: E) -> TransitionResult<S> {
TransitionResult(self, Err(err.into()), None)
TransitionResult(self, TransitionData::Result(Err(err.into()), None))
}
/// A state transition with corresponding [`Result`].
@ -121,7 +170,13 @@ impl<S: ParseState> Transition<S> {
T: Into<ParseStatus<S>>,
E: Into<S::Error>,
{
TransitionResult(self, result.map(Into::into).map_err(Into::into), None)
TransitionResult(
self,
TransitionData::Result(
result.map(Into::into).map_err(Into::into),
None,
),
)
}
/// A state transition indicating that more data is needed before an
@ -129,42 +184,28 @@ impl<S: ParseState> Transition<S> {
///
/// This corresponds to [`ParseStatus::Incomplete`].
pub fn incomplete(self) -> TransitionResult<S> {
TransitionResult(self, Ok(ParseStatus::Incomplete), None)
TransitionResult(
self,
TransitionData::Result(Ok(ParseStatus::Incomplete), None),
)
}
/// A dead state transition.
/// A state transition could not be performed and parsing will not
/// continue.
///
/// This corresponds to [`ParseStatus::Dead`],
/// and a calling parser should use the provided [`Token`] as
/// lookahead.
pub fn dead(self, tok: S::DeadToken) -> TransitionResult<S> {
TransitionResult(self, Ok(ParseStatus::Dead(tok)), None)
}
}
impl<S: ParseState> Into<(Transition<S>, ParseStateResult<S>)>
for TransitionResult<S>
{
fn into(self) -> (Transition<S>, ParseStateResult<S>) {
(self.0, self.1)
}
}
impl<S: ParseState> Try for TransitionResult<S> {
type Output = (Transition<S>, ParseStateResult<S>);
type Residual = (Transition<S>, ParseStateResult<S>);
fn from_output(output: Self::Output) -> Self {
match output {
(st, result) => Self(st, result, None),
}
}
fn branch(self) -> ControlFlow<Self::Residual, Self::Output> {
match self.into() {
(st, Ok(x)) => ControlFlow::Continue((st, Ok(x))),
(st, Err(e)) => ControlFlow::Break((st, Err(e))),
}
/// A dead state represents an _accepting state_ that has no edge to
/// another state for the given `tok`.
/// Rather than throw an error,
/// a parser uses this status to indicate that it has completed
/// parsing and that the token should be utilized elsewhere;
/// the provided token will be used as a token of [`Lookahead`].
///
/// If a parser is not prepared to be finalized and needs to yield an
/// object first,
/// use [`Transition::result`] or other methods along with a token
/// of [`Lookahead`].
pub fn dead(self, tok: S::Token) -> TransitionResult<S> {
TransitionResult(self, TransitionData::Dead(Lookahead(tok)))
}
}
@ -173,7 +214,7 @@ impl<S: ParseState> FromResidual<(Transition<S>, ParseStateResult<S>)>
{
fn from_residual(residual: (Transition<S>, ParseStateResult<S>)) -> Self {
match residual {
(st, result) => Self(st, result, None),
(st, result) => Self(st, TransitionData::Result(result, None)),
}
}
}

View File

@ -190,6 +190,11 @@ impl AttrList {
self
}
pub fn extend<T: IntoIterator<Item = Attr>>(mut self, iter: T) -> Self {
self.attrs.extend(iter);
self
}
/// Search for an attribute of the given `name`.
///
/// _You should use this method only when a linear search makes sense._

View File

@ -144,10 +144,12 @@ impl Diagnostic for AttrParseError {
#[cfg(test)]
mod test {
use std::assert_matches::assert_matches;
use super::*;
use crate::{
convert::ExpectInto,
parse::{EmptyContext, ParseError, ParseStatus, Parsed},
parse::{ParseError, Parsed},
sym::GlobalSymbolIntern,
xir::test::{close_empty, open},
};
@ -159,18 +161,13 @@ mod test {
fn dead_if_first_token_is_non_attr() {
let tok = open("foo", S);
let sut = AttrParseState::default();
let mut sut = AttrParseState::parse(vec![tok.clone()].into_iter());
// There is no state that we can transition to,
// and we're in an empty accepting state.
assert_eq!(
(
// Make sure we're in the same state we started in so that
// we know we can accommodate recovery token(s).
Transition(AttrParseState::default()),
Ok(ParseStatus::Dead(tok.clone()))
),
sut.parse_token(tok, &mut EmptyContext).into()
assert_matches!(
sut.next(),
Some(Err(ParseError::UnexpectedToken(given, _))) if given == tok,
);
}

View File

@ -167,7 +167,7 @@ impl From<Attr> for XirfToken {
/// XIRF-compatible attribute parser.
pub trait FlatAttrParseState<const MAX_DEPTH: usize> =
ParseState<Token = XirToken, DeadToken = XirToken, Object = Attr>
ParseState<Token = XirToken, Object = Attr>
where
Self: Default,
<Self as ParseState>::Error: Into<XirToXirfError>,
@ -232,12 +232,12 @@ where
(NodeExpected, tok) => Self::parse_node(tok, stack),
(AttrExpected(sa), tok) => {
let (_sa, lookahead, stack) =
sa.delegate_lookahead(stack, tok, AttrExpected)?;
Self::parse_node(lookahead, stack)
}
(AttrExpected(sa), tok) => sa.delegate(
tok,
stack,
|sa| Transition(AttrExpected(sa)),
|| Transition(NodeExpected),
),
(Done, tok) => Transition(Done).dead(tok),
}

View File

@ -68,7 +68,7 @@ pub enum AttrParseError<S: AttrParseState> {
/// The caller must determine whether to proceed with parsing of the
/// element despite these problems;
/// such recovery is beyond the scope of this parser.
MissingRequired(S::Token, S),
MissingRequired(S),
/// An attribute was encountered that was not expected by this parser.
///
@ -79,7 +79,7 @@ pub enum AttrParseError<S: AttrParseState> {
impl<S: AttrParseState> Display for AttrParseError<S> {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
match self {
Self::MissingRequired(_, st) => {
Self::MissingRequired(st) => {
let ele_name = st.element_name();
write!(f, "element `{ele_name}` missing required ")?;
@ -105,7 +105,7 @@ impl<S: AttrParseState> Error for AttrParseError<S> {
impl<S: AttrParseState> Diagnostic for AttrParseError<S> {
fn describe(&self) -> Vec<AnnotatedSpan> {
match self {
Self::MissingRequired(_, st) => st
Self::MissingRequired(st) => st
.element_span()
.error(format!(
"missing required {}",
@ -149,10 +149,7 @@ pub trait AttrParseState: ParseState {
/// are missing.
/// The list of missing fields is generated dynamically during
/// diagnostic reporting.
fn finalize_attr(
self,
tok_dead: <Self as ParseState>::Token,
) -> Result<<Self as ParseState>::DeadToken, AttrParseError<Self>>;
fn finalize_attr(self) -> Result<Self::Object, AttrParseError<Self>>;
/// Names of attributes that are required but do not yet have a value.
fn required_missing(&self) -> Vec<QName>;
@ -200,6 +197,8 @@ macro_rules! attr_parse {
$vis struct $state_name {
#[doc(hidden)]
___ctx: (QName, Span),
#[doc(hidden)]
___done: bool,
$(
pub $field: Option<$ty>,
)*
@ -209,6 +208,7 @@ macro_rules! attr_parse {
fn with_element(ele: QName, span: Span) -> Self {
Self {
___ctx: (ele, span),
___done: false,
$(
$field: None,
)*
@ -229,14 +229,12 @@ macro_rules! attr_parse {
fn finalize_attr(
self,
tok_dead: <Self as ParseState>::Token,
) -> Result<<Self as ParseState>::DeadToken, AttrParseError<Self>> {
) -> Result<Self::Object, AttrParseError<Self>> {
// Validate required fields before we start moving data.
$(
attr_parse!(@if_missing_req $($fmod)? self.$field {
return Err(
AttrParseError::MissingRequired(
tok_dead,
self,
)
)
@ -251,7 +249,7 @@ macro_rules! attr_parse {
)*
};
Ok(parse::Aggregate(obj, tok_dead))
Ok(obj)
}
fn required_missing(&self) -> Vec<QName> {
@ -268,6 +266,14 @@ macro_rules! attr_parse {
}
}
impl $state_name {
fn done_with_element(ele: QName, span: Span) -> Self {
let mut new = Self::with_element(ele, span);
new.___done = true;
new
}
}
$(#[$sattr])*
#[doc=""]
#[doc=concat!(
@ -296,9 +302,8 @@ macro_rules! attr_parse {
impl parse::ParseState for $state_name {
type Token = flat::XirfToken;
type Object = ();
type Object = $struct_name;
type Error = AttrParseError<Self>;
type DeadToken = parse::Aggregate<$struct_name, Self::Token>;
fn parse_token(
mut self,
@ -330,13 +335,21 @@ macro_rules! attr_parse {
))
},
// Any tokens received after aggregation is completed
// must not be processed,
// otherwise we'll recurse indefinitely.
tok_dead if self.___done => {
Transition(self).dead(tok_dead)
},
// Aggregation complete (dead state).
tok_dead => {
let (ele, span) = self.___ctx;
self.finalize_attr(tok_dead)
.map(ParseStatus::Dead)
.transition(Self::with_element(ele, span))
self.finalize_attr()
.map(ParseStatus::Object)
.transition(Self::done_with_element(ele, span))
.with_lookahead(tok_dead)
}
}
}
@ -384,7 +397,7 @@ macro_rules! attr_parse {
mod test {
use super::*;
use crate::{
parse::{Aggregate, ParseError, ParseState, Parser, TokenStream},
parse::{ParseError, ParseState, Parsed, Parser, TokenStream},
span::{Span, DUMMY_SPAN},
xir::{
attr::{Attr, AttrSpan},
@ -402,9 +415,9 @@ mod test {
// Random choice of QName for tests.
const QN_ELE: QName = QN_YIELDS;
fn parse_aggregate<S>(
fn parse_aggregate<S: AttrParseState>(
toks: impl TokenStream<S::Token>,
) -> Result<S::DeadToken, ParseError<S::DeadToken, S::Error>>
) -> Result<(S::Object, S::Token), ParseError<S::Token, S::Error>>
where
S: AttrParseState,
S::Context: Default,
@ -415,21 +428,37 @@ mod test {
))
}
fn parse_aggregate_with<S, I>(
fn parse_aggregate_with<S: AttrParseState, I>(
sut: &mut Parser<S, I>,
) -> Result<S::DeadToken, ParseError<S::DeadToken, S::Error>>
) -> Result<(S::Object, S::Token), ParseError<S::Token, S::Error>>
where
S: ParseState,
S::Context: Default,
I: TokenStream<S::Token>,
{
match sut.collect::<Result<Vec<_>, _>>() {
Err(ParseError::UnexpectedToken(agg, _)) => Ok(agg),
Err(other) => Err(other),
unexpected => {
panic!("expected ParseError::UnexpectedToken: {unexpected:?}")
let mut obj = None;
for item in sut {
match item {
Ok(Parsed::Object(result)) => {
obj.replace(result);
}
Ok(Parsed::Incomplete) => continue,
// This represents the dead state,
// since this is the top-level parser.
Err(ParseError::UnexpectedToken(tok, _)) => {
return Ok((
obj.expect(
"parser did not produce aggregate attribute object",
),
tok,
))
}
Err(other) => return Err(other),
}
}
panic!("expected AttrParseState dead state (obj: {obj:?})");
}
#[test]
@ -454,12 +483,12 @@ mod test {
.into_iter();
assert_eq!(
Ok(Aggregate(
Ok((
ReqValues {
name: attr_name,
yields: attr_yields,
},
tok_dead,
tok_dead
)),
parse_aggregate::<ReqValuesState>(toks),
);
@ -490,12 +519,12 @@ mod test {
.into_iter();
assert_eq!(
Ok(Aggregate(
Ok((
ReqValues {
name: attr_name,
yields: attr_yields,
},
tok_dead,
tok_dead
)),
parse_aggregate::<ReqValuesState>(toks),
);
@ -523,12 +552,12 @@ mod test {
.into_iter();
assert_eq!(
Ok(Aggregate(
Ok((
OptValues {
name: Some(attr_name),
yields: Some(attr_yields),
},
tok_dead,
tok_dead
)),
parse_aggregate::<OptValuesState>(toks),
);
@ -552,12 +581,12 @@ mod test {
.into_iter();
assert_eq!(
Ok(Aggregate(
Ok((
OptMissing {
name: None,
yields: None,
},
tok_dead,
tok_dead
)),
parse_aggregate::<OptMissingState>(toks),
);
@ -587,13 +616,13 @@ mod test {
.into_iter();
assert_eq!(
Ok(Aggregate(
Ok((
Mixed {
name: attr_name,
src: Some(attr_src),
yields: None,
},
tok_dead,
tok_dead
)),
parse_aggregate::<MixedState>(toks),
);
@ -643,7 +672,6 @@ mod test {
assert_matches!(
err,
ParseError::StateError(AttrParseError::MissingRequired(
ref given_tok,
ReqMissingState {
name: Some(ref given_name),
src: None, // cause of the error
@ -653,7 +681,6 @@ mod test {
},
)) if given_name == &ATTR_NAME
&& given_yields == &ATTR_YIELDS
&& given_tok == &tok_dead,
);
}
@ -668,10 +695,7 @@ mod test {
partial.name.replace(ATTR_NAME);
partial.yields.replace(ATTR_YIELDS);
// The dead token doesn't matter;
// it needs to be present but is otherwise ignored for this test.
let tok_dead = close_empty(S3, Depth(0));
let err = AttrParseError::MissingRequired(tok_dead, partial);
let err = AttrParseError::MissingRequired(partial);
// When represented as a string,
// the error should produce _all_ required attributes that do not
@ -704,11 +728,7 @@ mod test {
partial.name.replace(ATTR_NAME);
partial.yields.replace(ATTR_YIELDS);
// The dead token doesn't matter;
// it needs to be present but is otherwise ignored for this test.
let tok_dead = close_empty(S3, Depth(0));
let err = AttrParseError::MissingRequired(tok_dead, partial);
let err = AttrParseError::MissingRequired(partial);
let desc = err.describe();
// The diagnostic message should reference the element.
@ -780,12 +800,12 @@ mod test {
// The final result,
// after having failed and recovered.
assert_eq!(
Ok(Aggregate(
Ok((
Unexpected {
name: attr_name,
src: attr_src,
},
tok_dead,
tok_dead
)),
parse_aggregate_with(&mut sut),
);

View File

@ -182,7 +182,7 @@ use crate::{
diagnose::{AnnotatedSpan, Diagnostic},
parse::{
self, EmptyContext, NoContext, ParseError, ParseResult, ParseState,
ParseStatus, ParsedResult, Transition, TransitionResult,
ParsedResult, Transition, TransitionResult,
},
span::Span,
sym::SymbolId,
@ -504,12 +504,11 @@ where
Done,
}
pub trait StackAttrParseState =
ParseState<Token = XirToken, DeadToken = XirToken, Object = Attr>
where
Self: Default,
<Self as ParseState>::Error: Into<StackError>,
EmptyContext: AsMut<<Self as ParseState>::Context>;
pub trait StackAttrParseState = ParseState<Token = XirToken, Object = Attr>
where
Self: Default,
<Self as ParseState>::Error: Into<StackError>,
EmptyContext: AsMut<<Self as ParseState>::Context>;
impl<SA: StackAttrParseState> Default for Stack<SA> {
fn default() -> Self {
@ -551,25 +550,17 @@ impl<SA: StackAttrParseState> ParseState for Stack<SA> {
),
// Attribute parsing.
(AttrState(estack, attrs, sa), tok) => {
use ParseStatus::*;
match sa.parse_token(tok, ctx.as_mut()).into() {
(Transition(sa), Ok(Incomplete)) => {
Transition(AttrState(estack, attrs, sa)).incomplete()
}
(Transition(sa), Ok(Object(attr))) => {
Transition(AttrState(estack, attrs.push(attr), sa))
.incomplete()
}
(_, Ok(Dead(lookahead))) => {
BuddingElement(estack.consume_attrs(attrs))
.parse_token(lookahead, ctx)
}
(Transition(sa), Err(x)) => {
Transition(AttrState(estack, attrs, sa)).err(x.into())
}
}
}
(AttrState(estack, attrs, sa), tok) => sa.delegate_with_obj(
tok,
ctx,
(estack, attrs),
|sa, obj, (estack, attrs)| {
Transition(AttrState(estack, attrs.extend(obj), sa))
},
|(estack, attrs)| {
Transition(BuddingElement(estack.consume_attrs(attrs)))
},
),
(BuddingElement(stack), Token::Close(name, CloseSpan(span, _))) => {
stack