tame/tamer/src/parse/state/transition.rs

452 lines
16 KiB
Rust
Raw Normal View History

// Parsing automaton
//
// Copyright (C) 2014-2022 Ryan Specialty Group, LLC.
//
// This file is part of TAME.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.
//! State transitions for parser automata.
use super::{
ClosedParseState, ParseState, ParseStateResult, ParseStatus, Token,
};
use std::{
convert::Infallible,
hint::unreachable_unchecked,
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
ops::{ControlFlow, FromResidual},
};
#[cfg(doc)]
use super::Parser;
/// A state transition with associated data.
///
/// Conceptually,
/// imagine the act of a state transition producing data.
/// See [`Transition`] for convenience methods for producing this tuple.
///
/// Sometimes a parser is not able to complete the operation requested
/// based on the provided input token.
/// Since TAMER uses a streaming parsing framework that places strict
/// limits on control flow,
/// a single token can be returned as lookahead to indicate that the
/// token could not be parsed yet and should be provided once again
/// in place of the next token from the input stream.
/// This allows,
/// for example,
/// for multiple data to be emitted in response to a single token.
///
tamer: parse::state::ParseState::Super: Superstate concept I'm disappointed that I keep having to implement features that I had hoped to avoid implementing. This introduces a "superstate" feature, which is intended really just to be a sum type that is able to delegate to stitched `ParseState`s. This then allows a `ParseState` to transition directly to another `ParseState` and have the parent `ParseState` handle the delegation---a trampoline. This issue naturally arises out of the recursive nature of parsing a TAME XML document, where certain statements can be nested (like `<section>`), and where expressions can be nested. I had gotten away with composition-based delegation for now because `xmlo` headers do not have such nesting. The composition-based approach falls flat for recursive structures. The typical naive solution is boxing, which I cannot do, because not only is this on an extremely hot code path, but I require that Rust be able to deeply introspect and optimize away the lowering pipeline as much as possible. Many months ago, I figured that such a solution would require a trampoline, as it typically does in stack-based languages, but I was hoping to avoid it. Well, no longer; let's just get on with it. This intends to implement trampolining in a `ParseState` that serves as that sum type, rather than introducing it as yet another feature to `Parser`; the latter would provide a more convenient API, but it would continue to bloat `Parser` itself. Right now, only the element parser generator will require use of this, so if it's needed beyond that, then I'll debate whether it's worth providing a better abstraction. For now, the intent will be to use the `Context` to store a stack that it can pop off of to restore the previous `ParseState` before delegation. DEV-7145
2022-08-03 12:53:50 -04:00
/// If a [`ParseState`] is not a [`ClosedParseState`],
/// the transition will be to its superstate ([`ParseState::Super`]);
/// this conversion is performed automatically by the [`Transition`]
/// methods that produce [`TransitionResult`],
/// (such as [`Transition::ok`]).
///
/// This struct is opaque to ensure that critical invariants involving
/// transitions and lookahead are properly upheld;
/// callers must use the appropriate parsing APIs.
#[derive(Debug, PartialEq)]
pub struct TransitionResult<S: ParseState>(
/// New parser state.
pub(in super::super) Transition<S>,
/// Result of the parsing operation.
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
pub(in super::super) TransitionData<S>,
);
impl<S: ParseState> TransitionResult<S> {
pub fn into_super(self) -> TransitionResult<S::Super> {
match self {
Self(t, data) => {
TransitionResult(t.into_super(), data.into_super())
}
}
}
/// Indicate that this transition include a single token of lookahead,
/// which should be provided back to the parser in place of the
/// next token from the input stream.
pub fn with_lookahead(self, lookahead: S::Token) -> Self {
match self {
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
Self(transition, TransitionData::Result(result, None)) => Self(
transition,
TransitionData::Result(result, Some(Lookahead(lookahead))),
),
// This represents a problem with the parser;
// we should never specify a lookahead token more than once.
// This could be enforced statically with the type system if
// ever such a thing is deemed to be worth doing.
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
Self(
..,
TransitionData::Result(_, Some(prev))
| TransitionData::Dead(prev),
) => {
panic!("internal error: lookahead token overwrite: {prev:?}")
}
}
}
tamer: xir::parse::ele: Initial element parser generator concept This begins generating parsers that are capable of parsing elements. I need to move on, so this abstraction isn't going to go as far as it could, but let's see where it takes me. This was the work that required the recent lookahead changes, which has been detailed in previous commits. This initial support is basic, but robust. It supports parsing elements with attributes and children, but it does not yet support the equivalent of the Kleene star (`*`). Such support will likely be added by supporting parsers that are able to recurse on their own definition in tail position, which will also require supporting parsers that do not add to the stack. This generates parsers that, like all the other parsers, use enums to provide a typed stack. Stitched parsers produce a nested stack that is always bounded in size. Fortunately, expressions---which can nest deeply---do not need to maintain ancestor context on the stack, and so this should work fine; we can get away with this because XIRF ensures proper nesting for us. Statements that _do_ need to maintain such context are not nested. This also does not yet support emitting an object on closing tag, which will be necessary for NIR, which will be a streaming IR that is "near" to the source XML in structure. This will then be used to lower into AIR for the ASG, which gives structure needed for further analysis. More information to come; I just want to get this committed to serve as a mental synchronization point and clear my head, since I've been sitting on these changes for so long and have to keep stashing them as I tumble down rabbit holes covered in yak hair. DEV-7145
2022-07-13 13:55:32 -04:00
/// Possibly indicate that this transition includes a single token of
/// lookahead.
///
/// If the argument is [`None`],
/// this returns `self` unchanged.
///
/// This is useful when working with the output of other parsers.
/// See [`with_lookahead`](TransitionResult::with_lookahead) for more
/// information.
pub(in super::super) fn maybe_with_lookahead(
self,
lookahead: Option<Lookahead<S::Token>>,
) -> Self {
match lookahead {
Some(Lookahead(lookahead)) => self.with_lookahead(lookahead),
None => self,
}
}
}
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
/// Token to use as a lookahead token in place of the next token from the
/// input stream.
#[derive(Debug, PartialEq)]
pub struct Lookahead<T: Token>(pub(in super::super) T);
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
/// Information about the state transition.
///
/// Note: Ideally a state wouldn't even be required for
/// [`Dead`](TransitionData::Dead),
/// but [`ParseState`] does not implement [`Default`] and [`Parser`]
/// requires _some_ state exist.
#[derive(Debug, PartialEq)]
pub(in super::super) enum TransitionData<S: ParseState> {
/// State transition was successful or not attempted,
/// with an optional token of [`Lookahead`].
///
/// Note that a successful state transition _does not_ imply a
/// successful [`ParseStateResult`]---the
/// parser may choose to successfully transition into an error
/// recovery state to accommodate future tokens.
Result(ParseStateResult<S>, Option<Lookahead<S::Token>>),
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
/// No valid state transition exists from the current state for the
/// given input token,
/// which is returned as a token of [`Lookahead`].
///
/// A dead state is an accepting state that has no state transition for
/// the given token.
/// This could simply mean that the parser has completed its job and
/// that control must be returned to a parent context.
/// Note that this differs from an error state,
/// where a parser is unable to reach an accepting state because it
/// received unexpected input.
///
/// Note that the parser may still choose to perform a state transition
/// for the sake of error recovery,
/// but note that the dead state is generally interpreted to mean
/// "I have no further work that I am able to perform"
/// and may lead to finalization of the parser.
/// If a parser intends to do additional work,
/// it should return an error instead via [`TransitionData::Result`].
Dead(Lookahead<S::Token>),
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
}
tamer: parser::Parser: cfg(test) tracing This produces useful parse traces that are output as part of a failing test case. The parser generator macros can be a bit confusing to deal with when things go wrong, so this helps to clarify matters. This is _not_ intended to be machine-readable, but it does show that it would be possible to generate machine-readable output to visualize the entire lowering pipeline. Perhaps something for the future. I left these inline in Parser::feed_tok because they help to elucidate what is going on, just by reading what the trace would output---that is, it helps to make the method more self-documenting, albeit a tad bit more verbose. But with that said, it should probably be extracted at some point; I don't want this to set a precedent where composition is feasible. Here's an example from test cases: [Parser::feed_tok] (input IR: XIRF) | ==> Parser before tok is parsing attributes for `package`. | | Attrs_(SutAttrsState_ { ___ctx: (QName(None, LocalPart(NCName(SymbolId(46 "package")))), OpenSpan(Span { len: 0, offset: 0, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10)), ___done: false }) | | ==> XIRF tok: `<unexpected>` | | Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) | | ==> Parser after tok is expecting opening tag `<classify>`. | | ChildA(Expecting_) | | Lookahead: Some(Lookahead(Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)))) = note: this trace was output as a debugging aid because `cfg(test)`. [Parser::feed_tok] (input IR: XIRF) | ==> Parser before tok is expecting opening tag `<classify>`. | | ChildA(Expecting_) | | ==> XIRF tok: `<unexpected>` | | Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) | | ==> Parser after tok is attempting to recover by ignoring element with unexpected name `unexpected` (expected `classify`). | | ChildA(RecoverEleIgnore_(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))) | | Lookahead: None = note: this trace was output as a debugging aid because `cfg(test)`. DEV-7145
2022-07-18 14:32:34 -04:00
impl<S: ParseState> TransitionData<S> {
pub fn into_super(self) -> TransitionData<S::Super> {
match self {
Self::Result(st_result, ola) => TransitionData::Result(
st_result.map(ParseStatus::into_super).map_err(|e| e.into()),
ola,
),
Self::Dead(la) => TransitionData::Dead(la),
}
}
tamer: parser::Parser: cfg(test) tracing This produces useful parse traces that are output as part of a failing test case. The parser generator macros can be a bit confusing to deal with when things go wrong, so this helps to clarify matters. This is _not_ intended to be machine-readable, but it does show that it would be possible to generate machine-readable output to visualize the entire lowering pipeline. Perhaps something for the future. I left these inline in Parser::feed_tok because they help to elucidate what is going on, just by reading what the trace would output---that is, it helps to make the method more self-documenting, albeit a tad bit more verbose. But with that said, it should probably be extracted at some point; I don't want this to set a precedent where composition is feasible. Here's an example from test cases: [Parser::feed_tok] (input IR: XIRF) | ==> Parser before tok is parsing attributes for `package`. | | Attrs_(SutAttrsState_ { ___ctx: (QName(None, LocalPart(NCName(SymbolId(46 "package")))), OpenSpan(Span { len: 0, offset: 0, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10)), ___done: false }) | | ==> XIRF tok: `<unexpected>` | | Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) | | ==> Parser after tok is expecting opening tag `<classify>`. | | ChildA(Expecting_) | | Lookahead: Some(Lookahead(Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)))) = note: this trace was output as a debugging aid because `cfg(test)`. [Parser::feed_tok] (input IR: XIRF) | ==> Parser before tok is expecting opening tag `<classify>`. | | ChildA(Expecting_) | | ==> XIRF tok: `<unexpected>` | | Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) | | ==> Parser after tok is attempting to recover by ignoring element with unexpected name `unexpected` (expected `classify`). | | ChildA(RecoverEleIgnore_(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))) | | Lookahead: None = note: this trace was output as a debugging aid because `cfg(test)`. DEV-7145
2022-07-18 14:32:34 -04:00
/// Reference to the token of lookahead,
/// if any.
pub(in super::super) fn lookahead_ref(
&self,
) -> Option<&Lookahead<S::Token>> {
match self {
TransitionData::Dead(ref la)
| TransitionData::Result(_, Some(ref la)) => Some(la),
_ => None,
}
}
/// Reference to parsed object,
/// if any.
pub(in super::super) fn object_ref(&self) -> Option<&S::Object> {
match self {
TransitionData::Result(Ok(ParseStatus::Object(obj)), _) => {
Some(obj)
}
_ => None,
}
}
/// Reference to parsing error,
/// if any.
pub(in super::super) fn err_ref(&self) -> Option<&S::Error> {
match self {
TransitionData::Result(Err(e), _) => Some(e),
_ => None,
}
}
/// Map [`TransitionData`] when the inner result is of type
/// [`ParseStatus::Object`].
///
/// This will map over `self` within the context of an inner
/// [`ParseStatus::Object`] and an associated optional token of
/// [`Lookahead`].
/// This allows using objects to influence parser operations more
/// broadly.
///
/// _This method is private to this module because it requires that the
/// caller be diligent in not discarding the provided token of
/// lookahead._
/// Since this token may be stored and later emitted,
/// there is no reliable automated way at present to ensure that this
/// invariant is upheld;
/// such an effort is far beyond the scope of current work at the
/// time of writing.
pub(in super::super) fn map_when_obj<SB: ParseState>(
self,
f: impl FnOnce(S::Object, Option<Lookahead<S::Token>>) -> TransitionData<SB>,
) -> TransitionData<SB>
where
SB: ParseState<Token = S::Token, Error = S::Error>,
{
// Ideally this will be decomposed into finer-grained functions
// (as in a more traditional functional style),
// but such wasn't needed at the time of writing.
// But this is dizzying.
match self {
TransitionData::Result(Ok(ParseStatus::Object(obj)), la) => {
f(obj, la)
}
TransitionData::Result(Ok(ParseStatus::Incomplete), la) => {
TransitionData::Result(Ok(ParseStatus::Incomplete), la)
}
TransitionData::Result(Err(e), la) => {
TransitionData::Result(Err(e), la)
}
TransitionData::Dead(la) => TransitionData::Dead(la),
}
}
tamer: parser::Parser: cfg(test) tracing This produces useful parse traces that are output as part of a failing test case. The parser generator macros can be a bit confusing to deal with when things go wrong, so this helps to clarify matters. This is _not_ intended to be machine-readable, but it does show that it would be possible to generate machine-readable output to visualize the entire lowering pipeline. Perhaps something for the future. I left these inline in Parser::feed_tok because they help to elucidate what is going on, just by reading what the trace would output---that is, it helps to make the method more self-documenting, albeit a tad bit more verbose. But with that said, it should probably be extracted at some point; I don't want this to set a precedent where composition is feasible. Here's an example from test cases: [Parser::feed_tok] (input IR: XIRF) | ==> Parser before tok is parsing attributes for `package`. | | Attrs_(SutAttrsState_ { ___ctx: (QName(None, LocalPart(NCName(SymbolId(46 "package")))), OpenSpan(Span { len: 0, offset: 0, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10)), ___done: false }) | | ==> XIRF tok: `<unexpected>` | | Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) | | ==> Parser after tok is expecting opening tag `<classify>`. | | ChildA(Expecting_) | | Lookahead: Some(Lookahead(Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)))) = note: this trace was output as a debugging aid because `cfg(test)`. [Parser::feed_tok] (input IR: XIRF) | ==> Parser before tok is expecting opening tag `<classify>`. | | ChildA(Expecting_) | | ==> XIRF tok: `<unexpected>` | | Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) | | ==> Parser after tok is attempting to recover by ignoring element with unexpected name `unexpected` (expected `classify`). | | ChildA(RecoverEleIgnore_(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))) | | Lookahead: None = note: this trace was output as a debugging aid because `cfg(test)`. DEV-7145
2022-07-18 14:32:34 -04:00
}
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
/// A verb denoting a state transition.
///
/// This is typically instantiated directly by a [`ParseState`] to perform a
/// state transition in [`ParseState::parse_token`].
///
/// This newtype was created to produce clear, self-documenting code;
/// parsers can get confusing to read with all of the types involved,
/// so this provides a mental synchronization point.
///
/// This also provides some convenience methods to help remove boilerplate
/// and further improve code clarity.
#[derive(Debug, PartialEq, Eq)]
pub struct Transition<S: ParseState>(pub S);
impl<S: ParseState> Transition<S> {
tamer: parse::state::ParseState::Super: Superstate concept I'm disappointed that I keep having to implement features that I had hoped to avoid implementing. This introduces a "superstate" feature, which is intended really just to be a sum type that is able to delegate to stitched `ParseState`s. This then allows a `ParseState` to transition directly to another `ParseState` and have the parent `ParseState` handle the delegation---a trampoline. This issue naturally arises out of the recursive nature of parsing a TAME XML document, where certain statements can be nested (like `<section>`), and where expressions can be nested. I had gotten away with composition-based delegation for now because `xmlo` headers do not have such nesting. The composition-based approach falls flat for recursive structures. The typical naive solution is boxing, which I cannot do, because not only is this on an extremely hot code path, but I require that Rust be able to deeply introspect and optimize away the lowering pipeline as much as possible. Many months ago, I figured that such a solution would require a trampoline, as it typically does in stack-based languages, but I was hoping to avoid it. Well, no longer; let's just get on with it. This intends to implement trampolining in a `ParseState` that serves as that sum type, rather than introducing it as yet another feature to `Parser`; the latter would provide a more convenient API, but it would continue to bloat `Parser` itself. Right now, only the element parser generator will require use of this, so if it's needed beyond that, then I'll debate whether it's worth providing a better abstraction. For now, the intent will be to use the `Context` to store a stack that it can pop off of to restore the previous `ParseState` before delegation. DEV-7145
2022-08-03 12:53:50 -04:00
/// Transform a [`Transition`] into a transition of its superstate
/// [`ParseState::Super`].
///
/// This is needed because trait specialization does not yet have a path
/// to stabilization as of the time of writing,
/// and so `From<Transition<S>> for Transition<S::Super>` cannot be
/// implemented because those types overlap.
pub fn into_super(self) -> Transition<S::Super> {
match self {
Transition(st) => Transition(st.into()),
}
}
/// A state transition with corresponding data.
///
/// This allows [`ParseState::parse_token`] to emit a parsed object and
/// corresponds to [`ParseStatus::Object`].
pub fn ok<T>(self, obj: T) -> TransitionResult<S::Super>
where
T: Into<ParseStatus<S::Super>>,
{
tamer: parse::state::ParseState::Super: Superstate concept I'm disappointed that I keep having to implement features that I had hoped to avoid implementing. This introduces a "superstate" feature, which is intended really just to be a sum type that is able to delegate to stitched `ParseState`s. This then allows a `ParseState` to transition directly to another `ParseState` and have the parent `ParseState` handle the delegation---a trampoline. This issue naturally arises out of the recursive nature of parsing a TAME XML document, where certain statements can be nested (like `<section>`), and where expressions can be nested. I had gotten away with composition-based delegation for now because `xmlo` headers do not have such nesting. The composition-based approach falls flat for recursive structures. The typical naive solution is boxing, which I cannot do, because not only is this on an extremely hot code path, but I require that Rust be able to deeply introspect and optimize away the lowering pipeline as much as possible. Many months ago, I figured that such a solution would require a trampoline, as it typically does in stack-based languages, but I was hoping to avoid it. Well, no longer; let's just get on with it. This intends to implement trampolining in a `ParseState` that serves as that sum type, rather than introducing it as yet another feature to `Parser`; the latter would provide a more convenient API, but it would continue to bloat `Parser` itself. Right now, only the element parser generator will require use of this, so if it's needed beyond that, then I'll debate whether it's worth providing a better abstraction. For now, the intent will be to use the `Context` to store a stack that it can pop off of to restore the previous `ParseState` before delegation. DEV-7145
2022-08-03 12:53:50 -04:00
TransitionResult(
self.into_super(),
TransitionData::Result(Ok(obj.into()), None),
)
}
/// A transition with corresponding error.
///
/// This indicates a parsing failure.
/// The state ought to be suitable for error recovery.
pub fn err<E: Into<S::Error>>(self, err: E) -> TransitionResult<S::Super> {
// The first error conversion is into that expected by S,
// which will _then_ (below) be converted into S::Super
// (if they're not the same).
let err_s: S::Error = err.into();
tamer: parse::state::ParseState::Super: Superstate concept I'm disappointed that I keep having to implement features that I had hoped to avoid implementing. This introduces a "superstate" feature, which is intended really just to be a sum type that is able to delegate to stitched `ParseState`s. This then allows a `ParseState` to transition directly to another `ParseState` and have the parent `ParseState` handle the delegation---a trampoline. This issue naturally arises out of the recursive nature of parsing a TAME XML document, where certain statements can be nested (like `<section>`), and where expressions can be nested. I had gotten away with composition-based delegation for now because `xmlo` headers do not have such nesting. The composition-based approach falls flat for recursive structures. The typical naive solution is boxing, which I cannot do, because not only is this on an extremely hot code path, but I require that Rust be able to deeply introspect and optimize away the lowering pipeline as much as possible. Many months ago, I figured that such a solution would require a trampoline, as it typically does in stack-based languages, but I was hoping to avoid it. Well, no longer; let's just get on with it. This intends to implement trampolining in a `ParseState` that serves as that sum type, rather than introducing it as yet another feature to `Parser`; the latter would provide a more convenient API, but it would continue to bloat `Parser` itself. Right now, only the element parser generator will require use of this, so if it's needed beyond that, then I'll debate whether it's worth providing a better abstraction. For now, the intent will be to use the `Context` to store a stack that it can pop off of to restore the previous `ParseState` before delegation. DEV-7145
2022-08-03 12:53:50 -04:00
TransitionResult(
self.into_super(),
TransitionData::Result(Err(err_s.into()), None),
tamer: parse::state::ParseState::Super: Superstate concept I'm disappointed that I keep having to implement features that I had hoped to avoid implementing. This introduces a "superstate" feature, which is intended really just to be a sum type that is able to delegate to stitched `ParseState`s. This then allows a `ParseState` to transition directly to another `ParseState` and have the parent `ParseState` handle the delegation---a trampoline. This issue naturally arises out of the recursive nature of parsing a TAME XML document, where certain statements can be nested (like `<section>`), and where expressions can be nested. I had gotten away with composition-based delegation for now because `xmlo` headers do not have such nesting. The composition-based approach falls flat for recursive structures. The typical naive solution is boxing, which I cannot do, because not only is this on an extremely hot code path, but I require that Rust be able to deeply introspect and optimize away the lowering pipeline as much as possible. Many months ago, I figured that such a solution would require a trampoline, as it typically does in stack-based languages, but I was hoping to avoid it. Well, no longer; let's just get on with it. This intends to implement trampolining in a `ParseState` that serves as that sum type, rather than introducing it as yet another feature to `Parser`; the latter would provide a more convenient API, but it would continue to bloat `Parser` itself. Right now, only the element parser generator will require use of this, so if it's needed beyond that, then I'll debate whether it's worth providing a better abstraction. For now, the intent will be to use the `Context` to store a stack that it can pop off of to restore the previous `ParseState` before delegation. DEV-7145
2022-08-03 12:53:50 -04:00
)
}
/// A state transition with corresponding [`Result`].
///
/// This translates the provided [`Result`] in a manner equivalent to
/// [`Transition::ok`] and [`Transition::err`].
pub fn result<T, E>(
self,
result: Result<T, E>,
) -> TransitionResult<S::Super>
where
T: Into<ParseStatus<S>>,
E: Into<S::Error>,
{
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
TransitionResult(
tamer: parse::state::ParseState::Super: Superstate concept I'm disappointed that I keep having to implement features that I had hoped to avoid implementing. This introduces a "superstate" feature, which is intended really just to be a sum type that is able to delegate to stitched `ParseState`s. This then allows a `ParseState` to transition directly to another `ParseState` and have the parent `ParseState` handle the delegation---a trampoline. This issue naturally arises out of the recursive nature of parsing a TAME XML document, where certain statements can be nested (like `<section>`), and where expressions can be nested. I had gotten away with composition-based delegation for now because `xmlo` headers do not have such nesting. The composition-based approach falls flat for recursive structures. The typical naive solution is boxing, which I cannot do, because not only is this on an extremely hot code path, but I require that Rust be able to deeply introspect and optimize away the lowering pipeline as much as possible. Many months ago, I figured that such a solution would require a trampoline, as it typically does in stack-based languages, but I was hoping to avoid it. Well, no longer; let's just get on with it. This intends to implement trampolining in a `ParseState` that serves as that sum type, rather than introducing it as yet another feature to `Parser`; the latter would provide a more convenient API, but it would continue to bloat `Parser` itself. Right now, only the element parser generator will require use of this, so if it's needed beyond that, then I'll debate whether it's worth providing a better abstraction. For now, the intent will be to use the `Context` to store a stack that it can pop off of to restore the previous `ParseState` before delegation. DEV-7145
2022-08-03 12:53:50 -04:00
self.into_super(),
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
TransitionData::Result(
result
.map(Into::into)
.map(ParseStatus::into_super)
.map_err(Into::<S::Error>::into)
.map_err(Into::into),
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
None,
),
)
}
/// A state transition indicating that more data is needed before an
/// object can be emitted.
///
/// This corresponds to [`ParseStatus::Incomplete`].
pub fn incomplete(self) -> TransitionResult<S::Super> {
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
TransitionResult(
tamer: parse::state::ParseState::Super: Superstate concept I'm disappointed that I keep having to implement features that I had hoped to avoid implementing. This introduces a "superstate" feature, which is intended really just to be a sum type that is able to delegate to stitched `ParseState`s. This then allows a `ParseState` to transition directly to another `ParseState` and have the parent `ParseState` handle the delegation---a trampoline. This issue naturally arises out of the recursive nature of parsing a TAME XML document, where certain statements can be nested (like `<section>`), and where expressions can be nested. I had gotten away with composition-based delegation for now because `xmlo` headers do not have such nesting. The composition-based approach falls flat for recursive structures. The typical naive solution is boxing, which I cannot do, because not only is this on an extremely hot code path, but I require that Rust be able to deeply introspect and optimize away the lowering pipeline as much as possible. Many months ago, I figured that such a solution would require a trampoline, as it typically does in stack-based languages, but I was hoping to avoid it. Well, no longer; let's just get on with it. This intends to implement trampolining in a `ParseState` that serves as that sum type, rather than introducing it as yet another feature to `Parser`; the latter would provide a more convenient API, but it would continue to bloat `Parser` itself. Right now, only the element parser generator will require use of this, so if it's needed beyond that, then I'll debate whether it's worth providing a better abstraction. For now, the intent will be to use the `Context` to store a stack that it can pop off of to restore the previous `ParseState` before delegation. DEV-7145
2022-08-03 12:53:50 -04:00
self.into_super(),
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
TransitionData::Result(Ok(ParseStatus::Incomplete), None),
)
}
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
/// A state transition could not be performed and parsing will not
/// continue.
///
tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145
2022-07-11 23:49:57 -04:00
/// A dead state represents an _accepting state_ that has no edge to
/// another state for the given `tok`.
/// Rather than throw an error,
/// a parser uses this status to indicate that it has completed
/// parsing and that the token should be utilized elsewhere;
/// the provided token will be used as a token of [`Lookahead`].
///
/// If a parser is not prepared to be finalized and needs to yield an
/// object first,
/// use [`Transition::result`] or other methods along with a token
/// of [`Lookahead`].
pub fn dead(self, tok: S::Token) -> TransitionResult<S::Super> {
tamer: parse::state::ParseState::Super: Superstate concept I'm disappointed that I keep having to implement features that I had hoped to avoid implementing. This introduces a "superstate" feature, which is intended really just to be a sum type that is able to delegate to stitched `ParseState`s. This then allows a `ParseState` to transition directly to another `ParseState` and have the parent `ParseState` handle the delegation---a trampoline. This issue naturally arises out of the recursive nature of parsing a TAME XML document, where certain statements can be nested (like `<section>`), and where expressions can be nested. I had gotten away with composition-based delegation for now because `xmlo` headers do not have such nesting. The composition-based approach falls flat for recursive structures. The typical naive solution is boxing, which I cannot do, because not only is this on an extremely hot code path, but I require that Rust be able to deeply introspect and optimize away the lowering pipeline as much as possible. Many months ago, I figured that such a solution would require a trampoline, as it typically does in stack-based languages, but I was hoping to avoid it. Well, no longer; let's just get on with it. This intends to implement trampolining in a `ParseState` that serves as that sum type, rather than introducing it as yet another feature to `Parser`; the latter would provide a more convenient API, but it would continue to bloat `Parser` itself. Right now, only the element parser generator will require use of this, so if it's needed beyond that, then I'll debate whether it's worth providing a better abstraction. For now, the intent will be to use the `Context` to store a stack that it can pop off of to restore the previous `ParseState` before delegation. DEV-7145
2022-08-03 12:53:50 -04:00
TransitionResult(
self.into_super(),
TransitionData::Dead(Lookahead(tok)),
)
}
}
impl<S: ClosedParseState> FromResidual<(Transition<S>, ParseStateResult<S>)>
for TransitionResult<S>
{
fn from_residual(residual: (Transition<S>, ParseStateResult<S>)) -> Self {
match residual {
(st, result) => Self(st, TransitionData::Result(result, None)),
}
}
}
impl<S: ParseState> FromResidual<Result<Infallible, TransitionResult<S>>>
for TransitionResult<S>
{
fn from_residual(
residual: Result<Infallible, TransitionResult<S>>,
) -> Self {
match residual {
Err(e) => e,
// SAFETY: This match arm doesn't seem to be required in
// core::result::Result's FromResidual implementation,
// but as of 1.61 nightly it is here.
// Since this is Infallable,
// it cannot occur.
Ok(_) => unsafe { unreachable_unchecked() },
}
}
}
impl<S: ParseState> FromResidual<ControlFlow<TransitionResult<S>, Infallible>>
for TransitionResult<S>
{
fn from_residual(
residual: ControlFlow<TransitionResult<S>, Infallible>,
) -> Self {
match residual {
ControlFlow::Break(result) => result,
// SAFETY: Infallible, so cannot hit.
ControlFlow::Continue(_) => unsafe { unreachable_unchecked() },
}
}
}
/// An object able to be used as data for a state [`Transition`].
///
/// This flips the usual order of things:
/// rather than using a method of [`Transition`] to provide data,
/// this starts with the data and produces a transition from it.
/// This is sometimes necessary to satisfy ownership/borrowing rules.
///
/// This trait simply removes boilerplate associated with storing
/// intermediate values and translating into the resulting type.
pub trait Transitionable<S: ParseState> {
/// Perform a state transition to `S` using [`Self`] as the associated
/// data.
///
/// This may be necessary to satisfy ownership/borrowing rules when
/// state data from `S` is used to compute [`Self`].
fn transition(self, to: S) -> TransitionResult<S::Super>;
}
impl<S, E> Transitionable<S> for Result<ParseStatus<S>, E>
where
S: ParseState,
<S as ParseState>::Error: From<E>,
{
fn transition(self, to: S) -> TransitionResult<S::Super> {
Transition(to).result(self)
}
}
impl<S, E> Transitionable<S> for Result<(), E>
where
S: ParseState,
<S as ParseState>::Error: From<E>,
{
fn transition(self, to: S) -> TransitionResult<S::Super> {
Transition(to).result(self.map(|_| ParseStatus::Incomplete))
}
}
impl<S: ParseState> Transitionable<S> for ParseStatus<S> {
fn transition(self, to: S) -> TransitionResult<S::Super> {
Transition(to).ok(self.into_super())
}
}