2022-03-17 12:20:20 -04:00
|
|
|
// XIR flat (XIRF)
|
|
|
|
//
|
2022-05-03 14:14:29 -04:00
|
|
|
// Copyright (C) 2014-2022 Ryan Specialty Group, LLC.
|
2022-03-17 12:20:20 -04:00
|
|
|
//
|
|
|
|
// This file is part of TAME.
|
|
|
|
//
|
|
|
|
// This program is free software: you can redistribute it and/or modify
|
|
|
|
// it under the terms of the GNU General Public License as published by
|
|
|
|
// the Free Software Foundation, either version 3 of the License, or
|
|
|
|
// (at your option) any later version.
|
|
|
|
//
|
|
|
|
// This program is distributed in the hope that it will be useful,
|
|
|
|
// but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
// GNU General Public License for more details.
|
|
|
|
//
|
|
|
|
// You should have received a copy of the GNU General Public License
|
|
|
|
// along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
|
|
|
|
//! Lightly-parsed XIR as a flat stream (XIRF).
|
|
|
|
//!
|
|
|
|
//! XIRF lightly parses a raw XIR [`TokenStream`] into a stream of
|
2022-06-02 13:41:24 -04:00
|
|
|
//! [`XirfToken`]s that are,
|
2022-03-17 12:20:20 -04:00
|
|
|
//! like a [`TokenStream`],
|
|
|
|
//! flat in structure.
|
|
|
|
//! It provides the following features over raw XIR:
|
|
|
|
//!
|
|
|
|
//! 1. All closing tags must correspond to a matching opening tag at the
|
|
|
|
//! same depth;
|
2022-06-02 13:41:24 -04:00
|
|
|
//! 2. [`XirfToken`] exposes the [`Depth`] of each opening/closing tag;
|
2022-03-17 23:22:38 -04:00
|
|
|
//! 3. Attribute tokens are parsed into [`Attr`] objects;
|
|
|
|
//! 4. Documents must begin with an element and end with the closing of
|
|
|
|
//! that element;
|
|
|
|
//! 5. Parsing will fail if input ends before all elements have been
|
2022-03-17 12:20:20 -04:00
|
|
|
//! closed.
|
|
|
|
//!
|
|
|
|
//! XIRF lowering does not perform any dynamic memory allocation;
|
|
|
|
//! maximum element nesting depth is set statically depending on the needs
|
|
|
|
//! of the caller.
|
|
|
|
|
|
|
|
use super::{
|
2022-03-17 16:10:56 -04:00
|
|
|
attr::{Attr, AttrParseError, AttrParseState},
|
tamer: xir: Introduce {Ele,Open,Close}Span
This isn't conceptally all that significant of a change, but there was a lot
of modify to get it working. I would generally separate this into a commit
for the implementation and another commit for the integration, but I decided
to keep things together.
This serves a role similar to AttrSpan---this allows deriving a span
representing the element name from a span representing the entire XIR
token. This will provide more useful context for errors---including the tag
delimiter(s) means that we care about the fact that an element is in that
position (as opposed to some other type of node) within the context of an
error. However, if we are expecting an element but take issue with the
element name itself, we want to place emphasis on that instead.
This also starts to consider the issue of span contexts---a blob of detached
data that is `Span` is useful for error context, but it's not useful for
manipulation or deriving additional information. For that, we need to
encode additional context, and this is an attempt at that.
I am interested in the concept of providing Spans that are guaranteed to
actually make sense---that are instantiated and manipulated with APIs that
ensure consistency. But such a thing buys us very little, practically
speaking, over what I have now for TAMER, and so I don't expect to actually
implement that for this project; I'll leave that for a personal
project. TAMER's already take a lot of my personal interests and it can
cause me a lot of grief sometimes (with regards to letting my aspirations
cause me more work).
DEV-7145
2022-06-24 13:51:49 -04:00
|
|
|
CloseSpan, OpenSpan, QName, Token as XirToken, TokenStream, Whitespace,
|
2022-03-17 12:20:20 -04:00
|
|
|
};
|
2022-03-18 16:24:53 -04:00
|
|
|
use crate::{
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
diagnose::{Annotate, AnnotatedSpan, Diagnostic},
|
2022-03-18 16:24:53 -04:00
|
|
|
parse::{
|
2022-06-13 11:17:21 -04:00
|
|
|
Context, Object, ParseState, ParsedResult, Token, Transition,
|
2022-04-04 21:50:47 -04:00
|
|
|
TransitionResult,
|
2022-03-18 16:24:53 -04:00
|
|
|
},
|
|
|
|
span::Span,
|
|
|
|
sym::SymbolId,
|
tamer: xir: Introduce {Ele,Open,Close}Span
This isn't conceptally all that significant of a change, but there was a lot
of modify to get it working. I would generally separate this into a commit
for the implementation and another commit for the integration, but I decided
to keep things together.
This serves a role similar to AttrSpan---this allows deriving a span
representing the element name from a span representing the entire XIR
token. This will provide more useful context for errors---including the tag
delimiter(s) means that we care about the fact that an element is in that
position (as opposed to some other type of node) within the context of an
error. However, if we are expecting an element but take issue with the
element name itself, we want to place emphasis on that instead.
This also starts to consider the issue of span contexts---a blob of detached
data that is `Span` is useful for error context, but it's not useful for
manipulation or deriving additional information. For that, we need to
encode additional context, and this is an attempt at that.
I am interested in the concept of providing Spans that are guaranteed to
actually make sense---that are instantiated and manipulated with APIs that
ensure consistency. But such a thing buys us very little, practically
speaking, over what I have now for TAMER, and so I don't expect to actually
implement that for this project; I'll leave that for a personal
project. TAMER's already take a lot of my personal interests and it can
cause me a lot of grief sometimes (with regards to letting my aspirations
cause me more work).
DEV-7145
2022-06-24 13:51:49 -04:00
|
|
|
xir::EleSpan,
|
2022-03-18 16:24:53 -04:00
|
|
|
};
|
2022-03-17 12:20:20 -04:00
|
|
|
use arrayvec::ArrayVec;
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
use std::{error::Error, fmt::Display};
|
2022-03-17 12:20:20 -04:00
|
|
|
|
|
|
|
/// Tag nesting depth
|
|
|
|
/// (`0` represents the root).
|
|
|
|
#[derive(Debug, Clone, PartialEq, Eq)]
|
tamer: obj::xmlo::reader: Begin conversion to ParseState
This begins to transition XmloReader into a ParseState. Unlike previous
changes where ParseStates were composed into a single ParseState, this is
instead a lowering operation that will take the output of one Parser and
provide it to another.
The mess in ld::poc (...which still needs to be refactored and removed)
shows the concept, which will be abstracted away. This won't actually get
to the ASG in order to test that that this works with the
wip-xmlo-xir-reader flag on (development hasn't gotten that far yet), but
since it type-checks, it should conceptually work.
Wiring lowering operations together is something that I've been dreading for
months, but my approach of only abstracting after-the-fact has helped to
guide a sane approach for this. For some definition of "sane".
It's also worth noting that AsgBuilder will too become a ParseState
implemented as another lowering operation, so:
XIR -> XIRF -> XMLO -> ASG
These steps will all be streaming, with iteration happening only at the
topmost level. For this reason, it's important that ASG not be responsible
for doing that pull, and further we should propagate Parsed::Incomplete
rather than filtering it out and looping an indeterminate number of times
outside of the toplevel.
One final note: the choice of 64 for the maximum depth is entirely
arbitrary and should be more than generous; it'll be finalized at some point
in the future once I actually evaluate what maximum depth is reasonable
based on how the system is used, with some added growing room.
DEV-10863
2022-03-22 13:56:43 -04:00
|
|
|
pub struct Depth(pub usize);
|
2022-03-17 12:20:20 -04:00
|
|
|
|
|
|
|
impl Display for Depth {
|
|
|
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
|
|
|
Display::fmt(&self.0, f)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/// A lightly-parsed XIRF object.
|
|
|
|
///
|
|
|
|
/// Certain XIR [`Token`]s are formed into a single object,
|
|
|
|
/// such as an [`Attr`].
|
|
|
|
/// Other objects retain the same format as their underlying token,
|
|
|
|
/// but are still validated to ensure that they are well-formed and that
|
|
|
|
/// the XML is well-structured.
|
|
|
|
#[derive(Debug, Clone, PartialEq, Eq)]
|
2022-06-02 13:41:24 -04:00
|
|
|
pub enum XirfToken {
|
2022-03-17 12:20:20 -04:00
|
|
|
/// Opening tag of an element.
|
tamer: xir: Introduce {Ele,Open,Close}Span
This isn't conceptally all that significant of a change, but there was a lot
of modify to get it working. I would generally separate this into a commit
for the implementation and another commit for the integration, but I decided
to keep things together.
This serves a role similar to AttrSpan---this allows deriving a span
representing the element name from a span representing the entire XIR
token. This will provide more useful context for errors---including the tag
delimiter(s) means that we care about the fact that an element is in that
position (as opposed to some other type of node) within the context of an
error. However, if we are expecting an element but take issue with the
element name itself, we want to place emphasis on that instead.
This also starts to consider the issue of span contexts---a blob of detached
data that is `Span` is useful for error context, but it's not useful for
manipulation or deriving additional information. For that, we need to
encode additional context, and this is an attempt at that.
I am interested in the concept of providing Spans that are guaranteed to
actually make sense---that are instantiated and manipulated with APIs that
ensure consistency. But such a thing buys us very little, practically
speaking, over what I have now for TAMER, and so I don't expect to actually
implement that for this project; I'll leave that for a personal
project. TAMER's already take a lot of my personal interests and it can
cause me a lot of grief sometimes (with regards to letting my aspirations
cause me more work).
DEV-7145
2022-06-24 13:51:49 -04:00
|
|
|
Open(QName, OpenSpan, Depth),
|
2022-03-17 12:20:20 -04:00
|
|
|
|
|
|
|
/// Closing tag of an element.
|
|
|
|
///
|
|
|
|
/// If the name is [`None`],
|
|
|
|
/// then the tag is self-closing.
|
|
|
|
/// If the name is [`Some`],
|
|
|
|
/// then the tag is guaranteed to be balanced
|
|
|
|
/// (matching the depth of its opening tag).
|
tamer: xir: Introduce {Ele,Open,Close}Span
This isn't conceptally all that significant of a change, but there was a lot
of modify to get it working. I would generally separate this into a commit
for the implementation and another commit for the integration, but I decided
to keep things together.
This serves a role similar to AttrSpan---this allows deriving a span
representing the element name from a span representing the entire XIR
token. This will provide more useful context for errors---including the tag
delimiter(s) means that we care about the fact that an element is in that
position (as opposed to some other type of node) within the context of an
error. However, if we are expecting an element but take issue with the
element name itself, we want to place emphasis on that instead.
This also starts to consider the issue of span contexts---a blob of detached
data that is `Span` is useful for error context, but it's not useful for
manipulation or deriving additional information. For that, we need to
encode additional context, and this is an attempt at that.
I am interested in the concept of providing Spans that are guaranteed to
actually make sense---that are instantiated and manipulated with APIs that
ensure consistency. But such a thing buys us very little, practically
speaking, over what I have now for TAMER, and so I don't expect to actually
implement that for this project; I'll leave that for a personal
project. TAMER's already take a lot of my personal interests and it can
cause me a lot of grief sometimes (with regards to letting my aspirations
cause me more work).
DEV-7145
2022-06-24 13:51:49 -04:00
|
|
|
Close(Option<QName>, CloseSpan, Depth),
|
2022-03-17 12:20:20 -04:00
|
|
|
|
|
|
|
/// An attribute and its value.
|
|
|
|
///
|
|
|
|
/// The associated [`Span`]s can be found on the enclosed [`Attr`]
|
|
|
|
/// object.
|
|
|
|
Attr(Attr),
|
|
|
|
|
|
|
|
/// Comment node.
|
|
|
|
Comment(SymbolId, Span),
|
|
|
|
|
|
|
|
/// Character data as part of an element.
|
|
|
|
///
|
2022-06-02 13:41:24 -04:00
|
|
|
/// See also [`CData`](XirfToken::CData) variant.
|
2022-03-17 12:20:20 -04:00
|
|
|
Text(SymbolId, Span),
|
|
|
|
|
|
|
|
/// CData node (`<![CDATA[...]]>`).
|
|
|
|
///
|
|
|
|
/// _Warning: It is up to the caller to ensure that the string `]]>` is
|
|
|
|
/// not present in the text!_
|
|
|
|
/// This is intended for reading existing XML data where CData is
|
|
|
|
/// already present,
|
|
|
|
/// not for producing new CData safely!
|
|
|
|
CData(SymbolId, Span),
|
|
|
|
|
|
|
|
/// Similar to `Text`,
|
|
|
|
/// but intended for use where only whitespace is allowed,
|
|
|
|
/// such as alignment of attributes.
|
|
|
|
Whitespace(Whitespace, Span),
|
|
|
|
}
|
|
|
|
|
2022-06-02 13:41:24 -04:00
|
|
|
impl Token for XirfToken {
|
2022-03-21 13:40:54 -04:00
|
|
|
fn span(&self) -> Span {
|
2022-06-02 13:41:24 -04:00
|
|
|
use XirfToken::*;
|
2022-03-21 13:40:54 -04:00
|
|
|
|
|
|
|
match self {
|
tamer: xir: Introduce {Ele,Open,Close}Span
This isn't conceptally all that significant of a change, but there was a lot
of modify to get it working. I would generally separate this into a commit
for the implementation and another commit for the integration, but I decided
to keep things together.
This serves a role similar to AttrSpan---this allows deriving a span
representing the element name from a span representing the entire XIR
token. This will provide more useful context for errors---including the tag
delimiter(s) means that we care about the fact that an element is in that
position (as opposed to some other type of node) within the context of an
error. However, if we are expecting an element but take issue with the
element name itself, we want to place emphasis on that instead.
This also starts to consider the issue of span contexts---a blob of detached
data that is `Span` is useful for error context, but it's not useful for
manipulation or deriving additional information. For that, we need to
encode additional context, and this is an attempt at that.
I am interested in the concept of providing Spans that are guaranteed to
actually make sense---that are instantiated and manipulated with APIs that
ensure consistency. But such a thing buys us very little, practically
speaking, over what I have now for TAMER, and so I don't expect to actually
implement that for this project; I'll leave that for a personal
project. TAMER's already take a lot of my personal interests and it can
cause me a lot of grief sometimes (with regards to letting my aspirations
cause me more work).
DEV-7145
2022-06-24 13:51:49 -04:00
|
|
|
Open(_, OpenSpan(span, _), _)
|
|
|
|
| Close(_, CloseSpan(span, _), _)
|
2022-03-21 13:40:54 -04:00
|
|
|
| Comment(_, span)
|
|
|
|
| Text(_, span)
|
|
|
|
| CData(_, span)
|
|
|
|
| Whitespace(_, span) => *span,
|
|
|
|
|
|
|
|
Attr(attr) => attr.span(),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-06-13 11:17:21 -04:00
|
|
|
impl Object for XirfToken {}
|
2022-03-25 16:45:32 -04:00
|
|
|
|
2022-06-02 13:41:24 -04:00
|
|
|
impl Display for XirfToken {
|
2022-03-21 13:40:54 -04:00
|
|
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
2022-06-02 13:41:24 -04:00
|
|
|
use XirfToken::*;
|
2022-03-21 13:40:54 -04:00
|
|
|
|
|
|
|
match self {
|
|
|
|
Open(qname, span, _) => {
|
|
|
|
Display::fmt(&XirToken::Open(*qname, *span), f)
|
|
|
|
}
|
|
|
|
Close(oqname, span, _) => {
|
|
|
|
Display::fmt(&XirToken::Close(*oqname, *span), f)
|
|
|
|
}
|
|
|
|
Attr(attr) => Display::fmt(&attr, f),
|
|
|
|
Comment(sym, span) => {
|
|
|
|
Display::fmt(&XirToken::Comment(*sym, *span), f)
|
|
|
|
}
|
|
|
|
Text(sym, span) => Display::fmt(&XirToken::Text(*sym, *span), f),
|
|
|
|
CData(sym, span) => Display::fmt(&XirToken::CData(*sym, *span), f),
|
|
|
|
Whitespace(ws, span) => {
|
|
|
|
Display::fmt(&XirToken::Whitespace(*ws, *span), f)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-06-02 13:41:24 -04:00
|
|
|
impl From<Attr> for XirfToken {
|
2022-03-29 14:18:08 -04:00
|
|
|
fn from(attr: Attr) -> Self {
|
|
|
|
Self::Attr(attr)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-03-17 12:20:20 -04:00
|
|
|
/// XIRF-compatible attribute parser.
|
2022-04-04 21:50:47 -04:00
|
|
|
pub trait FlatAttrParseState<const MAX_DEPTH: usize> =
|
2022-06-07 09:21:53 -04:00
|
|
|
ParseState<Token = XirToken, DeadToken = XirToken, Object = Attr>
|
2022-04-04 21:50:47 -04:00
|
|
|
where
|
2022-06-07 15:02:41 -04:00
|
|
|
Self: Default,
|
2022-06-02 13:41:24 -04:00
|
|
|
<Self as ParseState>::Error: Into<XirToXirfError>,
|
2022-04-04 21:50:47 -04:00
|
|
|
StateContext<MAX_DEPTH>: AsMut<<Self as ParseState>::Context>;
|
2022-03-17 12:20:20 -04:00
|
|
|
|
|
|
|
/// Stack of element [`QName`] and [`Span`] pairs,
|
|
|
|
/// representing the current level of nesting.
|
|
|
|
///
|
|
|
|
/// This storage is statically allocated,
|
|
|
|
/// allowing XIRF's parser to avoid memory allocation entirely.
|
|
|
|
type ElementStack<const MAX_DEPTH: usize> = ArrayVec<(QName, Span), MAX_DEPTH>;
|
|
|
|
|
2022-03-17 23:22:38 -04:00
|
|
|
/// XIRF document parser state.
|
2022-03-17 12:20:20 -04:00
|
|
|
///
|
2022-03-17 23:22:38 -04:00
|
|
|
/// This parser is a pushdown automaton that parses a single XML document.
|
|
|
|
#[derive(Debug, Default, PartialEq, Eq)]
|
2022-06-02 13:41:24 -04:00
|
|
|
pub enum XirToXirf<const MAX_DEPTH: usize, SA = AttrParseState>
|
2022-03-17 12:20:20 -04:00
|
|
|
where
|
2022-04-04 21:50:47 -04:00
|
|
|
SA: FlatAttrParseState<MAX_DEPTH>,
|
2022-03-17 12:20:20 -04:00
|
|
|
{
|
2022-03-17 23:22:38 -04:00
|
|
|
/// Document parsing has not yet begun.
|
|
|
|
#[default]
|
|
|
|
PreRoot,
|
2022-03-17 12:20:20 -04:00
|
|
|
/// Parsing nodes.
|
2022-04-04 21:50:47 -04:00
|
|
|
NodeExpected,
|
2022-03-17 12:20:20 -04:00
|
|
|
/// Delegating to attribute parser.
|
2022-04-04 21:50:47 -04:00
|
|
|
AttrExpected(SA),
|
2022-03-17 23:22:38 -04:00
|
|
|
/// End of document has been reached.
|
|
|
|
Done,
|
2022-03-17 12:20:20 -04:00
|
|
|
}
|
|
|
|
|
2022-04-04 21:50:47 -04:00
|
|
|
pub type StateContext<const MAX_DEPTH: usize> =
|
|
|
|
Context<ElementStack<MAX_DEPTH>>;
|
|
|
|
|
2022-06-02 13:41:24 -04:00
|
|
|
impl<const MAX_DEPTH: usize, SA> ParseState for XirToXirf<MAX_DEPTH, SA>
|
2022-03-17 12:20:20 -04:00
|
|
|
where
|
2022-04-04 21:50:47 -04:00
|
|
|
SA: FlatAttrParseState<MAX_DEPTH>,
|
2022-03-17 12:20:20 -04:00
|
|
|
{
|
2022-03-18 15:26:05 -04:00
|
|
|
type Token = XirToken;
|
2022-06-02 13:41:24 -04:00
|
|
|
type Object = XirfToken;
|
|
|
|
type Error = XirToXirfError;
|
2022-04-04 21:50:47 -04:00
|
|
|
type Context = StateContext<MAX_DEPTH>;
|
2022-03-17 12:20:20 -04:00
|
|
|
|
2022-04-04 21:50:47 -04:00
|
|
|
fn parse_token(
|
|
|
|
self,
|
|
|
|
tok: Self::Token,
|
|
|
|
stack: &mut Self::Context,
|
|
|
|
) -> TransitionResult<Self> {
|
2022-06-02 13:41:24 -04:00
|
|
|
use XirToXirf::{AttrExpected, Done, NodeExpected, PreRoot};
|
2022-03-17 12:20:20 -04:00
|
|
|
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
match (self, tok) {
|
2022-03-17 23:22:38 -04:00
|
|
|
// Comments are permitted before and after the first root element.
|
2022-03-21 13:40:54 -04:00
|
|
|
(st @ (PreRoot | Done), XirToken::Comment(sym, span)) => {
|
2022-06-02 13:41:24 -04:00
|
|
|
Transition(st).ok(XirfToken::Comment(sym, span))
|
2022-03-17 23:22:38 -04:00
|
|
|
}
|
|
|
|
|
2022-04-04 21:50:47 -04:00
|
|
|
(PreRoot, tok @ XirToken::Open(..)) => Self::parse_node(tok, stack),
|
2022-03-17 23:22:38 -04:00
|
|
|
|
|
|
|
(PreRoot, tok) => {
|
2022-06-02 13:41:24 -04:00
|
|
|
Transition(PreRoot).err(XirToXirfError::RootOpenExpected(tok))
|
2022-03-17 23:22:38 -04:00
|
|
|
}
|
|
|
|
|
2022-04-04 21:50:47 -04:00
|
|
|
(NodeExpected, tok) => Self::parse_node(tok, stack),
|
2022-03-17 12:20:20 -04:00
|
|
|
|
2022-04-04 21:50:47 -04:00
|
|
|
(AttrExpected(sa), tok) => {
|
|
|
|
let (_sa, lookahead, stack) =
|
|
|
|
sa.delegate_lookahead(stack, tok, AttrExpected)?;
|
|
|
|
|
|
|
|
Self::parse_node(lookahead, stack)
|
|
|
|
}
|
2022-03-17 23:22:38 -04:00
|
|
|
|
|
|
|
(Done, tok) => Transition(Done).dead(tok),
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
}
|
2022-03-17 12:20:20 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
/// Whether all elements have been closed.
|
|
|
|
///
|
|
|
|
/// Parsing will fail if there are any open elements.
|
|
|
|
/// Intuitively,
|
|
|
|
/// this means that the parser must have encountered the closing tag
|
|
|
|
/// for the root element.
|
|
|
|
fn is_accepting(&self) -> bool {
|
|
|
|
// TODO: It'd be nice if we could also return additional context to
|
|
|
|
// aid the user in diagnosing the problem,
|
|
|
|
// e.g. what element(s) still need closing.
|
2022-06-02 13:41:24 -04:00
|
|
|
*self == XirToXirf::Done
|
2022-03-17 12:20:20 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-06-02 13:41:24 -04:00
|
|
|
impl<const MAX_DEPTH: usize, SA> Display for XirToXirf<MAX_DEPTH, SA>
|
2022-05-25 14:20:10 -04:00
|
|
|
where
|
|
|
|
SA: FlatAttrParseState<MAX_DEPTH>,
|
|
|
|
{
|
|
|
|
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
|
2022-06-02 13:41:24 -04:00
|
|
|
use XirToXirf::*;
|
2022-05-25 14:20:10 -04:00
|
|
|
|
|
|
|
match self {
|
|
|
|
PreRoot => write!(f, "expecting document root"),
|
|
|
|
NodeExpected => write!(f, "expecting a node"),
|
|
|
|
AttrExpected(sa) => Display::fmt(sa, f),
|
|
|
|
Done => write!(f, "done parsing"),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-06-02 13:41:24 -04:00
|
|
|
impl<const MAX_DEPTH: usize, SA> XirToXirf<MAX_DEPTH, SA>
|
2022-03-17 12:20:20 -04:00
|
|
|
where
|
2022-04-04 21:50:47 -04:00
|
|
|
SA: FlatAttrParseState<MAX_DEPTH>,
|
2022-03-17 12:20:20 -04:00
|
|
|
{
|
|
|
|
/// Parse a token while in a state expecting a node.
|
|
|
|
fn parse_node(
|
2022-03-21 13:40:54 -04:00
|
|
|
tok: <Self as ParseState>::Token,
|
2022-04-04 21:50:47 -04:00
|
|
|
stack: &mut ElementStack<MAX_DEPTH>,
|
2022-03-17 16:30:35 -04:00
|
|
|
) -> TransitionResult<Self> {
|
2022-06-02 13:41:24 -04:00
|
|
|
use XirToXirf::{AttrExpected, Done, NodeExpected};
|
|
|
|
use XirfToken::*;
|
2022-03-17 12:20:20 -04:00
|
|
|
|
|
|
|
match tok {
|
2022-03-21 13:40:54 -04:00
|
|
|
XirToken::Open(qname, span) if stack.len() == MAX_DEPTH => {
|
2022-06-02 13:41:24 -04:00
|
|
|
Transition(NodeExpected).err(XirToXirfError::MaxDepthExceeded {
|
tamer: xir: Introduce {Ele,Open,Close}Span
This isn't conceptally all that significant of a change, but there was a lot
of modify to get it working. I would generally separate this into a commit
for the implementation and another commit for the integration, but I decided
to keep things together.
This serves a role similar to AttrSpan---this allows deriving a span
representing the element name from a span representing the entire XIR
token. This will provide more useful context for errors---including the tag
delimiter(s) means that we care about the fact that an element is in that
position (as opposed to some other type of node) within the context of an
error. However, if we are expecting an element but take issue with the
element name itself, we want to place emphasis on that instead.
This also starts to consider the issue of span contexts---a blob of detached
data that is `Span` is useful for error context, but it's not useful for
manipulation or deriving additional information. For that, we need to
encode additional context, and this is an attempt at that.
I am interested in the concept of providing Spans that are guaranteed to
actually make sense---that are instantiated and manipulated with APIs that
ensure consistency. But such a thing buys us very little, practically
speaking, over what I have now for TAMER, and so I don't expect to actually
implement that for this project; I'll leave that for a personal
project. TAMER's already take a lot of my personal interests and it can
cause me a lot of grief sometimes (with regards to letting my aspirations
cause me more work).
DEV-7145
2022-06-24 13:51:49 -04:00
|
|
|
open: (qname, span.tag_span()),
|
2022-04-04 21:50:47 -04:00
|
|
|
max: Depth(MAX_DEPTH),
|
|
|
|
})
|
2022-03-21 13:40:54 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
XirToken::Open(qname, span) => {
|
2022-03-17 12:20:20 -04:00
|
|
|
let depth = stack.len();
|
tamer: xir: Introduce {Ele,Open,Close}Span
This isn't conceptally all that significant of a change, but there was a lot
of modify to get it working. I would generally separate this into a commit
for the implementation and another commit for the integration, but I decided
to keep things together.
This serves a role similar to AttrSpan---this allows deriving a span
representing the element name from a span representing the entire XIR
token. This will provide more useful context for errors---including the tag
delimiter(s) means that we care about the fact that an element is in that
position (as opposed to some other type of node) within the context of an
error. However, if we are expecting an element but take issue with the
element name itself, we want to place emphasis on that instead.
This also starts to consider the issue of span contexts---a blob of detached
data that is `Span` is useful for error context, but it's not useful for
manipulation or deriving additional information. For that, we need to
encode additional context, and this is an attempt at that.
I am interested in the concept of providing Spans that are guaranteed to
actually make sense---that are instantiated and manipulated with APIs that
ensure consistency. But such a thing buys us very little, practically
speaking, over what I have now for TAMER, and so I don't expect to actually
implement that for this project; I'll leave that for a personal
project. TAMER's already take a lot of my personal interests and it can
cause me a lot of grief sometimes (with regards to letting my aspirations
cause me more work).
DEV-7145
2022-06-24 13:51:49 -04:00
|
|
|
stack.push((qname, span.tag_span()));
|
2022-03-17 12:20:20 -04:00
|
|
|
|
|
|
|
// Delegate to the attribute parser until it is complete.
|
2022-04-04 21:50:47 -04:00
|
|
|
Transition(AttrExpected(SA::default())).ok(Open(
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
qname,
|
|
|
|
span,
|
|
|
|
Depth(depth),
|
|
|
|
))
|
2022-03-17 12:20:20 -04:00
|
|
|
}
|
|
|
|
|
2022-03-21 13:40:54 -04:00
|
|
|
XirToken::Close(close_oqname, close_span) => {
|
2022-03-17 12:20:20 -04:00
|
|
|
match (close_oqname, stack.pop()) {
|
2022-03-17 23:22:38 -04:00
|
|
|
(_, None) => unreachable!("parser should be in Done state"),
|
2022-03-17 12:20:20 -04:00
|
|
|
|
|
|
|
(Some(qname), Some((open_qname, open_span)))
|
|
|
|
if qname != open_qname =>
|
|
|
|
{
|
2022-04-04 21:50:47 -04:00
|
|
|
Transition(NodeExpected).err(
|
2022-06-02 13:41:24 -04:00
|
|
|
XirToXirfError::UnbalancedTag {
|
2022-03-17 12:20:20 -04:00
|
|
|
open: (open_qname, open_span),
|
tamer: xir: Introduce {Ele,Open,Close}Span
This isn't conceptally all that significant of a change, but there was a lot
of modify to get it working. I would generally separate this into a commit
for the implementation and another commit for the integration, but I decided
to keep things together.
This serves a role similar to AttrSpan---this allows deriving a span
representing the element name from a span representing the entire XIR
token. This will provide more useful context for errors---including the tag
delimiter(s) means that we care about the fact that an element is in that
position (as opposed to some other type of node) within the context of an
error. However, if we are expecting an element but take issue with the
element name itself, we want to place emphasis on that instead.
This also starts to consider the issue of span contexts---a blob of detached
data that is `Span` is useful for error context, but it's not useful for
manipulation or deriving additional information. For that, we need to
encode additional context, and this is an attempt at that.
I am interested in the concept of providing Spans that are guaranteed to
actually make sense---that are instantiated and manipulated with APIs that
ensure consistency. But such a thing buys us very little, practically
speaking, over what I have now for TAMER, and so I don't expect to actually
implement that for this project; I'll leave that for a personal
project. TAMER's already take a lot of my personal interests and it can
cause me a lot of grief sometimes (with regards to letting my aspirations
cause me more work).
DEV-7145
2022-06-24 13:51:49 -04:00
|
|
|
close: (qname, close_span.tag_span()),
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
},
|
2022-03-17 12:20:20 -04:00
|
|
|
)
|
|
|
|
}
|
|
|
|
|
2022-03-17 23:22:38 -04:00
|
|
|
// Final closing tag (for root node) completes the document.
|
2022-03-25 16:45:32 -04:00
|
|
|
(..) if stack.len() == 0 => Transition(Done).ok(Close(
|
2022-03-17 23:22:38 -04:00
|
|
|
close_oqname,
|
|
|
|
close_span,
|
|
|
|
Depth(0),
|
|
|
|
)),
|
|
|
|
|
2022-03-17 12:20:20 -04:00
|
|
|
(..) => {
|
|
|
|
let depth = stack.len();
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
2022-04-04 21:50:47 -04:00
|
|
|
Transition(NodeExpected).ok(Close(
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
close_oqname,
|
|
|
|
close_span,
|
|
|
|
Depth(depth),
|
|
|
|
))
|
2022-03-17 12:20:20 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-03-21 13:40:54 -04:00
|
|
|
XirToken::Comment(sym, span) => {
|
2022-04-04 21:50:47 -04:00
|
|
|
Transition(NodeExpected).ok(Comment(sym, span))
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
}
|
2022-03-21 13:40:54 -04:00
|
|
|
XirToken::Text(sym, span) => {
|
2022-04-04 21:50:47 -04:00
|
|
|
Transition(NodeExpected).ok(Text(sym, span))
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
}
|
2022-03-21 13:40:54 -04:00
|
|
|
XirToken::CData(sym, span) => {
|
2022-04-04 21:50:47 -04:00
|
|
|
Transition(NodeExpected).ok(CData(sym, span))
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
}
|
2022-03-21 13:40:54 -04:00
|
|
|
XirToken::Whitespace(ws, span) => {
|
2022-04-04 21:50:47 -04:00
|
|
|
Transition(NodeExpected).ok(Whitespace(ws, span))
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
}
|
2022-03-17 12:20:20 -04:00
|
|
|
|
|
|
|
// We should transition to `State::Attr` before encountering any
|
|
|
|
// of these tokens.
|
2022-03-21 13:40:54 -04:00
|
|
|
XirToken::AttrName(..)
|
|
|
|
| XirToken::AttrValue(..)
|
tamer: xir: Initial re-introduction of AttrEnd
AttrEnd was initially removed in
0cc0bc9d5a92e666e4ec8319f6bd29c35cc331a8 (and the commit prior), because
there was not a compelling reason to use it over a lookahead
operation (returning a token via the a dead state transition); `AttrEnd`
simply introduced inconsistencies between the XIR reader (which produced
AttrEnd) and internal XIR stream generators (e.g. the lowering operations
into XIR->XML, which do not).
But now that parsers are performing aggregation---in particular the
attribute parser-generator `xir::parse::attr`---this has become quite a
pain, because the dead state is an actionable token. For example:
1. Open
2. Attr
3. Attr
4. Open
5. ...
In the happy case, token #4 results in `Parsed::Incomplete`, and so can just
be transformed into the object representing the aggregated attributes. But
even in this happy path, it's ugly, and it requires non-tail recursion on
the parser which requires a duplicate stack allocation for the
`ParserState`. That violates a core principle of the system.
But if there is an error at #4---e.g. an unexpected element---then we no
longer have a `Parsed::Incomplete` to hijack for our own uses, and we'd have
to introduce the ability to return both an error and a token, or we'd have
to introduce the ability to keep a token of lookahead instead of reading
from the underlying token stream, but that's complicated with push parsers,
which are used for parser composition. Yikes.
And furthermore, the aggregation has caused me to introduce the ability to
override the dead state type to introduce both a token of lookahead and
aggregation information. This complicates the system and is going to be
confusing to others.
Given all of this, AttrEnd does now seem appropriate to reintroduce, since
it will allow processing of aggregate operations when encountering that
token without having to worry about the above scenario; without having to
duplicate a `ParseState` stack; without having to hijack dead state
transitions for producing our aggregate object; and everything else
mentioned above.
This commit does not modify those abstractions to use AttrEnd yet; it
re-introduces the token to the core system, not the parser-generators, and
it doesn't yet replace lookahead operations in the parsers that use
them. That'll come next. Unlike the commit that removed it, though, we are
now generating proper spans, so make note of that here. This also does not
introduce the concept to XIRF yet, which did not exist at the time that it
was removed, so XIRF is filtering it out until a following commit.
DEV-7145
2022-06-28 16:10:57 -04:00
|
|
|
| XirToken::AttrValueFragment(..)
|
|
|
|
| XirToken::AttrEnd(..) => {
|
2022-03-17 12:20:20 -04:00
|
|
|
unreachable!("attribute token in NodeExpected state: {tok:?}")
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Produce a streaming parser lowering a XIR [`TokenStream`] into a XIRF
|
|
|
|
/// stream.
|
|
|
|
pub fn parse<const MAX_DEPTH: usize>(
|
|
|
|
toks: impl TokenStream,
|
2022-06-02 13:41:24 -04:00
|
|
|
) -> impl Iterator<Item = ParsedResult<XirToXirf<MAX_DEPTH>>> {
|
|
|
|
XirToXirf::<MAX_DEPTH>::parse(toks)
|
2022-03-17 12:20:20 -04:00
|
|
|
}
|
|
|
|
|
2022-06-02 13:41:24 -04:00
|
|
|
/// Parsing error from [`XirToXirf`].
|
2022-03-17 12:20:20 -04:00
|
|
|
#[derive(Debug, Eq, PartialEq)]
|
2022-06-02 13:41:24 -04:00
|
|
|
pub enum XirToXirfError {
|
2022-03-17 23:22:38 -04:00
|
|
|
/// Opening root element tag was expected.
|
2022-03-21 13:40:54 -04:00
|
|
|
RootOpenExpected(XirToken),
|
2022-03-17 23:22:38 -04:00
|
|
|
|
2022-03-17 12:20:20 -04:00
|
|
|
/// Opening tag exceeds the maximum nesting depth for this parser.
|
|
|
|
MaxDepthExceeded { open: (QName, Span), max: Depth },
|
|
|
|
|
|
|
|
/// The closing tag does not match the opening tag at the same level of
|
|
|
|
/// nesting.
|
|
|
|
UnbalancedTag {
|
|
|
|
open: (QName, Span),
|
|
|
|
close: (QName, Span),
|
|
|
|
},
|
|
|
|
|
|
|
|
/// Error from the attribute parser.
|
|
|
|
AttrError(AttrParseError),
|
|
|
|
}
|
|
|
|
|
2022-06-02 13:41:24 -04:00
|
|
|
impl Display for XirToXirfError {
|
2022-03-17 12:20:20 -04:00
|
|
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
2022-06-02 13:41:24 -04:00
|
|
|
use XirToXirfError::*;
|
2022-03-17 12:20:20 -04:00
|
|
|
|
|
|
|
match self {
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
RootOpenExpected(_tok) => {
|
|
|
|
write!(f, "missing opening root element",)
|
2022-03-17 23:22:38 -04:00
|
|
|
}
|
|
|
|
|
2022-03-17 12:20:20 -04:00
|
|
|
MaxDepthExceeded {
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
open: (_name, _),
|
2022-03-17 12:20:20 -04:00
|
|
|
max,
|
|
|
|
} => {
|
|
|
|
write!(
|
|
|
|
f,
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
"maximum XML element nesting depth of `{max}` exceeded"
|
2022-03-17 12:20:20 -04:00
|
|
|
)
|
|
|
|
}
|
|
|
|
|
|
|
|
UnbalancedTag {
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
open: (open_name, _),
|
|
|
|
close: (_close_name, _),
|
2022-03-17 12:20:20 -04:00
|
|
|
} => {
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
write!(f, "expected closing tag for `{open_name}`")
|
2022-03-17 12:20:20 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
AttrError(e) => Display::fmt(e, f),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-06-02 13:41:24 -04:00
|
|
|
impl Error for XirToXirfError {
|
2022-03-17 12:20:20 -04:00
|
|
|
fn source(&self) -> Option<&(dyn Error + 'static)> {
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
match self {
|
|
|
|
Self::AttrError(e) => Some(e),
|
|
|
|
_ => None,
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-06-02 13:41:24 -04:00
|
|
|
impl Diagnostic for XirToXirfError {
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
fn describe(&self) -> Vec<AnnotatedSpan> {
|
2022-06-02 13:41:24 -04:00
|
|
|
use XirToXirfError::*;
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
|
|
|
|
match self {
|
|
|
|
RootOpenExpected(tok) => {
|
|
|
|
// TODO: Should the span be the first byte,
|
|
|
|
// or should we delegate that question to an e.g. `SpanLike`?
|
|
|
|
tok.span()
|
|
|
|
.error("an opening root node was expected here")
|
|
|
|
.into()
|
|
|
|
}
|
|
|
|
|
|
|
|
MaxDepthExceeded {
|
|
|
|
open: (_, span),
|
|
|
|
max,
|
|
|
|
} => span
|
|
|
|
.error(format!(
|
|
|
|
"this opening tag increases the level of nesting \
|
|
|
|
past the limit of {max}"
|
|
|
|
))
|
|
|
|
.into(),
|
|
|
|
|
|
|
|
UnbalancedTag {
|
|
|
|
open: (open_name, open_span),
|
2022-04-28 15:47:34 -04:00
|
|
|
close: (_close_name, close_span),
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
} => {
|
|
|
|
// TODO: hint saying that the nesting could be wrong, etc;
|
|
|
|
// we can't just suggest a replacement,
|
|
|
|
// since that's not necessarily the problem
|
|
|
|
vec![
|
|
|
|
open_span
|
|
|
|
.note(format!("element `{open_name}` is opened here")),
|
2022-04-28 15:47:34 -04:00
|
|
|
// No need to state the close name since the source line
|
|
|
|
// will be highlighted by the diagnostic message.
|
|
|
|
close_span.error(format!("expected `</{open_name}>`")),
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
]
|
|
|
|
}
|
|
|
|
|
|
|
|
AttrError(e) => e.describe(),
|
|
|
|
}
|
2022-03-17 12:20:20 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-06-02 13:41:24 -04:00
|
|
|
impl From<AttrParseError> for XirToXirfError {
|
2022-03-17 12:20:20 -04:00
|
|
|
fn from(e: AttrParseError) -> Self {
|
|
|
|
Self::AttrError(e)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
#[cfg(test)]
|
tamer: xir: Introduce {Ele,Open,Close}Span
This isn't conceptally all that significant of a change, but there was a lot
of modify to get it working. I would generally separate this into a commit
for the implementation and another commit for the integration, but I decided
to keep things together.
This serves a role similar to AttrSpan---this allows deriving a span
representing the element name from a span representing the entire XIR
token. This will provide more useful context for errors---including the tag
delimiter(s) means that we care about the fact that an element is in that
position (as opposed to some other type of node) within the context of an
error. However, if we are expecting an element but take issue with the
element name itself, we want to place emphasis on that instead.
This also starts to consider the issue of span contexts---a blob of detached
data that is `Span` is useful for error context, but it's not useful for
manipulation or deriving additional information. For that, we need to
encode additional context, and this is an attempt at that.
I am interested in the concept of providing Spans that are guaranteed to
actually make sense---that are instantiated and manipulated with APIs that
ensure consistency. But such a thing buys us very little, practically
speaking, over what I have now for TAMER, and so I don't expect to actually
implement that for this project; I'll leave that for a personal
project. TAMER's already take a lot of my personal interests and it can
cause me a lot of grief sometimes (with regards to letting my aspirations
cause me more work).
DEV-7145
2022-06-24 13:51:49 -04:00
|
|
|
pub mod test;
|