tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
// XIRT attribute parsers
|
|
|
|
|
//
|
2022-05-03 14:14:29 -04:00
|
|
|
|
// Copyright (C) 2014-2022 Ryan Specialty Group, LLC.
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
//
|
|
|
|
|
// This file is part of TAME.
|
|
|
|
|
//
|
|
|
|
|
// This program is free software: you can redistribute it and/or modify
|
|
|
|
|
// it under the terms of the GNU General Public License as published by
|
|
|
|
|
// the Free Software Foundation, either version 3 of the License, or
|
|
|
|
|
// (at your option) any later version.
|
|
|
|
|
//
|
|
|
|
|
// This program is distributed in the hope that it will be useful,
|
|
|
|
|
// but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
|
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
|
// GNU General Public License for more details.
|
|
|
|
|
//
|
|
|
|
|
// You should have received a copy of the GNU General Public License
|
|
|
|
|
// along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
|
|
|
|
|
|
//! Parse XIR attribute [`TokenStream`][super::super::TokenStream]s.
|
|
|
|
|
|
|
|
|
|
use crate::{
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
|
diagnose::{Annotate, AnnotatedSpan, Diagnostic},
|
|
|
|
|
parse::{NoContext, ParseState, Token, Transition, TransitionResult},
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
span::Span,
|
2022-03-18 16:24:53 -04:00
|
|
|
|
xir::{QName, Token as XirToken},
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
};
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
use std::{error::Error, fmt::Display};
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
|
|
|
|
|
use super::Attr;
|
|
|
|
|
|
|
|
|
|
/// Attribute parser DFA.
|
|
|
|
|
///
|
|
|
|
|
/// While this parser does store the most recently encountered [`QName`]
|
|
|
|
|
/// and [`Span`],
|
|
|
|
|
/// these data are used only for emitting data about the accepted state;
|
|
|
|
|
/// they do not influence the automaton's state transitions.
|
|
|
|
|
/// The actual parsing operation is therefore a FSM,
|
|
|
|
|
/// not a PDA.
|
|
|
|
|
#[derive(Debug, Eq, PartialEq)]
|
2021-12-17 10:22:05 -05:00
|
|
|
|
pub enum AttrParseState {
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
Empty,
|
|
|
|
|
Name(QName, Span),
|
|
|
|
|
}
|
|
|
|
|
|
2021-12-17 10:22:05 -05:00
|
|
|
|
impl ParseState for AttrParseState {
|
2022-03-18 15:26:05 -04:00
|
|
|
|
type Token = XirToken;
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
type Object = Attr;
|
|
|
|
|
type Error = AttrParseError;
|
|
|
|
|
|
2022-04-04 21:50:47 -04:00
|
|
|
|
fn parse_token(
|
|
|
|
|
self,
|
|
|
|
|
tok: Self::Token,
|
|
|
|
|
_: NoContext,
|
|
|
|
|
) -> TransitionResult<Self> {
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
use AttrParseState::{Empty, Name};
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
match (self, tok) {
|
2022-03-18 15:26:05 -04:00
|
|
|
|
(Empty, XirToken::AttrName(name, span)) => {
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
Transition(Name(name, span)).incomplete()
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
}
|
|
|
|
|
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
(Empty, invalid) => Transition(Empty).dead(invalid),
|
tamer: xir::tree::attr_parser_from: Integrate AttrParser
This begins to integrate the isolated AttrParser. The next step will be
integrating it into the larger XIRT parser.
There's been considerable delay in getting this committed, because I went
through quite the struggle with myself trying to determine what balance I
want to strike between Rust's type system; convenience with parser
combinators; iterators; and various other abstractions. I ended up being
confounded by trying to maintain the current XmloReader abstraction, which
is fundamentally incompatible with the way the new parsing system
works (streaming iterators that do not collect or perform heap
allocations).
There'll be more information on this to come, but there are certain things
that will be changing.
There are a couple problems highlighted by this commit (not in code, but
conceptually):
1. Introducing Option here for the TokenParserState doesn't feel right, in
the sense that the abstraction is inappropriate. We should perhaps
introduce a new variant Parsed::Done or something to indicate intent,
rather than leaving the reader to have to read about what None actually
means.
2. This turns Parsed into more of a statement influencing control
flow/logic, and so should be encapsulated, with an external equivalent
of Parsed that omits variants that ought to remain encapsulated.
3. TokenStreamState is true, but these really are the actual parsers;
TokenStreamParser is more of a coordinator, and helps to abstract away
some of the common logic so lower-level parsers do not have to worry
about it. But calling it TokenStreamState is both a bit
confusing and is an understatement---it _does_ hold the state, but it
also holds the current parsing stack in its variants.
Another thing that is not yet entirely clear is whether this AttrParser
ought to care about detection of duplicate attributes, or if that should be
done in a separate parser, perhaps even at the XIR level. The same can be
said for checking for balanced tags. By pushing it to TokenStream in XIR,
we would get a guaranteed check regardless of what parsers are used, which
is attractive because it reduces the (almost certain-to-otherwise-occur)
risk that individual parsers will not sufficiently check for semantically
valid XML. But it does _potentially_ match error recovery more
complicated. But at the same time, perhaps more specific parsers ought not
care about recovery at that level.
Anyway, point being, more to come, but I am disappointed how much time I'm
spending considering parsing, given that there are so many things I need to
move onto. I just want this done right and in a way that feels like it's
working well with Rust while it's all in working memory, otherwise it's
going to be a significant effort to get back into.
DEV-11268
2021-12-10 14:13:02 -05:00
|
|
|
|
|
2022-03-18 15:26:05 -04:00
|
|
|
|
(Name(name, nspan), XirToken::AttrValue(value, vspan)) => {
|
2022-03-25 16:45:32 -04:00
|
|
|
|
Transition(Empty).ok(Attr::new(name, value, (nspan, vspan)))
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
(Name(name, nspan), invalid) => {
|
|
|
|
|
// Restore state for error recovery.
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
Transition(Name(name, nspan)).err(
|
|
|
|
|
AttrParseError::AttrValueExpected(name, nspan, invalid),
|
|
|
|
|
)
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
}
|
2021-12-10 16:22:02 -05:00
|
|
|
|
}
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
#[inline]
|
|
|
|
|
fn is_accepting(&self) -> bool {
|
|
|
|
|
*self == Self::Empty
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2021-12-17 10:22:05 -05:00
|
|
|
|
impl Default for AttrParseState {
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
fn default() -> Self {
|
|
|
|
|
Self::Empty
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-05-25 14:20:10 -04:00
|
|
|
|
impl Display for AttrParseState {
|
|
|
|
|
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
|
|
|
|
|
use AttrParseState::*;
|
|
|
|
|
|
|
|
|
|
match self {
|
|
|
|
|
Empty => write!(f, "expecting an attribute"),
|
|
|
|
|
Name(name, _) => {
|
|
|
|
|
write!(f, "expecting an attribute value for {name}")
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
/// Attribute parsing error.
|
tamer: xir::tree: Integrate AttrParserState into Stack
Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too. This commit message is accurate, but confusing.
This performs the long-awaited task of trying to observe, concretely, how to
combine two automata. This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.
The next step will be to abstract this away.
There are some important things to note here. First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token. This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.
The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation. It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context". The "I've done my
job" part is only applicable in an accepting state.
If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.
The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional. Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.
Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one. Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.
All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.
DEV-11268
2021-12-16 09:44:02 -05:00
|
|
|
|
#[derive(Debug, PartialEq, Eq)]
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
pub enum AttrParseError {
|
2022-03-21 13:40:54 -04:00
|
|
|
|
/// [`XirToken::AttrName`] was expected.
|
2022-03-18 15:26:05 -04:00
|
|
|
|
AttrNameExpected(XirToken),
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
|
2022-03-21 13:40:54 -04:00
|
|
|
|
/// [`XirToken::AttrValue`] was expected.
|
2022-03-18 15:26:05 -04:00
|
|
|
|
AttrValueExpected(QName, Span, XirToken),
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
impl Display for AttrParseError {
|
|
|
|
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
|
|
|
|
match self {
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
|
Self::AttrNameExpected(_) => {
|
|
|
|
|
write!(f, "attribute name expected")
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
}
|
|
|
|
|
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
|
Self::AttrValueExpected(name, _span, _tok) => {
|
|
|
|
|
write!(f, "expected value for `@{name}`",)
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
impl Error for AttrParseError {
|
|
|
|
|
fn source(&self) -> Option<&(dyn Error + 'static)> {
|
|
|
|
|
None
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
|
impl Diagnostic for AttrParseError {
|
|
|
|
|
fn describe(&self) -> Vec<AnnotatedSpan> {
|
|
|
|
|
match self {
|
|
|
|
|
Self::AttrNameExpected(tok) => tok.span().mark_error().into(),
|
|
|
|
|
|
|
|
|
|
Self::AttrValueExpected(_name, span, _tok) => {
|
|
|
|
|
span.mark_error().into()
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
#[cfg(test)]
|
|
|
|
|
mod test {
|
|
|
|
|
use super::*;
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
use crate::{
|
|
|
|
|
convert::ExpectInto,
|
2022-04-04 21:50:47 -04:00
|
|
|
|
parse::{EmptyContext, ParseStatus, Parsed},
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
sym::GlobalSymbolIntern,
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
const S: Span = crate::span::DUMMY_SPAN;
|
|
|
|
|
const S2: Span = S.offset_add(1).unwrap();
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
|
|
|
|
|
#[test]
|
tamer: xir::tree: Integrate AttrParserState into Stack
Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too. This commit message is accurate, but confusing.
This performs the long-awaited task of trying to observe, concretely, how to
combine two automata. This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.
The next step will be to abstract this away.
There are some important things to note here. First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token. This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.
The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation. It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context". The "I've done my
job" part is only applicable in an accepting state.
If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.
The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional. Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.
Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one. Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.
All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.
DEV-11268
2021-12-16 09:44:02 -05:00
|
|
|
|
fn dead_if_first_token_is_non_attr() {
|
2022-03-18 15:26:05 -04:00
|
|
|
|
let tok = XirToken::Open("foo".unwrap_into(), S);
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
let sut = AttrParseState::default();
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
|
tamer: xir::tree: Integrate AttrParserState into Stack
Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too. This commit message is accurate, but confusing.
This performs the long-awaited task of trying to observe, concretely, how to
combine two automata. This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.
The next step will be to abstract this away.
There are some important things to note here. First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token. This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.
The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation. It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context". The "I've done my
job" part is only applicable in an accepting state.
If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.
The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional. Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.
Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one. Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.
All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.
DEV-11268
2021-12-16 09:44:02 -05:00
|
|
|
|
// There is no state that we can transition to,
|
|
|
|
|
// and we're in an empty accepting state.
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
assert_eq!(
|
2022-03-25 09:17:25 -04:00
|
|
|
|
(
|
|
|
|
|
// Make sure we're in the same state we started in so that
|
|
|
|
|
// we know we can accommodate recovery token(s).
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
Transition(AttrParseState::default()),
|
2022-03-25 09:17:25 -04:00
|
|
|
|
Ok(ParseStatus::Dead(tok.clone()))
|
|
|
|
|
),
|
2022-04-04 21:50:47 -04:00
|
|
|
|
sut.parse_token(tok, &mut EmptyContext).into()
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
);
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
fn parse_single_attr() {
|
|
|
|
|
let attr = "attr".unwrap_into();
|
|
|
|
|
let val = "val".intern();
|
|
|
|
|
|
2022-03-18 15:26:05 -04:00
|
|
|
|
let toks = [XirToken::AttrName(attr, S), XirToken::AttrValue(val, S2)]
|
|
|
|
|
.into_iter();
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
let sut = AttrParseState::parse(toks);
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
|
|
|
|
|
assert_eq!(
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
Ok(vec![
|
|
|
|
|
Parsed::Incomplete,
|
|
|
|
|
Parsed::Object(Attr::new(attr, val, (S, S2))),
|
|
|
|
|
]),
|
|
|
|
|
sut.collect()
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
|
fn parse_fails_when_attribute_value_missing_but_can_recover() {
|
|
|
|
|
let attr = "bad".unwrap_into();
|
|
|
|
|
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
let sut = AttrParseState::default();
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
|
|
|
|
|
// This token indicates that we're expecting a value to come next in
|
|
|
|
|
// the token stream.
|
2022-03-25 09:56:22 -04:00
|
|
|
|
let TransitionResult(Transition(sut), result) =
|
2022-04-04 21:50:47 -04:00
|
|
|
|
sut.parse_token(XirToken::AttrName(attr, S), &mut EmptyContext);
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
assert_eq!(result, Ok(ParseStatus::Incomplete));
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
|
|
|
|
|
// But we provide something else unexpected.
|
2022-03-25 09:56:22 -04:00
|
|
|
|
let TransitionResult(Transition(sut), result) =
|
2022-04-04 21:50:47 -04:00
|
|
|
|
sut.parse_token(XirToken::Close(None, S2), &mut EmptyContext);
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
assert_eq!(
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
result,
|
2021-12-10 16:22:02 -05:00
|
|
|
|
Err(AttrParseError::AttrValueExpected(
|
2021-12-06 14:48:55 -05:00
|
|
|
|
attr,
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
S,
|
2022-03-18 15:26:05 -04:00
|
|
|
|
XirToken::Close(None, S2)
|
2021-12-10 16:22:02 -05:00
|
|
|
|
))
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
// We should not be in an accepting state,
|
|
|
|
|
// given that we haven't finished parsing the attribute.
|
|
|
|
|
assert!(!sut.is_accepting());
|
|
|
|
|
|
|
|
|
|
// Despite this error,
|
|
|
|
|
// we should remain in a state that permits recovery should a
|
|
|
|
|
// proper token be substituted.
|
|
|
|
|
// Rather than checking for that state,
|
|
|
|
|
// let's actually attempt a recovery.
|
|
|
|
|
let recover = "value".intern();
|
2022-04-04 21:50:47 -04:00
|
|
|
|
let TransitionResult(Transition(sut), result) = sut
|
|
|
|
|
.parse_token(XirToken::AttrValue(recover, S2), &mut EmptyContext);
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
assert_eq!(
|
tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system. I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.
Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser). Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference. Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).
This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together. I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.
DEV-10863
2022-03-17 15:50:35 -04:00
|
|
|
|
result,
|
|
|
|
|
Ok(ParseStatus::Object(Attr::new(attr, recover, (S, S2)))),
|
tamer: xir:tree: Begin work on composable XIRT parser
The XIRT parser was initially written for test cases, so that unit tests
should assert more easily on generated token streams (XIR). While it was
planned, it wasn't clear what the eventual needs would be, which were
expected to differ. Indeed, loading everything into a generic tree
representation in memory is not appropriate---we should prefer streaming and
avoiding heap allocations when they’re not necessary, and we should parse
into an IR rather than a generic format, which ensures that the data follow
a proper grammar and are semantically valid.
When parsing attributes in an isolated context became necessary for the
aforementioned task, the state machine of the XIRT parser was modified to
accommodate. The opposite approach should have been taken---instead of
adding complexity and special cases to the parser, and from a complex parser
extracting a simple one (an attribute parser), we should be composing the
larger (full XIRT) parser from smaller ones (e.g. attribute, child
elements).
A combinator, when used in a functional sense, refers not to combinatory
logic but to the composition of more complex systems from smaller ones. The
changes made as part of this commit begin to work toward combinators, though
it's not necessarily evident yet (to you, the reader) how that'll work,
since the code for it hasn't yet been written; this is commit is simply
getting my work thusfar introduced so I can do some light refactoring before
continuing on it.
TAMER does not aim to introduce a parser combinator framework in its usual
sense---it favors, instead, striking a proper balance with Rust’s type
system that permits the convenience of combinators only in situations where
they are needed, to avoid having to write new parser
boilerplate. Specifically:
1. Rust’s type system should be used as combinators, so that parsers are
automatically constructed from the type definition.
2. Primitive parsers are written as explicit automata, not as primitive
combinators.
3. Parsing should directly produce IRs as a lowering operation below XIRT,
rather than producing XIRT itself. That is, target IRs should consume
XIRT and produce parse themselves immediately, during streaming.
In the future, if more combinators are needed, they will be added; maybe
this will eventually evolve into a more generic parser combinator framework
for TAME, but that is certainly a waste of time right now. And, to be
honest, I’m hoping that won’t be necessary.
2021-12-06 11:26:53 -05:00
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
// Finally, we should now be in an accepting state.
|
|
|
|
|
assert!(sut.is_accepting());
|
|
|
|
|
}
|
|
|
|
|
}
|