tamer: Initial frontend concept

This introduces the beginnings of frontends for TAMER, gated behind a
`wip-features` flag.

This will be introduced in stages:

  1. Replace the existing copy with a parser-based copy (echo back out the
     tokens), when the flag is on.
  2. Begin to parse portions of the source, augmenting the output xmlo (xmli
     at the moment).  The XSLT-based compiler will be modified to skip
     compilation steps as necessary.

As portions of the compilation are implemented in TAMER, they'll be placed
behind their own feature flags and stabalized, which will incrementally
remove the compilation steps from the XSLT-based system.  The result should
be substantial incremental performance improvements.

Short-term, the priorities are for loading identifiers into an IR
are (though the order may change):

  1. Echo
  2. Imports
  3. Extern declarations.
  4. Simple identifiers (e.g. param, const, template, etc).
  5. Classifications.
  6. Documentation expressions.
  7. Calculation expressions.
  8. Template applications.
  9. Template definitions.
  10. Inline templates.

After each of those are done, the resulting xmlo (xmli) will have fully
reconstructed the source document from the IR produced during parsing.
main
Mike Gerwitz 2021-07-23 22:24:08 -04:00
parent 60372d2960
commit fb8422d670
7 changed files with 396 additions and 0 deletions

View File

@ -36,3 +36,17 @@ exitcode = "1.1.2"
lazy_static = ">= 1.4.0"
petgraph-graphml = ">= 2.0.1"
# Feature flags can be specified using `./configure FEATURES=foo,bar,baz`.
#
# Flags beginning with "wip-" are short-lived flags that exist only during
# development of a particular feature; you should not hard-code them
# anywhere, since the build will break once they are removed. Enabling WIP
# flags should also be expected to cause undesirable behavior in some form
# or another. Once WIP features are finalized, they are enabled by default
# and the flag removed.
[features]
# Process source files using available frontends rather than copying
# the files verbatim to XMLI files. This begins the process of moving
# compilation from XSLT into TAMER, and so the XSLT-based compiler must be
# expecting it so that it can skip those compilation steps.
wip-frontends = []

View File

@ -29,6 +29,13 @@ use std::ffi::OsStr;
use std::fs;
use std::path::Path;
#[cfg(feature = "wip-frontends")]
use {
std::io::BufReader,
tamer::frontend::{FrontendParser, XmlFrontendParser},
tamer::fs::File,
};
/// Types of commands
enum Command {
Compile(String, String, String),
@ -50,7 +57,18 @@ pub fn main() -> Result<(), Box<dyn Error>> {
}
let dest = Path::new(&output);
// This will eventually replace `fs::copy` below.
#[cfg(feature = "wip-frontends")]
{
let file: BufReader<fs::File> = File::open(source)?;
let mut parser = XmlFrontendParser::new(file);
parser.parse_next()?;
}
fs::copy(source, dest)?;
Ok(())
}
Ok(Command::Usage) => {

View File

@ -0,0 +1,55 @@
// TAME frontends
//
// Copyright (C) 2014-2021 Ryan Specialty Group, LLC.
//
// This file is part of TAME.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.
//! Frontends for the TAME programming language.
//!
//! A _frontend_ represents a source language.
//! The principal frontend for TAME is the XML-based package specification
//! language ([`XmlFrontendParser`]).
//!
//! Parsing
//! =======
//! [Parsers](parser) for frontends are expected to fulfill three primary
//! roles:
//!
//! 1. Produce a sequence tokens from a source input (see [`Token`]);
//! 2. Perform no implicit copying of source buffer data (zero-copy); and
//! 3. Attempt recovery to continue parsing in the event of an error.
//!
//! Recovery allows the parser to find and report more errors at once,
//! rather than requiring a developer to correct and recompile one error
//! at a time.
//! Recovery further makes parsers suitable for static analysis in
//! situations where correctness is non-critical,
//! such as for linting; checkstyle; and language servers.
//!
//! Parsers are expected to be scannerless
//! (that is, not require a separate scanning/lexing process),
//! or to at least encapsulate lexing.
//!
//! *TODO*: Mention IR and guide reader to the next steps in the pipeline.
mod parser;
mod xml;
pub use parser::{
FrontendError, FrontendEvent, FrontendParser, FrontendResult, Token,
};
pub use xml::XmlFrontendParser;

View File

@ -0,0 +1,190 @@
// TAME frontend parser
//
// Copyright (C) 2014-2021 Ryan Specialty Group, LLC.
//
// This file is part of TAME.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.
//! Recovering, zero-copy, scannerless parsers for TAME frontends.
//!
//! See the [parent module](super) for more information.
use std::{borrow::Cow, fmt::Display, num::NonZeroUsize};
/// Recovering, zero-copy, scannerless parser.
pub trait FrontendParser<'l, T, E> {
/// Human-readable short description of parser.
///
/// TAME consists of a number of source languages, so this should be
/// sufficient to state what parser was chosen for a given source
/// file.
fn desc() -> &'static str;
/// Attempt to parse the next token.
///
/// A [`FrontendEvent::Token`] contains common information about the
/// encountered lexeme and source byte interval,
/// but the token kind is frontend-specific.
///
/// When a parsing error occurs,
/// frontends are encouraged to self-correct if possible.
/// If this is able to happen,
/// [`FrontendEvent::RecoverableError`] will be emitted with zero or
/// more tokens that may be used in place of the erroneous input to
/// possibly continue parsing in a useful way;
/// this can be used for further static analysis or error
/// checking.
///
/// If the end of the file
/// (or end of the parsable region of a file)
/// has been reached,
/// [`FrontendEvent::Eof`] will be emitted,
/// unless a [`FrontendEvent::RecoverableError`] has been previous
/// emitted,
/// in which case [`FrontendError::EofWithRecoverableErrors`]
/// indicates that the caller should take special care in
/// determining whether parsing should be considered be to a
/// failure.
fn parse_next(&mut self) -> FrontendResult<FrontendEvent<'l, T, E>, E>;
}
/// Raw input string associated with a token.
#[derive(Debug, PartialEq, Eq)]
pub struct Lexeme<'a>(Cow<'a, [u8]>);
/// A lexeme combined with a type (kind) and location.
///
/// The `interval` represents the starting and ending offset, inclusive, of
/// the lexeme used to produce this token.
/// The `kind` is the token type,
/// specific to each individual frontend.
///
/// Tokens are intended to be short-lived and lowered into another
/// intermediate representation (IR) for further processing and analysis.
#[derive(Debug, PartialEq, Eq)]
pub struct Token<'l, T> {
/// Token type and associated data.
///
/// The token kind represents the parsed information and should always
/// be used in place of the lexeme (which may not be available),
/// unless referring back to the source input.
kind: T,
/// Raw input from which the token was generated.
///
/// A lexeme may not be available if a token was generated by the
/// compiler in a manner that is not associated with any source
/// input.
///
/// Since frontend parsers are zero-copy by default,
/// a lexeme may be available only immediately after a token is
/// emitted,
/// unless the caller wishes to copy its value.
lexeme: Option<Lexeme<'l>>,
/// Starting and ending offset of the lexeme, inclusive.
///
/// An interval may not be available if a token was generated by the
/// compiler in a manner that is not associated with any source
/// input.
interval: Option<(usize, NonZeroUsize)>,
}
/// Result of attempting to parse input for the next token.
#[derive(Debug, PartialEq)]
pub enum FrontendEvent<'l, T, E> {
/// Successfully parsed token.
Token(Token<'l, T>),
/// An error occurred,
/// but one or more tokens are provided in an attempt to self-correct
/// so parsing may continue.
///
/// The provided interval represents all source bytes consumed for all
/// recovery tokens;
/// parsing will continue at the next byte after the end of that
/// interval.
/// The recovery token may very well be nonsense;
/// the goal is to continue parsing to find more errors,
/// not to infer a correct program.
RecoverableError {
/// Source error.
source: E,
/// Starting and ending offset of all bytes associated with this
/// error, inclusive.
///
/// Note that recovery tokens may not have interval information if
/// their source input is not sensible.
interval: (usize, usize),
/// Zero or more tokens that may be substituted in place of the
/// erroneous input in an attempt to continue parsing.
///
/// These recovery tokens are not guaranteed to be successful,
/// nor can they be used to confidently repair a program with
/// parse errors.
recovery_tokens: Vec<Token<'l, T>>,
},
/// End of the file has been reached with no recoverable errors.
///
/// See also [`FrontendError::EofWithRecoverableErrors`].
Eof,
}
/// Error attempting to parse input for the next token.
#[derive(Debug, PartialEq, Eq)]
pub enum FrontendError<E> {
/// An error occurred during parsing and the parser was either unable to
/// determine how to recover or did not attempt recovery.
UnrecoverableError {
/// Source error.
source: E,
/// Starting and ending byte offsets of source input that produced
/// the error.
interval: (usize, usize),
},
/// EOF reached with recoverable errors.
///
/// This error indicates that the end of the file has been reached,
/// but recoverable errors have been previously omitted,
/// and so parsing should fail.
/// If the caller chooses to ignore this error and accept the recovery
/// tokens,
/// the emitted tokens may not represent a valid program.
/// However,
/// if parsing was performed for syntax checking or static analysis,
/// then this error might be able to be safely ignored.
///
/// See also [`FrontendEvent::Eof`].
EofWithRecoverableErrors,
}
impl<E> Display for FrontendError<E> {
fn fmt(&self, fmt: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(fmt, "TODO fmt")
}
}
impl<E: std::fmt::Debug> std::error::Error for FrontendError<E> {
fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
None
}
}
pub type FrontendResult<T, E> = Result<T, FrontendError<E>>;

View File

@ -0,0 +1,84 @@
// XML frontend
//
// Copyright (C) 2014-2021 Ryan Specialty Group, LLC.
//
// This file is part of TAME.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.
//! XML frontend for the TAME programming language.
use super::{FrontendEvent, FrontendParser, FrontendResult};
use quick_xml::{Error as XmlError, Reader as XmlReader};
use std::fmt::Display;
use std::io::BufRead;
/// Parser for XML-based sources.
pub struct XmlFrontendParser<B>
where
B: BufRead,
{
_reader: XmlReader<B>,
}
impl<B> XmlFrontendParser<B>
where
B: BufRead,
{
pub fn new(buf_read: B) -> Self {
let reader = XmlReader::from_reader(buf_read);
Self { _reader: reader }
}
}
impl<'l, B> FrontendParser<'l, XmlToken, XmlFrontendError>
for XmlFrontendParser<B>
where
B: BufRead,
{
fn desc() -> &'static str {
"XML-based package specification language"
}
fn parse_next(&mut self) -> XmlFrontendResult<XmlFrontendEvent<'l>> {
Ok(FrontendEvent::Eof)
}
}
pub type XmlFrontendEvent<'l> = FrontendEvent<'l, XmlToken, XmlFrontendError>;
pub enum XmlToken {}
#[derive(Debug)]
pub enum XmlFrontendError {
XmlError(XmlError),
}
impl Display for XmlFrontendError {
fn fmt(&self, fmt: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(fmt, "TODO fmt")
}
}
impl std::error::Error for XmlFrontendError {
fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
None
}
}
pub type XmlFrontendResult<T> = FrontendResult<T, XmlFrontendError>;
#[cfg(test)]
mod test;

View File

@ -0,0 +1,32 @@
// Tests for XML frontend
//
// Copyright (C) 2014-2021 Ryan Specialty Group, LLC.
//
// This file is part of TAME.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.
use super::*;
type Sut<B> = XmlFrontendParser<B>;
#[test]
fn emits_eof() {
let stub_data: &[u8] = &[];
let mut sut = Sut::new(stub_data);
let result = sut.parse_next();
assert!(matches!(result, Ok(FrontendEvent::Eof)));
}

View File

@ -30,6 +30,9 @@ extern crate lazy_static;
#[macro_use]
pub mod sym;
#[cfg(feature = "wip-frontends")]
pub mod frontend;
pub mod fs;
pub mod ir;
pub mod ld;