tamer: nir::tplshort: Desugar body into @values@

This represents a significant departure from how the XSLT-based TAME handles
the `@values@` param, but it will end up having the same effect.  It builds
upon prior work, utilizing the fact that referencing a template in TAMER
will expand it.

The problem is this: allowing trees in `Meta` would add yet another
container; we have `Pkg` and `Tpl` already.  This was the same problem with
template application---I didn't want to add support for binding arguments
separately, and so re-used templates themselves, reaching the generalization
I just mentioned above.

`Meta` is intended to be a lexical metasyntatic variable.  That keeps its
implementation quite simple.  But if we start allowing trees, that gets
rather complicated really quickly, and starts to require much more complex
AIR parser state.

But we can accomplish the same behavior by desugaring into an existing
container---a template---and placing the body within it.  Then, in the
future, we'll parse `param-copy` into a simple `Air::RefIdent`, which will
expand the closed template and produce the same result as it does today in
the XSLT-based system.

This leaves open issues of closure (variable binding) in complex scenarios,
such as in templates that introduce metavariables to be utilized by the
body.  That's never a practice I liked, but we'll see how things evolve.

Further, this does not yet handle nested template applications.

But this saved me a ton of work.  Desugaring is much simpler.

The question is going to be how the XSLT-based compiler responds to this for
large packages with thousands of template applications.  I'll have to see
if it's worth the hit at that time, or if we should inline it when
generating the `xmli` file, producing the same `@values@` as
before.  But as it stands at this moment, the output is _not_ compatible
with the current compiler, as it expects `@values@` to be a tree, so a
modification would have to be made there.

DEV-13708
main
Mike Gerwitz 2023-03-23 14:40:40 -04:00
parent 120f5bdfef
commit 975f60bff9
5 changed files with 264 additions and 15 deletions

View File

@ -27,13 +27,16 @@
//! <c:sum />
//! </t:foo>
//!
//! <!-- desugars into -->
//! <!-- the above desugars into the below -->
//!
//! <apply-template name="_foo_">
//! <with-param name="@bar@" value="baz" />
//! <with-param name="@values@">
//! <c:sum />
//! </with-param>
//! <with-param name="@values@" value="___dsgr-01___" />
//! </apply-template>
//!
//! <template name="___dsgr-01___">
//! <c:sum />
//! </template>
//! ```
//!
//! The shorthand syntax makes templates look like another language
@ -49,6 +52,16 @@
//! like language primitives.
//! Shorthand form was added well after the long `apply-template` form.
//!
//! The body of a shorthand template becomes the body of a new template,
//! and its id referenced as the lexical value of the param `@values@`.
//! (This poor name is a historical artifact.)
//! Since the template is closed
//! (has no free metavariables),
//! it will be expanded on reference,
//! inlining its body into the reference site.
//! This is a different and generalized approach to the `param-copy`
//! behavior of the XLST-based TAME.
//!
//! This shorthand version does not permit metavariables for template or
//! param names,
//! so the long form is still a useful language feature for more
@ -58,13 +71,19 @@
//! `:src/current/include/preproc/template.xsl`.
//! You may need to consult the Git history if this file is no longer
//! available or if the XSLT template was since removed.
//! The XSLT-based compiler did not produce a separate template for
//! `@values@`.
use arrayvec::ArrayVec;
use super::{Nir, NirEntity};
use crate::{
parse::prelude::*,
sym::{GlobalSymbolIntern, GlobalSymbolResolve},
span::Span,
sym::{
st::raw::L_TPLP_VALUES, GlobalSymbolIntern, GlobalSymbolResolve,
SymbolId,
},
};
use std::convert::Infallible;
@ -77,6 +96,15 @@ pub enum TplShortDesugar {
/// passing tokens along in the meantime.
#[default]
Scanning,
/// A shorthand template application associated with the provided
/// [`Span`] was encountered and shorthand params are being desugared.
DesugaringParams(Span),
/// A child element was encountered while desugaring params,
/// indicating a body of the shorthand application that needs
/// desugaring into `@values@`.
DesugaringBody,
}
impl Display for TplShortDesugar {
@ -85,6 +113,12 @@ impl Display for TplShortDesugar {
Self::Scanning => {
write!(f, "awaiting shorthand template application")
}
Self::DesugaringParams(_) => {
write!(f, "desugaring shorthand template application params")
}
Self::DesugaringBody => {
write!(f, "desugaring shorthand template application body")
}
}
}
}
@ -123,19 +157,66 @@ impl ParseState for TplShortDesugar {
stack.push(Ref(SPair(tpl_name, span)));
Transition(Scanning).ok(Open(TplApply(None), span))
Transition(DesugaringParams(span))
.ok(Open(TplApply(None), span))
}
// Shorthand template params' names do not contain the
// surrounding `@`s.
(Scanning, Open(TplParam(Some((name, val))), span)) => {
(
DesugaringParams(ospan),
Open(TplParam(Some((name, val))), span),
) => {
let pname = format!("@{name}@").intern();
// note: reversed (stack)
stack.push(Close(TplParam(None), span));
stack.push(Text(val));
stack.push(BindIdent(SPair(pname, name.span())));
Transition(DesugaringParams(ospan))
.ok(Open(TplParam(None), span))
}
Transition(Scanning).ok(Open(TplParam(None), span))
// A child element while we're desugaring template params
// means that we have reached the body,
// which is to desugar into `@values@`.
// We generate a name for a new template,
// set `@values@` to the name of the template,
// close our active template application,
// and then place the body into that template.
//
// TODO: This does not handle nested template applications.
(DesugaringParams(ospan), tok @ Open(..)) => {
let gen_name = gen_tpl_name_at_offset(ospan);
// The spans are awkward here because we are streaming,
// and so don't have much choice but to use the opening
// span for everything.
// If this ends up being unhelpful for diagnostics,
// we can have AIR do some adjustment through some
// yet-to-be-defined means.
//
// note: reversed (stack)
stack.push(tok);
stack.push(BindIdent(SPair(gen_name, ospan)));
stack.push(Open(Tpl, ospan));
// Application ends here,
// and the new template (above) will absorb both this
// token `tok` and all tokens that come after.
stack.push(Close(TplApply(None), ospan));
stack.push(Close(TplParam(None), ospan));
stack.push(Text(SPair(gen_name, ospan)));
stack.push(BindIdent(SPair(L_TPLP_VALUES, ospan)));
Transition(DesugaringBody).ok(Open(TplParam(None), ospan))
}
(DesugaringBody, Close(TplApply(_), span)) => {
Transition(Scanning).ok(Close(Tpl, span))
}
(DesugaringParams(_), tok @ Close(TplApply(_), _)) => {
Transition(Scanning).ok(tok)
}
// Any tokens that we don't recognize will be passed on unchanged.
@ -148,7 +229,25 @@ impl ParseState for TplShortDesugar {
}
}
type Stack = ArrayVec<Nir, 3>;
type Stack = ArrayVec<Nir, 7>;
/// Generate a deterministic template identifier name that is unique
/// relative to the offset in the source context (file) of the given
/// [`Span`].
///
/// Hygiene is not a concern since identifiers cannot be redeclared,
/// so conflicts with manually-created identifiers will result in a
/// compilation error
/// (albeit a cryptic one);
/// the hope is that the informally-compiler-reserved `___` convention
/// mitigates that unlikely occurrence.
/// Consequently,
/// we _must_ intern to ensure that error can occur
/// (we cannot use [`GlobalSymbolIntern::clone_uninterned`]).
#[inline]
fn gen_tpl_name_at_offset(span: Span) -> SymbolId {
format!("___dsgr-{:x}___", span.offset()).intern()
}
#[cfg(test)]
mod test;

View File

@ -94,6 +94,71 @@ fn desugars_unary() {
);
}
// Body of shorthand is desugared into `@values@` param.
#[test]
fn desugars_body_into_tpl_with_ref_in_values_param() {
// Shorthand converts `t:tpl-name` into `_tpl-name_`.
let qname = ("t", "short").unwrap_into();
let name = SPair("_short_".into(), S1);
#[rustfmt::skip]
let toks = vec![
// <t:qname>
Open(TplApply(Some(qname)), S1),
// Body to desugar into own template (@values@).
Open(Sum, S2),
Open(Product, S3),
Close(Product, S4),
Close(Sum, S5),
// Body can contain siblings.
Open(Product, S6),
Close(Product, S7),
// </t:qname>
Close(TplApply(None), S8),
];
// The name of the generated template.
// This test is a bit too friendly with implementation details,
// but it does allow us to be perfectly precise in the output
// assertion.
let gen_name = gen_tpl_name_at_offset(S1);
#[rustfmt::skip]
assert_eq!(
Ok(vec![
O(Open(TplApply(None), S1)),
O(Ref(name)),
// @values@ remains lexical by referencing the name of a
// template we're about to generate.
O(Open(TplParam(None), S1)),
O(BindIdent(SPair(L_TPLP_VALUES, S1))),
O(Text(SPair(gen_name, S1))), //:-.
O(Close(TplParam(None), S1)), // |
O(Close(TplApply(None), S1)), // |
// |
// Generate a template to hold the // |
// body of `@values@`. // |
// It is closed and so expandable. // |
O(Open(Tpl, S1)), // /
O(BindIdent(SPair(gen_name, S1))), //<`
// And here we have the body of the above
// shorthand application.
O(Open(Sum, S2)),
O(Open(Product, S3)),
O(Close(Product, S4)),
O(Close(Sum, S5)),
O(Open(Product, S6)),
O(Close(Product, S7)),
O(Close(Tpl, S8)),
]),
Sut::parse(toks.into_iter()).collect(),
);
}
// Don't parse what we desugar into!
#[test]
fn does_not_desugar_long_form() {

View File

@ -716,6 +716,8 @@ pub mod st {
L_YIELD: cid "yield",
L_YIELDS: cid "yields",
L_TPLP_VALUES: str "@values@",
CC_ANY_OF: cid "anyOf",
L_MAP_UUUHEAD: str ":map:___head",

View File

@ -80,5 +80,50 @@
<with-param name="@bar@" value="baz" />
<with-param name="@baz@" value="quux" />
</apply-template>
<template name="_short-hand-nullary-body_" />
<apply-template name="_short-hand-nullary-body_">
<with-param name="@values@" value="___dsgr-c23___" />
</apply-template>
<template name="___dsgr-c23___">
<c:product>
<c:sum />
</c:product>
</template>
<template name="_short-hand-nary-body_" />
<apply-template name="_short-hand-nary-body_">
<with-param name="@bar@" value="baz" />
<with-param name="@baz@" value="quux" />
<with-param name="@values@" value="___dsgr-cc2___" />
</apply-template>
<template name="___dsgr-cc2___">
<c:sum>
<c:product />
</c:sum>
</template>
</package>

View File

@ -80,19 +80,57 @@
<!-- TODO
Shorthand template bodies desugar into the param `@values@`.
Unlike in the XSLT-based TAMER,
metavaraibles (template parameters) are purely lexical,
and do not contain trees,
simplifying its implementation.
Desugaring instead takes advantage of existing features by generating a
_new_ closed template with the body from the shorthand application.
Since closed templates can be applied by referencing them as a value,
which expands them in place,
this ends up having the same effect as a `param-copy`.
For now,
the expected output asserts on this behavior,
but if this has a significantly negative impact on performance of the
XSLT-based compiler,
then it'll have to inline during desugaring.
This asserts verbatim on the output,
which uses a generated id based on the span.
This is fragile,
and it may break often;
just take the hex span from the test failure in that case.
<template name="_short-hand-nullary-body_" />
<t:short-hand-nullary-body>
<c:sum />
<c:product>
<c:sum />
</c:product>
</t:short-hand-nullary-body>
<template name="_short-hand-nary-body_" />
<t:short-hand-nary-body bar="baz" baz="quux">
<c:sum>
<c:product />
</c:sum>
</t:short-hand-nary-body>
<!-- TODO
<t:short-hand-nullary-inner>
<t:inner-short />
</t:short-hand-nullary-inner>
<t:short-hand foo="bar">
<c:sum />
</t:short-hand>
<t:short-hand foo="bar">
<t:inner-short />
</t:short-hand>