employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	22a9596cf4	tamer: xir::parse::ele: Hoist whitespace/comment handling to superstate All child parsers do the same thing, so this simplifies things. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	f8a9e952e5	tamer: xir::parse::ele: Correct handling of sum dead state post-recovery Along with this change we also had to change how we handle dead states in the superstate. So there were two problems here: 1. Sum states were not yielding a dead state after recovery, which meant that parsing was unable to continue (we still have a `todo!`); and 2. The superstate considered it an error when there was nothing left on the stack, because I assumed that ought not happen. Regarding #2---it _shouldn't_ happen, _unless_ we have extra input after we have completed parsing. Which happens to be the case for this test case, but more importantly, we shouldn't be panicing with errors about TAMER bugs if somebody puts extra input after a closing root tag in a source file. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	b95ec5a9d8	tamer: xir::parse::ele: Adjust diagnostic display of expected element list This does two things: 1. Places the expected list on a separate help line as a footnote where it'll be a bit more tolerable when it inevitably overflows the terminal width in certain contexts (we may wrap in the future); and 2. Removes angled brackets from the element names so that they (a) better correspond with the span which highlights only the element name and (b) do not imply that the elements take no attributes. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	67ee914505	tamer: xir::parse::ele: Store matching QName on NS match When we match a QName against a namespace, we ought to store the matching QName to use (a) in error messages and (b) to make available as a binding. The former is necessary for sensible errors (rather than saying that it's e.g. expecting a closing `t:*`) and the latter is necessary for e.g. getting the template name out of `t:foo`. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	8cb03d8d16	tamer: xir::parse::ele: Initial namespace prefix matching support This allows matching on a namespace prefix by providing a `Prefix` instead of a `QName`. This works, but is missing a couple notable things (and possibly more): 1. Tracking the QName that is _actually_ matched so that it can be used in messages stating what the expected closing tag is; and 2. Making that QName available via a binding. This will be used to match on `t:*` in NIR. If you're wondering how attribute parsing is supposed to work with that (of course you're wondering that, random person reading this)---that'll have to work differently for those matches, since template shorthand application contains argument names as attributes. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	f9fe4aa13b	tamer: xir::st: Static namespace prefixes (c and t) In particular, `t:*` will be recognized by NIR for short-hand template application. These will be utilized in an upcoming commit. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	88fa0688fa	tamer: xir::parse::ele: Abstract node matching This introduces `NodeMatcher`, with the intent of introducing wildcard QName matches for e.g. `t:*` nodes. It's not yet clear if I'll expand this to support text nodes yet, or if I'll convert text nodes into elements to re-use the existing system (which I had initially planned on doing, but didn't because of the work and expense (token expansion) involved in the conversion). DEV-7145	2022-08-12 00:47:13 -04:00
Mike Gerwitz	7b9bc9e108	tamer: xir::parse::ele: Ignore Text nodes for now I need to move on, and there are (a) a couple different ways to proceed that I want to mull over and (b) upcoming changes that may influence my decision one way or another. DEV-7145	2022-08-12 00:47:12 -04:00
Mike Gerwitz	4aaf91a9e7	tamer: xir::parse::ele: Un-nest child parser errors This will utilize the superstate's error object in place of nested errors, which was the result of the previous composition-based delegation. As you can see, all we had to do was remove the special handling of these errors; the existing delegation setup continues to handle the types properly with no change. The composition continues to work for `*Attr_`. The alternative was to box inner errors, since they're far from the hot code path, but that's clearly unnecessary. To be clear: this is necessary to allow for recursive grammars in `ele_parse` without creating recursive data structures in Rust. DEV-7145	2022-08-10 11:46:54 -04:00
Mike Gerwitz	adf7baf115	tamer: xir::parse::ele: Handle comments like whitespace Comments ought not have any more semantic meaning than whitespace. Other languages may have conventions that allow for various types of things in comments, like annotations, but those are symptoms of language limitations---we control the source language here. DEV-7145	2022-08-10 11:46:54 -04:00
Mike Gerwitz	15e04d63e2	tamer: xir::parse::ele: Transition trampoline This properly integrates the trampoline into `ele_parse!`. The implementation leaves some TODOs, most notably broken mixed text handling since we can no longer intercept those tokens before passing to the child. That is temporarily marked as incomplete; see a future commit. The introduced test `ParseState`s were to help me reason about the system intuitively as I struggled to track down some type errors in the monstrosity that is `ele_parse!`. It will fail to compile if those invariants are violated. (In the end, the problems were pretty simple to resolve, and the struggle was the type system doing its job in telling me that I needed to step back and try to reason about the problem again until it was intuitive.) This keeps around the NT states for now, which are quickly used to transition to the next NT state, like a couple of bounces on a trampoline: NT -> Dead -> Parent -> Next NT This could be optimized in the future, if it's worth doing. This also makes no attempt to implement tail calls; that would have to come after fixing mixed content and really isn't worth the added complexity now. I (desperately) need to move on, and still have a bunch of cleanup to do. I had hoped for a smaller commit, but that was too difficult to do with all the types involved. DEV-7145	2022-08-10 11:46:45 -04:00
Mike Gerwitz	233fa7de6a	tamer: diagnose::panic: New module This change introduces diagnostic messages for panics. The intent is to be able to use panics in situations where it is either not possible to or not worth the time to recover from errors and ensure a consistent/sensible system state. In those situations, we still ought to be able to provide the user with useful information to attempt to get unstuck, since the error is surely in response to some particular input, and maybe that input can be tweaked to work around the problem. Ideally, invalid states are avoided using the type system and statically verified at compile-time. But this is not always possible, or in some cases may be way more effort or cause way more code complexity than is worth, given the unliklihood of the error occurring. With that said, it's been interesting, over the past >10y that TAME has existed, seeing how unlikely errors do sometimes pop up many years after they were written. It's also interesting to have my intuition of what is "unlikely" challenged, but hopefully it holds generally. DEV-7145	2022-08-09 15:20:37 -04:00
Mike Gerwitz	454b7a163f	tamer: xir::parse::ele: Move repeat configuration out of Context I had previously used `Context` to hold the parser configuration for repetition, since that was the easier option. But I now want to utilize the `Context` for a stack for the superstate trampoline, and I don't want to have to deal with the awkwardness of the repetition in doing so, since it requires that the configuration be created during delegation, rather than just being passed through to all child parsers. This adds to a mess that needs cleaning up, but I'll do that after everything is working. DEV-7145	2022-08-08 15:23:55 -04:00
Mike Gerwitz	6bc872eb38	tamer: xir::parse::ele: Generate superstate And here's the thing that I've been dreading, partly because of the `macro_rules` issues involved. But, it's not too terrible. This module was already large and complex, and this just adds to it---it's in need of refactoring, but I want to be sure it's fully working and capable of handling NIR before I go spending time refactoring only to undo it. _This does not yet use trampolining in place of the call stack._ That'll come next; I just wanted to get the macro updated, the superstate generated, and tests passing. This does convert into the superstate (`ParseState::Super`), but then converts back to the original `ParseState` for BC with the existing composition-based delegation. That will go away and will then use the equivalent of CPS, using the superstate+`Parser` as a trampoline. This will require an explicit stack via `Context`, like XIRF. And it will allow for tail calls, with respect to parser delegation, if I decide it's worth doing. The root problem is that source XML requires recursive parsing (for expressions and statements like `<section>`), which results in recursive data structures (`ParseState` enum variants). Resolving this with boxing is not appropriate, because that puts heap indirection in an extremely hot code path, and may also inhibit the aggressive optimizations that I need Rust to perform to optimize away the majority of the lowering pipeline. Once this is sorted out, this should be the last big thing for the parser. This unfortunately has been a nagging and looming issue for months, that I was hoping to avoid, and in retrospect that was naive. DEV-7145	2022-08-08 15:23:55 -04:00
Mike Gerwitz	53a689741b	tamer: parse::state::ParseState::Super: Superstate concept I'm disappointed that I keep having to implement features that I had hoped to avoid implementing. This introduces a "superstate" feature, which is intended really just to be a sum type that is able to delegate to stitched `ParseState`s. This then allows a `ParseState` to transition directly to another `ParseState` and have the parent `ParseState` handle the delegation---a trampoline. This issue naturally arises out of the recursive nature of parsing a TAME XML document, where certain statements can be nested (like `<section>`), and where expressions can be nested. I had gotten away with composition-based delegation for now because `xmlo` headers do not have such nesting. The composition-based approach falls flat for recursive structures. The typical naive solution is boxing, which I cannot do, because not only is this on an extremely hot code path, but I require that Rust be able to deeply introspect and optimize away the lowering pipeline as much as possible. Many months ago, I figured that such a solution would require a trampoline, as it typically does in stack-based languages, but I was hoping to avoid it. Well, no longer; let's just get on with it. This intends to implement trampolining in a `ParseState` that serves as that sum type, rather than introducing it as yet another feature to `Parser`; the latter would provide a more convenient API, but it would continue to bloat `Parser` itself. Right now, only the element parser generator will require use of this, so if it's needed beyond that, then I'll debate whether it's worth providing a better abstraction. For now, the intent will be to use the `Context` to store a stack that it can pop off of to restore the previous `ParseState` before delegation. DEV-7145	2022-08-08 15:23:54 -04:00
Mike Gerwitz	7a5f731cac	tamer: tameld: XIRF nesting 64=>4 Since we'll never be reading past the header, this is all that is needed. If in the future this is violated, XIRF will cause a nice diagnostic error displaying precisely what opening tag caused the increased level of nesting, which will aid in debugging and allow us to determine if it ought to be increased. Here's an example, if I set the max to `3`: error: maximum XML element nesting depth of `3` exceeded --> /home/.../foo.xmlo:261:10 \| 261 \| <preproc:sym-ref name=":_vproduct:vector_a"/> \| ^^^^^^^^^^^^^^^^ error: this opening tag increases the level of nesting past the limit of 3 Of course, the longer-term goal is to do away with `xmlo` entirely. This had no (perceivable via `/usr/bin/time -v`, at least) impact on memory or CPU time. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	77efefe680	tamer: xir::attr::parse: Better parser state descriptions The attribute name was neither quoted nor `@`-prefixed. (I noticed this in the traces.) DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	2d117a4864	tamer: xir::parse::ele: Mixed content parsing "Mixed content" is the XML term representing element nodes mixed with text nodes. For example, `foo <strong>bar</strong> baz` is mixed. TAME supports text nodes as documentation, intended to be in a literate style but never fully realized. In any case, we need to permit them, and I wanted to do more than just ignore the nodes. This takes a different approach than typical parser delegation---it has the parent parser _preempt_ the child by intercepting text before delegation takes place, rather than having the child reject the token (or possibly interpret it itself!) and have to handle an error or dead state. And while this makes it more confusing in terms of state machine stitching, it does make sense, in the sense that the parent parser is really what "owns" the text node---the parser is delegating _element_ parsing only, take asserts authority when necessary to take back control where it shouldn't be delegated. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	8779abe2bb	tamer: xir::flat: Expose depth for all node-related tokens Previously a `Depth` was provided only for `Open` and `Close`. This depth information, for example, will be used by NIR to quickly determine whether a given parser ought to assert ownership of a text/comment token rather than delegating it. This involved modifying a number of test cases, but it's worth repeating in these commits that this is intentional---I've been bit in the past using `..` in contexts where I really do want to know if variant fields change so that I can consider whether and how that change may affect the code utilizing that variant. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	b3c0bdc786	tamer: xir::parse::ele: Ignore whitespace around elements Recent changes regarding whitespace were all to support this change (though it was also needed for XIRF, pre- and post-root). Now I'll have to conted with how I want to handle text nodes in various circumstances, in terms of `ele_parse!`. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	8f3301431c	tamer: span::dummy: New module to hold DUMMY_SPAN and derivatives Various DUMMY_SPAN-derived spans are used by many test cases, so this finally extracts them---something I've been meaning to do for some time. This also places DUMMY_SPAN behind a `cfg(test)` directive to ensure that it is _only_ used in tests; UNKNOWN_SPAN should be used when a span is actually unknown, which may also be the case during development. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	0edb21429d	tamer: parse::error: Describe unexpected token of input When Parser has a unhandled dead state and fails due to an unexpected token of input, we should display what we interpreted that token as. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	18803ea576	tamer: xir: Format tokens without tt quotes Whether or not quoting is appropriate depends on context, and that parent context is already performing the quoting. For example: error: expected `</rater>`, but found `<import>` --> /home/[...]/foo.xml:2:1 \| 2 \| <rater xmlns="http://www.lovullo.com/rater" \| ------ note: element starts here --> /home/[...]/foo.xml:7:3 \| 7 \| <import package="/rater/core/base" /> \| ^^^^^^^ error: expected `</rater>` In these cases (obviously I'm still working on the parser, since this is nonsense), the parser is responsible for quoting the token "<import>". DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	8778976018	tamer: xir::flat: Ignore whitespace both before and after root DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	4f2b27f944	tamer: xir: Attribute error formatting/typo fixes There were two problem errors: one showing "element element" and one showing the value along with the name of the attribute. The change for `<Attr as Display>::fmt` is debatable. I'm going to do this for now (only show `@name`) and adjust later if necessary. I'll need to go use `crate::fmt` consistently in previously-existing format strings at some point, too. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	41b41e02c1	tamer: Xirf::Text refinement This teaches XIRF to optionally refine Text into RefinedText, which determines whether the given SymbolId represents entirely whitespace. This is something I've been putting off for some time, but now that I'm parsing source language for NIR, it is necessary, in that we can only permit whitespace Text nodes in certain contexts. The idea is to capture the most common whitespace as preinterned symbols. Note that this heuristic ought to be determined from scanning a codebase, which I haven't done yet; this is just an initial list. The fallback is to look up the string associated with the SymbolId and perform a linear scan, aborting on the first non-whitespace character. This combination of checks should be sufficiently performant for now considering that this is only being run on source files, which really are not all that large. (They become large when template-expanded.) I'll optimize further if I notice it show up during profiling. This also frees XIR itself from being concerned by Whitespace. Initially I had used quick-xml's whitespace trimming, but it messed up my span calculations, and those were a pain in the ass to implement to begin with, since I had to resort to pointer arithmetic. I'd rather avoid tweaking it. tameld will not check for whitespace, since it's not important---xmlo files, if malformed, are the fault of the compiler; we can ignore text nodes except in the context of code fragments, where they are never whitespace (unless that's also a compiler bug). Onward and yonward. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	b38c16fd08	tamer: parse::trace: Generalize reason for trace output The trace outputs a note in the footer indicating _why_ it's being output, so that the reader understands both where the potentially-unexpected behavior originates from and so they know (in the case of the feature flag) how to inhibit it. That information originally lived in `Parser`, where the `cfg` directive to enable it lives, but it was moved into the abstraction. This corrects that. DEV-7145	2022-08-01 15:01:12 -04:00
Mike Gerwitz	17327f1b64	tamer: parse::trace: Extract tracing into new module This has gotten large and was cluttering `feed_tok`. This also provides the ability to more easily expand into other types of tracing in the future. DEV-7145	2022-07-26 09:29:17 -04:00
Mike Gerwitz	8f25c9ae0a	tamer: parse::parser: Include object in parser trace This information is likely redundant in a lowering pipeline, but is more useful outside of such a pipeline. It's also more clear. `Object` does not implement `Display`, though, because that's too burdensome for how it's currently used. Many `Object`s are also `Token`s though and, if fed to another `Parser` for lowering, it'll get `Display::fmt`'d. DEV-7145	2022-07-26 09:28:39 -04:00
Mike Gerwitz	4b5e51b0f0	tamer: parse::parser::Parser::feed_tok: cfg note precedence Rust was warning that `cfg` was unused if both `test` and `parser-trace-stderr`. This both allows that and adjusts the precedence to make more sense for tests. DEV-7145	2022-07-26 09:28:39 -04:00
Mike Gerwitz	c3dfcc565c	tamer: parse::parser::Parser: Include errors in parse trace Because of recovery, the trace otherwise paints a really confusing-looking picture when given unexpected input. This is large enough now that it really ought to be extracted from `feed_tok`, but I'll wait to see how this evolves further. I considered adding color too, but it's not yet clear to me that the visual noise will be all that helpful. DEV-7145	2022-07-26 09:28:37 -04:00
Mike Gerwitz	422f3d9c0c	tamer: New parser-trace-stderr feature flag This flag allows toggling the parser trace that was previously only available to tests. Unfortunately, at the time of writing, Cargo cannot enable flags in profiles, so I have to check for either `test` or this flag being set to enable relevant features. This trace is useful as I start to run the parser against existing code written in TAME so that our existing systems can help to guide my development. Unlike the current tests, it also allows seeing real-world data as part of the lowering pipeline, where multiple `Parser`s are in play. Having this feature flag also makes this feature more easily discoverable to those wishing to observe how the lowering pipeline works. DEV-7145	2022-07-21 22:10:08 -04:00
Mike Gerwitz	de35cc37fd	tamer: xir::writer::XmlWriter: Do not take Token ownership impl for `&Token` instead of Token; the writer is just copying data into the destination stream anyway. This will allow us to continue writing the token while also using it for further processing, like `tee`. DEV-7145	2022-07-21 15:29:55 -04:00
Mike Gerwitz	0504788a16	tamer: xir::parse::ele: Visibility specifier We need to be able to export generated identifiers. Trying to figure out a syntax for this was a bit tricky considering how much is generated, so I just settled on something that's reasonably clear and easy to parse with `macro_rules!`. I had intended to just make everything public by default and encapsulate using private modules, but that then required making everything else that it uses public (e.g. error and token objects), which would have been a bizarre thing to do in e.g. test cases. DEV-7145	2022-07-21 14:56:43 -04:00
Mike Gerwitz	acced76788	tamer: xir::parse::ele: Expand types for external expansion for sum NT Like a previous commit, this corrects the types for sum NTs so that they properly resolve in contexts external to xir::parse. DEV-7145	2022-07-21 13:44:30 -04:00
Mike Gerwitz	992c000b68	tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError This integrates the previous ValueError for `attr_parse!` into `ele_parse!`. DEV-7145	2022-07-21 09:23:34 -04:00
Mike Gerwitz	3a764d111e	tamer: xir::parse::attr: Fallible value parsing Values can be parsed using `TryFrom<Attr>`. Previously only `From<Attr>` was supported, which could not fail. This is critical for parsing values into types, which will wrap `SymbolId` to provide data assurances. DEV-7145	2022-07-21 09:23:11 -04:00
Mike Gerwitz	184ff6bdcc	tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module The tests had certain things in scope, but now that I'm trying to use it outside of those modules, some fixes are needed. This is admittedly a sloppy commit, with a number of miscellaneous fixes. I didn't bother separating it more because most of them are type fixes, and the `From<Attr>` stuff is going to have to change into, likely, `TryFrom<Attr>` so that parse failures can occur when attributes do not match certain patterns. DEV-7145	2022-07-20 15:40:28 -04:00
Mike Gerwitz	e517e15a29	tamer: parse::Token: Swap trait method order This just places `ir_name` first in the trait definition so that it'll be inserted in that same order when using LSP. DEV-7145	2022-07-20 13:58:44 -04:00
Mike Gerwitz	c856fd72d9	tamer: xir::parse::ele: Diagnostic output The only additional information needed was opening spans so that we can provide useful information regarding closing tags. This uses a generic Span in place of {Open,Close}Span because the latter wasn't necessary, but more descriptive types would be nice; it may be beneficial later on to introduce newtypes for each of the span generated by {Open,Close}Span. DEV-7145	2022-07-20 12:17:15 -04:00
Mike Gerwitz	ce765d3b56	tamer: xir::parse::attr: Error and recovery on duplicate attr This was a TODO for the attribute parser generator. The first attribute will be kept and later ones will be ignored, producing an error. Recovery permits further attribute parsing having ignored the duplicate. DEV-7145	2022-07-20 12:16:13 -04:00
Mike Gerwitz	21dfff0110	tamer: xir::parse::attr::test: Extract into own file It's not going to be getting any smaller. DEV-7145	2022-07-20 10:02:41 -04:00
Mike Gerwitz	1ec9c963fd	tamer: xir::parse::ele: Nonterminal repetition (Kleene star) This allows an element to be repeated by the parent NT. The easiest way I saw to implement this for now was to abuse the Context to provide a runtime configuration that would allow the state machine to reset after it has completed parsing. This also influences error recovery, in that if we're expecting zero or more of something, we cannot provide an error for an unexpected name, and instead must emit a dead state so that the caller can determine what to do. DEV-7145	2022-07-19 16:14:12 -04:00
Mike Gerwitz	e73c223a55	tamer: parser::Parser: cfg(test) tracing This produces useful parse traces that are output as part of a failing test case. The parser generator macros can be a bit confusing to deal with when things go wrong, so this helps to clarify matters. This is _not_ intended to be machine-readable, but it does show that it would be possible to generate machine-readable output to visualize the entire lowering pipeline. Perhaps something for the future. I left these inline in Parser::feed_tok because they help to elucidate what is going on, just by reading what the trace would output---that is, it helps to make the method more self-documenting, albeit a tad bit more verbose. But with that said, it should probably be extracted at some point; I don't want this to set a precedent where composition is feasible. Here's an example from test cases: [Parser::feed_tok] (input IR: XIRF) \| ==> Parser before tok is parsing attributes for `package`. \| \| Attrs_(SutAttrsState_ { ___ctx: (QName(None, LocalPart(NCName(SymbolId(46 "package")))), OpenSpan(Span { len: 0, offset: 0, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10)), ___done: false }) \| \| ==> XIRF tok: `<unexpected>` \| \| Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) \| \| ==> Parser after tok is expecting opening tag `<classify>`. \| \| ChildA(Expecting_) \| \| Lookahead: Some(Lookahead(Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)))) = note: this trace was output as a debugging aid because `cfg(test)`. [Parser::feed_tok] (input IR: XIRF) \| ==> Parser before tok is expecting opening tag `<classify>`. \| \| ChildA(Expecting_) \| \| ==> XIRF tok: `<unexpected>` \| \| Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) \| \| ==> Parser after tok is attempting to recover by ignoring element with unexpected name `unexpected` (expected `classify`). \| \| ChildA(RecoverEleIgnore_(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))) \| \| Lookahead: None = note: this trace was output as a debugging aid because `cfg(test)`. DEV-7145	2022-07-19 14:44:18 -04:00
Mike Gerwitz	f462c7daec	tamer: xir::parse::attr: Display: element name This resolves a TODO by including the name of the element whose attributes are currently being parsed. This also frees a parent from having to provide additional context, allowing Display to be fully delegated when stitching. DEV-7145	2022-07-18 14:43:29 -04:00
Mike Gerwitz	2f4c20dac8	tamer: xir::parse::ele: Remaining Display::fmt for nonterminals The following commit (test tracing) requires non-panicing `Display` and `Debug` values. DEV-7145	2022-07-18 14:31:42 -04:00
Mike Gerwitz	cf2cd882ca	tamer: xir::parse::ele: Introduce sum nonterminals This introduces `Nt := (A \| ... \| Z);`, where `Nt` is the name of the nonterminal and `A ... Z` are the inner nonterminals---it produces a parser that provides a choice between a set of nonterminals. This is implemented efficiently by understanding the QName that is accepted by each of the inner nonterminals and delegating that token immediately to the appropriate parser. This is a benefit of using a parser generator macro over parser combinators---we do not need to implement backtracking by letting inner parsers fail, because we know ahead of time exactly what parser we need. This _does not_ verify that each of the inner parsers accept a unique QName; maybe at a later time I can figure out something for that. However, because this compiles into a `match`, there is no ambiguity---like a PEG parser, there is precedence in the face of an ambiguous token, and the first one wins. Consequently, tests would surely fail, since the latter wouldn't be able to be parsed. This also demonstrates how we can have good error suggestions for this parsing framework: because the inner nonterminals and their QNames are known at compile time, error messages simply generate a list of QNames that are expected. The error recovery strategy is the same as previously noted, and subject to the same concerns, though it may be more appropriate here: it is desirable for the inner parser to fail rather than retrying, so that the sum parser is able to fail and, once the Kleene operator is introduced, retry on another potential element. But again, that recovery strategy may happen to work in some cases, but'll fail miserably in others (e.g. placing an unknown element at the head of a block that expects a sequence of elements would potentially fail the entire block rather than just the invalid one). But more to come on that later; it's not critical at this point. I need to get parsing completed for TAME's input language. DEV-7145	2022-07-14 15:12:57 -04:00
Mike Gerwitz	1fdfc0aa4d	tamer: xir::parse::ele: Introduce open/close span bindings This adds the ability to bind identifiers to represent `OpenSpan` and `CloseSpan`, available to the `@` and `/` maps. Since identifiers in TAME originate from attributes, this may not get a whole lot of use, but it's important to be available. There is some awkwardness in that the opening span appears to be scoped to the entire nonterminal, but it's actually only available in the `@` mapping. I'll change this if it's actually needed; this keeps things simple for now. DEV-7145	2022-07-13 23:42:51 -04:00
Mike Gerwitz	cceb8c7fb9	tamer: xir::parse::ele: Initial Close mapping support Since the parsers produce streaming IRs, we need to be able to emit tokens representing closing delimiters, where they are important. This notably doesn't use spans; I'll add those next, since they're also needed for the previous work. DEV-7145	2022-07-13 15:02:46 -04:00
Mike Gerwitz	c30c0e268d	tamer: xir::parse::ele::test: TODO regarding recovery strategy The comment explains the issue. I don't think the strategy is going to be a desirable one, but I want to move on and observe in retrospect how it ought to be handled. The important part right now is that recovery is accounted for and possible, which was a long-standing concern. DEV-7145	2022-07-13 14:25:25 -04:00

1 2 3 4 5 ...

533 Commits (22a9596cf436c4e63290c6468cf9f815b1081a03)