employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	1ec9c963fd	tamer: xir::parse::ele: Nonterminal repetition (Kleene star) This allows an element to be repeated by the parent NT. The easiest way I saw to implement this for now was to abuse the Context to provide a runtime configuration that would allow the state machine to reset after it has completed parsing. This also influences error recovery, in that if we're expecting zero or more of something, we cannot provide an error for an unexpected name, and instead must emit a dead state so that the caller can determine what to do. DEV-7145	2022-07-19 16:14:12 -04:00
Mike Gerwitz	e73c223a55	tamer: parser::Parser: cfg(test) tracing This produces useful parse traces that are output as part of a failing test case. The parser generator macros can be a bit confusing to deal with when things go wrong, so this helps to clarify matters. This is _not_ intended to be machine-readable, but it does show that it would be possible to generate machine-readable output to visualize the entire lowering pipeline. Perhaps something for the future. I left these inline in Parser::feed_tok because they help to elucidate what is going on, just by reading what the trace would output---that is, it helps to make the method more self-documenting, albeit a tad bit more verbose. But with that said, it should probably be extracted at some point; I don't want this to set a precedent where composition is feasible. Here's an example from test cases: [Parser::feed_tok] (input IR: XIRF) \| ==> Parser before tok is parsing attributes for `package`. \| \| Attrs_(SutAttrsState_ { ___ctx: (QName(None, LocalPart(NCName(SymbolId(46 "package")))), OpenSpan(Span { len: 0, offset: 0, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10)), ___done: false }) \| \| ==> XIRF tok: `<unexpected>` \| \| Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) \| \| ==> Parser after tok is expecting opening tag `<classify>`. \| \| ChildA(Expecting_) \| \| Lookahead: Some(Lookahead(Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)))) = note: this trace was output as a debugging aid because `cfg(test)`. [Parser::feed_tok] (input IR: XIRF) \| ==> Parser before tok is expecting opening tag `<classify>`. \| \| ChildA(Expecting_) \| \| ==> XIRF tok: `<unexpected>` \| \| Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) \| \| ==> Parser after tok is attempting to recover by ignoring element with unexpected name `unexpected` (expected `classify`). \| \| ChildA(RecoverEleIgnore_(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))) \| \| Lookahead: None = note: this trace was output as a debugging aid because `cfg(test)`. DEV-7145	2022-07-19 14:44:18 -04:00
Mike Gerwitz	1fdfc0aa4d	tamer: xir::parse::ele: Introduce open/close span bindings This adds the ability to bind identifiers to represent `OpenSpan` and `CloseSpan`, available to the `@` and `/` maps. Since identifiers in TAME originate from attributes, this may not get a whole lot of use, but it's important to be available. There is some awkwardness in that the opening span appears to be scoped to the entire nonterminal, but it's actually only available in the `@` mapping. I'll change this if it's actually needed; this keeps things simple for now. DEV-7145	2022-07-13 23:42:51 -04:00
Mike Gerwitz	cceb8c7fb9	tamer: xir::parse::ele: Initial Close mapping support Since the parsers produce streaming IRs, we need to be able to emit tokens representing closing delimiters, where they are important. This notably doesn't use spans; I'll add those next, since they're also needed for the previous work. DEV-7145	2022-07-13 15:02:46 -04:00
Mike Gerwitz	73efc59582	tamer: xir::parse::ele: Initial element parser generator concept This begins generating parsers that are capable of parsing elements. I need to move on, so this abstraction isn't going to go as far as it could, but let's see where it takes me. This was the work that required the recent lookahead changes, which has been detailed in previous commits. This initial support is basic, but robust. It supports parsing elements with attributes and children, but it does not yet support the equivalent of the Kleene star (`*`). Such support will likely be added by supporting parsers that are able to recurse on their own definition in tail position, which will also require supporting parsers that do not add to the stack. This generates parsers that, like all the other parsers, use enums to provide a typed stack. Stitched parsers produce a nested stack that is always bounded in size. Fortunately, expressions---which can nest deeply---do not need to maintain ancestor context on the stack, and so this should work fine; we can get away with this because XIRF ensures proper nesting for us. Statements that _do_ need to maintain such context are not nested. This also does not yet support emitting an object on closing tag, which will be necessary for NIR, which will be a streaming IR that is "near" to the source XML in structure. This will then be used to lower into AIR for the ASG, which gives structure needed for further analysis. More information to come; I just want to get this committed to serve as a mental synchronization point and clear my head, since I've been sitting on these changes for so long and have to keep stashing them as I tumble down rabbit holes covered in yak hair. DEV-7145	2022-07-13 14:08:47 -04:00
Mike Gerwitz	c9b3b84f90	tamer: parse::transition::Lookahead: ParseState=>Token type param Having the lookahead token generic over the `ParseState` was a pain in the ass for stitching, since they shared the same token type but not the same parser. I don't expect there to be any need to be able to infer other parser-related types for a token of lookahead, so I'd rather just make my life easier until such a thing is needed. DEV-7145	2022-07-13 10:13:35 -04:00
Mike Gerwitz	bd783ac08b	tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145	2022-07-12 00:11:45 -04:00
Mike Gerwitz	61ce7d3fc7	tamer: parse::state::transition: Extract module into own file That's it. Just preparing for changes that will change how lookahaeds and dead state transitions will work. DEV-7145	2022-07-07 12:47:31 -04:00
Mike Gerwitz	e54f93b30f	tamer: parse: Introduce lookahaed token in Parser NB: This is the initial change to introduce the token of lookahead, but this does not fully integrate it. In particular, this is missing from the stitching/delegation layer. This has been a long time coming, I suppose, though I had tried to avoid it with `Parser::delegate_lookahead`. But the problem with doing that is that it forced the ParserState to recurse, which both violates that I want no looping constructs except for the toplevel, and performs additional stack allocation as it is not in tail position. The final straw was having to both return an error _and_ an aggregate object for the attribute parser when an unexpected element is encountered (this code is not yet committed). One option was to add a recovery object to the error object, and formalize that, but then we have other concerns; for example, what if that recovery object triggered an error? We'd have to mask either the old or the new error. But we wouldn't want to mask either, because the object causing the error would be the aggregate attributes, which is _not_ a recovery object, but actual data we want to emit. And so it's a kluge right off of the bat. The use of a token of lookahaed is a more traditional approach and has uses outside of just this one scenario. It'll also allow for the removal of recursion from the existing ParserStates, and possibly the elimination of dead state associated data, though I may end up leaving that; more to come. Rust will also optimize away lookahead storage and processing in Parsers that do not utilize it. DEV-7145	2022-07-07 11:19:55 -04:00
Mike Gerwitz	40c68d3e1e	tamer: parse::state::TransitionResult: Make opaque There was only one test outside of the `parse` module using these fields. The next commit will be introducing lookahead, and I do not want to have to trust callers to ensure invariants are met. DEV-7145	2022-07-05 14:12:06 -04:00
Mike Gerwitz	adc45d90df	tamer: xir::parse: Attribute parser generator This is the first parser generator for the parsing framework. I've been waiting quite a while to do this because I wanted to be sure that I understood how I intended to write the attribute parsers manually. Now that I'm about to start parsing source XML files, it is necessary to have a parser generator. Typically one thinks of a parser generator as a separate program that generates code for some language, but that is not always the case---that represents a lack of expressiveness in the language itself (e.g. C). Here, I simply use Rust's macro system, which should be a concept familiar to someone coming from a language like Lisp. This also resolves where I stand on parser combinators with respect to this abstraction: they both accomplish the exact same thing (composition of smaller parsers), but this abstraction doesn't do so in the typical functional way. But the end result is the same. The parser generated by this abstraction will be optimized an inlined in the same manner as the hand-written parsers. Since they'll be tightly coupled with an element parser (which too will have a parser generator), I expect that most attribute parsers will simply be inlined; they exist as separate parsers conceptually, for the same reason that you'd use parser combinators. It's worth mentioning that this awkward reliance on dead state for a lookahead token to determine when aggregation is complete rubs me the wrong way, but resolving it would involve reintroducing the XIR AttrEnd that I had previously removed. I'll keep fighting with myself on this, but I want to get a bit further before I determine if it's worth the tradeoff of reintroducing (more complex IR but simplified parsing). DEV-7145	2022-06-21 13:23:02 -04:00
Mike Gerwitz	f7752436da	tamer: parse::Parser: Add remaining field docs DEV-7145	2022-06-07 15:23:20 -04:00
Mike Gerwitz	3c227e5a2d	tamer: parse::ParseState: Remove Default trait bound `ParseState` originally required `Default` for use with `mem::take` in `Parser::feed_tok`. This unfortunately cannot last, since more specialized parsers require context during initialization in order to provide useful diagnostic information. (The other option is to require the caller to augment errors with diagnostic information, but that would have to be duplicated by every caller and complicates parser composition; I'd prefer those diagnostic details remain encapsulated.) Replacing `Default` with `Option` is uglier, but it ends up producing the same assembly as `mem::take` did, at least at the time of writing. Because Rust is able to elide unnecessary moves using this implementation, there is no need for `unwrap_unchecked` or other unsafe methods, which is great, since it shows that this parsing methodology is viable entirely in safe Rust. DEV-7145	2022-06-07 15:08:40 -04:00
Mike Gerwitz	f14ffc87c2	tamer: parse::state::ParseState::DeadToken: New associated type Previously, `ParseStatus::Dead` always yielded `ParseState::Token`. However, I'm working on introducing parsers that aggregate (parsing XML attributes into structs), and those parsers do not know that they have completed aggregation until they reach a dead state; given that, I need to yield additional information at that time. I played around with a number of alternative ideas, but this ended up being the cleanest, relative to the effort involved. For example, introducing another parameter to `ParseStatus::Dead` was too burdensome on APIs that ought not concern themselves with the possibility of receiving an object in addition to a lookahead token, since many parsers are not capable of doing so (given that they map M:(N<=M)). Another option that I abandoned fairly quickly was having `is_accepting` (potentially renamed) return an aggregate object, since that's on the side and didn't feel like it was part of the parsing pipeline. The intent is to abstract this some in a new `ParseState` method for delegation + aggregation. DEV-7145	2022-06-07 09:37:41 -04:00
Mike Gerwitz	8d92667388	tamer: Integrate xir::reader as a parser in the lowering pipeline This allows `XmlXirReader` to be used in a `Lower` operation, just as everything else, bringing me one step closer to a pipeline that can be concisely represented; this is finally beginning to unify in a clear way, though it is still a bit of a mess. This causes `XmlXirReader` to _act_ like a `parse::Parser` in that it yields a `ParsedResult`, but it does not use `parse::Parser` itself; that was the _original_ plan: convert it into a `ParseState` where `XmlXirReader` became a context, and force `Parser` to yield by feeding it a stream of tokens with `repeat`, but that ended up performing poorly relative to this change. I did some investigation, which I might write about in the future, but for now, this solution works just fine. DEV-7145	2022-06-02 10:30:44 -04:00
Mike Gerwitz	f8c28655dc	tamer: parse: Split into multiple modules This abstraction has grown quite a bit, and it's time to start formalizing it a bit. This split doesn't change any behavior, but it does start to make it easier to reason about by clearly stating the broad components and how they interact with one-another. This doesn't yet move the tests; those will come next, but they are very few. The reason I gave previously for this was because (a) they're tested indirectly via the systems that utilize them and (b) because the abstraction was not yet settled on the process was already very expensive. No test coverage was lost---it's only that failures were potentially harder to debug on test failures, but in practice not even this was true, because the deeply expressive types all but ensured that, if it compiles, it will function in a way that is expected. Unit tests and documentation for this system will be added once I'm sure that this abstraction is in a proper state. DEV-7145	2022-06-01 11:32:58 -04:00

16 Commits (184ff6bdccc52b0703f798b6ecbbe316f8407666)