employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	954b5a2795	Copyright year and name update Ryan Specialty Group (RSG) rebranded to Ryan Specialty after its IPO.	2023-01-20 23:37:30 -05:00
Mike Gerwitz	ed8a2ce28a	tamer: xir::parse::ele: Superstate not to accept early EOF This was accepting an early EOF when the active child `ParseState` was in an accepting state, because it was not ensuring that anything on the stack was also accepting. Ideally, there should be nothing on the stack, and hopefully in the future that's what happens. But with how things are today, it's important that, if anything is on the stack, it is accepting. Since `is_accepting` on the superstate is only called during finalization, and because the check terminates early, and because the stack practically speaking will only have a couple things on it max (unless we're in tail position in a deeply nested tree, without TCO [yet]), this shouldn't be an expensive check. Implementing this did require that we expose `Context` to `is_accepting`, which I had hoped to avoid having to do, but here we are. DEV-7145	2022-08-12 00:47:15 -04:00
Mike Gerwitz	77efefe680	tamer: xir::attr::parse: Better parser state descriptions The attribute name was neither quoted nor `@`-prefixed. (I noticed this in the traces.) DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	8f3301431c	tamer: span::dummy: New module to hold DUMMY_SPAN and derivatives Various DUMMY_SPAN-derived spans are used by many test cases, so this finally extracts them---something I've been meaning to do for some time. This also places DUMMY_SPAN behind a `cfg(test)` directive to ensure that it is _only_ used in tests; UNKNOWN_SPAN should be used when a span is actually unknown, which may also be the case during development. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	bd783ac08b	tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145	2022-07-12 00:11:45 -04:00
Mike Gerwitz	40c68d3e1e	tamer: parse::state::TransitionResult: Make opaque There was only one test outside of the `parse` module using these fields. The next commit will be introducing lookahead, and I do not want to have to trust callers to ensure invariants are met. DEV-7145	2022-07-05 14:12:06 -04:00
Mike Gerwitz	a16a0d9138	Revert "tamer: xir: Initial re-introduction of AttrEnd" This reverts commit `b973d36862`. Alright, I'm getting sick of fighting with myself on this. But rather than just removing the last commit, I'm going to keep it around, so that my thoughts are clearly documented for my future quarrels with myself. Firstly: this added more overhead than I wanted it to. While it wasn't significant, it did add 100--150ms to one of our largest systems, up from ~2.8s, which seems a bit much for a token that's really just meant to make life easier for the parser. Further, it seems that all I've managed to do is push my original problem to a different layer---this started as a means to resolve having to emit both an object and an error simultaneously in the case where aggregate attribute parsing has completed, but we encounter an error on the next token (e.g. an unexpected element). But XIRF, if it's missing AttrEnd, should throw an error, but should also recover. Recovery is easy---just assume that it was present---_but then we don't emit a XIRF `AttrEnd` token_, which is necessary for downstream systems. So we'd need to either: (a) emit both a token and an error; or (b) panic. But if we're doing (a), then the need for `AttrEnd` goes away, because it solves the original problem (though the other concerns of the previous commit still stand). (b) is not ideal at all, even though the missing token does represent an internal system error; it's not something the user can correct. But, given that it's something that the user cannot correct, doesn't that imply that it's an awkward thing to include in the token stream? So back to `AttrEnd` being an awkward PITA to have. So, given (a), I'll just do that: errors will become more of a "hey, this error just occurred, but I'm trying to recover---here's an object that you should use if you choose to continue parsing, but it may or may not be what you're looking for; proceed with caution". That flips the original script: I imagined having external systems feed recovery tokens, but this encapsulates recovery within the parser, which really is more appropriate, though less flexible than having an omniscient external recovery system; such a monolith was always an awkward concept and would be difficult to implement cleanly. This can also potentially be implemented as a generalization of the Dead state change that allowed an object to be emitted alongside the lookahead/error. Anyway, back to where I was...I'm sure I'll look back on this in the future shaking my head, reflecting on how naive I was. DEV-7145	2022-06-29 11:25:44 -04:00
Mike Gerwitz	b973d36862	tamer: xir: Initial re-introduction of AttrEnd AttrEnd was initially removed in `0cc0bc9d5a` (and the commit prior), because there was not a compelling reason to use it over a lookahead operation (returning a token via the a dead state transition); `AttrEnd` simply introduced inconsistencies between the XIR reader (which produced AttrEnd) and internal XIR stream generators (e.g. the lowering operations into XIR->XML, which do not). But now that parsers are performing aggregation---in particular the attribute parser-generator `xir::parse::attr`---this has become quite a pain, because the dead state is an actionable token. For example: 1. Open 2. Attr 3. Attr 4. Open 5. ... In the happy case, token #4 results in `Parsed::Incomplete`, and so can just be transformed into the object representing the aggregated attributes. But even in this happy path, it's ugly, and it requires non-tail recursion on the parser which requires a duplicate stack allocation for the `ParserState`. That violates a core principle of the system. But if there is an error at #4---e.g. an unexpected element---then we no longer have a `Parsed::Incomplete` to hijack for our own uses, and we'd have to introduce the ability to return both an error and a token, or we'd have to introduce the ability to keep a token of lookahead instead of reading from the underlying token stream, but that's complicated with push parsers, which are used for parser composition. Yikes. And furthermore, the aggregation has caused me to introduce the ability to override the dead state type to introduce both a token of lookahead and aggregation information. This complicates the system and is going to be confusing to others. Given all of this, AttrEnd does now seem appropriate to reintroduce, since it will allow processing of aggregate operations when encountering that token without having to worry about the above scenario; without having to duplicate a `ParseState` stack; without having to hijack dead state transitions for producing our aggregate object; and everything else mentioned above. This commit does not modify those abstractions to use AttrEnd yet; it re-introduces the token to the core system, not the parser-generators, and it doesn't yet replace lookahead operations in the parsers that use them. That'll come next. Unlike the commit that removed it, though, we are now generating proper spans, so make note of that here. This also does not introduce the concept to XIRF yet, which did not exist at the time that it was removed, so XIRF is filtering it out until a following commit. DEV-7145	2022-06-29 11:02:02 -04:00
Mike Gerwitz	c671bf6a9c	tamer: xir: Introduce {Ele,Open,Close}Span This isn't conceptally all that significant of a change, but there was a lot of modify to get it working. I would generally separate this into a commit for the implementation and another commit for the integration, but I decided to keep things together. This serves a role similar to AttrSpan---this allows deriving a span representing the element name from a span representing the entire XIR token. This will provide more useful context for errors---including the tag delimiter(s) means that we care about the fact that an element is in that position (as opposed to some other type of node) within the context of an error. However, if we are expecting an element but take issue with the element name itself, we want to place emphasis on that instead. This also starts to consider the issue of span contexts---a blob of detached data that is `Span` is useful for error context, but it's not useful for manipulation or deriving additional information. For that, we need to encode additional context, and this is an attempt at that. I am interested in the concept of providing Spans that are guaranteed to actually make sense---that are instantiated and manipulated with APIs that ensure consistency. But such a thing buys us very little, practically speaking, over what I have now for TAMER, and so I don't expect to actually implement that for this project; I'll leave that for a personal project. TAMER's already take a lot of my personal interests and it can cause me a lot of grief sometimes (with regards to letting my aspirations cause me more work). DEV-7145	2022-06-24 14:16:29 -04:00
Mike Gerwitz	eafb3b2a1b	tamer: Add Display impl for each ParseState for generic ParseErrors This is intended to describe, to the user, the state that the parser is in. This will be used to convey additional information for general parser errors, but it should also probably be integrated into parsers' individual errors as well when appropriate. This is something I expected to add at some point, but I wanted to add them because, when dealing with lowering errors, it can be difficult to tell what parser the error originated from. DEV-11864	2022-05-25 15:26:02 -04:00
Mike Gerwitz	1ad2fb1dc8	Copyright year update 2022 RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no "Group"), but I'm not sure if the legal name has been changed yet or not, so I'll wait on that.	2022-05-03 14:14:29 -04:00
Mike Gerwitz	eaa8133d21	tamer: diagnose: Introduction of diagnostic system This is a working concept that will continue to evolve. I wanted to start with some basic output before getting too carried away, since there's a lot of potential here. This is heavily influenced by Rust's helpful diagnostic messages, but will take some time to realize a lot of the things that Rust does. The next step will be to resolve line and column numbers, and then possibly include snippets and underline spans, placing the labels alongside them. I need to balance this work with everything else I have going on. This is a large commit, but it converts the existing Error Display impls into Diagnostic. This separation is a bit verbose, so I'll see how this ends up evolving. Diagnostics are tied to Error at the moment, but I imagine in the future that any object would be able to describe itself, error or not, which would be useful in the future both for the Summary Page and for query functionality, to help developers understand the systems they are writing using TAME. Output is integrated into tameld only in this commit; I'll add tamec next. Examples of what this outputs are available in the test cases in this commit. DEV-10935	2022-04-13 15:22:46 -04:00
Mike Gerwitz	e77bdaf19a	tamer: parse: Introduce mutable Context This resolves the performance issues caused by Rust's failure to elide the ElementStack (ArrayVec) memcpys on move. Since XIRF is invoked tens of millions of times in some cases for larger systems, prior to this change, failure to optimize away moves for XIRF resulted in tens of millions of memcpys. This resulted in linking of one program going from 1s -> ~15s. This change reduces it to ~2.5s with the wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the subject of future changes). In particular, this change introduces a new mutable reference to `ParseState::parse_token`, which is a reference to a `Context` owned by the caller (e.g. `Parser`). In the case of XIRF, this means that `Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of `flat::State`; this allows the latter to remain pure and benefit from Rust's move optimizations, without sacrificing the otherwise-pure implementation. ParseStates that do not need a mutable context can use `NoContext` and remain pure. DEV-12024	2022-04-05 15:50:53 -04:00
Mike Gerwitz	f402e51d04	tamer: parse: More flexible Transition API This does some cleanup and adds `parse::Object` for use in disambiguating `From` for `ParseStatus`, allowing the `Transition` API to be much more flexible in the data it accepts and automatically converts. This allows us to concisely provide raw output data to be wrapped, or provide `ParseStatus` directly when more convenient. There aren't yet examples in the docs; I'll do so once I make sure this API is actually utilized as intended. DEV-10863	2022-03-25 16:45:32 -04:00
Mike Gerwitz	279ddc79d7	tamer: parse::TransitionResult: Alias=>newtype This converts the tuple type alias into a newtype, so that we may provide our own implementations. This differs from a previous approach that I took, which involved making this type `Result<(S, T), (S, E)>` so that the return values composed well with other functions. But the reality is that this is used only by other `ParseState`s and `Parser`, so it's unnecessary. However, this is also an attempt to utilize the new Try and FromResidual traits; note how the Try associated types match precisely what I was trying to do before, though they're used as intermediate types. I'll see how this evolves. DEV-10863	2022-03-25 12:28:50 -04:00
Mike Gerwitz	2e98a69d15	Revert "tamer: parse::TransitionResult: Move common Transition into Result" This reverts commit `bf5da75096`.	2022-03-25 09:17:25 -04:00
Mike Gerwitz	bf5da75096	tamer: parse::TransitionResult: Move common Transition into Result This allows the Results to compose and, importantly, is compatible with `?` without having to put in any extra effort. This makes puts the caller in an awkward spot, so I introduced a utility function `result_tup0_invert` for now; we'll see if that stays or evolves differently. DEV-10863	2022-03-24 23:48:30 -04:00
Mike Gerwitz	ceb00c4df5	tamer: xir: Complete parse type migration A previous commit moved the parser. This updates the types so that they can actually be utilized in that context. DEV-10863	2022-03-21 15:50:43 -04:00
Mike Gerwitz	14638a612f	tamer: {xir::=>}parse: Move parser out of XIR The parsing framework originally created for XIR is now more general and useful to other things. We'll see how this evolves. This needs additional documentation, but I'd like to see how it changes as I implement XmloReader and then some of the source readers first. DEV-10863	2022-03-18 16:24:53 -04:00
Mike Gerwitz	0360226caa	tamer: xir::parse: Generalize input token type This adds a `Token` type to `ParseState`. Everything uses `xir::Token` currently, but `XmloReader` will use `xir::flat::Object`. Now that this has been generalized beyond XIR, the parser ought to be hoisted up a level. DEV-10863	2022-03-18 15:26:05 -04:00
Mike Gerwitz	7b6d68af85	tamer: xir::parse::Transition: Generalize flat::Transition XIRF introduced the concept of `Transition` to help document code and provide mental synchronization points that make it easier to reason about the system. I decided to hoist this into XIR's parser itself, and have `parse_token` accept an owned state and require a new state to be returned, utilizing `Transition`. Together with the convenience methods introduced on `Transition` itself, this produces much clearer code, as is evidenced by tree::Stack (XIRT's parser). Passing an owned state is something that I had wanted to do originally, but I thought it'd lead to more concise code to use a mutable reference. Unfortunately, that concision lead to code that was much more difficult than necessary to understand, and ended up having a net negative benefit by leading to some more boilerplate for the nested types (granted, that could have been alleviated in other ways). This also opens up the possibility to do something that I wasn't able to before, which was continue to abstract away parser composition by stitching their state machines together. I don't know if this'll be done immediately, but because the actual parsing operations are now able to compose functionally without mutability getting the way, the previous state coupling issues with the parent parser go away. DEV-10863	2022-03-17 16:02:05 -04:00

21 Commits (bef68e16340ab5e6abdcf2807e535771d8e98436)