employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	57a805b495	tamer: src::pipeline: Eliminate most error type references Just cleaning up a bit, removing some unnecessary types, since there are so many involved. DEV-13162	2023-05-25 16:58:44 -04:00
Mike Gerwitz	ea6259570e	tamer: ld::poc: Extract xmlo loading pipeline into new pipeline module I want to clean this up a bit further. The motivation is that we need this for imports in `tamec`. Eventually this will be cleaned up to the point where it's declarative and easy to understand---there's a mess of types involved now and, when something goes wrong, it can be brutally confusing. DEV-13162	2023-05-25 16:38:41 -04:00
Mike Gerwitz	e6c6028b37	tamer: xir::parse::ele: Move StateStack into parse::state This will be utilized by `AirAggregate`. DEV-13708	2023-03-30 15:44:12 -04:00
Mike Gerwitz	4fd8e9ea40	tamer: asg::air: Extract expression parsing into `expr` This is more of the same refactoring that has been happening. This extraction also helps emphasize the relationship between imported objects, and isolates the growing number of test cases. This parser will only grow. DEV-13708	2023-03-10 14:27:59 -05:00
Mike Gerwitz	b6d0569b99	tamer: asg::air: Expression parser This delegates expression parsing to `AirExprAggregate`, in an effort to both begin to simplify the understanding and maintenance of `AirAggregate`; and allow for parser composition for template parsing. This utilizes the prior changes for token sum types to precisely define the subset of AIR tokens supported by the expression parser. This differs from prior approaches which delegated until a dead state, relying on runtime information to determine if a parser has finished. This allows us to determine that statically. I do want to be able to eliminate the dead state from the parser so we can get rid of the `unreachable!`, but I need to move on; that's something I had tried to do in the past too, which ended up adding a bit of complexity, and I'll have to consider my options in the future, including whether the dead state transition can be entirely eliminated in favor of the combination of these sum types and recovery; the parsing framework decisions were made while recovery was still an open question, at least in practice. DEV-13708	2023-03-10 14:27:59 -05:00
Mike Gerwitz	33d2b4f0b8	tamer: tamec: POC lowering pipeline with XirfAutoClose and XirfToXir This replaces the stub `derive_xmli` with the same result (well, minus a space before the '/' in the output) using what will become the lowering pipeline. Once again, this is quite verbose, and the lowering pipeline in general needs to be further abstracted away. Unlike the rest of the pipeline, an error during the derivation process will immediately terminate with an unrecoverable error, because we do not want to write partial files. This does not remove the garbage file, because the build system ought to do that itself (e.g. `make`)...but that is certainly open for debate. DEV-13708	2023-03-10 14:27:57 -05:00
Mike Gerwitz	29178f2360	tamer: xir::reader: Divorce from `parse` The reader previously yielded a `ParsedResult`, presumably to simplify lowering operations. But the reader is not a `ParseState`, and does not otherwise use the parsing API, so this was an inappropriate and confusing coupling. This resolves that, introducing a new `lowerable` which will translate an iterator into something that can be placed in a lowering pipeline. See the previous commit for more information. DEV-13708	2023-03-10 14:27:57 -05:00
Mike Gerwitz	79cc61f996	tamer: xir::flat::XirfToXir: New lowering operation This parser does exactly what it says it does. Its implementation is simple, but I added a test anyway just to prove that it works, and the test seems more complicated than the implementation itself, given the types involved. DEV-13708	2023-03-10 14:27:57 -05:00
Mike Gerwitz	954b5a2795	Copyright year and name update Ryan Specialty Group (RSG) rebranded to Ryan Specialty after its IPO.	2023-01-20 23:37:30 -05:00
Mike Gerwitz	aa1ca06a0e	tamer: tamec: Introduce NIR->AIR->ASG lowering This does not yet yield the produces ASG, but does set up the lowering pipeline to prepare to produce it. It's also currently a no-op, with `NirToAsg` just yielding `Incomplete`. The goal is to begin to move toward vertical slices for TAMER as I start to return to the previous approach of a handoff with the old compiler. Now that I've gained clarity from my previous failed approach (which I documented in previous commits), I feel that this is the best way forward that will allow me to incrementally introduce more fine-grained performance improvements, at the cost of some throwaway work as this progresses. But the cost of delay with these build times is far greater. DEV-13429	2022-12-13 13:37:07 -05:00
Mike Gerwitz	03cf652c41	tamer: parse::util: Introduce StitchableExpansionState This parser really just allows me to continue developing the NIR interpolation system using `Expansion` terminology, and avoid having to use dead states in tests. This allows for the appropriate level of abstraction to be used in isolation, and then only be stripped when stitching is necessary. Future commits will show how this is actually integrated and may introduce additional abstraction to help. DEV-13156	2022-11-15 12:19:25 -05:00
Mike Gerwitz	4117efc50c	tamer: nir::desugar::interp: Generalize without NIR symbol types This is a shift in approach. My original idea was to try to keep NIR parsing the way it was, since it's already hard enough to reason about with the `ele_parse!` parser-generator macro mess. The idea was to produce an IR that would explicitly be denoted as "maybe sugared", and have a desugaring operation as part of the lowering pipeline that would perform interpolation and lower the symbol into a plain version. The problem with that is: 1. The use of the type was going to introduce a lot of mapping for all the NIR token variants there are going to be; and 2. _The types weren't even utilized for interpolation._ Instead, if we interpolated _as attributes are encountered_ while parsing NIR, then we'd be able to expand directly into that NIR token stream and handle _all_ symbols in a generic way, without any mapping beyond the definition of NIR's grammar using `ele_parse!`. This is a step in that direction---it removes `NirSymbolTy` and introduces a generic abstraction for the concept of expansion, which will be utilized soon by the attribute parser to allow replacing `TryFrom` with something akin to `ParseFrom`, or something like that, which is able to produce a token stream before finally yielding the value of the attribute (which will be either the original symbol or the replacement metavariable, in the case of interpolation). (Note that interpolation isn't yet finished---errors still need to be implemented. But I want a working vertical slice first.) DEV-13156	2022-11-10 12:33:30 -05:00
Mike Gerwitz	66f09fa4c9	tamer: parse::prelude: New module Not sure why I didn't add a prelude sooner, considering all the import boilerplate. This will evolve as needed and I'll go back and replace other imports when I'm not in the middle of something. DEV-13156	2022-11-02 14:56:26 -04:00
Mike Gerwitz	26aaf6efc1	tamer: parse::error::ParseError: Extract some variants into FinalizeError This helps to clarify the situations under which these errors can occur, and the generality also helps to show why the inner types are as they are (e.g. use of `String`). But more importantly, this allows for an error type in `finalize` that is detached from the `ParseState`, which will be able to be utilized in the lowering pipeline as a more general error distinguishable from other lowering errors. At the moment I'm maintaining BC, but a following commit will demonstrate the use case to introduce recoverable vs. non-recoverable errors. DEV-13158	2022-10-26 12:44:19 -04:00
Mike Gerwitz	2087672c47	tamer: parse::parser::finalize: Introduce FinalizedParser This newtype allows a caller to prove (using types) that a parser of a given type (`ParseState`) has been finalized. This will be used by the lowering pipeline to ensure that all parsers in the pipeline end up getting finalized (as you can see from a TODO added in the code, one of them is missing). The lack of such a type was an oversight during the (rather stressed) development of the parsing system, and I shouldn't need to resort to unit tests to verify that parsers have been finalized. DEV-13158	2022-10-26 12:44:19 -04:00
Mike Gerwitz	ed8a2ce28a	tamer: xir::parse::ele: Superstate not to accept early EOF This was accepting an early EOF when the active child `ParseState` was in an accepting state, because it was not ensuring that anything on the stack was also accepting. Ideally, there should be nothing on the stack, and hopefully in the future that's what happens. But with how things are today, it's important that, if anything is on the stack, it is accepting. Since `is_accepting` on the superstate is only called during finalization, and because the check terminates early, and because the stack practically speaking will only have a couple things on it max (unless we're in tail position in a deeply nested tree, without TCO [yet]), this shouldn't be an expensive check. Implementing this did require that we expose `Context` to `is_accepting`, which I had hoped to avoid having to do, but here we are. DEV-7145	2022-08-12 00:47:15 -04:00
Mike Gerwitz	15e04d63e2	tamer: xir::parse::ele: Transition trampoline This properly integrates the trampoline into `ele_parse!`. The implementation leaves some TODOs, most notably broken mixed text handling since we can no longer intercept those tokens before passing to the child. That is temporarily marked as incomplete; see a future commit. The introduced test `ParseState`s were to help me reason about the system intuitively as I struggled to track down some type errors in the monstrosity that is `ele_parse!`. It will fail to compile if those invariants are violated. (In the end, the problems were pretty simple to resolve, and the struggle was the type system doing its job in telling me that I needed to step back and try to reason about the problem again until it was intuitive.) This keeps around the NT states for now, which are quickly used to transition to the next NT state, like a couple of bounces on a trampoline: NT -> Dead -> Parent -> Next NT This could be optimized in the future, if it's worth doing. This also makes no attempt to implement tail calls; that would have to come after fixing mixed content and really isn't worth the added complexity now. I (desperately) need to move on, and still have a bunch of cleanup to do. I had hoped for a smaller commit, but that was too difficult to do with all the types involved. DEV-7145	2022-08-10 11:46:45 -04:00
Mike Gerwitz	53a689741b	tamer: parse::state::ParseState::Super: Superstate concept I'm disappointed that I keep having to implement features that I had hoped to avoid implementing. This introduces a "superstate" feature, which is intended really just to be a sum type that is able to delegate to stitched `ParseState`s. This then allows a `ParseState` to transition directly to another `ParseState` and have the parent `ParseState` handle the delegation---a trampoline. This issue naturally arises out of the recursive nature of parsing a TAME XML document, where certain statements can be nested (like `<section>`), and where expressions can be nested. I had gotten away with composition-based delegation for now because `xmlo` headers do not have such nesting. The composition-based approach falls flat for recursive structures. The typical naive solution is boxing, which I cannot do, because not only is this on an extremely hot code path, but I require that Rust be able to deeply introspect and optimize away the lowering pipeline as much as possible. Many months ago, I figured that such a solution would require a trampoline, as it typically does in stack-based languages, but I was hoping to avoid it. Well, no longer; let's just get on with it. This intends to implement trampolining in a `ParseState` that serves as that sum type, rather than introducing it as yet another feature to `Parser`; the latter would provide a more convenient API, but it would continue to bloat `Parser` itself. Right now, only the element parser generator will require use of this, so if it's needed beyond that, then I'll debate whether it's worth providing a better abstraction. For now, the intent will be to use the `Context` to store a stack that it can pop off of to restore the previous `ParseState` before delegation. DEV-7145	2022-08-08 15:23:54 -04:00
Mike Gerwitz	8f3301431c	tamer: span::dummy: New module to hold DUMMY_SPAN and derivatives Various DUMMY_SPAN-derived spans are used by many test cases, so this finally extracts them---something I've been meaning to do for some time. This also places DUMMY_SPAN behind a `cfg(test)` directive to ensure that it is _only_ used in tests; UNKNOWN_SPAN should be used when a span is actually unknown, which may also be the case during development. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	17327f1b64	tamer: parse::trace: Extract tracing into new module This has gotten large and was cluttering `feed_tok`. This also provides the ability to more easily expand into other types of tracing in the future. DEV-7145	2022-07-26 09:29:17 -04:00
Mike Gerwitz	c3dfcc565c	tamer: parse::parser::Parser: Include errors in parse trace Because of recovery, the trace otherwise paints a really confusing-looking picture when given unexpected input. This is large enough now that it really ought to be extracted from `feed_tok`, but I'll wait to see how this evolves further. I considered adding color too, but it's not yet clear to me that the visual noise will be all that helpful. DEV-7145	2022-07-26 09:28:37 -04:00
Mike Gerwitz	e517e15a29	tamer: parse::Token: Swap trait method order This just places `ir_name` first in the trait definition so that it'll be inserted in that same order when using LSP. DEV-7145	2022-07-20 13:58:44 -04:00
Mike Gerwitz	e73c223a55	tamer: parser::Parser: cfg(test) tracing This produces useful parse traces that are output as part of a failing test case. The parser generator macros can be a bit confusing to deal with when things go wrong, so this helps to clarify matters. This is _not_ intended to be machine-readable, but it does show that it would be possible to generate machine-readable output to visualize the entire lowering pipeline. Perhaps something for the future. I left these inline in Parser::feed_tok because they help to elucidate what is going on, just by reading what the trace would output---that is, it helps to make the method more self-documenting, albeit a tad bit more verbose. But with that said, it should probably be extracted at some point; I don't want this to set a precedent where composition is feasible. Here's an example from test cases: [Parser::feed_tok] (input IR: XIRF) \| ==> Parser before tok is parsing attributes for `package`. \| \| Attrs_(SutAttrsState_ { ___ctx: (QName(None, LocalPart(NCName(SymbolId(46 "package")))), OpenSpan(Span { len: 0, offset: 0, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10)), ___done: false }) \| \| ==> XIRF tok: `<unexpected>` \| \| Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) \| \| ==> Parser after tok is expecting opening tag `<classify>`. \| \| ChildA(Expecting_) \| \| Lookahead: Some(Lookahead(Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)))) = note: this trace was output as a debugging aid because `cfg(test)`. [Parser::feed_tok] (input IR: XIRF) \| ==> Parser before tok is expecting opening tag `<classify>`. \| \| ChildA(Expecting_) \| \| ==> XIRF tok: `<unexpected>` \| \| Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) \| \| ==> Parser after tok is attempting to recover by ignoring element with unexpected name `unexpected` (expected `classify`). \| \| ChildA(RecoverEleIgnore_(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))) \| \| Lookahead: None = note: this trace was output as a debugging aid because `cfg(test)`. DEV-7145	2022-07-19 14:44:18 -04:00
Mike Gerwitz	bd783ac08b	tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145	2022-07-12 00:11:45 -04:00
Mike Gerwitz	f14ffc87c2	tamer: parse::state::ParseState::DeadToken: New associated type Previously, `ParseStatus::Dead` always yielded `ParseState::Token`. However, I'm working on introducing parsers that aggregate (parsing XML attributes into structs), and those parsers do not know that they have completed aggregation until they reach a dead state; given that, I need to yield additional information at that time. I played around with a number of alternative ideas, but this ended up being the cleanest, relative to the effort involved. For example, introducing another parameter to `ParseStatus::Dead` was too burdensome on APIs that ought not concern themselves with the possibility of receiving an object in addition to a lookahead token, since many parsers are not capable of doing so (given that they map M:(N<=M)). Another option that I abandoned fairly quickly was having `is_accepting` (potentially renamed) return an aggregate object, since that's on the side and didn't feel like it was part of the parsing pipeline. The intent is to abstract this some in a new `ParseState` method for delegation + aggregation. DEV-7145	2022-06-07 09:37:41 -04:00
Mike Gerwitz	8d92667388	tamer: Integrate xir::reader as a parser in the lowering pipeline This allows `XmlXirReader` to be used in a `Lower` operation, just as everything else, bringing me one step closer to a pipeline that can be concisely represented; this is finally beginning to unify in a clear way, though it is still a bit of a mess. This causes `XmlXirReader` to _act_ like a `parse::Parser` in that it yields a `ParsedResult`, but it does not use `parse::Parser` itself; that was the _original_ plan: convert it into a `ParseState` where `XmlXirReader` became a context, and force `Parser` to yield by feeding it a stream of tokens with `repeat`, but that ended up performing poorly relative to this change. I did some investigation, which I might write about in the future, but for now, this solution works just fine. DEV-7145	2022-06-02 10:30:44 -04:00
Mike Gerwitz	f8c28655dc	tamer: parse: Split into multiple modules This abstraction has grown quite a bit, and it's time to start formalizing it a bit. This split doesn't change any behavior, but it does start to make it easier to reason about by clearly stating the broad components and how they interact with one-another. This doesn't yet move the tests; those will come next, but they are very few. The reason I gave previously for this was because (a) they're tested indirectly via the systems that utilize them and (b) because the abstraction was not yet settled on the process was already very expensive. No test coverage was lost---it's only that failures were potentially harder to debug on test failures, but in practice not even this was true, because the deeply expressive types all but ensured that, if it compiles, it will function in a way that is expected. Unit tests and documentation for this system will be added once I'm sure that this abstraction is in a proper state. DEV-7145	2022-06-01 11:32:58 -04:00
Mike Gerwitz	63aa452197	tamer: parse: Move parse::lower into Lower This also modifies `poc` such that `Lower` is invoked as an associated function rather than a method to emphasize the pattern that is forming, so that it can be later abstracted away. DEV-11864	2022-06-01 11:15:43 -04:00
Mike Gerwitz	f40f8bbafc	tamer: parse: Rename {lower__while_ok=>lower_} The `while_ok` can just be implied with a lowering operation, and that reduces the name complexity so that we can maybe introduce even more specialized methods without resulting in a huge sentence as a name. DEV-11864	2022-05-27 14:10:55 -04:00
Mike Gerwitz	b084e23497	tamer: Refactor asg_builder into obj::xmlo::lower and asg::air This finally uses `parse` all the way up to aggregation into the ASG, as can be seen by the mess in `poc`. This will be further simplified---I just need to get this committed so that I can mentally get it off my plate. I've been separating this commit into smaller commits, but there's a point where it's just not worth the effort anymore. I don't like making large changes such as this one. There is still work to do here. First, it's worth re-mentioning that `poc` means "proof-of-concept", and represents things that still need a proper home/abstraction. Secondly, `poc` is retrieving the context of two parsers---`LowerContext` and `Asg`. The latter is desirable, since it's the final aggregation point, but the former needs to be eliminated; in particular, packages need to be worked into the ASG so that `found` can be removed. Recursively loading `xmlo` files still happens in `poc`, but the compiler will need this as well. Once packages are on the ASG, along with their state, that responsibility can be generalized as well. That will then simplify lowering even further, to the point where hopefully everything has the same shape (once final aggregation has an abstraction), after which we can then create a final abstraction to concisely stitch everything together. Right now, Rust isn't able to infer `S` for `Lower<S, LS>`, which is unfortunate, but we'll be able to help it along with a more explicit abstraction. DEV-11864	2022-05-27 13:51:29 -04:00
Mike Gerwitz	eafb3b2a1b	tamer: Add Display impl for each ParseState for generic ParseErrors This is intended to describe, to the user, the state that the parser is in. This will be used to convey additional information for general parser errors, but it should also probably be integrated into parsers' individual errors as well when appropriate. This is something I expected to add at some point, but I wanted to add them because, when dealing with lowering errors, it can be difficult to tell what parser the error originated from. DEV-11864	2022-05-25 15:26:02 -04:00
Mike Gerwitz	9edc32dd3b	tamer: parse::LowerIter: Generic inner TripIter iterator This commit is preparing to compose LowerIter directly. DEV-11864	2022-05-24 10:27:14 -04:00
Mike Gerwitz	f218c452b9	tamer: iter::trip: Flatten Result The `*_iter_while_ok` functions now compose like monads, flattening `Result` at each step and drastically simplifying handling of error types. This also removes the bunch of `?`s at the end of the expression, and allows me to use `?` within the callback itself. I had originally not used `Result` as the return type of the callback because I was not entirely sure how I was going to use them, but it's now clear that I _always_ use `Result` as the return type, and so there's no use in trying to be too accommodating; it can always change in the future. This is desirable not just for cleanup, but because trying to refactor `asg_builder` into a pair of `Parser`s is really messy to chain without flattening, especially given some state that has to leak temporarily to the caller. More on that in a future commit. DEV-11864	2022-05-20 16:08:16 -04:00
Mike Gerwitz	263cb68380	tamer: parse: Persistent context This allows retrieving and providing a context to a `Parser`. This is intended for use with an aggregating parser, in particular to construct the ASG and return it. This is a component of a change that replaces `asg_builder` with a `Parser`-based lowering into the ASG, but there are still changes that need to be made to simplify things and complete its integration. DEV-11864	2022-05-18 16:15:09 -04:00
Mike Gerwitz	001499d921	tamer: parse::ParseError: Remove Eq trait bound Just as in other commits, since it's an unnecessary limitation. DEV-11864	2022-05-18 16:06:22 -04:00
Mike Gerwitz	c49d87976d	tamer: parse::Token: Remove Eq trait bound `PartialEq` remains, and is all that is needed. See previous commit regarding the removal of this same bound from `Context`. This can be re-added if it ends up actually being necessary. But Tokens are ephemeral and used only in lowering pipelines, using pattern matching. DEV-11864	2022-05-16 10:05:14 -04:00
Mike Gerwitz	0493e68cb3	tamer: parse::ParseState::Context: Add missing comment DEV-11864	2022-05-10 11:06:22 -04:00
Mike Gerwitz	0ef0d2b553	tamer: parse::ParseState:Error: Relax Eq trait bound This is unnecessarily restrictive, since we do not require anything further than `PartialEq` for the situations where we care about equality (tests). DEV-11864	2022-05-06 15:28:47 -04:00
Mike Gerwitz	9f990e19e9	tamer: parse::ParseState::Context: Remove Default trait bound This is too restrictive, especially for parsers that fold into something, like the ASG, which may exist prior to invoking the parser. This moves the trait bound to the functions that actually need it. Those obviously cannot be used if the Context does not implement `Default`, but I'll provide alternative conveniences. DEV-11864	2022-05-05 15:55:04 -04:00
Mike Gerwitz	1ad2fb1dc8	Copyright year update 2022 RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no "Group"), but I'm not sure if the legal name has been changed yet or not, so I'll wait on that.	2022-05-03 14:14:29 -04:00
Mike Gerwitz	eaa8133d21	tamer: diagnose: Introduction of diagnostic system This is a working concept that will continue to evolve. I wanted to start with some basic output before getting too carried away, since there's a lot of potential here. This is heavily influenced by Rust's helpful diagnostic messages, but will take some time to realize a lot of the things that Rust does. The next step will be to resolve line and column numbers, and then possibly include snippets and underline spans, placing the labels alongside them. I need to balance this work with everything else I have going on. This is a large commit, but it converts the existing Error Display impls into Diagnostic. This separation is a bit verbose, so I'll see how this ends up evolving. Diagnostics are tied to Error at the moment, but I imagine in the future that any object would be able to describe itself, error or not, which would be useful in the future both for the Summary Page and for query functionality, to help developers understand the systems they are writing using TAME. Output is integrated into tameld only in this commit; I'll add tamec next. Examples of what this outputs are available in the test cases in this commit. DEV-10935	2022-04-13 15:22:46 -04:00
Mike Gerwitz	c49510646b	tamer: parse::Parser (last_span): Replace Option with UNKNOWN_SPAN There's no use in complicating the error handling here when we'd just default to `UNKNOWN_SPAN` anyway when trying to render it. `UNKNOWN_SPAN` didn't exist at the time of writing. DEV-10935	2022-04-12 09:59:00 -04:00
Mike Gerwitz	6871a0cdc7	tamer: parse (ParseState): Doc correction regarding determinism The pair is now a triple and parsers are often NFAs.	2022-04-05 15:55:58 -04:00
Mike Gerwitz	e77bdaf19a	tamer: parse: Introduce mutable Context This resolves the performance issues caused by Rust's failure to elide the ElementStack (ArrayVec) memcpys on move. Since XIRF is invoked tens of millions of times in some cases for larger systems, prior to this change, failure to optimize away moves for XIRF resulted in tens of millions of memcpys. This resulted in linking of one program going from 1s -> ~15s. This change reduces it to ~2.5s with the wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the subject of future changes). In particular, this change introduces a new mutable reference to `ParseState::parse_token`, which is a reference to a `Context` owned by the caller (e.g. `Parser`). In the case of XIRF, this means that `Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of `flat::State`; this allows the latter to remain pure and benefit from Rust's move optimizations, without sacrificing the otherwise-pure implementation. ParseStates that do not need a mutable context can use `NoContext` and remain pure. DEV-12024	2022-04-05 15:50:53 -04:00
Mike Gerwitz	1a04d99f15	tamer: obj::xmlo::reader: Working xmlo reader This makes the necessary tweaks to have the entire linker work end-to-end and produce a compatible xmle file (that is, identical except for nondeterministic topological ordering). That's good, and finally that can get off of my plate. What's disappointing, and what I'll have more information on in future commits, is how slow it is. The linking of our largest package goes from ~1s -> ~15s with this change. The reason is because of tens of millions of `memcpy` calls. Why? The ParseState abstraction is pure and passes an owned `self` around, and Parser replaces its own reference using this: let result; TransitionResult(Transition(self.state), result) = take(&mut self.state).parse_token(tok); Naively, this would store a copy of the old state in `result`, allocate a new ParseState for `self.state`, pass the original or a copy to `parse_token`, and then overwrite `self.state` with the new ParseState that is returned once it is all over. Of course, that'd be devastating. What we want to happen is for Rust to realize that it can just pass a reference to `self.state` and perform no copying at all. For certain parsers, this is exactly what happens. Great! But for XIRF, it we have this: /// Stack of element [`QName`] and [`Span`] pairs, /// representing the current level of nesting. /// /// This storage is statically allocated, /// allowing XIRF's parser to avoid memory allocation entirely. type ElementStack<const MAX_DEPTH: usize> = ArrayVec<(QName, Span), MAX_DEPTH>; /// XIRF document parser state. /// /// This parser is a pushdown automaton that parses a single XML document. #[derive(Debug, Default, PartialEq, Eq)] pub enum State<const MAX_DEPTH: usize, SA = AttrParseState> where SA: FlatAttrParseState, { /// Document parsing has not yet begun. #[default] PreRoot, /// Parsing nodes. NodeExpected(ElementStack<MAX_DEPTH>), /// Delegating to attribute parser. AttrExpected(ElementStack<MAX_DEPTH>, SA), /// End of document has been reached. Done, } ParseState contains an ArrayVec, and its implementation details are causes LLVM _not_ to elide the `memcpy`. And there's a lot of them. Considering that ParseState is supposed to use only statically allocated memory and be zero-copy, this is rather ironic. Now, this _could_ be potentially fixed by not using ArrayVec; removing it (and the corresponding checks for balanced tags) gets us down to 2s (which still needs improvement), but we can't have a core abstraction in our system resting on a house of cards. What if the optimization changes between releases and suddenly linking / building becomes shit slow? That's too much of a risk. Further, having to limit what abstractions we use just to appease the compiler to optimize away moves is very restrictive. The better option seems like to go back to what I used to do: pass around `&mut self`. I had moved to an owned `self` to force consideration of _all_ state transitions, but I can try to do the same thing in a different type of way using mutable references, and then we avoid this problem. The abstraction isn't pure (in the functional sense) anymore, but it's safe and isn't relying on delicate inlining and optimizer implementation details to have a performant system. More information to come. DEV-10863	2022-04-01 16:31:14 -04:00
Mike Gerwitz	fb3da09fa4	tamer: obj::xmlo::reader: preproc:sym-deps processing This parses the symbol dependency list (adjacency list). I'm noticing some glaring issues in error handling, particularly that the token being parsed while an error occurs is not returned and so recovery is impossible. I'll have to address that later on, after I get this parser completed. Another previous question that I had a hard time answering in prior months was how I was going to compose boilerplate parsers, e.g. handling the parsing of single-attribute elements and such. A pattern is clearly taking shape, and with the composition of parsers more formalized, that'll be able to be abstracted away. But again, that's going to wait until after this parser is actually functioning. Too many delays so far. DEV-10863	2022-03-30 15:05:55 -04:00
Mike Gerwitz	5c16add95d	tamer: parse (Transitionable): New This simply removes boilerplate. This will receive concrete examples once I come up with docs for the entire module; there's boilerplate involved in testing and documenting this in isolation and the time investment is not worth it yet until I'm certain that this will not be changed. DEV-10863	2022-03-30 10:03:14 -04:00
Mike Gerwitz	4cb478a42d	tamer: parser::ParseState::delegate_lookahead: New concept This introduces a new method similar to the previous `delegate`, but with another closure that allows for handling lookahead tokens from the child parser. Admittedly, this isn't exactly what I was going for---a list of arguments isn't exactly self-documenting, especially with the brevity when the arguments line up---but this was easy to do and so I'll run with this for now. This also modified `delegate` to accept a context, even though it wasn't necessary, both for consistency with its lookup counterpart and for brevity with the `into` argument (allowing, in our case, to just pass the name of the variant, rather than a closure). I'm not going to handle the actual starting and accepting state stitching abstraction for now; I'd like to observe future boilerplate more before I consider the best way to handle it, though I do have some ideas. DEV-10863	2022-03-29 14:46:43 -04:00
Mike Gerwitz	2a3d5be159	tamer: parse::ParseState::delegate: Initial state stitching concept This is the delegation portion of what I've come to call "state stitching"---wiring together two state machines that recognize the same input tokens. This handles the delegation of tokens once the parser has been entered, but does not yet handle the actual stitching part of it: wiring the start and accepting states of the child parser to the parent. This is indirectly tested by the XmloReader, but it will receive its own tests once I further finalize this concept. I'm playing around with some ideas. With that said, a quick visual inspection together with the guarantees provided by the type system should convince any familiar reader of its correctness. DEV-10863	2022-03-29 14:12:26 -04:00
Mike Gerwitz	f402e51d04	tamer: parse: More flexible Transition API This does some cleanup and adds `parse::Object` for use in disambiguating `From` for `ParseStatus`, allowing the `Transition` API to be much more flexible in the data it accepts and automatically converts. This allows us to concisely provide raw output data to be wrapped, or provide `ParseStatus` directly when more convenient. There aren't yet examples in the docs; I'll do so once I make sure this API is actually utilized as intended. DEV-10863	2022-03-25 16:45:32 -04:00

1 2

57 Commits (9c6b00a124cd4a381eadf4e0090a921d83620407)