employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	f8c28655dc	tamer: parse: Split into multiple modules This abstraction has grown quite a bit, and it's time to start formalizing it a bit. This split doesn't change any behavior, but it does start to make it easier to reason about by clearly stating the broad components and how they interact with one-another. This doesn't yet move the tests; those will come next, but they are very few. The reason I gave previously for this was because (a) they're tested indirectly via the systems that utilize them and (b) because the abstraction was not yet settled on the process was already very expensive. No test coverage was lost---it's only that failures were potentially harder to debug on test failures, but in practice not even this was true, because the deeply expressive types all but ensured that, if it compiles, it will function in a way that is expected. Unit tests and documentation for this system will be added once I'm sure that this abstraction is in a proper state. DEV-7145	2022-06-01 11:32:58 -04:00
Mike Gerwitz	63aa452197	tamer: parse: Move parse::lower into Lower This also modifies `poc` such that `Lower` is invoked as an associated function rather than a method to emphasize the pattern that is forming, so that it can be later abstracted away. DEV-11864	2022-06-01 11:15:43 -04:00
Mike Gerwitz	f40f8bbafc	tamer: parse: Rename {lower__while_ok=>lower_} The `while_ok` can just be implied with a lowering operation, and that reduces the name complexity so that we can maybe introduce even more specialized methods without resulting in a huge sentence as a name. DEV-11864	2022-05-27 14:10:55 -04:00
Mike Gerwitz	b084e23497	tamer: Refactor asg_builder into obj::xmlo::lower and asg::air This finally uses `parse` all the way up to aggregation into the ASG, as can be seen by the mess in `poc`. This will be further simplified---I just need to get this committed so that I can mentally get it off my plate. I've been separating this commit into smaller commits, but there's a point where it's just not worth the effort anymore. I don't like making large changes such as this one. There is still work to do here. First, it's worth re-mentioning that `poc` means "proof-of-concept", and represents things that still need a proper home/abstraction. Secondly, `poc` is retrieving the context of two parsers---`LowerContext` and `Asg`. The latter is desirable, since it's the final aggregation point, but the former needs to be eliminated; in particular, packages need to be worked into the ASG so that `found` can be removed. Recursively loading `xmlo` files still happens in `poc`, but the compiler will need this as well. Once packages are on the ASG, along with their state, that responsibility can be generalized as well. That will then simplify lowering even further, to the point where hopefully everything has the same shape (once final aggregation has an abstraction), after which we can then create a final abstraction to concisely stitch everything together. Right now, Rust isn't able to infer `S` for `Lower<S, LS>`, which is unfortunate, but we'll be able to help it along with a more explicit abstraction. DEV-11864	2022-05-27 13:51:29 -04:00
Mike Gerwitz	eafb3b2a1b	tamer: Add Display impl for each ParseState for generic ParseErrors This is intended to describe, to the user, the state that the parser is in. This will be used to convey additional information for general parser errors, but it should also probably be integrated into parsers' individual errors as well when appropriate. This is something I expected to add at some point, but I wanted to add them because, when dealing with lowering errors, it can be difficult to tell what parser the error originated from. DEV-11864	2022-05-25 15:26:02 -04:00
Mike Gerwitz	9edc32dd3b	tamer: parse::LowerIter: Generic inner TripIter iterator This commit is preparing to compose LowerIter directly. DEV-11864	2022-05-24 10:27:14 -04:00
Mike Gerwitz	f218c452b9	tamer: iter::trip: Flatten Result The `*_iter_while_ok` functions now compose like monads, flattening `Result` at each step and drastically simplifying handling of error types. This also removes the bunch of `?`s at the end of the expression, and allows me to use `?` within the callback itself. I had originally not used `Result` as the return type of the callback because I was not entirely sure how I was going to use them, but it's now clear that I _always_ use `Result` as the return type, and so there's no use in trying to be too accommodating; it can always change in the future. This is desirable not just for cleanup, but because trying to refactor `asg_builder` into a pair of `Parser`s is really messy to chain without flattening, especially given some state that has to leak temporarily to the caller. More on that in a future commit. DEV-11864	2022-05-20 16:08:16 -04:00
Mike Gerwitz	263cb68380	tamer: parse: Persistent context This allows retrieving and providing a context to a `Parser`. This is intended for use with an aggregating parser, in particular to construct the ASG and return it. This is a component of a change that replaces `asg_builder` with a `Parser`-based lowering into the ASG, but there are still changes that need to be made to simplify things and complete its integration. DEV-11864	2022-05-18 16:15:09 -04:00
Mike Gerwitz	001499d921	tamer: parse::ParseError: Remove Eq trait bound Just as in other commits, since it's an unnecessary limitation. DEV-11864	2022-05-18 16:06:22 -04:00
Mike Gerwitz	c49d87976d	tamer: parse::Token: Remove Eq trait bound `PartialEq` remains, and is all that is needed. See previous commit regarding the removal of this same bound from `Context`. This can be re-added if it ends up actually being necessary. But Tokens are ephemeral and used only in lowering pipelines, using pattern matching. DEV-11864	2022-05-16 10:05:14 -04:00
Mike Gerwitz	0493e68cb3	tamer: parse::ParseState::Context: Add missing comment DEV-11864	2022-05-10 11:06:22 -04:00
Mike Gerwitz	0ef0d2b553	tamer: parse::ParseState:Error: Relax Eq trait bound This is unnecessarily restrictive, since we do not require anything further than `PartialEq` for the situations where we care about equality (tests). DEV-11864	2022-05-06 15:28:47 -04:00
Mike Gerwitz	9f990e19e9	tamer: parse::ParseState::Context: Remove Default trait bound This is too restrictive, especially for parsers that fold into something, like the ASG, which may exist prior to invoking the parser. This moves the trait bound to the functions that actually need it. Those obviously cannot be used if the Context does not implement `Default`, but I'll provide alternative conveniences. DEV-11864	2022-05-05 15:55:04 -04:00
Mike Gerwitz	1ad2fb1dc8	Copyright year update 2022 RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no "Group"), but I'm not sure if the legal name has been changed yet or not, so I'll wait on that.	2022-05-03 14:14:29 -04:00
Mike Gerwitz	eaa8133d21	tamer: diagnose: Introduction of diagnostic system This is a working concept that will continue to evolve. I wanted to start with some basic output before getting too carried away, since there's a lot of potential here. This is heavily influenced by Rust's helpful diagnostic messages, but will take some time to realize a lot of the things that Rust does. The next step will be to resolve line and column numbers, and then possibly include snippets and underline spans, placing the labels alongside them. I need to balance this work with everything else I have going on. This is a large commit, but it converts the existing Error Display impls into Diagnostic. This separation is a bit verbose, so I'll see how this ends up evolving. Diagnostics are tied to Error at the moment, but I imagine in the future that any object would be able to describe itself, error or not, which would be useful in the future both for the Summary Page and for query functionality, to help developers understand the systems they are writing using TAME. Output is integrated into tameld only in this commit; I'll add tamec next. Examples of what this outputs are available in the test cases in this commit. DEV-10935	2022-04-13 15:22:46 -04:00
Mike Gerwitz	c49510646b	tamer: parse::Parser (last_span): Replace Option with UNKNOWN_SPAN There's no use in complicating the error handling here when we'd just default to `UNKNOWN_SPAN` anyway when trying to render it. `UNKNOWN_SPAN` didn't exist at the time of writing. DEV-10935	2022-04-12 09:59:00 -04:00
Mike Gerwitz	6871a0cdc7	tamer: parse (ParseState): Doc correction regarding determinism The pair is now a triple and parsers are often NFAs.	2022-04-05 15:55:58 -04:00
Mike Gerwitz	e77bdaf19a	tamer: parse: Introduce mutable Context This resolves the performance issues caused by Rust's failure to elide the ElementStack (ArrayVec) memcpys on move. Since XIRF is invoked tens of millions of times in some cases for larger systems, prior to this change, failure to optimize away moves for XIRF resulted in tens of millions of memcpys. This resulted in linking of one program going from 1s -> ~15s. This change reduces it to ~2.5s with the wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the subject of future changes). In particular, this change introduces a new mutable reference to `ParseState::parse_token`, which is a reference to a `Context` owned by the caller (e.g. `Parser`). In the case of XIRF, this means that `Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of `flat::State`; this allows the latter to remain pure and benefit from Rust's move optimizations, without sacrificing the otherwise-pure implementation. ParseStates that do not need a mutable context can use `NoContext` and remain pure. DEV-12024	2022-04-05 15:50:53 -04:00
Mike Gerwitz	1a04d99f15	tamer: obj::xmlo::reader: Working xmlo reader This makes the necessary tweaks to have the entire linker work end-to-end and produce a compatible xmle file (that is, identical except for nondeterministic topological ordering). That's good, and finally that can get off of my plate. What's disappointing, and what I'll have more information on in future commits, is how slow it is. The linking of our largest package goes from ~1s -> ~15s with this change. The reason is because of tens of millions of `memcpy` calls. Why? The ParseState abstraction is pure and passes an owned `self` around, and Parser replaces its own reference using this: let result; TransitionResult(Transition(self.state), result) = take(&mut self.state).parse_token(tok); Naively, this would store a copy of the old state in `result`, allocate a new ParseState for `self.state`, pass the original or a copy to `parse_token`, and then overwrite `self.state` with the new ParseState that is returned once it is all over. Of course, that'd be devastating. What we want to happen is for Rust to realize that it can just pass a reference to `self.state` and perform no copying at all. For certain parsers, this is exactly what happens. Great! But for XIRF, it we have this: /// Stack of element [`QName`] and [`Span`] pairs, /// representing the current level of nesting. /// /// This storage is statically allocated, /// allowing XIRF's parser to avoid memory allocation entirely. type ElementStack<const MAX_DEPTH: usize> = ArrayVec<(QName, Span), MAX_DEPTH>; /// XIRF document parser state. /// /// This parser is a pushdown automaton that parses a single XML document. #[derive(Debug, Default, PartialEq, Eq)] pub enum State<const MAX_DEPTH: usize, SA = AttrParseState> where SA: FlatAttrParseState, { /// Document parsing has not yet begun. #[default] PreRoot, /// Parsing nodes. NodeExpected(ElementStack<MAX_DEPTH>), /// Delegating to attribute parser. AttrExpected(ElementStack<MAX_DEPTH>, SA), /// End of document has been reached. Done, } ParseState contains an ArrayVec, and its implementation details are causes LLVM _not_ to elide the `memcpy`. And there's a lot of them. Considering that ParseState is supposed to use only statically allocated memory and be zero-copy, this is rather ironic. Now, this _could_ be potentially fixed by not using ArrayVec; removing it (and the corresponding checks for balanced tags) gets us down to 2s (which still needs improvement), but we can't have a core abstraction in our system resting on a house of cards. What if the optimization changes between releases and suddenly linking / building becomes shit slow? That's too much of a risk. Further, having to limit what abstractions we use just to appease the compiler to optimize away moves is very restrictive. The better option seems like to go back to what I used to do: pass around `&mut self`. I had moved to an owned `self` to force consideration of _all_ state transitions, but I can try to do the same thing in a different type of way using mutable references, and then we avoid this problem. The abstraction isn't pure (in the functional sense) anymore, but it's safe and isn't relying on delicate inlining and optimizer implementation details to have a performant system. More information to come. DEV-10863	2022-04-01 16:31:14 -04:00
Mike Gerwitz	fb3da09fa4	tamer: obj::xmlo::reader: preproc:sym-deps processing This parses the symbol dependency list (adjacency list). I'm noticing some glaring issues in error handling, particularly that the token being parsed while an error occurs is not returned and so recovery is impossible. I'll have to address that later on, after I get this parser completed. Another previous question that I had a hard time answering in prior months was how I was going to compose boilerplate parsers, e.g. handling the parsing of single-attribute elements and such. A pattern is clearly taking shape, and with the composition of parsers more formalized, that'll be able to be abstracted away. But again, that's going to wait until after this parser is actually functioning. Too many delays so far. DEV-10863	2022-03-30 15:05:55 -04:00
Mike Gerwitz	5c16add95d	tamer: parse (Transitionable): New This simply removes boilerplate. This will receive concrete examples once I come up with docs for the entire module; there's boilerplate involved in testing and documenting this in isolation and the time investment is not worth it yet until I'm certain that this will not be changed. DEV-10863	2022-03-30 10:03:14 -04:00
Mike Gerwitz	4cb478a42d	tamer: parser::ParseState::delegate_lookahead: New concept This introduces a new method similar to the previous `delegate`, but with another closure that allows for handling lookahead tokens from the child parser. Admittedly, this isn't exactly what I was going for---a list of arguments isn't exactly self-documenting, especially with the brevity when the arguments line up---but this was easy to do and so I'll run with this for now. This also modified `delegate` to accept a context, even though it wasn't necessary, both for consistency with its lookup counterpart and for brevity with the `into` argument (allowing, in our case, to just pass the name of the variant, rather than a closure). I'm not going to handle the actual starting and accepting state stitching abstraction for now; I'd like to observe future boilerplate more before I consider the best way to handle it, though I do have some ideas. DEV-10863	2022-03-29 14:46:43 -04:00
Mike Gerwitz	2a3d5be159	tamer: parse::ParseState::delegate: Initial state stitching concept This is the delegation portion of what I've come to call "state stitching"---wiring together two state machines that recognize the same input tokens. This handles the delegation of tokens once the parser has been entered, but does not yet handle the actual stitching part of it: wiring the start and accepting states of the child parser to the parent. This is indirectly tested by the XmloReader, but it will receive its own tests once I further finalize this concept. I'm playing around with some ideas. With that said, a quick visual inspection together with the guarantees provided by the type system should convince any familiar reader of its correctness. DEV-10863	2022-03-29 14:12:26 -04:00
Mike Gerwitz	f402e51d04	tamer: parse: More flexible Transition API This does some cleanup and adds `parse::Object` for use in disambiguating `From` for `ParseStatus`, allowing the `Transition` API to be much more flexible in the data it accepts and automatically converts. This allows us to concisely provide raw output data to be wrapped, or provide `ParseStatus` directly when more convenient. There aren't yet examples in the docs; I'll do so once I make sure this API is actually utilized as intended. DEV-10863	2022-03-25 16:45:32 -04:00
Mike Gerwitz	279ddc79d7	tamer: parse::TransitionResult: Alias=>newtype This converts the tuple type alias into a newtype, so that we may provide our own implementations. This differs from a previous approach that I took, which involved making this type `Result<(S, T), (S, E)>` so that the return values composed well with other functions. But the reality is that this is used only by other `ParseState`s and `Parser`, so it's unnecessary. However, this is also an attempt to utilize the new Try and FromResidual traits; note how the Try associated types match precisely what I was trying to do before, though they're used as intermediate types. I'll see how this evolves. DEV-10863	2022-03-25 12:28:50 -04:00
Mike Gerwitz	2e98a69d15	Revert "tamer: parse::TransitionResult: Move common Transition into Result" This reverts commit `bf5da75096`.	2022-03-25 09:17:25 -04:00
Mike Gerwitz	bf5da75096	tamer: parse::TransitionResult: Move common Transition into Result This allows the Results to compose and, importantly, is compatible with `?` without having to put in any extra effort. This makes puts the caller in an awkward spot, so I introduced a utility function `result_tup0_invert` for now; we'll see if that stays or evolves differently. DEV-10863	2022-03-24 23:48:30 -04:00
Mike Gerwitz	fbf786086a	tamer: parse::Parser (lower_while_ok): New method This introduces a WIP lowering operation, abstracting away quite a bit of the manual wiring work, which is really important to providing an API that provides the proper level of abstraction for actually understanding what the system is doing. This does not yet have tests associated with it---I had started, but it's a lot of work and boilerplate for something that is going to evolve. Generally, I wouldn't use that as an excuse, but the robust type definitions in play, combined with the tiny amount of actual logic, provide a pretty high level of confidence. It's very difficult to wire these types together and produce something incorrect without doing something obviously bad. Similarly, I'm holding off on proper docs too, though I did write some information here. More to come, after I actually get to work on the XmloReader. On a side note: I'm happy to have made progress on this, since this wiring is something I've been dreading and wondering about since before the Parser abstraction even existed. Note also that this makes parser::feed_toks private again---I don't intend to support push parsers yet, since they're only needed internally. Maybe for error recovery, but I'll wait to decide until it's actually needed. DEV-10863	2022-03-23 14:31:16 -04:00
Mike Gerwitz	b4a7591357	tamer: obj::xmlo::reader: Begin conversion to ParseState This begins to transition XmloReader into a ParseState. Unlike previous changes where ParseStates were composed into a single ParseState, this is instead a lowering operation that will take the output of one Parser and provide it to another. The mess in ld::poc (...which still needs to be refactored and removed) shows the concept, which will be abstracted away. This won't actually get to the ASG in order to test that that this works with the wip-xmlo-xir-reader flag on (development hasn't gotten that far yet), but since it type-checks, it should conceptually work. Wiring lowering operations together is something that I've been dreading for months, but my approach of only abstracting after-the-fact has helped to guide a sane approach for this. For some definition of "sane". It's also worth noting that AsgBuilder will too become a ParseState implemented as another lowering operation, so: XIR -> XIRF -> XMLO -> ASG These steps will all be streaming, with iteration happening only at the topmost level. For this reason, it's important that ASG not be responsible for doing that pull, and further we should propagate Parsed::Incomplete rather than filtering it out and looping an indeterminate number of times outside of the toplevel. One final note: the choice of 64 for the maximum depth is entirely arbitrary and should be more than generous; it'll be finalized at some point in the future once I actually evaluate what maximum depth is reasonable based on how the system is used, with some added growing room. DEV-10863	2022-03-22 14:06:52 -04:00
Mike Gerwitz	f6957ff028	tamer: parse::Parser: Extract logic from Iterator impl This introduces a (still-private) way to _push_ tokens into the parser, rather than relying purely on a pull-based interface. Not only does this simplify the iterator, but this is also preparing to make the new `feed_tok` public so that parsers can be composed in more contexts. I suspect that this method may also be useful for error recovery, since it can be used to inject tokens into arbitrary points of a token stream. I kept the new method private for now so that I can introduce the new API and docs separate from this refactoring. DEV-10863	2022-03-22 10:10:59 -04:00
Mike Gerwitz	14638a612f	tamer: {xir::=>}parse: Move parser out of XIR The parsing framework originally created for XIR is now more general and useful to other things. We'll see how this evolves. This needs additional documentation, but I'd like to see how it changes as I implement XmloReader and then some of the source readers first. DEV-10863	2022-03-18 16:24:53 -04:00

31 Commits (f8c28655dc2ffb810eef5193f89091f7909997da)