employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	34b64fd619	tamer: asg::air: AIR as a sum IR This introduces a new macro `sum_ir!` to help with a long-standing problem of not being able to easily narrow types in Rust without a whole lot of boilerplate. This patch includes a bit of documentation, so see that for more information. This was not a welcome change---I jumped down this rabbit hole trying to decompose `AirAggregate` so that I can share portions of parsing with the current parser and a template parser. I can now proceed with that. This is not the only implementation that I had tried. I previously inverted the approach, as I've been doing manually for some time: manually create types to hold the sets of variants, and then create a sum type to hold those types. That works, but it resulted in a mess for systems that have to use the IR, since now you have two enums to contend with. I didn't find that to be appropriate, because we shouldn't complicate the external API for implementation details. The enum for IRs is supposed to be like a bytecode---a list of operations that can be performed with the IR. They can be grouped if it makes sense for a public API, but in my case, I only wanted subsets for the sake of delegating responsibilities to smaller subsystems, while retaining the context that `match` provides via its exhaustiveness checking but does not expose as something concrete (which is deeply frustrating!). Anyway, here we are; this'll be refined over time, hopefully, and portions of it can be generalized for removing boilerplate from other IRs. Another thing to note is that this syntax is really a compromise---I had to move on, and I was spending too much time trying to get creative with `macro_rules!`. It isn't the best, and it doesn't seem very Rust-like in some places and is therefore not necessarily all that intuitive. This can be refined further in the future. But the end result, all things considered, isn't too bad. DEV-13708	2023-03-10 14:27:58 -05:00
Mike Gerwitz	d42a46d2b8	tamer: NIR->xmli template definition setup This sets the stage for template parsing, and finally decides how we're going to represent templates on the ASG. This is going to start simple, since my original plans for improving how templates are handled (conceptually) is going to have to wait. This is the last difficult object type to figure out, with respect to graph representation and derivation, so I wanted to get it out of the way. DEV-13708	2023-03-10 14:27:58 -05:00
Mike Gerwitz	08278bc867	tamer: asg::air::Air::{ExprIdent=>BindIdent}: Rename I wasn't initially sure whether I'd want separate tokens for different types of identifying operations, but now that I see that it is clear from the current state of the parser, there's no need. This matches the name of the token in NIR. DEV-13708	2023-03-10 14:27:58 -05:00
Mike Gerwitz	4afc8c22e6	tamer: asg::air: Merge Pkg closing span The `Pkg` span will now properly reflect the entire definition of the package including the opening and closing tags. This was found while I was working on a graph traversal. DEV-13597	2023-03-10 14:27:57 -05:00
Mike Gerwitz	39e98210be	tamer: asg::graph::object::ident::ObjectIndex::<Ident>::bind_definition: Replace ident span I noticed this while working on a graph traversal. The unit test used the same span for both the reference _and_ the binding, so I didn't notice. -_- The problem with this, though, is that we do not have a separate span representing the source location of the identifier reference. The reason is that we decided to re-use an existing node rather than creating another one, which would add another inconvenient layer of indirection (and complexity). So, I may have to add (optional?) spans to edges. DEV-13708	2023-03-10 14:27:57 -05:00
Mike Gerwitz	2d3b27ac01	tamer: asg: Root package definition This causes a package definition to be rooted (so that it can be easily accessed for a graph walk). This keeps consistent with the new `ObjectIndex`-based API by introducing a unit `Root` `ObjectKind` and the boilerplate that goes with it. This boilerplate, now glaringly obvious, will be refactored at some point, since its repetition is onerous and distracting. DEV-13159	2023-02-01 10:34:17 -05:00
Mike Gerwitz	f753a23bad	tamer: asg: Introduce edge from Package to Ident Included in this diff are the corresponding changes to the graph to support the change. Adding the edge was easy, but we also need a way to get the package for an identifier. The easiest way to do that is to modify the edge weight to include not just the target node type, but also the source. DEV-13159	2023-02-01 10:34:17 -05:00
Mike Gerwitz	39d093525c	tamer: nir, asg: Introduce package to ASG This does not yet create edges from identifiers to the package; just getting this introduced was quite a bit of work, so I want to get this committed. Note that this also includes a change to NIR so that `Close` contains the entity so that we can pattern-match for AIR transformations rather than retaining yet another stack with checks that are already going to be done by AIR. This makes NIR stand less on its own from a self-validation point, but that's okay, given that it's the language that the user entered and, conceptually, they could enter invalid NIR the same as they enter invalid XML (e.g. from a REPL). In _practice_, of course, NIR is lowered from XML and the schema is enforced during that lowering and so the validation does exist as part of that parsing. These concessions speak more to the verbosity of the language (Rust) than anything. DEV-13159	2023-02-01 10:34:16 -05:00
Mike Gerwitz	39ebb74583	tamer: asg: Expression identifier references This adds support for identifier references, adding `Ident` as a valid edge type for `Expr`. There is nothing in the system yet to enforce ontology through levels of indirection; that will come later on. I'm testing these changes with a very minimal NIR parse, which I'll commit shortly. DEV-13597	2023-01-26 14:45:17 -05:00
Mike Gerwitz	ee30600f67	tamer: asg::air::Air: {Expr=>Expr} Makes grouping and code completion easier when they're prefixed. DEV-13597	2023-01-23 11:48:28 -05:00
Mike Gerwitz	954b5a2795	Copyright year and name update Ryan Specialty Group (RSG) rebranded to Ryan Specialty after its IPO.	2023-01-20 23:37:30 -05:00
Mike Gerwitz	4e3a81d7f5	tamer: asg: Bind transparent ident This provides the initial implementation allowing an identifier to be defined (bound to an object and made transparent). I'm not yet entirely sure whether I'll stick with the "transparent" and "opaque" terminology when there's also "declare" and "define", but a `Missing` state is a type of declaration and so the distinction does still seem to be important. There is still work to be done on `ObjectIndex::<Ident>::bind_definition`, which will follow. I'm going to be balancing work to provide type-level guarantees, since I don't have the time to go as far as I'd like. DEV-13597	2023-01-20 23:37:29 -05:00
Mike Gerwitz	f1cf35f499	tamer: asg: Add expression edges This introduces a number of abstractions, whose concepts are not fully documented yet since I want to see how it evolves in practice first. This introduces the concept of edge ontology (similar to a schema) using the type system. Even though we are not able to determine what the graph will look like statically---since that's determined by data fed to us at runtime---we _can_ ensure that the code _producing_ the graph from those data will produce a graph that adheres to its ontology. Because of the typed `ObjectIndex`, we're also able to implement operations that are specific to the type of object that we're operating on. Though, since the type is not (yet?) stored on the edge itself, it is possible to walk the graph without looking at node weights (the `ObjectContainer`) and therefore avoid panics for invalid type assumptions, which is bad, but I don't think that'll happen in practice, since we'll want to be resolving nodes at some point. But I'll addres that more in the future. Another thing to note is that walking edges is only done in tests right now, and so there's no filtering or anything; once there are nodes (if there are nodes) that allow for different outgoing edge types, we'll almost certainly want filtering as well, rather than panicing. We'll also want to be able to query for any object type, but filter only to what's permitted by the ontology. DEV-13160	2023-01-20 23:37:29 -05:00
Mike Gerwitz	8786ee74fa	tamer: asg::air: Expression building error cases This addresses the two outstanding `todo!` match arms representing errors in lowering expressions into the graph. As noted in the comments, these errors are unlikely to be hit when using TAME in the traditional way, since e.g. XIR and NIR are going to catch the equivalent problems within their own contexts (unbalanced tags and a valid expression grammar respectively). _But_, the IR does need to stand on its own, and I further hope that some tooling maybe can interact more directly with AIR in the future. DEV-13160	2023-01-20 23:37:29 -05:00
Mike Gerwitz	dc3cd8bbc8	tamer: asg::air::AirAggregate: Reduce duplication This refactors the previous commit a bit to remove the significant amount of duplication, as planned. DEV-7145	2023-01-20 23:37:29 -05:00
Mike Gerwitz	40c941d348	tamer: asg::air::AirAggregate: Initial impl of nested exprs This introduces a number of concepts together, again to demonstrate that they were derived. This introduces support for nested expressions, extending the previous work. It also supports error recovery for dangling expressions. The parser states are a mess; there is a lot of duplicate code here that needs refactoring, but I wanted to commit this first at a known-good state so that the diff will demonstrate the need for the change that will follow; the opportunities for abstraction are plainly visible. The immutable stack introduced here could be generalized, if needed, in the future. Another important note is that Rust optimizes away the `memcpy`s for the stack that was introduced here. The initial Parser Context was introduced because of `ArrayVec` inhibiting that elision, but Vec never had that problem. In the future, I may choose to go back and remove ArrayVec, but I had wanted to keep memory allocation out of the picture as much as possible to make the disassembly and call graph easier to reason about and to have confidence that optimizations were being performed as intended. With that said---it _should_ be eliding in tamec, since we're not doing anything meaningful yet with the graph. It does also elide in tameld, but it's possible that Rust recognizes that those code paths are never taken because tameld does nothing with expressions. So I'll have to monitor this as I progress and adjust accordingly; it's possible a future commit will call BS on everything I just said. Of course, the counter-point to that is that Rust is optimizing them away anyway, but Vec _does_ still require allocation; I was hoping to keep such allocation at the fringes. But another counter-point is that it _still_ is allocated at the fringe, when the context is initialized for the parser as part of the lowering pipeline. But I didn't know how that would all come together back then. ...alright, enough rambling. DEV-13160	2023-01-20 23:37:29 -05:00
Mike Gerwitz	4b9b173e30	tamer: asg::air::Air::span: Provide spans Not that they're loaded from object files yet, but this will at least work once they are. DEV-13160	2023-01-20 23:37:29 -05:00
Mike Gerwitz	edbfc87a54	tamer: f::Functor: New trait This commit is purposefully coupled with changes that utilize it to demonstrate that the need for this abstraction has been _derived_, not forced; TAMER doesn't aim to be functional for the sake of it, since idiomatic Rust achieves many of its benefits without the formalisms. But, the formalisms do occasionally help, and this is one such example. There is other existing code that can be refactored to take advantage of this style as well. I do _not_ wish to pull an existing functional dependency into TAMER; I want to keep these abstractions light, and eliminate them as necessary, as Rust continues to integrate new features into its core. I also want to be able to modify the abstractions to suit our particular needs. (This is _not_ a general recommendation; it's particular to TAMER and to my experience.) This implementation of `Functor` is one such example. While it is modeled after Haskell in that it provides `fmap`, the primitive here is instead `map`, with `fmap` derived from it, since `map` allows for better use of Rust idioms. Furthermore, it's polymorphic over _trait_ type parameters, not method, allowing for separate trait impls for different container types, which can in turn be inferred by Rust and allow for some very concise mapping; this is particularly important for TAMER because of the disciplined use of newtypes. For example, `foo.overwrite(span)` and `foo.overwrite(name)` are both self-documenting, and better alternatives than, say, `foo.map_span(\|_\| span)` and `foo.map_symbol(\|_\| name)`; the latter are perfectly clear in what they do, but lack a layer of abstraction, and are verbose. But the clarity of the _new_ form does rely on either good naming conventions of arguments, or explicit type annotations using turbofish notation if necessary. This will be implemented on core Rust types as appropriate and as possible. At the time of writing, we do not yet have trait specialization, and there's too many soundness issues for me to be comfortable enabling it, so that limits that we can do with something like, say, a generic `Result`, while also allowing for specialized implementations based on newtypes. DEV-13160	2023-01-20 23:37:27 -05:00
Mike Gerwitz	6e90867212	tamer: asg::object::Object{Ref=>Index}: Associate object type This makes the system a bit more ergonomic and introduces additional type safety by associating the narrowed object type with the `ObjectIndex` (previously `ObjectRef`). Not only does this allow us to explicitly state the type of object wherever those indices are stored, but it also allows the API to automatically narrow to that type when operating on it again without the caller having to worry about it. DEV-13160	2022-12-22 15:18:08 -05:00
Mike Gerwitz	646633883f	tamer: Initial concept for AIR/ASG Expr This begins to place expressions on the graph---something that I've been thinking about for a couple of years now, so it's interesting to finally be doing it. This is going to evolve; I want to get some things committed so that it's clear how I'm moving forward. The ASG makes things a bit awkward for a number of reasons: 1. I'm dealing with older code where I had a different model of doing things; 2. It's mutable, rather than the mostly-functional lowering pipeline; 3. We're dealing with an aggregate ever-evolving blob of data (the graph) rather than a stream of tokens; and 4. We don't have as many type guarantees. I've shown with the lowering pipeline that I'm able to take a mutable reference and convert it into something that's both functional and performant, where I remove it from its container (an `Option`), create a new version of it, and place it back. Rust is able to optimize away the memcpys and such and just directly manipulate the underlying value, which is often a register with all of the inlining. _But_ this is a different scenario now. The lowering pipeline has a narrow context. The graph has to keep hitting memory. So we'll see how this goes. But it's most important to get this working and measure how it performs; I'm not trying to prematurely optimize. My attempts right now are for the way that I wish to develop. Speaking to #4 above, it also sucks that I'm not able to type the relationships between nodes on the graph. Rather, it's not that I _can't_, but a project to created a typed graph library is beyond the scope of this work and would take far too much time. I'll leave that to a personal, non-work project. Instead, I'm going to have to narrow the type any time the graph is accessed. And while that sucks, I'm going to do my best to encapsulate those details to make it as seamless as possible API-wise. The performance hit of performing the narrowing I'm hoping will be very small relative to all the business logic going on (a single cache miss is bound to be far more expensive than many narrowings which are just integer comparisons and branching)...but we'll see. Introducing branching sucks, but branch prediction is pretty damn good in modern CPUs. DEV-13160	2022-12-22 14:33:28 -05:00
Mike Gerwitz	0b2e563cdb	tamer: asg: Associate spans with identifiers and introduce diagnostics This ASG implementation is a refactored form of original code from the proof-of-concept linker, which was well before the span and diagnostic implementations, and well before I knew for certain how I was going to solve that problem. This was quite the pain in the ass, but introduces spans to the AIR tokens and graph so that we always have useful diagnostic information. With that said, there are some important things to note: 1. Linker spans will originate from the `xmlo` files until we persist spans to those object files during `tamec`'s compilation. But it's better than nothing. 2. Some additional refactoring is still needed for consistency, e.g. use of `SPair`. 3. This is just a preliminary introduction. More refactoring will come as tamec is continued. DEV-13041	2022-12-16 14:44:38 -05:00
Mike Gerwitz	56d1ecf0a3	tamer: Air{Token=>} Consistency with `Nir` et al. DEV-13430	2022-12-13 14:36:38 -05:00
Mike Gerwitz	be41d056bb	tamer: nir::air: Lower to Air::TODO This actually passes data to the next parser, whereas before we were stopping short. DEV-13160	2022-12-13 14:28:16 -05:00
Mike Gerwitz	d55b3add77	tamer: asg::air::test: Extract into own file Just minor preparatory work. DEV-13160	2022-12-13 13:57:04 -05:00
Mike Gerwitz	2087672c47	tamer: parse::parser::finalize: Introduce FinalizedParser This newtype allows a caller to prove (using types) that a parser of a given type (`ParseState`) has been finalized. This will be used by the lowering pipeline to ensure that all parsers in the pipeline end up getting finalized (as you can see from a TODO added in the code, one of them is missing). The lack of such a type was an oversight during the (rather stressed) development of the parsing system, and I shouldn't need to resort to unit tests to verify that parsers have been finalized. DEV-13158	2022-10-26 12:44:19 -04:00
Mike Gerwitz	ed8a2ce28a	tamer: xir::parse::ele: Superstate not to accept early EOF This was accepting an early EOF when the active child `ParseState` was in an accepting state, because it was not ensuring that anything on the stack was also accepting. Ideally, there should be nothing on the stack, and hopefully in the future that's what happens. But with how things are today, it's important that, if anything is on the stack, it is accepting. Since `is_accepting` on the superstate is only called during finalization, and because the check terminates early, and because the stack practically speaking will only have a couple things on it max (unless we're in tail position in a deeply nested tree, without TCO [yet]), this shouldn't be an expensive check. Implementing this did require that we expose `Context` to `is_accepting`, which I had hoped to avoid having to do, but here we are. DEV-7145	2022-08-12 00:47:15 -04:00
Mike Gerwitz	e73c223a55	tamer: parser::Parser: cfg(test) tracing This produces useful parse traces that are output as part of a failing test case. The parser generator macros can be a bit confusing to deal with when things go wrong, so this helps to clarify matters. This is _not_ intended to be machine-readable, but it does show that it would be possible to generate machine-readable output to visualize the entire lowering pipeline. Perhaps something for the future. I left these inline in Parser::feed_tok because they help to elucidate what is going on, just by reading what the trace would output---that is, it helps to make the method more self-documenting, albeit a tad bit more verbose. But with that said, it should probably be extracted at some point; I don't want this to set a precedent where composition is feasible. Here's an example from test cases: [Parser::feed_tok] (input IR: XIRF) \| ==> Parser before tok is parsing attributes for `package`. \| \| Attrs_(SutAttrsState_ { ___ctx: (QName(None, LocalPart(NCName(SymbolId(46 "package")))), OpenSpan(Span { len: 0, offset: 0, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10)), ___done: false }) \| \| ==> XIRF tok: `<unexpected>` \| \| Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) \| \| ==> Parser after tok is expecting opening tag `<classify>`. \| \| ChildA(Expecting_) \| \| Lookahead: Some(Lookahead(Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)))) = note: this trace was output as a debugging aid because `cfg(test)`. [Parser::feed_tok] (input IR: XIRF) \| ==> Parser before tok is expecting opening tag `<classify>`. \| \| ChildA(Expecting_) \| \| ==> XIRF tok: `<unexpected>` \| \| Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) \| \| ==> Parser after tok is attempting to recover by ignoring element with unexpected name `unexpected` (expected `classify`). \| \| ChildA(RecoverEleIgnore_(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))) \| \| Lookahead: None = note: this trace was output as a debugging aid because `cfg(test)`. DEV-7145	2022-07-19 14:44:18 -04:00
Mike Gerwitz	91b55999e2	tamer: asg::air::{AirState=>AirAggregate}: Rename Like the previous commit, this emphasizes what is happening. DEV-7145	2022-06-02 13:26:46 -04:00
Mike Gerwitz	b084e23497	tamer: Refactor asg_builder into obj::xmlo::lower and asg::air This finally uses `parse` all the way up to aggregation into the ASG, as can be seen by the mess in `poc`. This will be further simplified---I just need to get this committed so that I can mentally get it off my plate. I've been separating this commit into smaller commits, but there's a point where it's just not worth the effort anymore. I don't like making large changes such as this one. There is still work to do here. First, it's worth re-mentioning that `poc` means "proof-of-concept", and represents things that still need a proper home/abstraction. Secondly, `poc` is retrieving the context of two parsers---`LowerContext` and `Asg`. The latter is desirable, since it's the final aggregation point, but the former needs to be eliminated; in particular, packages need to be worked into the ASG so that `found` can be removed. Recursively loading `xmlo` files still happens in `poc`, but the compiler will need this as well. Once packages are on the ASG, along with their state, that responsibility can be generalized as well. That will then simplify lowering even further, to the point where hopefully everything has the same shape (once final aggregation has an abstraction), after which we can then create a final abstraction to concisely stitch everything together. Right now, Rust isn't able to infer `S` for `Lower<S, LS>`, which is unfortunate, but we'll be able to help it along with a more explicit abstraction. DEV-11864	2022-05-27 13:51:29 -04:00

1 2

79 Commits (1cf54887565cd46dccc97b98ee0b9696235e1a52)