employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	0e0f3e658d	tamer: pipeline: Remove explicit error specification in pipeline definition It does not matter what the error of the source is as long as the caller is able to deal with it, especially given that the particular error is a property of the source, which is under control of the caller. DEV-13162	2023-06-05 13:44:49 -04:00
Mike Gerwitz	9c6b00a124	tamer: pipeline: Initial concept for declarative pipeline definition This has been the ultimate goal for the pipeline for some time---the ability to declaratively define the lowering pipeline in a way that is clear, concise, and is correct by definition. The reason that the lowering pipeline required so much boilerplate was because of the robust types involved, which ensures that everything in the pipeline is compatible with one-another---it's not possible to construct a pipeline that will not work. Of course, there is nuance involved in some cases---I didn't want to include the `until` clause, which makes it fail the "obviously correct" criterion, but that can be improved over time. This only abstracts away `load_xmlo` and `parse_package_xml`; next I'll have to evolve the abstraction to support lifetimes for `lower_xmli`'s `AsgTreeToXirf`. That pipeline also ends with a custom sink that really ought to become its own parser, but I don't want to jump down that rabbit hole right now, so we may just support custom sinks for now with the intent of removing it in the future. This has been a long time coming. The ultimate goal is that you should be able to look at the parser pipelines to have a clear, high-level overview of how everything fits together. I'm not generating documentation yet, but that'll help serve as a guide as well. DEV-13162	2023-06-05 13:44:49 -04:00
Mike Gerwitz	f34f2644e9	tamer: pipeline: Allow reporting on entire Result The report acts as the sink for `load_xmlo` and `parse_package_xml`. At the moment, the type is `()`, and so there's nothing to report on but the error. But the idea is to add logging via `AirAggregate::Object`, which is currently just `()`. This change therefore is only a refactoring---it changes no functionality but sets up for future changes. This also introduces consistency with `lower_xmli` in use of `terminal` for the final operation. DEV-13162	2023-06-05 13:44:49 -04:00
Mike Gerwitz	b5187de5dc	tamer: pipeline::load_xmlo: Accept reporter This makes the API of `load_xmlo` much closer to `parse_package_xml`, both accepting a reporter and distinguishing between recoverable and unrecoverable errors. The linker still does not use a reporter and still fails on the first error, as before; I wanted to keep this change small. DEV-13162	2023-06-05 13:44:49 -04:00
Mike Gerwitz	ea6259570e	tamer: ld::poc: Extract xmlo loading pipeline into new pipeline module I want to clean this up a bit further. The motivation is that we need this for imports in `tamec`. Eventually this will be cleaned up to the point where it's declarative and easy to understand---there's a mess of types involved now and, when something goes wrong, it can be brutally confusing. DEV-13162	2023-05-25 16:38:41 -04:00
Mike Gerwitz	7857460c1d	tamer: Re-use prior AirAggreagteCtx for subsequent parsers A new AirAggregate parser is utilized for each package import. This prevents us from moving the index from `Asg` onto `AirAggregateCtx` because the index would be dropped between each import. This allows re-using that context and solves for problems that result from attempting to do so, as explained in the new `resume_previous_parsing_context` test case. But, it's now clear that there's a missing abstraction, and that reasoning about this problem at the topmost level of the compiler/linker in terms of internal parsing details like "context" is not appropriate. What we're doing is suspending parsing and resuming it later on for another package, aggregating into the same destination (ASG + index). An abstraction ought to be formed in terms of that. DEV-13162	2023-05-19 13:38:15 -04:00
Mike Gerwitz	799f2c6d96	tamer: tameld: Produce first error ...this has apparently been consuming errors for some time. This would cause the parser to enter an invalid state in some cases and terminate. This would _not_ permit an invalid link, as the graph would not be correct, but it was masking the actual error. This part of linker is in dire need of tests. This also ought to be replaced with tamec's approach of reporting all errors. DEV-13162	2023-05-04 16:04:52 -04:00
Mike Gerwitz	6db70385d0	tamer: xir::flat: Introduce configurable acceptors Technically, an "acceptor" in the context of state machines is actually a state machine; the terminology here is more describing the configuration of the state machine (`XirToXirf`) as an acceptor. This change comes with significant documentation of the rationale and why this is important; see that for more information. This change is necessary so that we can enforce finalization on all parsers in the lowering pipeline, which is not currently being done. If we were to do that now, then `tameld` would fail because it halts parsing of the tokens stream at the end of the `xmlo` header. This is also quite the type soup, but I'm not going to refine this further right now, since my focus is elsewhere (XMLI lowering). DEV-13708	2023-03-10 14:27:57 -05:00
Mike Gerwitz	29178f2360	tamer: xir::reader: Divorce from `parse` The reader previously yielded a `ParsedResult`, presumably to simplify lowering operations. But the reader is not a `ParseState`, and does not otherwise use the parsing API, so this was an inappropriate and confusing coupling. This resolves that, introducing a new `lowerable` which will translate an iterator into something that can be placed in a lowering pipeline. See the previous commit for more information. DEV-13708	2023-03-10 14:27:57 -05:00
Mike Gerwitz	963688f889	tamer: parse::lower::ParsedObject: Include Token type parameter The token type was previously hard-coded to `UnknownToken`, since the use case was the beginning of the lowering pipeline at the start of the program, where there was no token type because the first parser (`XirReader`, currently) is responsible for producing the first token type. But when we're lowering from the graph (so, the other side of the lowering pipeline), we _do_ have token types to deal with. This also emphasizes the inappropriate coupling of `<XirReader as Iterator>::Item` with `ParsedResult`; I'd like to follow the same approach that I'm about to introduce with `tamec`, so see a future commit. DEV-13708	2023-03-10 14:27:57 -05:00
Mike Gerwitz	055ff4a9d9	tamer: Remove graphml target This was originally created to populate Neo4J for querying, but it has not been utilized. It's become a maintenance burden as I try to change the API of and encapsulate the graph, which is important for upholding its invariants. This feature, or one like it, will return in the future. I have other related plans; we'll see if they materialize. The graph can't be encapsulated fully just yet because of the linker; those commits will come in the following days. DEV-13597	2023-01-26 14:45:17 -05:00
Mike Gerwitz	954b5a2795	Copyright year and name update Ryan Specialty Group (RSG) rebranded to Ryan Specialty after its IPO.	2023-01-20 23:37:30 -05:00
Mike Gerwitz	e6640c0019	tamer: Integrate clippy This invokes clippy as part of `make check` now, which I had previously avoided doing (I'll elaborate on that below). This commit represents the changes needed to resolve all the warnings presented by clippy. Many changes have been made where I find the lints to be useful and agreeable, but there are a number of lints, rationalized in `src/lib.rs`, where I found the lints to be disagreeable. I have provided rationale, primarily for those wondering why I desire to deviate from the default lints, though it does feel backward to rationalize why certain lints ought to be applied (the reverse should be true). With that said, this did catch some legitimage issues, and it was also helpful in getting some older code up-to-date with new language additions that perhaps I used in new code but hadn't gone back and updated old code for. My goal was to get clippy working without errors so that, in the future, when others get into TAMER and are still getting used to Rust, clippy is able to help guide them in the right direction. One of the reasons I went without clippy for so long (though I admittedly forgot I wasn't using it for a period of time) was because there were a number of suggestions that I found disagreeable, and I didn't take the time to go through them and determine what I wanted to follow. Furthermore, it was hard to make that judgment when I was new to the language and lacked the necessary experience to do so. One thing I would like to comment further on is the use of `format!` with `expect`, which is also what the diagnostic system convenience methods do (which clippy does not cover). Because of all the work I've done trying to understand Rust and looking at disassemblies and seeing what it optimizes, I falsely assumed that Rust would convert such things into conditionals in my otherwise-pure code...but apparently that's not the case, when `format!` is involved. I noticed that, after making the suggested fix with `get_ident`, Rust proceeded to then inline it into each call site and then apply further optimizations. It was also previously invoking the thread lock (for the interner) unconditionally and invoking the `Display` implementation. That is not at all what I intended for, despite knowing the eager semantics of function calls in Rust. Anyway, possibly more to come on that, I'm just tired of typing and need to move on. I'll be returning to investigate further diagnostic messages soon.	2023-01-20 23:37:29 -05:00
Mike Gerwitz	5e13c93a8f	tamer: asg: New ObjectContainer for Node type Working with the graph can be confusing with all of the layers involved. This begins to provide a better layer of abstraction that can encapsulate the concept and enforce invariants. Since I'm better able to enforce invariants now, this also removes the span from the diagnostic message, since the invariant is now always enforced with certainty. I'm not removing the runtime panic, though; we can revisit that if future profiling shows that it makes a negative impact. DEV-13160	2023-01-20 23:37:29 -05:00
Mike Gerwitz	8c4923274a	tamer: ld::xmle::lower: Diagnostic message for cycles This moves the special handling of circular dependencies out of `poc.rs`---and to be clear, everything needs to be moved out of there---and into the source of the error. The diagnostic system did not exist at the time. This is one example of how easy it will be to create robust diagnostics once we have the spans on the graph. Once the spans resolve to the proper source locations rather than the `xmlo` file, it'll Just Work. It is worth noting, though, that this detection and error will ultimately need to be moved so that it can occur when performing other operation on the graph during compilation, such as type inference and unification. I don't expect to go out of my way to detect cycles, though, since the linker will. DEV-13430	2022-12-16 15:09:05 -05:00
Mike Gerwitz	0b2e563cdb	tamer: asg: Associate spans with identifiers and introduce diagnostics This ASG implementation is a refactored form of original code from the proof-of-concept linker, which was well before the span and diagnostic implementations, and well before I knew for certain how I was going to solve that problem. This was quite the pain in the ass, but introduces spans to the AIR tokens and graph so that we always have useful diagnostic information. With that said, there are some important things to note: 1. Linker spans will originate from the `xmlo` files until we persist spans to those object files during `tamec`'s compilation. But it's better than nothing. 2. Some additional refactoring is still needed for consistency, e.g. use of `SPair`. 3. This is just a preliminary introduction. More refactoring will come as tamec is continued. DEV-13041	2022-12-16 14:44:38 -05:00
Mike Gerwitz	56d1ecf0a3	tamer: Air{Token=>} Consistency with `Nir` et al. DEV-13430	2022-12-13 14:36:38 -05:00
Mike Gerwitz	7c4c0ebdda	tamer: parse::lower: Separate error types for lowering and return Lowering errors in tamec end up utilizing recovery and reporting, so there is a distinction between recoverable and unrecoverable errors. tameld aborts on the first error, since recovery is not currently supported (we'll want to add it, since tameld should output e.g. lists of unresolved externs). Note that tamec does not yet handle `FinalizeError` like tameld because it uses `Lower::lower`, which does not yet finalize (though it does in practice when it reaches the end of the stream and auto-finalizes, but that is widened into a `ParseError`). DEV-13158	2022-10-26 12:44:20 -04:00
Mike Gerwitz	1c181fe546	tamer: parse::lower: Propagate widened errors to terminal parser The term "terminal parser" isn't formalized yet in the system, but is meant to refer to the innermost parser that is responsible for pulling tokens through the lowering pipeline. This approach is more of what one would expect when dealing with `Result`-like monads---we are effectively chaining the inner operation while propagating errors to short-circuit lowering and let the caller decide whether recovery ought to be permitted with diagnostic messages. This will become more clear as it is further refactored. This also means that the previous changes for introducing interior mutability for a shared mutable `Reporter` can be reverted, which is great, since that approach was antithetical to how the streaming pipeline operates (and introduces awkward mutable state into an otherwise-mostly-immutable system). DEV-13158	2022-10-26 12:32:51 -04:00
Mike Gerwitz	7a5f731cac	tamer: tameld: XIRF nesting 64=>4 Since we'll never be reading past the header, this is all that is needed. If in the future this is violated, XIRF will cause a nice diagnostic error displaying precisely what opening tag caused the increased level of nesting, which will aid in debugging and allow us to determine if it ought to be increased. Here's an example, if I set the max to `3`: error: maximum XML element nesting depth of `3` exceeded --> /home/.../foo.xmlo:261:10 \| 261 \| <preproc:sym-ref name=":_vproduct:vector_a"/> \| ^^^^^^^^^^^^^^^^ error: this opening tag increases the level of nesting past the limit of 3 Of course, the longer-term goal is to do away with `xmlo` entirely. This had no (perceivable via `/usr/bin/time -v`, at least) impact on memory or CPU time. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	41b41e02c1	tamer: Xirf::Text refinement This teaches XIRF to optionally refine Text into RefinedText, which determines whether the given SymbolId represents entirely whitespace. This is something I've been putting off for some time, but now that I'm parsing source language for NIR, it is necessary, in that we can only permit whitespace Text nodes in certain contexts. The idea is to capture the most common whitespace as preinterned symbols. Note that this heuristic ought to be determined from scanning a codebase, which I haven't done yet; this is just an initial list. The fallback is to look up the string associated with the SymbolId and perform a linear scan, aborting on the first non-whitespace character. This combination of checks should be sufficiently performant for now considering that this is only being run on source files, which really are not all that large. (They become large when template-expanded.) I'll optimize further if I notice it show up during profiling. This also frees XIR itself from being concerned by Whitespace. Initially I had used quick-xml's whitespace trimming, but it messed up my span calculations, and those were a pain in the ass to implement to begin with, since I had to resort to pointer arithmetic. I'd rather avoid tweaking it. tameld will not check for whitespace, since it's not important---xmlo files, if malformed, are the fault of the compiler; we can ignore text nodes except in the context of code fragments, where they are never whitespace (unless that's also a compiler bug). Onward and yonward. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	3da82b351e	tamer: xir::flat::{State=>XirToXirf}: Rename Like the previous two commits, this states the intent of this parser, which results in more clear pipeline composition. DEV-7145	2022-06-02 13:48:54 -04:00
Mike Gerwitz	91b55999e2	tamer: asg::air::{AirState=>AirAggregate}: Rename Like the previous commit, this emphasizes what is happening. DEV-7145	2022-06-02 13:26:46 -04:00
Mike Gerwitz	45bbf3879e	tamer: obj::xmlo::{lower=>air}: Rename {LowerState=>XmloToAir} This provides much more clarity as to what is going on. Further, it's less ambiguous, since I'm about to introduce a new type of xmlo lowering into XIR for writing the actual xmlo files. DEV-7145	2022-06-02 13:23:41 -04:00
Mike Gerwitz	8d92667388	tamer: Integrate xir::reader as a parser in the lowering pipeline This allows `XmlXirReader` to be used in a `Lower` operation, just as everything else, bringing me one step closer to a pipeline that can be concisely represented; this is finally beginning to unify in a clear way, though it is still a bit of a mess. This causes `XmlXirReader` to _act_ like a `parse::Parser` in that it yields a `ParsedResult`, but it does not use `parse::Parser` itself; that was the _original_ plan: convert it into a `ParseState` where `XmlXirReader` became a context, and force `Parser` to yield by feeding it a stream of tokens with `repeat`, but that ended up performing poorly relative to this change. I did some investigation, which I might write about in the future, but for now, this solution works just fine. DEV-7145	2022-06-02 10:30:44 -04:00
Mike Gerwitz	63aa452197	tamer: parse: Move parse::lower into Lower This also modifies `poc` such that `Lower` is invoked as an associated function rather than a method to emphasize the pattern that is forming, so that it can be later abstracted away. DEV-11864	2022-06-01 11:15:43 -04:00
Mike Gerwitz	f40f8bbafc	tamer: parse: Rename {lower__while_ok=>lower_} The `while_ok` can just be implied with a lowering operation, and that reduces the name complexity so that we can maybe introduce even more specialized methods without resulting in a huge sentence as a name. DEV-11864	2022-05-27 14:10:55 -04:00
Mike Gerwitz	b084e23497	tamer: Refactor asg_builder into obj::xmlo::lower and asg::air This finally uses `parse` all the way up to aggregation into the ASG, as can be seen by the mess in `poc`. This will be further simplified---I just need to get this committed so that I can mentally get it off my plate. I've been separating this commit into smaller commits, but there's a point where it's just not worth the effort anymore. I don't like making large changes such as this one. There is still work to do here. First, it's worth re-mentioning that `poc` means "proof-of-concept", and represents things that still need a proper home/abstraction. Secondly, `poc` is retrieving the context of two parsers---`LowerContext` and `Asg`. The latter is desirable, since it's the final aggregation point, but the former needs to be eliminated; in particular, packages need to be worked into the ASG so that `found` can be removed. Recursively loading `xmlo` files still happens in `poc`, but the compiler will need this as well. Once packages are on the ASG, along with their state, that responsibility can be generalized as well. That will then simplify lowering even further, to the point where hopefully everything has the same shape (once final aggregation has an abstraction), after which we can then create a final abstraction to concisely stitch everything together. Right now, Rust isn't able to infer `S` for `Lower<S, LS>`, which is unfortunate, but we'll be able to help it along with a more explicit abstraction. DEV-11864	2022-05-27 13:51:29 -04:00
Mike Gerwitz	f218c452b9	tamer: iter::trip: Flatten Result The `*_iter_while_ok` functions now compose like monads, flattening `Result` at each step and drastically simplifying handling of error types. This also removes the bunch of `?`s at the end of the expression, and allows me to use `?` within the callback itself. I had originally not used `Result` as the return type of the callback because I was not entirely sure how I was going to use them, but it's now clear that I _always_ use `Result` as the return type, and so there's no use in trying to be too accommodating; it can always change in the future. This is desirable not just for cleanup, but because trying to refactor `asg_builder` into a pair of `Parser`s is really messy to chain without flattening, especially given some state that has to leak temporarily to the caller. More on that in a future commit. DEV-11864	2022-05-20 16:08:16 -04:00
Mike Gerwitz	958a707e02	tamer: asg: Hoist Root from Ident into Object This was always the intent, but I didn't have a higher-level object yet. This removes all the awkwardness that existed with working the root in as an identifier. DEV-11864	2022-05-19 12:48:43 -04:00
Mike Gerwitz	6252758730	tamer: asg::Object: Introduce Object::Ident This wraps `Ident` in a new `Object` variant and modifies `Asg` so that its nodes are of type `Object`. This unfortunately requires runtime type checking. Whether or not that's worth alleviating in the future depends on a lot of different things, since it'll require my own graph implementation, and I have to focus on other things right now. Maybe it'll be worth it in the future. Note that this also gets rid of some doc examples that simply aren't worth maintaining as the API evolves. DEV-11864	2022-05-19 12:33:59 -04:00
Mike Gerwitz	3e277270a7	tamer: asg: Track roots on graph Previously, since the graph contained only identifiers, discovered roots were stored in a separate vector and exposed to the caller. This not only leaked details, but added complexity; this was left over from the refactoring of the proof-of-concept linker some time ago. This moves the root management into the ASG itself, mostly, with one item being left over for now in the asg_builder (eligibility classifications). There are two roots that were added automatically: - __yield - __worksheet The former has been removed and is now expected to be explicitly mapped in the return map, which is now enforced with an extern in `core/base`. This is still special, in the sense that it is explicitly referenced by the generated code, but there's nothing inherently special about it and I'll continue to generalize it into oblivion in the future, such that the final yield is just a convention. `__worksheet` is the only symbol of type `IdentKind::Worksheet`, and so that was generalized just as the meta and map entries were. The goal in the future will be to have this more under the control of the source language, and to consolodate individual roots under packages, so that the _actual_ roots are few. As far as the actual ASG goes: this introduces a single root node that is used as the sole reference for reachability analysis and topological sorting. The edges of that root node replace the vector that was removed. DEV-11864	2022-05-17 10:42:05 -04:00
Mike Gerwitz	3748762d31	tamer: asg::graph::Asg: Remove type parameter O This removes the generic on the Asg (which was formerly BaseAsg), hard-coding `IdentObject`, which will further evolve. This makes the IR an actual concrete IR rather than an abstract data structure. These tests bring me back a bit, since they were written as I was still becoming familiar with Rust. DEV-11864	2022-05-12 15:46:17 -04:00
Mike Gerwitz	f2c5443176	tamer: asg: Remove generic Asg, rename {Base=>}Asg This is the beginning of an incremental refactoring to remove generics, to simplify the ASG. When I initially wrote the linker, I wasn't sure what direction I was going in, but I was also negatively influenced by more traditional approaches to both design and unit testing. If we're going to call the ASG an IR, then it needs to be one---if the core of the IR is generic, then it's more like an abstract data structure than anything. We can abstract around the IR to slice it up into components that are a little easier to reason about and understand how responsibilities are segregated. DEV-11864	2022-05-11 16:47:13 -04:00
Mike Gerwitz	1ad2fb1dc8	Copyright year update 2022 RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no "Group"), but I'm not sure if the legal name has been changed yet or not, so I'll wait on that.	2022-05-03 14:14:29 -04:00
Mike Gerwitz	eaa8133d21	tamer: diagnose: Introduction of diagnostic system This is a working concept that will continue to evolve. I wanted to start with some basic output before getting too carried away, since there's a lot of potential here. This is heavily influenced by Rust's helpful diagnostic messages, but will take some time to realize a lot of the things that Rust does. The next step will be to resolve line and column numbers, and then possibly include snippets and underline spans, placing the labels alongside them. I need to balance this work with everything else I have going on. This is a large commit, but it converts the existing Error Display impls into Diagnostic. This separation is a bit verbose, so I'll see how this ends up evolving. Diagnostics are tied to Error at the moment, but I imagine in the future that any object would be able to describe itself, error or not, which would be useful in the future both for the Summary Page and for query functionality, to help developers understand the systems they are writing using TAME. Output is integrated into tameld only in this commit; I'll add tamec next. Examples of what this outputs are available in the test cases in this commit. DEV-10935	2022-04-13 15:22:46 -04:00
Mike Gerwitz	cfc7f45bc4	tamer: Remove wip-xmlo-xir-reader This entirely removes the old XmloReader that has since been replaced with a XIR-based reader. I had been holding off on this because the new reader is slower, pending performance optimizations (which I'll do a little later on), however the performance loss is of no practical consideration and only affects the linker, which is still fast. Therefore, it's better to get this old code out of the way to simplify refactoring going forward. In particular, I'm working on the diagnostic system. This is a little sad, in a way---this is some of my first Rust code that I'm deleting. DEV-10935	2022-04-11 16:11:49 -04:00
Mike Gerwitz	f07c0e75be	tamer: tameld (TameldError): Error sum type This aggregates all non-panic errors that can occur during link time, making `Box<dyn Error>` unnecessary. I've been wanting to do this for a long time, so it's nice seeing this come together. This is a powerful tool, in that we know, at compile time, all errors that can occur, and properly report on them and compose them. This method of error composition ensures that all errors have a chance to be handled within their context, though it'll take time to do so in a decent way. This just maintains compatibility with the dynamic dispatch that was previous occurring. This work is being done to introduce the initial diagnostic system, which was really difficult/confusing to do without proper errors types at the top level, considering the toplevel is responsible for triggering the diagnostic reporting. The cycle error is in particular going to be interesting once the system is in place, especially once it provides spans in the future, since it will guide the user through the code to understand how the cycle formed. More to come. DEV-10935	2022-04-11 15:15:04 -04:00
Mike Gerwitz	a1a4ad3e8e	tamer: Introduce context into XirReader tamec and tameld will now both introduce a `Context` to XIR, which will use it to create spans. Here's an example of an error, now that it's all working well together: $ target/release/tameld --emit xmle -o /dev/null path/to/package.xmlo error: invalid preproc:sym/@dim `9` at [/../path/to/package.xmlo offset 1175451-1175452] A future task will make this human-readable by producing line and column numbers, and perhaps even a snippet (if not now, then eventually). It's exciting to see this coming together finally. DEV-10934	2022-04-08 16:16:23 -04:00
Mike Gerwitz	2e3d94c3d6	tamer: obj::xmlo::reader: Simplify wip-xmlo-xir-reader flagging This removes the flag from most of the code, which also resolves the indentation. Not only was it bothering me, but I don't want (a) every line modified when the module body is hoisted and (b) `rustfmt` to reformat everything when that happens. This means that everything will be built, even though it's not used, when the flag is off, but I see that as a good thing. DEV-10863	2022-03-24 09:45:59 -04:00
Mike Gerwitz	fbf786086a	tamer: parse::Parser (lower_while_ok): New method This introduces a WIP lowering operation, abstracting away quite a bit of the manual wiring work, which is really important to providing an API that provides the proper level of abstraction for actually understanding what the system is doing. This does not yet have tests associated with it---I had started, but it's a lot of work and boilerplate for something that is going to evolve. Generally, I wouldn't use that as an excuse, but the robust type definitions in play, combined with the tiny amount of actual logic, provide a pretty high level of confidence. It's very difficult to wire these types together and produce something incorrect without doing something obviously bad. Similarly, I'm holding off on proper docs too, though I did write some information here. More to come, after I actually get to work on the XmloReader. On a side note: I'm happy to have made progress on this, since this wiring is something I've been dreading and wondering about since before the Parser abstraction even existed. Note also that this makes parser::feed_toks private again---I don't intend to support push parsers yet, since they're only needed internally. Maybe for error recovery, but I'll wait to decide until it's actually needed. DEV-10863	2022-03-23 14:31:16 -04:00
Mike Gerwitz	b4a7591357	tamer: obj::xmlo::reader: Begin conversion to ParseState This begins to transition XmloReader into a ParseState. Unlike previous changes where ParseStates were composed into a single ParseState, this is instead a lowering operation that will take the output of one Parser and provide it to another. The mess in ld::poc (...which still needs to be refactored and removed) shows the concept, which will be abstracted away. This won't actually get to the ASG in order to test that that this works with the wip-xmlo-xir-reader flag on (development hasn't gotten that far yet), but since it type-checks, it should conceptually work. Wiring lowering operations together is something that I've been dreading for months, but my approach of only abstracting after-the-fact has helped to guide a sane approach for this. For some definition of "sane". It's also worth noting that AsgBuilder will too become a ParseState implemented as another lowering operation, so: XIR -> XIRF -> XMLO -> ASG These steps will all be streaming, with iteration happening only at the topmost level. For this reason, it's important that ASG not be responsible for doing that pull, and further we should propagate Parsed::Incomplete rather than filtering it out and looping an indeterminate number of times outside of the toplevel. One final note: the choice of 64 for the maximum depth is entirely arbitrary and should be more than generous; it'll be finalized at some point in the future once I actually evaluate what maximum depth is reasonable based on how the system is used, with some added growing room. DEV-10863	2022-03-22 14:06:52 -04:00
Mike Gerwitz	4c5b860195	tamer: Remove Ix generic from ASG This is simply not worth it; the size is not going to be the bottleneck (at least any time soon) and the generic not only pollutes all the things that will use ASG in the near future, but is also incompatible with the SymbolId default that is used everywhere; if we have to force it to 32 bits anyway, then we may as well just default it right off the bat. I thought that this seemed like a good idea at the time, and saving bits is certainly tempting, but it was premature.	2022-01-14 10:21:49 -05:00
Mike Gerwitz	d710437ee4	tamer: xir::escape::CachingEscaper: New Escaper As promised, this will cache previously seen escaped/unescaped values by creating a two-way mapping between them. DEV-11081	2021-11-15 16:44:24 -05:00
Mike Gerwitz	27ba03b59b	tamer: xir::escape: Remove XirString in favor of Escaper This rewrites a good portion of the previous commit. Rather than explicitly storing whether a given string has been escaped, we can instead assume that all SymbolIds leaving or entering XIR are unescaped, because there is no reason for any other part of the system to deal with such details of XML documents. Given that, we need only unescape on read and escape on write. This is customary, so why didn't I do that to begin with? The previous commit outlines the reason, mainly being an optimization for the echo writer that is upcoming. However, this solution will end up being better---it's not implemented yet, but we can have a caching layer, such that the Escaper records a mapping between escaped and unescaped SymbolIds to avoid work the next time around. If we share the Escaper between _all_ readers and the writer, the result is that 1. Duplicate strings between source files and object files (many of which are read by both the linker and compiler) avoid re-unescaping; and 2. Writers can use this cache to avoid re-escaping when we've already seen the escaped variant of the string during read. The alternative would be a global cache, like the internment system, but I did not find that to be appropriate here, since this is far less fundamental and is much easier to compose. DEV-11081	2021-11-12 14:03:23 -05:00
Mike Gerwitz	428d508be4	tamer: {ir::=>}{asg, xir} See the previous commit. There is no sense in some common "IR" namespace, since those IRs should live close to whatever system whose data they represent. In the case of these, they are general IRs that can apply to many different parts of the system. If that proves to be a false statement, they'll be moved. DEV-10863	2021-11-04 16:13:27 -04:00
Mike Gerwitz	18ab032ba0	tamer: Begin XIR-based xmlo reader impl There isn't a whole lot here, but there is additional work needed in various places to support upcoming changes and so I want to get this commited to ease the cognitive burden of what I have thusfar. And to stop stashing. We have a feature flag for a reason. DEV-10863	2021-10-28 21:21:30 -04:00
Mike Gerwitz	739cf7e6eb	tamer: ir::asg::object::IdentObject: Define methods from IdentObjectData In particular, `name` needn't return an `Option`. `fragment` also returns a copy, since it's just a `SymbolId`. (It really ought to be a newtype rather than an alias, but we'll worry about that some other time.) These changes allow us to remove some runtime panics. DEV-10859	2021-10-14 14:38:02 -04:00
Mike Gerwitz	f055cb77c2	tamer: ld::xmle: Narrow Sections types This moves the logic that sorts identifiers into sections into Sections itself, and introduces XmleSections to allow for mocking for testing. This then allows us to narrow the types significantly, eliminating some runtime checks. The types can be narrowed further, but I'll be limiting the work I'll be doing now; this'll be inevitably addressed as we use the ASG for the compiler. This also handles moving Sections tests, which was a TODO from the previous commit. DEV-10859	2021-10-14 12:40:13 -04:00
Mike Gerwitz	08d92ca663	tamer: ld::xmle::sections: Remove generic object type xmle sections will only ever contain an object of one type, so there is no use in making this generic. I think the original plan was to have this represent, generically, sections of some object file (like ELF), but doing so would require a significant redesign anyway, so it makes no sense. This is easier to reason about. DEV-10859	2021-10-12 10:35:14 -04:00

1 2 3

116 Commits (6769f0c280227105bdae5c22f7ce26bb16fe6b7b)