employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	c71f3247b1	tamer: Remove int_log feature flag (stabalized in 1.68-nightly) This also bumps the minimum nightly version.	2022-12-16 14:44:39 -05:00
Mike Gerwitz	7d86fdd97d	tamer: Make RUSTFLAGS explicit in the cargo invocation Previously this just exported the variable into the environment, but I'm not comfortable with the lack of visibility that provides; I want to be able to see not only that it's happening, which will help to debug issues, but also when it's _not_ happening so that I know that it needs to be introduced into a configuration at a particular installation site.	2022-12-16 14:44:39 -05:00
Mike Gerwitz	0b2e563cdb	tamer: asg: Associate spans with identifiers and introduce diagnostics This ASG implementation is a refactored form of original code from the proof-of-concept linker, which was well before the span and diagnostic implementations, and well before I knew for certain how I was going to solve that problem. This was quite the pain in the ass, but introduces spans to the AIR tokens and graph so that we always have useful diagnostic information. With that said, there are some important things to note: 1. Linker spans will originate from the `xmlo` files until we persist spans to those object files during `tamec`'s compilation. But it's better than nothing. 2. Some additional refactoring is still needed for consistency, e.g. use of `SPair`. 3. This is just a preliminary introduction. More refactoring will come as tamec is continued. DEV-13041	2022-12-16 14:44:38 -05:00
Mike Gerwitz	3cc40f387b	tamer: RUSTFLAGS support Primarily intended for `-C target-cpu=native`.	2022-12-14 19:56:57 -05:00
Mike Gerwitz	92afc19cf8	tamer: asg::ident::test: Extract into own file DEV-13430	2022-12-13 23:29:30 -05:00
Mike Gerwitz	56d1ecf0a3	tamer: Air{Token=>} Consistency with `Nir` et al. DEV-13430	2022-12-13 14:36:38 -05:00
Mike Gerwitz	be41d056bb	tamer: nir::air: Lower to Air::TODO This actually passes data to the next parser, whereas before we were stopping short. DEV-13160	2022-12-13 14:28:16 -05:00
Mike Gerwitz	d55b3add77	tamer: asg::air::test: Extract into own file Just minor preparatory work. DEV-13160	2022-12-13 13:57:04 -05:00
Mike Gerwitz	daeaade53c	tamer: tamec: Expose ASG context in lowering pipeline The previous commit had the ASG implicitly constructed and then discarded. This will keep it around, which will be necessary not only for imports, but for passing the ASG off to the next phases of lowering. DEV-13429	2022-12-13 13:46:31 -05:00
Mike Gerwitz	aa1ca06a0e	tamer: tamec: Introduce NIR->AIR->ASG lowering This does not yet yield the produces ASG, but does set up the lowering pipeline to prepare to produce it. It's also currently a no-op, with `NirToAsg` just yielding `Incomplete`. The goal is to begin to move toward vertical slices for TAMER as I start to return to the previous approach of a handoff with the old compiler. Now that I've gained clarity from my previous failed approach (which I documented in previous commits), I feel that this is the best way forward that will allow me to incrementally introduce more fine-grained performance improvements, at the cost of some throwaway work as this progresses. But the cost of delay with these build times is far greater. DEV-13429	2022-12-13 13:37:07 -05:00
Mike Gerwitz	f0aa8c7554	tamer: nir::parse: Remove enum prefix from variants This just makes things less verbose. Doing so in its own commit before I start making real changes. DEV-13159	2022-12-07 12:50:21 -05:00
Mike Gerwitz	cf2139a8ef	tamer: nir::interp: Errors and recovery This finalizes the implementation for interpolation. There is some more cleanup that can be done, but it is now functioning as intended and providing errors. Finally. How deeply exhausting all of this has been. DEV-13156	2022-12-07 10:54:21 -05:00
Mike Gerwitz	2f963fafb2	tamer: nir::interp::test: Remove significant duplication This just cleans up these tests a bit before I add to them. What we're left with follows the structure of most other parser tests and is atm a good balance between boilerplate and clarity in isolation (a fair level of abstraction). Could possibly do better by putting the inner objects in a callback so that the `Close` can be asserted on commonly as well, but that's a bit awkward with how the assertion is based on the collection; we'd have to keep the last item from being collected from the iterator. I'd rather not deal with such restructuring right now and figuring out a decent pattern. Perhaps in the future. DEV-13156	2022-12-06 12:04:48 -05:00
Mike Gerwitz	8d2d273932	tamer: nir::interp: Integrate NIR interpolation into lowering pipeline This is the culmination of all the recent work---the third attempt at trying to integrate this. It ended up much cleaner than what was originally going to be done, but only after gutting portions of the system and changing my approach to how NIR is parsed (WRT attributes). See prior commits for more information. The final step is to fill the error branches with actual errors rather than `todo!`s. What a relief. DEV-13156	2022-12-05 16:32:00 -05:00
Mike Gerwitz	3050566062	tamer: nir::interp: Expand into new NIR tokens This begins to introduce the new, simplified NIR by creating tokens that serve as the expansion for interpolation. Admittedly, `Text` may change, as it doesn't really represent `<text>foo</text>`, and I'd rather that node change as well, though I'll probably want to maintain some sort of BC. DEV-13156	2022-12-02 00:15:31 -05:00
Mike Gerwitz	07dff3ba4e	tamer: xir::parse::ele: Remove attr sum state This removes quite a bit of work, and work that was difficult to reason about. While I'm disappointed that that hard work is lost (aside from digging it up in the commit history), I am happy that it was able to be removed, because the extra complexity and cognitive burden was significant. This removes more `memcpy`s than the sum state could have hoped to, since aggregation is no longer necessary. Given that, there is a slight performacne improvement. The re-introduction of required and duplicate checks later on should be more efficient than this was, and so this should be a net win overall in the end. DEV-13346	2022-12-01 11:09:26 -05:00
Mike Gerwitz	f872181f64	tamer: xir::parse: Remove old `attr_parse!` and unused error variants This cleans up the old implementation now that it's no longer used (as of the previous commit) by `ele_parse!`. It also removes the two error variants that no longer apply: required attributes and duplicate attributes. DEV-13346	2022-12-01 11:09:26 -05:00
Mike Gerwitz	ab0e4151a1	tamer: xir::parse::ele::ele_parse!: Integrate `attr_parse_stream!` This handles the bulk of the integration of the new `attr_parse_stream!` as a replacement for `attr_parse!`, which moves from aggregate attribute objects to a stream of attribute-derived tokens. Rationale for this change is in the preceding commit messages. The first striking change here is how it affects the test cases: nearly all `Incomplete`s are removed. Note that the parser has an existing optimization whereby `Incomplete` with lookahead causes immediate recursion within `Parser`, since those situations are used only for control flow and to keep recursion out of `ParseState`s. Next: this removes types from `nir::parse`'s grammar for attributes. The types will instead be derived from NIR tokens later in the lowering pipeline. This simplifies NIR considerably, since adding types into the mix at this point was taking an already really complex lowering phase and making it ever more difficult to reason about and get everything working together the way that I needed. Because of `attr_parse_stream!`, there are no more required attribute checks. Those will be handled later in the lowering pipeline, if they're actually needed in context, with possibly one exception: namespace declarations. Those are really part of the document and they ought to be handled _earlier_ in the pipeline; I'll do that at some point. It's not required for compilation; it's just required to maintain compliance with the XML spec. We also lose checks for duplicate attributes. This is also something that ought to be handled at the document level, and so earlier in the pipeline, since XML cares, not us---if we get a duplicate attribute that results in an extra NIR token, then the next parser will error out, since it has to check for those things anyway. A bunch of cleanup and simplification is still needed; I want to get the initial integration committed first. It's a shame I'm getting rid of so much work, but this is the right approach, and results in a much simpler system. DEV-13346	2022-12-01 11:09:26 -05:00
Mike Gerwitz	1983e73c81	tamer: xir::parse::attrstream: Value from SPair This really does need documentation. With that said, this changes things up a bit: the value is now derived from an `SPair` rather than an `Attr`, given that the name is redundant. We do not need the attribute name span, since the philosophy is that we're stripping the document and it should no longer be important beyond the current context. It does call into question errors, but my intent in the future is to be able to have the lowering pipline augment errors with its current state---since we're streaming, then an error that is encountered during lowering of an element will still have the element parser in the state representing the parsing of that element; so that information does not need to be propagated down the pipeline, but can be augmented as it bubbles back up. More on that at some point in the future; not right now. DEV-13346	2022-12-01 11:09:25 -05:00
Mike Gerwitz	9ad7742ad2	tamer: xir::parse::attrstream: Streaming attribute parser As I talked about in the previous commit, this is going to be the replacement for the aggreagte `attr_parse!`; the next commit will integrate it into `ele_parse!` so that I can begin to remove the old one. It is disappointing, since I did put a bit of work into this and I think the end result was pretty neat, even if was never fully utilized. But, this simplifies things significantly; no use in maintaining features that serve no purpose but to confound people. DEV-13346	2022-12-01 11:09:25 -05:00
Mike Gerwitz	6d39474127	tamer: NIR re-simplification Alright, this has been a rather tortured experience. The previous commit began to state what is going on. This is reversing a lot of prior work, with the benefit of hindsight. Little bit of history, for the people who will probably never read this, but who knows: As noted at the top of NIR, I've long wanted a very simple set of general primitives where all desugaring is done by the template system---TAME is a metalanguage after all. Therefore, I never intended on having any explicit desugaring operations. But I didn't have time to augment the template system to support parsing on attribute strings (nor am I sure if I want to do such a thing), so it became clear that interpolation would be a pass in the compiler. Which led me to the idea of a desugaring pass. That in turn spiraled into representing the status of whether NIR was desugared, and separating primitives, etc, which lead to a lot of additional complexity. The idea was to have a Sugared and Plan NIR, and further within them have symbols that have latent types---if they require interpolation, then those types would be deferred until after template expansion. The obvious problem there is that now: 1. NIR has the complexity of various types; and 2. Types were tightly coupled with NIR and how it was defined in terms of XML destructuring. The first attempt at this didn't go well: it was clear that the symbol types would make mapping from Sugared to Plain NIR very complicated. Further, since NIR had any number of symbols per Sugared NIR token, interpolation was a pain in the ass. So that lead to the idea of interpolating at the _attribute_ level. That seemed to be going well at first, until I realized that the token stream of the attribute parser does not match that of the element parser, and so that general solution fell apart. It wouldn't have been great anyway, since then interpolation was _also_ coupled to the destructuring of the document. Another goal of mine has been to decouple TAME from XML. Not because I want to move away from XML (if I did, I'd want S-expressions, not YAML, but I don't think the team would go for that). This decoupling would allow the use of a subset of the syntax of TAME in other places, like CSVMs and YAML test cases, for example, if appropriate. This approach makes sense: the grammar of TAME isn't XML, it's _embedded within_ XML. The XML layer has to be stripped to expose it. And so that's what NIR is now evolving into---the stripped, bare repsentation of TAME's language. That also has other benefits too down the line, like a REPL where you can use any number of syntaxes. I intend for NIR to be stack-based, which I'd find to be intuitive for manipulating and querying packages, but it could have any number of grammars, including Prolog-like for expressing Horn clauses and querying with a Prolog/Datalog-like syntax. But that's for the future... The next issue is that of attribute types. If we have a better language for NIR, then the types can be associated with the NIR tokens, rather than having to associate each symbol with raw type data, which doesn't make a whole lot of sense. That also allows for AIR to better infer types and determine what they ought to be, and further makes checking types after template application natural, since it's not part of NIR at all. It also means the template system can naturally apply to any sources. Now, if we take that final step further, and make attributes streaming instead of aggregating, we're back to a streaming pipeline where all aggregation takes place on the ASG (which also resolves the memcpy concerns worked around previously, also further simplifying `ele_parse` again, though it sucks that I wasted that time). And, without the symbol types getting in the way, since now NIR has types more fundamentally associated with tokens, we're able to interpolate on a token stream using simple SPairs, like I always hoped (and reverted back to in the previous commit). Oh, and what about that desugaring pass? There's the issue of how to represent such a thing in the type system---ideally we'd know statically that desugaring always lowers into a more primitive NIR that reduces the mapping that needs to be done to AIR. But that adds complexity, as mentioned above. The alternative is to just use the templat system, as I originally wanted to, and resolve shortcomings by augmenting the template system to be able to handle it. That not only keeps NIR and the compiler much simpler, but exposes more powerful tools to developers via TAME's metalanguage, if such a thing is appropriate. Anyway, this creates a system that's far more intuitive, and far simpler. It does kick the can to AIR, but that's okay, since it's also better positioned to deal with it. Everything I wrote above is a thought dump and has not been proof-read, so good luck! And lets hope this finally works out...it's actually feeling good this time. The journey was necessary to discover and justify what came out of it---everything I'm stripping away was like a cocoon, and within it is a more beautiful and more elegant TAME. DEV-13346	2022-12-01 11:09:25 -05:00
Mike Gerwitz	76beb117f9	Revert "tamer: nir::desugar::interp: Include attribute name in derived param name" Also: Revert "tamer: nir::desugar::interp: Token {SPair=>Attr}" This reverts commit 7fd60d6cdafaedc19642a3f10dfddfa7c7ae8f53. This reverts commit 12a008c66414c3d628097e503a98c80687e3c088. This has been quite a tortured experience, trying to figure out how to best fit desugaring into the existing system. The truth is that it ultimately failed because I was not sticking with my intuition---I was trying to get things out quickly by compromising on the design, and in the end, it saved me nothing. But I wouldn't say that it was a waste of time---the path was a dead end, but it was full of experiences. More to come, but interpolation is back to operating on NIR directly, and I chose to treat it as a source-to-source mapping and not represent it using the type system---interpolation can be an optional feature when writing TAME frontends (the principal one being the XML-based one), and it's up to later checks to assert that identifiers match a given domain. I am disappointed by the additional context we lose here, but that can always be introduced in the future differently, e.g. by maintaining a dictionary of additional context for spans that can be later referenced for diagnostic purposes. But let's worry about that in the future; it doesn't make sense to further complicate IRs for such a thing. DEV-13346	2022-12-01 11:09:25 -05:00
Mike Gerwitz	9da6cb439f	tamer: nir::desugar::interp: Include attribute name in derived param name This is simply to aid with debugging. See commit for information on why I didn't include the attribute name in the param name itself. DEV-13156	2022-12-01 11:09:25 -05:00
Mike Gerwitz	6a8befb98c	tamer: convert::Expect{From,Into}: Diagnostic panics Converts to use TAME's diagnostic panics, same as previous commits. Also introduces impl for `Result`, which I apparently hadn't needed yet. In the future, I hope trait impl specializations will be available to automatically derive and expose span information in these diagnostic messages for certain types. DEV-13156	2022-12-01 11:09:25 -05:00
Mike Gerwitz	d0a728c27f	tamer: nir::desugar::interp: Token {SPair=>Attr} This changes the input token from a more generic `SPair` to `Attr`, which reflects the new target integration point in the `attr_parse!` parser-generator. This is a compromise---I'd like for it to remain generic and have stitching deal with all integration concerns, but I have spent far too much time on this and need to keep moving. With that said, we do benefit from knowing where this must fit in---it's easier to reason about in a more concrete way, and we can take advantage of the extra information rather than being burdened by its presence and ignoring it. We need to be able to convert back into `XirfToken` (see a recent commit that discusses that) for `StitchExpansion`, which is why `Attr` is here. And since it is, we can use it to explain to the user not just the interpolation specification used to derive params, but also the attribute it is associated with. This is what TAME (in XSLT) does today, IIRC (I wrote it, I just forget exactly). It also means that I can name the parameters after the attribute. So, that'll be in a following commit; I was disappointed when my prior approach with `SPair` didn't give me enough information to be able to do that, since I think it's important that the system be as descriptive as possible in how it derives information. Of course, traces would reveal how the parser came about the derivation, but that requires recompilation in a special tracing mode. DEV-13156	2022-12-01 11:09:25 -05:00
Mike Gerwitz	99dcba690f	tamer: parse: SP::Token: From<Self::Token> Of course I would run into integration issues. My foresight is lacking. The purpose of this is to allow for type narrowing before passing data to a more specialized ParseState, so that the other ParseState doesn't need to concern itself with the entire domain of inputs that it doesn't need, and repeat unnecessary narrowing. For example, consider XIRF: it has an `Attr` variant, which holds an `Attr` object. We'll want to desugar that object. It does not make sense to require that the desugaring process accept `XirfToken` when we've already narrowed it to an `Attr`---we should accept an Attr. However, we run into a problem immediately: what happens with tokens that bubble back up due to lookahead or errors? Those tokens need to be converted _back_ (widened). Fortunately, widening is a much easier process than narrowing---we can simply use `From`, as we do today so many other places. So, this still keeps the onus of narrowing on the caller, but for now that seems most appropriate. I suspect Rust would optimize away duplicate checks, but that still leaves the maintenance concern---the two narrowings could get out of sync, and that's not acceptable. Unfortunately, this is just one of the problems with integration... DEV-13156	2022-12-01 11:09:14 -05:00
Mike Gerwitz	1aca0945df	tamer: parse::util::expand::StitchExpansion: Began transition from ParseState to method My initial plan with expansion was to wrap a `PasteState` in another that unwraps `Expansion` and converts into a `Dead` state, so that existing `TransitionResult` stitching methods (`delegate`, specifically) could be used. But the desire to use that existing method was primarily because stitching was a complex operation that was abstracted away _as part of the `delegate` method_, which made writing new ones verbose and difficult. Thus began the previous commits to begin to move that responsibility elsewhere so that it could be more composable. This continues with that, introducing a new trait that will culminate in the removal of a wrapping `ParseState` in favor of a stitching method. The old `StitchableExpansionState` is still used for tests, which demonstrates that the boilerplate problem still exists despite improvements made here These will become more generalized in the future as I have time (and the functional aspects of the code more formalized too, now that they're taking shape). The benefit of this is that we avoid having to warp our abstractions in ways that don't make sense (use of a dead state transition) just to satisfy existing APIs. It also means that we do not need the boilerplate of a `ParseState` any time we want to introduce this type of stitching/delegation. It also means that those methods can eventually be extracted into more general traits in the future as well. Ultimately, though, the two would have accomplished the same thing. But the difference is most emphasized in the _parent_---the actual stitching still has to take place for desugaring in the attribute parser, and I'd like for that abstraction to still be in terms of expansion. But if I utilized `StitchableExpansionState`, which converted into a dead state, I'd have to either forego the expansion abstraction---which would make the parser even more confusing---or I'd have to create _another_ abstraction around the dead state, which would mean that I stripped one abstraction just to introduce another one that's essentially the same thing. It didn't feel right, but it would have worked. The use of `PhantomData` in `StitchableExpansionState` was also a sign that something wasn't quite right, in terms of how the abstractions were integrating with one-another. And so here we are, as I struggle to wade my way through all of the yak shavings and make any meaningful progress on this project, while others continue to suffer due to slow build times. I'm sorry. Even if the system is improving. DEV-13156	2022-11-17 15:12:25 -05:00
Mike Gerwitz	1ce36225f6	tamer: diagnose::panic::DiagnosticOptionPanic: New panic This is just intended to simplify the job of panicing when something is expected to be `None`. In my case, `Lookahead`; see upcoming commits. This is intended to be generalized to more than just `Option`, but I have no use for it elsewhere yet; I primarily just needed to implement a method on `Option` so that I could have the ergonomics of the dot notation. DEV-13156	2022-11-17 14:36:00 -05:00
Mike Gerwitz	42618c5add	tamer: parse: Abstract lookahead token replacement panic There's no use in duplicating this in util::expand. Lookahead tokens are one of the few invariants that I haven't taken the time of enforcing using the type system, because it'd be quite a bit of work that I do not have time for, and may not be worth it with changes that may make the system less ergonomic. Nonetheless, I do hope to address it at some point in the (possibly-far) future. If ever you encounter this diagnostic message, ask yourself how stable TAMER otherwise is and how many other issues like this have been entirely prevented through compile-time proofs using the type system. DEV-13156	2022-11-16 15:25:52 -05:00
Mike Gerwitz	a377261de3	tamer: parse::state::transition::TransitionResult::with_lookahead: {=>diagnostic_}panic! As in previous commits, this continues to replace panics with `diagnostic_panic!`, which provides much more useful information both for debugging and to help the user possibly work around the problem. And lets the user know that it's not their fault, and it's a TAMER bug that should be reported. ...am I going to rationalize it in each commit message? DEV-13156	2022-11-16 14:20:58 -05:00
Mike Gerwitz	8cb4eb5b81	tamer: parse::util::expand::StitchableExpansionState: Utilize bimap This is just a light refactoring to utilize the new `TransitionResult::bimap` method. DEV-13156	2022-11-16 14:09:14 -05:00
Mike Gerwitz	60ce1305cc	tamer: parse::state: Further generalize ParseState::delegate This moves enough of the handling of complex type conversions into the various components of `TransitionResult` (and itself), which simplifies delegation and opens up the possibility of having specialized delegation/stitching methods implemented atop of `TransitionResult`. DEV-13156	2022-11-16 14:09:11 -05:00
Mike Gerwitz	a17e53258b	tamer: parse::state: Begin to tame delegation methods These delegation methods have been a pain in my ass for quite some time, and their lack of generalization makes the introduction of new delegation methods (in the general sense, not necessarily trait methods) very tedious and prone to inconsistencies. I'm going to progressively refactor them in separate commits so it's clear what I'm doing, primarily for future me to reference if need be. DEV-13156	2022-11-16 10:38:58 -05:00
Mike Gerwitz	fc425ff1d5	tamer: parse::state: EchoState and TransitionResult constituent primitives This beings to introduce more primitive operations to `TransitionResult` and its components so that I can actually work with them without having to write a bunch of concrete, boilerplate implementations. This is demonstrated in part by `EchoState` (which is nearly all boilerplate, but whose correctness should be verifiable at a glance), which will be used going forward as a basis for default implementations for parsers (e.g. expansion delegation). DEV-13156	2022-11-16 10:37:10 -05:00
Mike Gerwitz	55c55cabd3	tamer: parse::util::expand: Move expansion into own module This has evolved into a more robust and independent concept, but it is still a utility in the sense that it's utilizing existing parsing framework features and making them more convenient. DEV-13156	2022-11-15 13:28:54 -05:00
Mike Gerwitz	ddb4f24ea5	tamer: parse::util (ExpandableParseState, ExpandableInto): Clarifying traits These traits serve to abstract away some of the type-level details and clearly state what the end result is (something stitchable with a parent). I'm admittedly battling myself on this concept a bit. The proper layer of abstraction is the concept of expansion, which is an abstraction that is likely to be maintained all the way through, but we strip the abstraction for the sake of delegation. Maybe the better option is to provide a different method of delegation and avoid the stripping at all, and avoid the awkward interaction with the dead state. The awkwardness comes from the fact that delegating right now is so rigid and defined in terms of a method on state rather than a mapping between `TransitionResult`s. But I really need to move on... ;_; The original design was trying to generalize this such that composition at the attribute parser level (for NIR) would be able to just accept any sitchable parser with the convention that the dead state is the replacement token. But that is the wrong layer of abstraction, which not only makes it confusing, but is asking for trouble when someone inevitably violates that contract. With all of that said, `StitchableExpansionState` _is_ a delegation. It could just as easily be a function (`is_accepting` always delegates too), so perhaps that should just be generalized as reifying delegation as a `ParseState`. DEV-13156	2022-11-15 12:56:25 -05:00
Mike Gerwitz	03cf652c41	tamer: parse::util: Introduce StitchableExpansionState This parser really just allows me to continue developing the NIR interpolation system using `Expansion` terminology, and avoid having to use dead states in tests. This allows for the appropriate level of abstraction to be used in isolation, and then only be stripped when stitching is necessary. Future commits will show how this is actually integrated and may introduce additional abstraction to help. DEV-13156	2022-11-15 12:19:25 -05:00
Mike Gerwitz	4117efc50c	tamer: nir::desugar::interp: Generalize without NIR symbol types This is a shift in approach. My original idea was to try to keep NIR parsing the way it was, since it's already hard enough to reason about with the `ele_parse!` parser-generator macro mess. The idea was to produce an IR that would explicitly be denoted as "maybe sugared", and have a desugaring operation as part of the lowering pipeline that would perform interpolation and lower the symbol into a plain version. The problem with that is: 1. The use of the type was going to introduce a lot of mapping for all the NIR token variants there are going to be; and 2. _The types weren't even utilized for interpolation._ Instead, if we interpolated _as attributes are encountered_ while parsing NIR, then we'd be able to expand directly into that NIR token stream and handle _all_ symbols in a generic way, without any mapping beyond the definition of NIR's grammar using `ele_parse!`. This is a step in that direction---it removes `NirSymbolTy` and introduces a generic abstraction for the concept of expansion, which will be utilized soon by the attribute parser to allow replacing `TryFrom` with something akin to `ParseFrom`, or something like that, which is able to produce a token stream before finally yielding the value of the attribute (which will be either the original symbol or the replacement metavariable, in the case of interpolation). (Note that interpolation isn't yet finished---errors still need to be implemented. But I want a working vertical slice first.) DEV-13156	2022-11-10 12:33:30 -05:00
Mike Gerwitz	8a430a52bc	tamer: xir::prase: Extract intermediate attribute aggregate state into Context This was a substantial change. Design and rationale are documented on `AttrFieldSum` and related as part of this change, so please review the diff for more information there. If you're a Ryan employee, DEV-13209 gives plenty of profiling information, including raw data and visualizations from kcachegrind. For everyone else: you're able to easy produce your own from this commit and the previous and comparing the `__memcpy_avk_unaligned_erms` calls. The reduction is significant in this commit (~90%), and the number of Parsers invoking it has been reduced. Rust has been able to optimize more aggressively, and compound some of those optimizations, with the smaller `NirParseState` width. It also worth noting that `malloc` calls do not change at all between these two changes, so when we refer to memory, we're referring to pre-allocated memory on the stack, as TAMER was designed to utilize. DEV-13209	2022-11-09 16:01:09 -05:00
Mike Gerwitz	6ae6ca716c	tamer: diagnose::panic::diagnostic_unreachable!: New macro This is a diagnostic replacement for `unreachable!`. Eventually TAMER'll have build-time checks to enforce the use of these over alternatives; I need to survey the old instances on a case-by-case basis to see what diagnostic information can be reasonably presented in that context. DEV-13209	2022-11-09 10:47:17 -05:00
Mike Gerwitz	5c5041f90e	tamer: nir::desugar::interp: Proper span offsets The spans were previously not being calculated relative to the offset of the original symbol span. Tests were passing because all of those spans began at offset 0. DEV-13156	2022-11-08 00:55:45 -05:00
Mike Gerwitz	6b9979da9a	tamer: nir::desugar::interp: Valid parses This completes the valid parses, though some more refactoring will be done. Next up is error handling and recovery. DEV-13156	2022-11-07 23:59:47 -05:00
Mike Gerwitz	4a7fe887d5	tamer: nir::desugar: Initial interpolation desugaring This demonstrates how desugaring of interpolated strings will work, testing one of the happy paths. The remaining work to be done is largely refactoring; handling some other cases; and errors. Each of those items are marked with `todo!`s. I'm pleased with how this is turning out, and I'm excited to see diagnostic reporting within the specification string using the derived spans once I get a bit further along; this robust system is going to be much more helpful to developers than the existing system in XSLT. This also eliminates the ~50% performance degredation mentioned in a recent commit by eliminating the SugaredNirSymbol enum and replacing it with a newtype; this is a much better approach, though it doesn't change that I do need to eventually address the excessive `memcpy`s on hot code paths. DEV-13156	2022-11-07 14:15:16 -05:00
Mike Gerwitz	66f09fa4c9	tamer: parse::prelude: New module Not sure why I didn't add a prelude sooner, considering all the import boilerplate. This will evolve as needed and I'll go back and replace other imports when I'm not in the middle of something. DEV-13156	2022-11-02 14:56:26 -04:00
Mike Gerwitz	9922910d09	tamer: nir::NirSymbolTy (Display): Add impl Add initial descriptions and consolodate some of the types. There'll be more to come; this is just to get `Display` derives working for types that'll be using it. I'd like to see where this description manifests itself before I decide how user-friendly I'd like it to be. DEV-13156	2022-11-01 16:23:51 -04:00
Mike Gerwitz	5e2d8f13a7	tamer: nir (SugaredNir): Mirror PlainNir This mirror is only a `Todo` variant at the moment, but my hope had been to try to creatively nest or use generics to simplify the conversaion between the two flavors without a lot of boilerplate. But it doesn't seem like I'm going to be successful, and may have to resort to macros to remove boilerplate. But I need to stop fighting with myself and move on. Though I would still like to keep the types purely compile-time via const generics if possible, since they're not needed in memory (or disk) until we get to templates; they're otherwise static relative to a NIR token variant. DEV-13209	2022-11-01 15:22:42 -04:00
Mike Gerwitz	7f71f3f09f	tamer: nir: Detect interpolated values This simply detects whether a value will need to be further parsed for interpolation; it does not yet perform the parsing itself, which will happen during desugaring. This introduces a performance regression, for an interesting reason. I found that introducing a single new variant to `SugaredNir` (with a `(SymbolId, Span)` pair), was causing the width of the `NirParseState` type to increase just enough to cause Rust to be unable to optimize away a significant number of memcpys related to `Parser` moves, and consequently reducing performance by nearly 50% for `tamec`. Yikes. I suspected this would be a problem, and indeed have tried in all other cases to avoid aggregation until the ASG---the problem is that I had wanted to aggregate attributes for NIR so that the IR could actually make some progress toward simplifying the stream (and therefore working with the data), and be able to validate against a grammar defined in a single place. The problem is that the `NirParseState` type contains a sum type for every attribute parser, and is therefore as wide as the largest one. That is what Rust is having trouble optimizing memcpy away for. Indeed, reducing the number of attributes improves the situation drastically. However, it doesn't make it go away entirely. If you look at a callgrind profile for `tameld` (or a dissassembly), you'll notice that I put quite a bit of effort into ensuring that the hot code path for the lowering pipeline contains _no_ memcpys for the parsers. But that is not the case with `tamec`---I had to move on. But I do still have the same escape hatch that I introduced for `tameld`, which is the mutable `Context`. It seems that may be the solution there too, but I want to get a bit further along first to see how these data end up propagating before I go through that somewhat significant effort. DEV-13156	2022-11-01 15:15:40 -04:00
Mike Gerwitz	37d44e42ad	tamer: sym::symbol: Use {=>diagnostic_}panic! for resolution failure Various parts of the system have to be converted to use `diagnostic_panic!`, which makes it very clear that this is a bug in TAMER that should be reported. I just happened to see this one near code I was about to touch. DEV-13156	2022-11-01 12:42:36 -04:00
Mike Gerwitz	2a70525275	tamer: sym::prefill::quick_contains_byte: New function This will be utilized by NIR to avoid having to perform memory lookups for preinterned static symbols. DEV-13156	2022-11-01 12:42:32 -04:00
Mike Gerwitz	d195eedacb	tamer: nir: Sugared and plain flavors This introduces the concept of sugared NIR and provides the boilerplate for a desugaring pass. The earlier commits dealing with cleaning up the lowering pipeline were to support this work, in particular to ensure that reporting and recovery properly applied to this lowering operation without adding a ton more boilerplate. DEV-13158	2022-10-26 14:19:19 -04:00
Mike Gerwitz	dbe834b48a	tamer: tamec: Remove lowering pipeline refactoring comment I'm struggling to go much further yet without sorting out some other things first with regards to mutable `Context` and, in particular, the ASG. I'm going to pause on refactoring the lowering pipeline---it's been improved significantly with the recent work---and I will continue in the next few weeks. DEV-13158	2022-10-26 12:44:20 -04:00
Mike Gerwitz	7c4c0ebdda	tamer: parse::lower: Separate error types for lowering and return Lowering errors in tamec end up utilizing recovery and reporting, so there is a distinction between recoverable and unrecoverable errors. tameld aborts on the first error, since recovery is not currently supported (we'll want to add it, since tameld should output e.g. lists of unresolved externs). Note that tamec does not yet handle `FinalizeError` like tameld because it uses `Lower::lower`, which does not yet finalize (though it does in practice when it reaches the end of the stream and auto-finalizes, but that is widened into a `ParseError`). DEV-13158	2022-10-26 12:44:20 -04:00
Mike Gerwitz	26aaf6efc1	tamer: parse::error::ParseError: Extract some variants into FinalizeError This helps to clarify the situations under which these errors can occur, and the generality also helps to show why the inner types are as they are (e.g. use of `String`). But more importantly, this allows for an error type in `finalize` that is detached from the `ParseState`, which will be able to be utilized in the lowering pipeline as a more general error distinguishable from other lowering errors. At the moment I'm maintaining BC, but a following commit will demonstrate the use case to introduce recoverable vs. non-recoverable errors. DEV-13158	2022-10-26 12:44:19 -04:00
Mike Gerwitz	2087672c47	tamer: parse::parser::finalize: Introduce FinalizedParser This newtype allows a caller to prove (using types) that a parser of a given type (`ParseState`) has been finalized. This will be used by the lowering pipeline to ensure that all parsers in the pipeline end up getting finalized (as you can see from a TODO added in the code, one of them is missing). The lack of such a type was an oversight during the (rather stressed) development of the parsing system, and I shouldn't need to resort to unit tests to verify that parsers have been finalized. DEV-13158	2022-10-26 12:44:19 -04:00
Mike Gerwitz	7e62276907	tamer: Revert "tamer: diagnose::report::Report: {Mutable=>immutable} self reference" This reverts commit 85ec626fcd804eb2fac3fd6f0339182554f72cfd. This revert had to be modified to work alongside other changes. Interior mutability is fortunately no longer needed after the previous commit which allows reporting to occur in a single place in the lowering pipeline (at the terminal parser). DEV-13158	2022-10-26 12:44:18 -04:00
Mike Gerwitz	1c181fe546	tamer: parse::lower: Propagate widened errors to terminal parser The term "terminal parser" isn't formalized yet in the system, but is meant to refer to the innermost parser that is responsible for pulling tokens through the lowering pipeline. This approach is more of what one would expect when dealing with `Result`-like monads---we are effectively chaining the inner operation while propagating errors to short-circuit lowering and let the caller decide whether recovery ought to be permitted with diagnostic messages. This will become more clear as it is further refactored. This also means that the previous changes for introducing interior mutability for a shared mutable `Reporter` can be reverted, which is great, since that approach was antithetical to how the streaming pipeline operates (and introduces awkward mutable state into an otherwise-mostly-immutable system). DEV-13158	2022-10-26 12:32:51 -04:00
Mike Gerwitz	2ccdaf80fe	tamer: diagnose::report: Error tracking This extracts error tracking into the Reporter itself, which is already shared between lowering operations. This can then be used to display the number of errors. A new formatter (in tamer::fmt) will be added to handle the singular/plural conversion in place of "error(s)" in the future; I have more important things to work on right now. DEV-13158	2022-10-26 12:32:51 -04:00
Mike Gerwitz	f049da4496	tamer: tamec: Apply reporting (and continuing) to XirToXirf failure Previously these errors would immediately abort. This results in some duplicate code, but it's beginning to derive a common implementation. Check out the commits that follow; this is really an intermediate refactoring state. DEV-13158	2022-10-26 12:32:51 -04:00
Mike Gerwitz	733f44a616	tamer: diagnose::report::Report: {Mutable=>immutable} self reference VisualReporter now uses interior mutability so that we can hold multiple references to it for upcoming lowering pipeline changes. DEV-13158	2022-10-26 12:32:51 -04:00
Mike Gerwitz	a6e72b87f7	tamer: tamec: Extract compilation from main Another baby step. The small commits are intended to allow comprehension of what changes when looking at the diffs. This also removes a comment stating that errors do not fail compilation, since they most certainly do. DEV-13158	2022-10-26 12:32:51 -04:00
Mike Gerwitz	20ea83af1a	tamer: tamec: Extract source reading and writing This begins refactoring the lowering pipeline to begin to obviate abstraction boundaries. The lowering pipeline is the backbone of the system, and so it needs to become clear and self-documenting, which will take a little bit of work. DEV-13158	2022-10-26 12:32:51 -04:00
Mike Gerwitz	8c32967cbf	tamer: Cargo.toml: Sort dependencies This always annoys me when I add a dependency and I don't know where I ought to put it. Anyway, I was originally going to add the `regex` crate, but with further planning, I may not end up having use for it. Nonetheless, at least this is consistent.	2022-10-18 14:48:14 -04:00
Brandon Ellis	00f46b0032	[DEV-12990] Add gt, gte, lt, lte operators to if/unless This includes updating Tamer's parser to account for the new operator possibilities.	2022-09-22 11:38:06 -04:00
Mike Gerwitz	80d7de7376	tamer: nir: Remove token `todo!`s Just preparing to actually define NIR itself. The _grammar_ has been represented (derived from our internal systems, using them as a test case), but the IR itself has not yet received a definition. DEV-7145	2022-09-19 16:21:42 -04:00
Mike Gerwitz	3456bd593a	tamer: tamec: Fail with non-zero status if any NIR parsing errors This is a quick-and-dirty change. The lowering pipeline needs a proper abstraction, but I'm about to be on vacation at the end of the week and would like to get NIR->AIR lowering started before I consider that abstraction further, so this will do for now. NIR parsing has been tested in production without failing for over a week. DEV-7145	2022-09-19 10:11:47 -04:00
Mike Gerwitz	5403dd06c6	tamer: Provide links to `tame{c,ld}` DEV-7145	2022-09-19 10:04:40 -04:00
Mike Gerwitz	9966b82b9d	tamer: nir::parse: Grammar summary docs This is intended to provide just enough information to help elucidate how the system works and why. DEV-7145	2022-09-19 09:26:38 -04:00
Mike Gerwitz	dcb42b6e4b	tamer: xir::parse: Improvements to generated docs for NIR attributes This hides the internal state machine and provides better language for what remains. DEV-7145	2022-09-16 13:37:46 -04:00
Mike Gerwitz	1dc691160b	tamer: nir: Re-define "NIR" This was originally the "noramlized" IR, but that's not possible to do without template expansion, which is going to happen at a later point. So, this is just "NIR", pronounced "near", which is an IR that is "near" to the source code. You can define it was "Near IR" if you want, but it's just a homonym with a not-quite-defined acronym to me. DEV-7145	2022-09-16 09:59:38 -04:00
Mike Gerwitz	f9bdcc2775	tamer: xir::parse::ele: Remove `*Error_` types A type alias was added for BC before errors were hoisted out in a previous commit, but they are unnecessary because of the associated type on `ParseState`. This also corrects the long-existing issue of using generated identifiers in tests. DEV-7145	2022-09-15 16:10:47 -04:00
Mike Gerwitz	071c94790f	tamer: xir::ele::parse: Formatting: remove a level of indentation This moves `paste::paste!` up a line and reduces a level of indentation, since it's so squished. Aside from docblock reformatting, there are no other changes. DEV-7145	2022-09-15 16:09:49 -04:00
Mike Gerwitz	b3f4378517	tamer: xir::parse::ele: Hoist NT Display from `ele_parse!` macro This slims out the macro even further. It does result in an awkwardly-placed `PhantomData` because I don't want to add another variant that isn't actually used (since they represent states). DEV-7145	2022-09-14 16:34:59 -04:00
Mike Gerwitz	80f29e9420	tamer: xir::parse::ele: Hoist NtState out of `ele_parse!` macro This does the same as before with SumNtState, and takes advantage of the preparations made by the preceding commit. The macro is shrinking. DEV-7145	2022-09-14 15:35:58 -04:00
Mike Gerwitz	1817659811	tamer: xir::parse::ele: Abstract child NT states in parent parser This is in preparation for hoisting out the common states, as was done with the Sum NT in a previous commit. I also think that organizing states in this way is more clear. The previous embedding of the variants named after the NTs themselves was because the parser was storing the child state within it, before the introduction of the superstate trampoline. DEV-7145	2022-09-14 14:47:54 -04:00
Mike Gerwitz	d73a18d1a2	tamer: xir::parse::ele: Initial extraction of Sum NT state from macro After introducing the superstate and trampoline some time ago, the Sum NT states became fully generalized and can be hoisted out. DEV-7145	2022-09-14 12:23:52 -04:00
Mike Gerwitz	db3fd3f177	tamer: xir::parse::ele: Remove `unreachable!` in state transitions This will instead fail at compile time. DEV-7145	2022-09-14 10:00:10 -04:00
Mike Gerwitz	a5c7067c68	tamer: xir::parse::ele: Remove NT `todo!` for state transition Everything except for one state was already accounted for. We can now have confidence that the parser will never panic due to state transitions (beyond legitimate error conditions). There are some `unreachable!`s to contend with still. DEV-7145	2022-09-14 09:41:53 -04:00
Mike Gerwitz	212ca06efe	tamer: xir::parse: Extract and generalize NT errors This is the same as the previous commits, but for non-sum NTs. This also extracts errors into a separate module, which I had hoped to do in a separate commit, but it's not worth separating them. My _original_ reason for doing so was debugging (I'll get into that below), but I had wanted to trim down `ele.rs` anyway, since that mess is large and a lot to grok. My debugging was trying to figure out why Rust was failing to derive `PartialEq` on `NtError` because of `AttrParseError`. As it turns out, `AttrParseError::InvalidValue` was failing, thus the introduction of the `PartialEq` trait bound on `AttrParseState::ValueError`. Figuring this out required implementing `PartialEq` myself without `derive` (well, using LSP, which did all the work for me). I'm not sure why this was not failing previously, which is a bit of a concern, though perhaps in the context of the macro-expanded code, Rust was able to properly resolve the types. DEV-7145	2022-09-14 09:28:31 -04:00
Mike Gerwitz	5078bd8bda	tamer: xir::parse::ele: Extract sum NT error from `ele_parse!` The `ele_parse!` macro is a monstrosity, and expands into many different identifiers. The hope is that chipping away at things like this will not only make the template easier to understand by framing portions of the problem in terms of more traditional Rust code, but will also hopefully reduce compile times by reducing the amount of code that is expanded by the macro. DEV-7145	2022-09-13 09:20:29 -04:00
Mike Gerwitz	419b24f251	tamer: Introduce NIR (accepting only) This introduces NIR, but only as an accepting grammar; it doesn't yet emit the NIR IR, beyond TODOs. This modifies `tamec` to, while copying XIR, also attempt to lower NIR to produce parser errors, if any. It does not yet fail compilation, as I just want to be cautious and observe that everything's working properly for a little while as people use it, before I potentially break builds. This is the culmination of months of supporting effort. The NIR grammar is derived from our existing TAME sources internally, which I use for now as a test case until I introduce test cases directly into TAMER later on (I'd do it now, if I hadn't spent so much time on this; I'll start introducing tests as I begin emitting NIR tokens). This is capable of fully parsing our largest system with >900 packages, as well as `core`. `tamec`'s lowering is a mess; that'll be cleaned up in future commits. The same can be said about `tameld`. NIR's grammar has some initial documentation, but this will improve over time as well. The generated docs still need some improvement, too, especially with generated identifiers; I just want to get this out here for testing. DEV-7145	2022-08-29 15:52:04 -04:00
Mike Gerwitz	c420ab2730	tamer: xir::parse: Correct doc xrefs These weren't causing problems until they were output as part of NIR (in a separate module). NIR is about to be committed. DEV-7145	2022-08-29 15:52:04 -04:00
Mike Gerwitz	638a9c483b	tamer: xir::parse::ele: Hide internal NT enum variants The user never sees or interacts with these; they're macro-generated, and distract from the useful information in the generated docs. DEV-7145	2022-08-29 15:52:04 -04:00
Mike Gerwitz	2b33a45985	tamer: xir::parse::ele: Support NT docs This just modifies the macro to proxy attributes to generated NTs so that they can be documented. DEV-7145	2022-08-29 15:52:04 -04:00
Mike Gerwitz	51728545f7	tamer: xir::parse::ele: Properly handle previous state transitions This includes when on the last state / expecting a close. Previously, there were a couple major issues: 1. After parsing an NT, we can't allow preemption because we must emit a dead state so that we can remove the NT from the stack, otherwise they'll never close (until the parent does) and that results in unbounded stack growth for a lot of siblings. Therefore, we cannot preempt on `Text`, which causes the NT to receive it, emit a dead state, transition away from the NT, and not accept another NT of the same type after `Text`. 2. When encountering an unknown element, the error message stated that a closing tag was expected rather than one of the elements accepted by the final NT. For #1, this was solved by allowing the parent to transition back to the NT if it would have been matched by the previous NT. A future change may therefore allow us to remove repetition handling entirely and allow the parent to deal with it (maybe). For #2, the trouble is with the parser generator macro---we don't have a good way of knowing the last NT, and the last NT may not even exist if none was provided. This solution is a compromise, after having tried and failed at many others; I desperately need to move on, and this results in the correct behavior and doesn't sacrifice performance. But it can be done better in the future. It's also worth noting for #2 that the behavior isn't _entirely_ desirable, but in practice it is mostly correct. Specifically, if we encounter an unknown token, we're going to blow through all NTs until the last one, which will be forced to handle it. After that, we cannot return to a previous NT, and so we've forefitted the ability to parse anything that came before it. NIR's grammar is such that sequences are rare and, if present, there's really only ever two NTs, and so this awkward behavior will rarely cause practical issues. With that said, it ought to be improved in the future, but let's wait to see if other parts of the lowering pipeline provide more appropriate places to handle some of these things (even though it really ought to be handled at the grammar level). But I'm well out of time to spend on this. I have to move on. DEV-7145	2022-08-29 15:52:04 -04:00
Mike Gerwitz	7466ecbe8b	tamer: xir::parse::ele: Accept missing child `ele_parse!` was recently converted to accept zero-or-more for every NT to simplify the parser-generator, since NIR isn't going to be able to accurately determine whether child requirements are met anyway (because of the template system). This ensures that `Close` can be accepted when we're expecting an element. It also adds a test for a scenario that's causing me some trouble in stashed code so that I can ensure that it doesn't break. DEV-7145	2022-08-22 09:43:59 -04:00
Mike Gerwitz	9366c0c154	tamer: xir::parse::ele: Increase parser nesting depth This sets the maximum depth to 64, which is still arbitrary, but unfortunately the sum types introduce multiple levels of nesting, in particular for template applications, so nested applications can result in a fairly large stack. I have various ideas to improve upon that---limited a bit in that repetition as it is current implemented inhibits tail calls---but they're not worth doing just yet relative to other priorities. The impact of this change is not significant. DEV-7145	2022-08-18 16:16:45 -04:00
Mike Gerwitz	abb2c80e22	tamer: xir::parse::ele: Always repeat This removes support for configurable repetition. What? Why? As it turns out, the complexity that repetition adds is quite significant and is not worth the effort. The truth is that NIR is going to have to allow zero-or-more matches on virtually everything _anyway_ because template application is allowed virtually anywhere---it is not possible to fully statically analyze TAME's sources because templates can expand into just about anything. Given that, AIR (or something down the line) is going to have to supply the necessary invariants instead. It does suck, though, that this removes a lot of code that I fairly recently wrote, and spent a decent amount of time on. But it's important to know when to cut your losses. Perhaps I could have planned better, but deriving this whole system as been quite the experiment. DEV-7145	2022-08-18 15:19:40 -04:00
Mike Gerwitz	13d3c76a31	tamer: xir::parse::ele: Test to verify close after child recovery Just want to be sure that we emit a closing object to match the emitted opening one after recovery, otherwise the IR becomes unbalanced. DEV-7145	2022-08-18 12:41:27 -04:00
Mike Gerwitz	955131217b	tamer: xir::parse::ele: Attribute dead state recovery If attributes fail to parse (e.g. missing required attribute) and parsing reaches a dead state, this will recover by ignoring the entire element. It previously panicked with a TODO. DEV-7145	2022-08-18 12:41:26 -04:00
Mike Gerwitz	77fd92bbb2	tamer: xir::parse::ele: Remove `_` suffix from error variants These were initially used to prevent conflicts with generated variants, but we are no longer generating such variants since they're being jumped to via the trampoline. DEV-7145	2022-08-17 14:58:54 -04:00
Mike Gerwitz	b31ebc00a7	tamer: xir::parse::ele: Handle Close when expecting Open I'm starting to clean up some TODOs, and this was a glaring one causing panics when encountered. The recovery for this is simple, because we have no choice: just stop parsing; leave it to the next lowering operation(s) to complain that we didn't provide what was necessary. They'll have to, anyway, since templates mean that NIR cannot ever have enough information to guarantee that a document is well-formed, relative to what would expand from the template. DEV-7145	2022-08-17 14:49:34 -04:00
Mike Gerwitz	4c86c5b63c	tamer: xir::parse::ele: Support nested Sum NTs This allows for a construction like this: ``` ele_parse! { [...] StmtX := QN_X { [...] }; StmtY := QN_Y { [...] }; ExprA := QN_A { [...] }; ExprB := QN_B { [...] }; Expr := (A \| B); Stmt := (StmtX \| StmtY); // This previously was not allowed: StmtOrExpr := (Stmt \| Expr); } ``` There were initially two barriers to doing so: 1. Efficiently matching; and 2. Outputting diagnostic information about the union of all expected elements. The first was previously resolved with the introduction of `NT::matches`, which is macro-expanded in a way that Rust will be able to optimize a bit. Worst case, it's effectively a linear search, but our Sum NTs are not that deep in practice, so I don't expect that to be a concern. The concern that I was trying to avoid was heap-allocated `NodeMatcher`s to avoid recursive data structures, since that would have put heap access in a very hot code path, which is not an option. That left problem #2, which ended up being the harder problem. The solution was detailed in the previous commit, so you should look there, but it amounts to being able to format individual entries as if they were a part of a list by making them a function of not just the matcher itself, but also the number of items in (recursively) the sum type and the position of the matcher relative to that list. The list length is easily computed (recursively) at compile-time using `const` functions (`NT::matches_n`). And with that, NIR can be abstracted in sane ways using Sum NTs without a bunch of duplication that would have been a maintenance burden and an inevitable source of bugs (from having to duplicate NT references). DEV-7145	2022-08-17 10:44:53 -04:00
Mike Gerwitz	fd3184c795	tamer: fmt (ListDisplayWrapper::fmt_nth): List display without a slice This exposes the internal rendering of `ListDisplayWrapper::fmt` such that we can output a list without actually creating a list. This is used in an upcoming change for =ele_parse!= so that Sum NTs can render the union of all the QNames that their constituent NTs match on, recursively, as a single list, without having to create an ephemeral collection only for display. If Rust supports const functions for arrays/Vecs in the future, we could generate this at compile-time, if we were okay with the (small) cost, but this solution seems just fine. But output may be even _more_ performant since they'd all be adjacent in memory. This is used in these secenarios: 1. Diagnostic messages; 2. Error messages (overlaps with #1); and 3. `Display::fmt` of the `ParseState`s themselves. The reason that we want this to be reasonably performant is because #3 results in a _lot_ of output---easily GiB of output depending on what is being traced. Adding heap allocations to this would make it even slower, since a description is generated for each individual trace. Anyway, this is a fairly simple solution, albeit a little bit less clear, and only came after I had tried a number of other different approaches related to recursively constructing QName lists at compile time; they weren't worth the effort when this was so easy to do. DEV-7145	2022-08-17 10:44:28 -04:00
Mike Gerwitz	6b29479fd6	tamer: xir::fmt (DisplayFn): New fn wrapper See the docblock for a description. This is used in an upcoming commit for =ele_parse!=. DEV-7145	2022-08-17 10:01:47 -04:00
Mike Gerwitz	4177b8ed71	tamer: xir::parse::ele: Streaming attribute parsing This allows using a `[attr]` special form to stream attributes as they are encountered rather than aggregating a static attribute list. This is necessary in particular for short-hand template application and short-hand function application, since the attribute names are derived from template and function parameter lists, which are runtime values. The syntax for this is a bit odd since there's a semi-useless and confusing `@ {} => obj` still, but this is only going to be used by a couple of NTs and it's not worth the time to clean this up, given the rather significant macro complexity already. DEV-7145	2022-08-16 23:06:38 -04:00
Mike Gerwitz	43c64babb0	tamer: xir::parse::ele: Superstate element preemption This uses the same mechanism that was introduced for handling `Text` nodes in mixed content, allowing for arbitrary element `Open` matches for preemption by the superstate. This will be used to allow for template expansion virtually anywhere. Unlike the existing TAME, it'll even allow for it at the root, though whether that's ultimately permitted is really depending on how I approach template expansion; it may fail during a later lowering operation. This is interesting because this approach is only possible because of the CPS-style trampoline implementation. Previously, with the composition-based approach, each and every parser would have to perform this check, like we had to previously with `Text` nodes. As usual, this is still adding to the mess a bit, and it'll need some future cleanup. DEV-7145	2022-08-16 15:47:41 -04:00
Mike Gerwitz	6f53c0971b	tamer: xir::parse::ele: Superstate text node preemption This introduces the concept of superstate node preemption generally, which I hope to use for template application as well, since templates can appear in essentially any (syntatically valid, for XML) position. This implements mixed content handling by defining the mapping on the superstate itself, which really simplifies the problem but foregoes fine-grained text handling. I had hoped to avoid that, but oh well. This pushes the responsibility of whether text is semantically valid at that position to NIR->AIR lowering (which we're not transition to yet), which is really the better place for it anyway, since this is just the grammar. The lowering to AIR will need to validate anyway given that template expansion happens after NIR. Moving on! DEV-7145	2022-08-16 12:26:24 -04:00
Mike Gerwitz	65b42022f0	tamer: xir::st: Prefix all preproc-namespaced constants with `QN_P_` I had previously avoided this to keep names more concise, but now it's ambiguous with parsing actual TAME sources. DEV-7145	2022-08-15 13:00:10 -04:00
Mike Gerwitz	13641e1812	tamer: diagnose::report: `int_log` feature: {=>i}log10 https://github.com/rust-lang/rust/pull/100332 The above MR replaces `log10` and friends with `ilog10`; this is the first time an unstable feature bit us in a substantially backwards-incompatible way that's a pain to deal with. Fortunately, I'm just not going to deal with it: this is used with the diagnostic system, which isn't yet used by our projects (outside of me testing), and so those builds shouldn't fail before people upgrade. This is now pending stabalization with the new name, so hopefully we're good now: https://github.com/rust-lang/rust/issues/70887#issuecomment-1210602692	2022-08-12 16:42:30 -04:00
Mike Gerwitz	2a36bc4210	tamer: (explicit_generic_args_with_impl_trait): Remove unstable feature flag This was stabalized in Rust 1.63. I was waiting to be sure our build servers were updated properly before removing this (and they were, long ago).	2022-08-12 16:42:30 -04:00
Mike Gerwitz	ed8a2ce28a	tamer: xir::parse::ele: Superstate not to accept early EOF This was accepting an early EOF when the active child `ParseState` was in an accepting state, because it was not ensuring that anything on the stack was also accepting. Ideally, there should be nothing on the stack, and hopefully in the future that's what happens. But with how things are today, it's important that, if anything is on the stack, it is accepting. Since `is_accepting` on the superstate is only called during finalization, and because the check terminates early, and because the stack practically speaking will only have a couple things on it max (unless we're in tail position in a deeply nested tree, without TCO [yet]), this shouldn't be an expensive check. Implementing this did require that we expose `Context` to `is_accepting`, which I had hoped to avoid having to do, but here we are. DEV-7145	2022-08-12 00:47:15 -04:00
Mike Gerwitz	a4419413fb	tamer: parse::trace: Include context This is something that I had apparently forgotten to do, but is now useful in debugging `ele_parse!` issues with the trampoline. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	54d8348e95	tamer: Add `--quiet` flag to `make check` (`cargo test`) I wonder when this option was introduced, unless I never saw it because it is called "quiet". But this is what I always wanted (and how I write the output for my own tools, like progtest in this repo); the output has long gotten far too large. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	22a9596cf4	tamer: xir::parse::ele: Hoist whitespace/comment handling to superstate All child parsers do the same thing, so this simplifies things. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	f8a9e952e5	tamer: xir::parse::ele: Correct handling of sum dead state post-recovery Along with this change we also had to change how we handle dead states in the superstate. So there were two problems here: 1. Sum states were not yielding a dead state after recovery, which meant that parsing was unable to continue (we still have a `todo!`); and 2. The superstate considered it an error when there was nothing left on the stack, because I assumed that ought not happen. Regarding #2---it _shouldn't_ happen, _unless_ we have extra input after we have completed parsing. Which happens to be the case for this test case, but more importantly, we shouldn't be panicing with errors about TAMER bugs if somebody puts extra input after a closing root tag in a source file. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	b95ec5a9d8	tamer: xir::parse::ele: Adjust diagnostic display of expected element list This does two things: 1. Places the expected list on a separate help line as a footnote where it'll be a bit more tolerable when it inevitably overflows the terminal width in certain contexts (we may wrap in the future); and 2. Removes angled brackets from the element names so that they (a) better correspond with the span which highlights only the element name and (b) do not imply that the elements take no attributes. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	67ee914505	tamer: xir::parse::ele: Store matching QName on NS match When we match a QName against a namespace, we ought to store the matching QName to use (a) in error messages and (b) to make available as a binding. The former is necessary for sensible errors (rather than saying that it's e.g. expecting a closing `t:*`) and the latter is necessary for e.g. getting the template name out of `t:foo`. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	8cb03d8d16	tamer: xir::parse::ele: Initial namespace prefix matching support This allows matching on a namespace prefix by providing a `Prefix` instead of a `QName`. This works, but is missing a couple notable things (and possibly more): 1. Tracking the QName that is _actually_ matched so that it can be used in messages stating what the expected closing tag is; and 2. Making that QName available via a binding. This will be used to match on `t:*` in NIR. If you're wondering how attribute parsing is supposed to work with that (of course you're wondering that, random person reading this)---that'll have to work differently for those matches, since template shorthand application contains argument names as attributes. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	f9fe4aa13b	tamer: xir::st: Static namespace prefixes (c and t) In particular, `t:*` will be recognized by NIR for short-hand template application. These will be utilized in an upcoming commit. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	88fa0688fa	tamer: xir::parse::ele: Abstract node matching This introduces `NodeMatcher`, with the intent of introducing wildcard QName matches for e.g. `t:*` nodes. It's not yet clear if I'll expand this to support text nodes yet, or if I'll convert text nodes into elements to re-use the existing system (which I had initially planned on doing, but didn't because of the work and expense (token expansion) involved in the conversion). DEV-7145	2022-08-12 00:47:13 -04:00
Mike Gerwitz	7b9bc9e108	tamer: xir::parse::ele: Ignore Text nodes for now I need to move on, and there are (a) a couple different ways to proceed that I want to mull over and (b) upcoming changes that may influence my decision one way or another. DEV-7145	2022-08-12 00:47:12 -04:00
Mike Gerwitz	4aaf91a9e7	tamer: xir::parse::ele: Un-nest child parser errors This will utilize the superstate's error object in place of nested errors, which was the result of the previous composition-based delegation. As you can see, all we had to do was remove the special handling of these errors; the existing delegation setup continues to handle the types properly with no change. The composition continues to work for `*Attr_`. The alternative was to box inner errors, since they're far from the hot code path, but that's clearly unnecessary. To be clear: this is necessary to allow for recursive grammars in `ele_parse` without creating recursive data structures in Rust. DEV-7145	2022-08-10 11:46:54 -04:00
Mike Gerwitz	adf7baf115	tamer: xir::parse::ele: Handle comments like whitespace Comments ought not have any more semantic meaning than whitespace. Other languages may have conventions that allow for various types of things in comments, like annotations, but those are symptoms of language limitations---we control the source language here. DEV-7145	2022-08-10 11:46:54 -04:00
Mike Gerwitz	15e04d63e2	tamer: xir::parse::ele: Transition trampoline This properly integrates the trampoline into `ele_parse!`. The implementation leaves some TODOs, most notably broken mixed text handling since we can no longer intercept those tokens before passing to the child. That is temporarily marked as incomplete; see a future commit. The introduced test `ParseState`s were to help me reason about the system intuitively as I struggled to track down some type errors in the monstrosity that is `ele_parse!`. It will fail to compile if those invariants are violated. (In the end, the problems were pretty simple to resolve, and the struggle was the type system doing its job in telling me that I needed to step back and try to reason about the problem again until it was intuitive.) This keeps around the NT states for now, which are quickly used to transition to the next NT state, like a couple of bounces on a trampoline: NT -> Dead -> Parent -> Next NT This could be optimized in the future, if it's worth doing. This also makes no attempt to implement tail calls; that would have to come after fixing mixed content and really isn't worth the added complexity now. I (desperately) need to move on, and still have a bunch of cleanup to do. I had hoped for a smaller commit, but that was too difficult to do with all the types involved. DEV-7145	2022-08-10 11:46:45 -04:00
Mike Gerwitz	233fa7de6a	tamer: diagnose::panic: New module This change introduces diagnostic messages for panics. The intent is to be able to use panics in situations where it is either not possible to or not worth the time to recover from errors and ensure a consistent/sensible system state. In those situations, we still ought to be able to provide the user with useful information to attempt to get unstuck, since the error is surely in response to some particular input, and maybe that input can be tweaked to work around the problem. Ideally, invalid states are avoided using the type system and statically verified at compile-time. But this is not always possible, or in some cases may be way more effort or cause way more code complexity than is worth, given the unliklihood of the error occurring. With that said, it's been interesting, over the past >10y that TAME has existed, seeing how unlikely errors do sometimes pop up many years after they were written. It's also interesting to have my intuition of what is "unlikely" challenged, but hopefully it holds generally. DEV-7145	2022-08-09 15:20:37 -04:00
Mike Gerwitz	454b7a163f	tamer: xir::parse::ele: Move repeat configuration out of Context I had previously used `Context` to hold the parser configuration for repetition, since that was the easier option. But I now want to utilize the `Context` for a stack for the superstate trampoline, and I don't want to have to deal with the awkwardness of the repetition in doing so, since it requires that the configuration be created during delegation, rather than just being passed through to all child parsers. This adds to a mess that needs cleaning up, but I'll do that after everything is working. DEV-7145	2022-08-08 15:23:55 -04:00
Mike Gerwitz	6bc872eb38	tamer: xir::parse::ele: Generate superstate And here's the thing that I've been dreading, partly because of the `macro_rules` issues involved. But, it's not too terrible. This module was already large and complex, and this just adds to it---it's in need of refactoring, but I want to be sure it's fully working and capable of handling NIR before I go spending time refactoring only to undo it. _This does not yet use trampolining in place of the call stack._ That'll come next; I just wanted to get the macro updated, the superstate generated, and tests passing. This does convert into the superstate (`ParseState::Super`), but then converts back to the original `ParseState` for BC with the existing composition-based delegation. That will go away and will then use the equivalent of CPS, using the superstate+`Parser` as a trampoline. This will require an explicit stack via `Context`, like XIRF. And it will allow for tail calls, with respect to parser delegation, if I decide it's worth doing. The root problem is that source XML requires recursive parsing (for expressions and statements like `<section>`), which results in recursive data structures (`ParseState` enum variants). Resolving this with boxing is not appropriate, because that puts heap indirection in an extremely hot code path, and may also inhibit the aggressive optimizations that I need Rust to perform to optimize away the majority of the lowering pipeline. Once this is sorted out, this should be the last big thing for the parser. This unfortunately has been a nagging and looming issue for months, that I was hoping to avoid, and in retrospect that was naive. DEV-7145	2022-08-08 15:23:55 -04:00
Mike Gerwitz	53a689741b	tamer: parse::state::ParseState::Super: Superstate concept I'm disappointed that I keep having to implement features that I had hoped to avoid implementing. This introduces a "superstate" feature, which is intended really just to be a sum type that is able to delegate to stitched `ParseState`s. This then allows a `ParseState` to transition directly to another `ParseState` and have the parent `ParseState` handle the delegation---a trampoline. This issue naturally arises out of the recursive nature of parsing a TAME XML document, where certain statements can be nested (like `<section>`), and where expressions can be nested. I had gotten away with composition-based delegation for now because `xmlo` headers do not have such nesting. The composition-based approach falls flat for recursive structures. The typical naive solution is boxing, which I cannot do, because not only is this on an extremely hot code path, but I require that Rust be able to deeply introspect and optimize away the lowering pipeline as much as possible. Many months ago, I figured that such a solution would require a trampoline, as it typically does in stack-based languages, but I was hoping to avoid it. Well, no longer; let's just get on with it. This intends to implement trampolining in a `ParseState` that serves as that sum type, rather than introducing it as yet another feature to `Parser`; the latter would provide a more convenient API, but it would continue to bloat `Parser` itself. Right now, only the element parser generator will require use of this, so if it's needed beyond that, then I'll debate whether it's worth providing a better abstraction. For now, the intent will be to use the `Context` to store a stack that it can pop off of to restore the previous `ParseState` before delegation. DEV-7145	2022-08-08 15:23:54 -04:00
Mike Gerwitz	7a5f731cac	tamer: tameld: XIRF nesting 64=>4 Since we'll never be reading past the header, this is all that is needed. If in the future this is violated, XIRF will cause a nice diagnostic error displaying precisely what opening tag caused the increased level of nesting, which will aid in debugging and allow us to determine if it ought to be increased. Here's an example, if I set the max to `3`: error: maximum XML element nesting depth of `3` exceeded --> /home/.../foo.xmlo:261:10 \| 261 \| <preproc:sym-ref name=":_vproduct:vector_a"/> \| ^^^^^^^^^^^^^^^^ error: this opening tag increases the level of nesting past the limit of 3 Of course, the longer-term goal is to do away with `xmlo` entirely. This had no (perceivable via `/usr/bin/time -v`, at least) impact on memory or CPU time. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	77efefe680	tamer: xir::attr::parse: Better parser state descriptions The attribute name was neither quoted nor `@`-prefixed. (I noticed this in the traces.) DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	2d117a4864	tamer: xir::parse::ele: Mixed content parsing "Mixed content" is the XML term representing element nodes mixed with text nodes. For example, `foo <strong>bar</strong> baz` is mixed. TAME supports text nodes as documentation, intended to be in a literate style but never fully realized. In any case, we need to permit them, and I wanted to do more than just ignore the nodes. This takes a different approach than typical parser delegation---it has the parent parser _preempt_ the child by intercepting text before delegation takes place, rather than having the child reject the token (or possibly interpret it itself!) and have to handle an error or dead state. And while this makes it more confusing in terms of state machine stitching, it does make sense, in the sense that the parent parser is really what "owns" the text node---the parser is delegating _element_ parsing only, take asserts authority when necessary to take back control where it shouldn't be delegated. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	8779abe2bb	tamer: xir::flat: Expose depth for all node-related tokens Previously a `Depth` was provided only for `Open` and `Close`. This depth information, for example, will be used by NIR to quickly determine whether a given parser ought to assert ownership of a text/comment token rather than delegating it. This involved modifying a number of test cases, but it's worth repeating in these commits that this is intentional---I've been bit in the past using `..` in contexts where I really do want to know if variant fields change so that I can consider whether and how that change may affect the code utilizing that variant. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	b3c0bdc786	tamer: xir::parse::ele: Ignore whitespace around elements Recent changes regarding whitespace were all to support this change (though it was also needed for XIRF, pre- and post-root). Now I'll have to conted with how I want to handle text nodes in various circumstances, in terms of `ele_parse!`. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	8f3301431c	tamer: span::dummy: New module to hold DUMMY_SPAN and derivatives Various DUMMY_SPAN-derived spans are used by many test cases, so this finally extracts them---something I've been meaning to do for some time. This also places DUMMY_SPAN behind a `cfg(test)` directive to ensure that it is _only_ used in tests; UNKNOWN_SPAN should be used when a span is actually unknown, which may also be the case during development. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	0edb21429d	tamer: parse::error: Describe unexpected token of input When Parser has a unhandled dead state and fails due to an unexpected token of input, we should display what we interpreted that token as. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	18803ea576	tamer: xir: Format tokens without tt quotes Whether or not quoting is appropriate depends on context, and that parent context is already performing the quoting. For example: error: expected `</rater>`, but found `<import>` --> /home/[...]/foo.xml:2:1 \| 2 \| <rater xmlns="http://www.lovullo.com/rater" \| ------ note: element starts here --> /home/[...]/foo.xml:7:3 \| 7 \| <import package="/rater/core/base" /> \| ^^^^^^^ error: expected `</rater>` In these cases (obviously I'm still working on the parser, since this is nonsense), the parser is responsible for quoting the token "<import>". DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	8778976018	tamer: xir::flat: Ignore whitespace both before and after root DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	4f2b27f944	tamer: xir: Attribute error formatting/typo fixes There were two problem errors: one showing "element element" and one showing the value along with the name of the attribute. The change for `<Attr as Display>::fmt` is debatable. I'm going to do this for now (only show `@name`) and adjust later if necessary. I'll need to go use `crate::fmt` consistently in previously-existing format strings at some point, too. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	41b41e02c1	tamer: Xirf::Text refinement This teaches XIRF to optionally refine Text into RefinedText, which determines whether the given SymbolId represents entirely whitespace. This is something I've been putting off for some time, but now that I'm parsing source language for NIR, it is necessary, in that we can only permit whitespace Text nodes in certain contexts. The idea is to capture the most common whitespace as preinterned symbols. Note that this heuristic ought to be determined from scanning a codebase, which I haven't done yet; this is just an initial list. The fallback is to look up the string associated with the SymbolId and perform a linear scan, aborting on the first non-whitespace character. This combination of checks should be sufficiently performant for now considering that this is only being run on source files, which really are not all that large. (They become large when template-expanded.) I'll optimize further if I notice it show up during profiling. This also frees XIR itself from being concerned by Whitespace. Initially I had used quick-xml's whitespace trimming, but it messed up my span calculations, and those were a pain in the ass to implement to begin with, since I had to resort to pointer arithmetic. I'd rather avoid tweaking it. tameld will not check for whitespace, since it's not important---xmlo files, if malformed, are the fault of the compiler; we can ignore text nodes except in the context of code fragments, where they are never whitespace (unless that's also a compiler bug). Onward and yonward. DEV-7145	2022-08-01 15:01:37 -04:00
Mike Gerwitz	b38c16fd08	tamer: parse::trace: Generalize reason for trace output The trace outputs a note in the footer indicating _why_ it's being output, so that the reader understands both where the potentially-unexpected behavior originates from and so they know (in the case of the feature flag) how to inhibit it. That information originally lived in `Parser`, where the `cfg` directive to enable it lives, but it was moved into the abstraction. This corrects that. DEV-7145	2022-08-01 15:01:12 -04:00
Mike Gerwitz	17327f1b64	tamer: parse::trace: Extract tracing into new module This has gotten large and was cluttering `feed_tok`. This also provides the ability to more easily expand into other types of tracing in the future. DEV-7145	2022-07-26 09:29:17 -04:00
Mike Gerwitz	8f25c9ae0a	tamer: parse::parser: Include object in parser trace This information is likely redundant in a lowering pipeline, but is more useful outside of such a pipeline. It's also more clear. `Object` does not implement `Display`, though, because that's too burdensome for how it's currently used. Many `Object`s are also `Token`s though and, if fed to another `Parser` for lowering, it'll get `Display::fmt`'d. DEV-7145	2022-07-26 09:28:39 -04:00
Mike Gerwitz	4b5e51b0f0	tamer: parse::parser::Parser::feed_tok: cfg note precedence Rust was warning that `cfg` was unused if both `test` and `parser-trace-stderr`. This both allows that and adjusts the precedence to make more sense for tests. DEV-7145	2022-07-26 09:28:39 -04:00
Mike Gerwitz	c3dfcc565c	tamer: parse::parser::Parser: Include errors in parse trace Because of recovery, the trace otherwise paints a really confusing-looking picture when given unexpected input. This is large enough now that it really ought to be extracted from `feed_tok`, but I'll wait to see how this evolves further. I considered adding color too, but it's not yet clear to me that the visual noise will be all that helpful. DEV-7145	2022-07-26 09:28:37 -04:00
Mike Gerwitz	422f3d9c0c	tamer: New parser-trace-stderr feature flag This flag allows toggling the parser trace that was previously only available to tests. Unfortunately, at the time of writing, Cargo cannot enable flags in profiles, so I have to check for either `test` or this flag being set to enable relevant features. This trace is useful as I start to run the parser against existing code written in TAME so that our existing systems can help to guide my development. Unlike the current tests, it also allows seeing real-world data as part of the lowering pipeline, where multiple `Parser`s are in play. Having this feature flag also makes this feature more easily discoverable to those wishing to observe how the lowering pipeline works. DEV-7145	2022-07-21 22:10:08 -04:00
Mike Gerwitz	de35cc37fd	tamer: xir::writer::XmlWriter: Do not take Token ownership impl for `&Token` instead of Token; the writer is just copying data into the destination stream anyway. This will allow us to continue writing the token while also using it for further processing, like `tee`. DEV-7145	2022-07-21 15:29:55 -04:00
Mike Gerwitz	0504788a16	tamer: xir::parse::ele: Visibility specifier We need to be able to export generated identifiers. Trying to figure out a syntax for this was a bit tricky considering how much is generated, so I just settled on something that's reasonably clear and easy to parse with `macro_rules!`. I had intended to just make everything public by default and encapsulate using private modules, but that then required making everything else that it uses public (e.g. error and token objects), which would have been a bizarre thing to do in e.g. test cases. DEV-7145	2022-07-21 14:56:43 -04:00
Mike Gerwitz	acced76788	tamer: xir::parse::ele: Expand types for external expansion for sum NT Like a previous commit, this corrects the types for sum NTs so that they properly resolve in contexts external to xir::parse. DEV-7145	2022-07-21 13:44:30 -04:00
Mike Gerwitz	992c000b68	tamer: xir::parse::ele: AttrValueError for attr_parse!'s ValueError This integrates the previous ValueError for `attr_parse!` into `ele_parse!`. DEV-7145	2022-07-21 09:23:34 -04:00
Mike Gerwitz	3a764d111e	tamer: xir::parse::attr: Fallible value parsing Values can be parsed using `TryFrom<Attr>`. Previously only `From<Attr>` was supported, which could not fail. This is critical for parsing values into types, which will wrap `SymbolId` to provide data assurances. DEV-7145	2022-07-21 09:23:11 -04:00
Mike Gerwitz	184ff6bdcc	tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module The tests had certain things in scope, but now that I'm trying to use it outside of those modules, some fixes are needed. This is admittedly a sloppy commit, with a number of miscellaneous fixes. I didn't bother separating it more because most of them are type fixes, and the `From<Attr>` stuff is going to have to change into, likely, `TryFrom<Attr>` so that parse failures can occur when attributes do not match certain patterns. DEV-7145	2022-07-20 15:40:28 -04:00
Mike Gerwitz	e517e15a29	tamer: parse::Token: Swap trait method order This just places `ir_name` first in the trait definition so that it'll be inserted in that same order when using LSP. DEV-7145	2022-07-20 13:58:44 -04:00
Mike Gerwitz	c856fd72d9	tamer: xir::parse::ele: Diagnostic output The only additional information needed was opening spans so that we can provide useful information regarding closing tags. This uses a generic Span in place of {Open,Close}Span because the latter wasn't necessary, but more descriptive types would be nice; it may be beneficial later on to introduce newtypes for each of the span generated by {Open,Close}Span. DEV-7145	2022-07-20 12:17:15 -04:00
Mike Gerwitz	ce765d3b56	tamer: xir::parse::attr: Error and recovery on duplicate attr This was a TODO for the attribute parser generator. The first attribute will be kept and later ones will be ignored, producing an error. Recovery permits further attribute parsing having ignored the duplicate. DEV-7145	2022-07-20 12:16:13 -04:00
Mike Gerwitz	21dfff0110	tamer: xir::parse::attr::test: Extract into own file It's not going to be getting any smaller. DEV-7145	2022-07-20 10:02:41 -04:00
Mike Gerwitz	1ec9c963fd	tamer: xir::parse::ele: Nonterminal repetition (Kleene star) This allows an element to be repeated by the parent NT. The easiest way I saw to implement this for now was to abuse the Context to provide a runtime configuration that would allow the state machine to reset after it has completed parsing. This also influences error recovery, in that if we're expecting zero or more of something, we cannot provide an error for an unexpected name, and instead must emit a dead state so that the caller can determine what to do. DEV-7145	2022-07-19 16:14:12 -04:00
Mike Gerwitz	e73c223a55	tamer: parser::Parser: cfg(test) tracing This produces useful parse traces that are output as part of a failing test case. The parser generator macros can be a bit confusing to deal with when things go wrong, so this helps to clarify matters. This is _not_ intended to be machine-readable, but it does show that it would be possible to generate machine-readable output to visualize the entire lowering pipeline. Perhaps something for the future. I left these inline in Parser::feed_tok because they help to elucidate what is going on, just by reading what the trace would output---that is, it helps to make the method more self-documenting, albeit a tad bit more verbose. But with that said, it should probably be extracted at some point; I don't want this to set a precedent where composition is feasible. Here's an example from test cases: [Parser::feed_tok] (input IR: XIRF) \| ==> Parser before tok is parsing attributes for `package`. \| \| Attrs_(SutAttrsState_ { ___ctx: (QName(None, LocalPart(NCName(SymbolId(46 "package")))), OpenSpan(Span { len: 0, offset: 0, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10)), ___done: false }) \| \| ==> XIRF tok: `<unexpected>` \| \| Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) \| \| ==> Parser after tok is expecting opening tag `<classify>`. \| \| ChildA(Expecting_) \| \| Lookahead: Some(Lookahead(Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)))) = note: this trace was output as a debugging aid because `cfg(test)`. [Parser::feed_tok] (input IR: XIRF) \| ==> Parser before tok is expecting opening tag `<classify>`. \| \| ChildA(Expecting_) \| \| ==> XIRF tok: `<unexpected>` \| \| Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)) \| \| ==> Parser after tok is attempting to recover by ignoring element with unexpected name `unexpected` (expected `classify`). \| \| ChildA(RecoverEleIgnore_(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))) \| \| Lookahead: None = note: this trace was output as a debugging aid because `cfg(test)`. DEV-7145	2022-07-19 14:44:18 -04:00
Mike Gerwitz	f462c7daec	tamer: xir::parse::attr: Display: element name This resolves a TODO by including the name of the element whose attributes are currently being parsed. This also frees a parent from having to provide additional context, allowing Display to be fully delegated when stitching. DEV-7145	2022-07-18 14:43:29 -04:00
Mike Gerwitz	2f4c20dac8	tamer: xir::parse::ele: Remaining Display::fmt for nonterminals The following commit (test tracing) requires non-panicing `Display` and `Debug` values. DEV-7145	2022-07-18 14:31:42 -04:00
Mike Gerwitz	cf2cd882ca	tamer: xir::parse::ele: Introduce sum nonterminals This introduces `Nt := (A \| ... \| Z);`, where `Nt` is the name of the nonterminal and `A ... Z` are the inner nonterminals---it produces a parser that provides a choice between a set of nonterminals. This is implemented efficiently by understanding the QName that is accepted by each of the inner nonterminals and delegating that token immediately to the appropriate parser. This is a benefit of using a parser generator macro over parser combinators---we do not need to implement backtracking by letting inner parsers fail, because we know ahead of time exactly what parser we need. This _does not_ verify that each of the inner parsers accept a unique QName; maybe at a later time I can figure out something for that. However, because this compiles into a `match`, there is no ambiguity---like a PEG parser, there is precedence in the face of an ambiguous token, and the first one wins. Consequently, tests would surely fail, since the latter wouldn't be able to be parsed. This also demonstrates how we can have good error suggestions for this parsing framework: because the inner nonterminals and their QNames are known at compile time, error messages simply generate a list of QNames that are expected. The error recovery strategy is the same as previously noted, and subject to the same concerns, though it may be more appropriate here: it is desirable for the inner parser to fail rather than retrying, so that the sum parser is able to fail and, once the Kleene operator is introduced, retry on another potential element. But again, that recovery strategy may happen to work in some cases, but'll fail miserably in others (e.g. placing an unknown element at the head of a block that expects a sequence of elements would potentially fail the entire block rather than just the invalid one). But more to come on that later; it's not critical at this point. I need to get parsing completed for TAME's input language. DEV-7145	2022-07-14 15:12:57 -04:00
Mike Gerwitz	1fdfc0aa4d	tamer: xir::parse::ele: Introduce open/close span bindings This adds the ability to bind identifiers to represent `OpenSpan` and `CloseSpan`, available to the `@` and `/` maps. Since identifiers in TAME originate from attributes, this may not get a whole lot of use, but it's important to be available. There is some awkwardness in that the opening span appears to be scoped to the entire nonterminal, but it's actually only available in the `@` mapping. I'll change this if it's actually needed; this keeps things simple for now. DEV-7145	2022-07-13 23:42:51 -04:00
Mike Gerwitz	cceb8c7fb9	tamer: xir::parse::ele: Initial Close mapping support Since the parsers produce streaming IRs, we need to be able to emit tokens representing closing delimiters, where they are important. This notably doesn't use spans; I'll add those next, since they're also needed for the previous work. DEV-7145	2022-07-13 15:02:46 -04:00
Mike Gerwitz	c30c0e268d	tamer: xir::parse::ele::test: TODO regarding recovery strategy The comment explains the issue. I don't think the strategy is going to be a desirable one, but I want to move on and observe in retrospect how it ought to be handled. The important part right now is that recovery is accounted for and possible, which was a long-standing concern. DEV-7145	2022-07-13 14:25:25 -04:00
Mike Gerwitz	73efc59582	tamer: xir::parse::ele: Initial element parser generator concept This begins generating parsers that are capable of parsing elements. I need to move on, so this abstraction isn't going to go as far as it could, but let's see where it takes me. This was the work that required the recent lookahead changes, which has been detailed in previous commits. This initial support is basic, but robust. It supports parsing elements with attributes and children, but it does not yet support the equivalent of the Kleene star (`*`). Such support will likely be added by supporting parsers that are able to recurse on their own definition in tail position, which will also require supporting parsers that do not add to the stack. This generates parsers that, like all the other parsers, use enums to provide a typed stack. Stitched parsers produce a nested stack that is always bounded in size. Fortunately, expressions---which can nest deeply---do not need to maintain ancestor context on the stack, and so this should work fine; we can get away with this because XIRF ensures proper nesting for us. Statements that _do_ need to maintain such context are not nested. This also does not yet support emitting an object on closing tag, which will be necessary for NIR, which will be a streaming IR that is "near" to the source XML in structure. This will then be used to lower into AIR for the ASG, which gives structure needed for further analysis. More information to come; I just want to get this committed to serve as a mental synchronization point and clear my head, since I've been sitting on these changes for so long and have to keep stashing them as I tumble down rabbit holes covered in yak hair. DEV-7145	2022-07-13 14:08:47 -04:00
Mike Gerwitz	c9b3b84f90	tamer: parse::transition::Lookahead: ParseState=>Token type param Having the lookahead token generic over the `ParseState` was a pain in the ass for stitching, since they shared the same token type but not the same parser. I don't expect there to be any need to be able to infer other parser-related types for a token of lookahead, so I'd rather just make my life easier until such a thing is needed. DEV-7145	2022-07-13 10:13:35 -04:00
Mike Gerwitz	bd783ac08b	tamer: Replace ParseStatus::Dead with generic lookahead Oh what a tortured journey. I had originally tried to avoid formalizing lookahead for all parsers by pretending that it was only needed for dead state transitions (that is---states that have no transitions for a given input token), but then I needed to yield information for aggregation. So I added the ability to override the token for `Dead` to yield that, in addition to the token. But then I also needed to yield lookahead for error conditions. It was a mess that didn't make sense. This eliminates `ParseStatus::Dead` entirely and fully integrates the lookahead token in `Parser` that was previously implemented. Notably, the lookahead token is encapsulated in `TransitionResult` and unavailable to `ParseState` implementations, forcing them to rely on `Parser` for recursion. This not only prevents `ParseState` from recursing, but also simplifies delegation by removing the need to manually handle tokens of lookahead. The awkward case here is XIRT, which does not follow the streaming parsing convention, because it was conceived before the parsing framework. It needs to go away, but doing so right now would be a lot of work, so it has to stick around for a little bit longer until the new parser generators can be used instead. It is a persistent thorn in my side, going against the grain. `Parser` will immediately recurse if it sees a token of lookahead with an incomplete parse. This is because stitched parsers will frequently yield a dead state indication when they're done parsing, and there's no use in propagating an `Incomplete` status down the entire lowering pipeline. But, that does mean that the toplevel is not the only thing recursing. _But_, the behavior doesn't really change, in the sense that it would infinitely recurse down the entire lowering stack (though there'd be an opportunity to detect that). This should never happen with a correct parser, but it's not worth the effort right now to try to force such a thing with Rust's type system. Something like TLA+ is better suited here as an aid, but it shouldn't be necessary with clear implementations and proper test cases. Parser generators will also ensure such a thing cannot occur. I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a lot of type inference that happens based on the fact that `ParseStatus` has a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable for a public-facing `Parsed` to not be tied to `ParseState`, since consumers need not be concerned with such a heavy type; however, we _do_ want that heavy type internally, as it carries a lot of useful information that allows for significant and powerful type inference, which in turn creates expressive and convenient APIs. DEV-7145	2022-07-12 00:11:45 -04:00
Mike Gerwitz	61ce7d3fc7	tamer: parse::state::transition: Extract module into own file That's it. Just preparing for changes that will change how lookahaeds and dead state transitions will work. DEV-7145	2022-07-07 12:47:31 -04:00
Mike Gerwitz	e54f93b30f	tamer: parse: Introduce lookahaed token in Parser NB: This is the initial change to introduce the token of lookahead, but this does not fully integrate it. In particular, this is missing from the stitching/delegation layer. This has been a long time coming, I suppose, though I had tried to avoid it with `Parser::delegate_lookahead`. But the problem with doing that is that it forced the ParserState to recurse, which both violates that I want no looping constructs except for the toplevel, and performs additional stack allocation as it is not in tail position. The final straw was having to both return an error _and_ an aggregate object for the attribute parser when an unexpected element is encountered (this code is not yet committed). One option was to add a recovery object to the error object, and formalize that, but then we have other concerns; for example, what if that recovery object triggered an error? We'd have to mask either the old or the new error. But we wouldn't want to mask either, because the object causing the error would be the aggregate attributes, which is _not_ a recovery object, but actual data we want to emit. And so it's a kluge right off of the bat. The use of a token of lookahaed is a more traditional approach and has uses outside of just this one scenario. It'll also allow for the removal of recursion from the existing ParserStates, and possibly the elimination of dead state associated data, though I may end up leaving that; more to come. Rust will also optimize away lookahead storage and processing in Parsers that do not utilize it. DEV-7145	2022-07-07 11:19:55 -04:00
Mike Gerwitz	6385270fe6	tamer: Ensure debug_assert! takes effect in test profile I'd feel rather silly if I used `debug_assert!` for the sake of tests and they weren't actually being run due to optimization settings. This is just to catch potential future regressions; all is well today. DEV-7145	2022-07-05 14:59:35 -04:00
Mike Gerwitz	40c68d3e1e	tamer: parse::state::TransitionResult: Make opaque There was only one test outside of the `parse` module using these fields. The next commit will be introducing lookahead, and I do not want to have to trust callers to ensure invariants are met. DEV-7145	2022-07-05 14:12:06 -04:00
Mike Gerwitz	a16a0d9138	Revert "tamer: xir: Initial re-introduction of AttrEnd" This reverts commit `b973d36862`. Alright, I'm getting sick of fighting with myself on this. But rather than just removing the last commit, I'm going to keep it around, so that my thoughts are clearly documented for my future quarrels with myself. Firstly: this added more overhead than I wanted it to. While it wasn't significant, it did add 100--150ms to one of our largest systems, up from ~2.8s, which seems a bit much for a token that's really just meant to make life easier for the parser. Further, it seems that all I've managed to do is push my original problem to a different layer---this started as a means to resolve having to emit both an object and an error simultaneously in the case where aggregate attribute parsing has completed, but we encounter an error on the next token (e.g. an unexpected element). But XIRF, if it's missing AttrEnd, should throw an error, but should also recover. Recovery is easy---just assume that it was present---_but then we don't emit a XIRF `AttrEnd` token_, which is necessary for downstream systems. So we'd need to either: (a) emit both a token and an error; or (b) panic. But if we're doing (a), then the need for `AttrEnd` goes away, because it solves the original problem (though the other concerns of the previous commit still stand). (b) is not ideal at all, even though the missing token does represent an internal system error; it's not something the user can correct. But, given that it's something that the user cannot correct, doesn't that imply that it's an awkward thing to include in the token stream? So back to `AttrEnd` being an awkward PITA to have. So, given (a), I'll just do that: errors will become more of a "hey, this error just occurred, but I'm trying to recover---here's an object that you should use if you choose to continue parsing, but it may or may not be what you're looking for; proceed with caution". That flips the original script: I imagined having external systems feed recovery tokens, but this encapsulates recovery within the parser, which really is more appropriate, though less flexible than having an omniscient external recovery system; such a monolith was always an awkward concept and would be difficult to implement cleanly. This can also potentially be implemented as a generalization of the Dead state change that allowed an object to be emitted alongside the lookahead/error. Anyway, back to where I was...I'm sure I'll look back on this in the future shaking my head, reflecting on how naive I was. DEV-7145	2022-06-29 11:25:44 -04:00
Mike Gerwitz	b973d36862	tamer: xir: Initial re-introduction of AttrEnd AttrEnd was initially removed in `0cc0bc9d5a` (and the commit prior), because there was not a compelling reason to use it over a lookahead operation (returning a token via the a dead state transition); `AttrEnd` simply introduced inconsistencies between the XIR reader (which produced AttrEnd) and internal XIR stream generators (e.g. the lowering operations into XIR->XML, which do not). But now that parsers are performing aggregation---in particular the attribute parser-generator `xir::parse::attr`---this has become quite a pain, because the dead state is an actionable token. For example: 1. Open 2. Attr 3. Attr 4. Open 5. ... In the happy case, token #4 results in `Parsed::Incomplete`, and so can just be transformed into the object representing the aggregated attributes. But even in this happy path, it's ugly, and it requires non-tail recursion on the parser which requires a duplicate stack allocation for the `ParserState`. That violates a core principle of the system. But if there is an error at #4---e.g. an unexpected element---then we no longer have a `Parsed::Incomplete` to hijack for our own uses, and we'd have to introduce the ability to return both an error and a token, or we'd have to introduce the ability to keep a token of lookahead instead of reading from the underlying token stream, but that's complicated with push parsers, which are used for parser composition. Yikes. And furthermore, the aggregation has caused me to introduce the ability to override the dead state type to introduce both a token of lookahead and aggregation information. This complicates the system and is going to be confusing to others. Given all of this, AttrEnd does now seem appropriate to reintroduce, since it will allow processing of aggregate operations when encountering that token without having to worry about the above scenario; without having to duplicate a `ParseState` stack; without having to hijack dead state transitions for producing our aggregate object; and everything else mentioned above. This commit does not modify those abstractions to use AttrEnd yet; it re-introduces the token to the core system, not the parser-generators, and it doesn't yet replace lookahead operations in the parsers that use them. That'll come next. Unlike the commit that removed it, though, we are now generating proper spans, so make note of that here. This also does not introduce the concept to XIRF yet, which did not exist at the time that it was removed, so XIRF is filtering it out until a following commit. DEV-7145	2022-06-29 11:02:02 -04:00
Mike Gerwitz	9276d00456	tamer: Cargo.toml: Remove lazy_static This is not longer needed after the previous commit, with static spans having been replaced by `const` spans. This used to be required before Rust acquired better const features, and before I had preinterned symbols. DEV-7145	2022-06-24 14:18:04 -04:00
Mike Gerwitz	c671bf6a9c	tamer: xir: Introduce {Ele,Open,Close}Span This isn't conceptally all that significant of a change, but there was a lot of modify to get it working. I would generally separate this into a commit for the implementation and another commit for the integration, but I decided to keep things together. This serves a role similar to AttrSpan---this allows deriving a span representing the element name from a span representing the entire XIR token. This will provide more useful context for errors---including the tag delimiter(s) means that we care about the fact that an element is in that position (as opposed to some other type of node) within the context of an error. However, if we are expecting an element but take issue with the element name itself, we want to place emphasis on that instead. This also starts to consider the issue of span contexts---a blob of detached data that is `Span` is useful for error context, but it's not useful for manipulation or deriving additional information. For that, we need to encode additional context, and this is an attempt at that. I am interested in the concept of providing Spans that are guaranteed to actually make sense---that are instantiated and manipulated with APIs that ensure consistency. But such a thing buys us very little, practically speaking, over what I have now for TAMER, and so I don't expect to actually implement that for this project; I'll leave that for a personal project. TAMER's already take a lot of my personal interests and it can cause me a lot of grief sometimes (with regards to letting my aspirations cause me more work). DEV-7145	2022-06-24 14:16:29 -04:00
Mike Gerwitz	873e5fc761	tamer: asg::ident: {prolog=>prologue} typo fix Somewhat humorous.	2022-06-23 09:19:12 -04:00
Mike Gerwitz	2fafc331a1	tamer: xir::reader: Opening and closing tag whitespace Non-attribute and non-empty start/end tags will have their whitespace as part of the produced span. This sets us up for a following change that will allow for deriving the name span from this span given a QName, which gives us a span that both represents the entire XIR token and allows deriving the element name. An accurate token span is necessary for parsing errors where an element was not expected, while an element name span is more appropriate for issues of grammar and semantic errors that deal not with the fact that an element was encountered, but _what_ element was encountered. DEV-7145	2022-06-22 15:10:49 -04:00
Mike Gerwitz	e5c8a218c3	tamer: xir::reader: Correct empty element whitespace handling This both adds clarifying tests and corrects the case of `<foo/>`, where the offset was erroneously off by one---it saw that there were no attributes and added a byte thinking it'd include `>`, as in `<foo>`. DEV-7145	2022-06-22 10:28:44 -04:00
Mike Gerwitz	adc45d90df	tamer: xir::parse: Attribute parser generator This is the first parser generator for the parsing framework. I've been waiting quite a while to do this because I wanted to be sure that I understood how I intended to write the attribute parsers manually. Now that I'm about to start parsing source XML files, it is necessary to have a parser generator. Typically one thinks of a parser generator as a separate program that generates code for some language, but that is not always the case---that represents a lack of expressiveness in the language itself (e.g. C). Here, I simply use Rust's macro system, which should be a concept familiar to someone coming from a language like Lisp. This also resolves where I stand on parser combinators with respect to this abstraction: they both accomplish the exact same thing (composition of smaller parsers), but this abstraction doesn't do so in the typical functional way. But the end result is the same. The parser generated by this abstraction will be optimized an inlined in the same manner as the hand-written parsers. Since they'll be tightly coupled with an element parser (which too will have a parser generator), I expect that most attribute parsers will simply be inlined; they exist as separate parsers conceptually, for the same reason that you'd use parser combinators. It's worth mentioning that this awkward reliance on dead state for a lookahead token to determine when aggregation is complete rubs me the wrong way, but resolving it would involve reintroducing the XIR AttrEnd that I had previously removed. I'll keep fighting with myself on this, but I want to get a bit further before I determine if it's worth the tradeoff of reintroducing (more complex IR but simplified parsing). DEV-7145	2022-06-21 13:23:02 -04:00
Mike Gerwitz	9598532d8b	tamer: xir::st: Add missing docs for generated QName constants This was missed. It was not possible, using the documentation alone (without looking at the linked source) to tell what the QName actually represented, though you could assume by the name. DEV-7145	2022-06-21 13:23:01 -04:00
Mike Gerwitz	3f23bc5e33	tamer: fmt: New type-based formatting system This is partly an experiment, but is designed to simplify producing English sentences in various contexts. It makes use of a not only unstable, but incomplete, Rust feature---adt_const_params, for a static str const type parameter. Hopefully that ends up being stabalized. This uses types, but it's the same as function composition due to Rust's monomorphization. DEV-7145	2022-06-10 16:28:15 -04:00
Mike Gerwitz	f7752436da	tamer: parse::Parser: Add remaining field docs DEV-7145	2022-06-07 15:23:20 -04:00
Mike Gerwitz	3c227e5a2d	tamer: parse::ParseState: Remove Default trait bound `ParseState` originally required `Default` for use with `mem::take` in `Parser::feed_tok`. This unfortunately cannot last, since more specialized parsers require context during initialization in order to provide useful diagnostic information. (The other option is to require the caller to augment errors with diagnostic information, but that would have to be duplicated by every caller and complicates parser composition; I'd prefer those diagnostic details remain encapsulated.) Replacing `Default` with `Option` is uglier, but it ends up producing the same assembly as `mem::take` did, at least at the time of writing. Because Rust is able to elide unnecessary moves using this implementation, there is no need for `unwrap_unchecked` or other unsafe methods, which is great, since it shows that this parsing methodology is viable entirely in safe Rust. DEV-7145	2022-06-07 15:08:40 -04:00
Mike Gerwitz	f14ffc87c2	tamer: parse::state::ParseState::DeadToken: New associated type Previously, `ParseStatus::Dead` always yielded `ParseState::Token`. However, I'm working on introducing parsers that aggregate (parsing XML attributes into structs), and those parsers do not know that they have completed aggregation until they reach a dead state; given that, I need to yield additional information at that time. I played around with a number of alternative ideas, but this ended up being the cleanest, relative to the effort involved. For example, introducing another parameter to `ParseStatus::Dead` was too burdensome on APIs that ought not concern themselves with the possibility of receiving an object in addition to a lookahead token, since many parsers are not capable of doing so (given that they map M:(N<=M)). Another option that I abandoned fairly quickly was having `is_accepting` (potentially renamed) return an aggregate object, since that's on the side and didn't feel like it was part of the parsing pipeline. The intent is to abstract this some in a new `ParseState` method for delegation + aggregation. DEV-7145	2022-06-07 09:37:41 -04:00
Mike Gerwitz	495c1438fd	tamer: Consistent span diagram representation I'll document it more formally eventually, but this settles on a mix of the two: square brackets and dashes for intervals, `+` for intersecting lines, byte offsets below interval endpoints, and names below that. The docblock for `Span` itself iss still off; I'll probably just take one of the test cases and paste it there at some point. DEV-7145	2022-06-06 11:32:35 -04:00
Mike Gerwitz	bba181f573	tamer: xir::attr::Attr: Introduce AttrSpan This replaces a tuple with a tuple struct that allows for calculating more complete span information, such as the span encompassing the entire attribute and the value span including the surrounding quotes. This includes logic that ought to be abstracted into `Span` itself, and it's not as formal as I'd like it to be (e.g. not ensuring context), but this is a good starting point. Note that parsers call `Token::span`, which in turn calculates the attribute span, each time an attribute is encountered during lowering. But Rust does a good job at optimizing away unnecessary operations, so this didn't have an observable impact on time. DEV-7145	2022-06-06 11:31:28 -04:00
Mike Gerwitz	2b8e7e6031	tamer: xir::st::qname: New module This moves and deduplicates the static `QName`s into a common area. DEV-7145	2022-06-06 11:31:27 -04:00
Mike Gerwitz	3da82b351e	tamer: xir::flat::{State=>XirToXirf}: Rename Like the previous two commits, this states the intent of this parser, which results in more clear pipeline composition. DEV-7145	2022-06-02 13:48:54 -04:00
Mike Gerwitz	91b55999e2	tamer: asg::air::{AirState=>AirAggregate}: Rename Like the previous commit, this emphasizes what is happening. DEV-7145	2022-06-02 13:26:46 -04:00
Mike Gerwitz	45bbf3879e	tamer: obj::xmlo::{lower=>air}: Rename {LowerState=>XmloToAir} This provides much more clarity as to what is going on. Further, it's less ambiguous, since I'm about to introduce a new type of xmlo lowering into XIR for writing the actual xmlo files. DEV-7145	2022-06-02 13:23:41 -04:00
Mike Gerwitz	8d92667388	tamer: Integrate xir::reader as a parser in the lowering pipeline This allows `XmlXirReader` to be used in a `Lower` operation, just as everything else, bringing me one step closer to a pipeline that can be concisely represented; this is finally beginning to unify in a clear way, though it is still a bit of a mess. This causes `XmlXirReader` to _act_ like a `parse::Parser` in that it yields a `ParsedResult`, but it does not use `parse::Parser` itself; that was the _original_ plan: convert it into a `ParseState` where `XmlXirReader` became a context, and force `Parser` to yield by feeding it a stream of tokens with `repeat`, but that ended up performing poorly relative to this change. I did some investigation, which I might write about in the future, but for now, this solution works just fine. DEV-7145	2022-06-02 10:30:44 -04:00
Mike Gerwitz	f8c28655dc	tamer: parse: Split into multiple modules This abstraction has grown quite a bit, and it's time to start formalizing it a bit. This split doesn't change any behavior, but it does start to make it easier to reason about by clearly stating the broad components and how they interact with one-another. This doesn't yet move the tests; those will come next, but they are very few. The reason I gave previously for this was because (a) they're tested indirectly via the systems that utilize them and (b) because the abstraction was not yet settled on the process was already very expensive. No test coverage was lost---it's only that failures were potentially harder to debug on test failures, but in practice not even this was true, because the deeply expressive types all but ensured that, if it compiles, it will function in a way that is expected. Unit tests and documentation for this system will be added once I'm sure that this abstraction is in a proper state. DEV-7145	2022-06-01 11:32:58 -04:00
Mike Gerwitz	63aa452197	tamer: parse: Move parse::lower into Lower This also modifies `poc` such that `Lower` is invoked as an associated function rather than a method to emphasize the pattern that is forming, so that it can be later abstracted away. DEV-11864	2022-06-01 11:15:43 -04:00
Mike Gerwitz	f40f8bbafc	tamer: parse: Rename {lower__while_ok=>lower_} The `while_ok` can just be implied with a lowering operation, and that reduces the name complexity so that we can maybe introduce even more specialized methods without resulting in a huge sentence as a name. DEV-11864	2022-05-27 14:10:55 -04:00
Mike Gerwitz	b084e23497	tamer: Refactor asg_builder into obj::xmlo::lower and asg::air This finally uses `parse` all the way up to aggregation into the ASG, as can be seen by the mess in `poc`. This will be further simplified---I just need to get this committed so that I can mentally get it off my plate. I've been separating this commit into smaller commits, but there's a point where it's just not worth the effort anymore. I don't like making large changes such as this one. There is still work to do here. First, it's worth re-mentioning that `poc` means "proof-of-concept", and represents things that still need a proper home/abstraction. Secondly, `poc` is retrieving the context of two parsers---`LowerContext` and `Asg`. The latter is desirable, since it's the final aggregation point, but the former needs to be eliminated; in particular, packages need to be worked into the ASG so that `found` can be removed. Recursively loading `xmlo` files still happens in `poc`, but the compiler will need this as well. Once packages are on the ASG, along with their state, that responsibility can be generalized as well. That will then simplify lowering even further, to the point where hopefully everything has the same shape (once final aggregation has an abstraction), after which we can then create a final abstraction to concisely stitch everything together. Right now, Rust isn't able to infer `S` for `Lower<S, LS>`, which is unfortunate, but we'll be able to help it along with a more explicit abstraction. DEV-11864	2022-05-27 13:51:29 -04:00
Mike Gerwitz	eafb3b2a1b	tamer: Add Display impl for each ParseState for generic ParseErrors This is intended to describe, to the user, the state that the parser is in. This will be used to convey additional information for general parser errors, but it should also probably be integrated into parsers' individual errors as well when appropriate. This is something I expected to add at some point, but I wanted to add them because, when dealing with lowering errors, it can be difficult to tell what parser the error originated from. DEV-11864	2022-05-25 15:26:02 -04:00
Mike Gerwitz	9edc32dd3b	tamer: parse::LowerIter: Generic inner TripIter iterator This commit is preparing to compose LowerIter directly. DEV-11864	2022-05-24 10:27:14 -04:00
Mike Gerwitz	f218c452b9	tamer: iter::trip: Flatten Result The `*_iter_while_ok` functions now compose like monads, flattening `Result` at each step and drastically simplifying handling of error types. This also removes the bunch of `?`s at the end of the expression, and allows me to use `?` within the callback itself. I had originally not used `Result` as the return type of the callback because I was not entirely sure how I was going to use them, but it's now clear that I _always_ use `Result` as the return type, and so there's no use in trying to be too accommodating; it can always change in the future. This is desirable not just for cleanup, but because trying to refactor `asg_builder` into a pair of `Parser`s is really messy to chain without flattening, especially given some state that has to leak temporarily to the caller. More on that in a future commit. DEV-11864	2022-05-20 16:08:16 -04:00
Mike Gerwitz	958a707e02	tamer: asg: Hoist Root from Ident into Object This was always the intent, but I didn't have a higher-level object yet. This removes all the awkwardness that existed with working the root in as an identifier. DEV-11864	2022-05-19 12:48:43 -04:00
Mike Gerwitz	6252758730	tamer: asg::Object: Introduce Object::Ident This wraps `Ident` in a new `Object` variant and modifies `Asg` so that its nodes are of type `Object`. This unfortunately requires runtime type checking. Whether or not that's worth alleviating in the future depends on a lot of different things, since it'll require my own graph implementation, and I have to focus on other things right now. Maybe it'll be worth it in the future. Note that this also gets rid of some doc examples that simply aren't worth maintaining as the API evolves. DEV-11864	2022-05-19 12:33:59 -04:00
Mike Gerwitz	f75f1b605e	tamer: num: Header typo correction	2022-05-19 12:02:38 -04:00
Mike Gerwitz	ebf1de5a60	tamer: asg::Ident{Object=>}: Rename I think this may have been renamed _from_ `Ident` some time ago, but I'm too lazy to check. In any case, the name is redundant. DEV-11864	2022-05-19 11:17:04 -04:00
Mike Gerwitz	7d76cb53f6	tamer: asg: Move SymAttrs conversion into asg_builder This is a lowering operation and does not belong here. What a tangled mess this all was (see recent commits); no wonder it was so confusing. DEV-11864	2022-05-19 11:07:15 -04:00
Mike Gerwitz	eae194abc6	tamer: asg::object: Merge into asg::ident Everything in this file relates to identifiers, and I'm about to introduce a higher-level object, one of which may be an identifier. DEV-11864	2022-05-19 11:05:20 -04:00
Mike Gerwitz	92dba0a28c	tamer: obj::xmlo::asg_builder::IdentKindError: Merge into AsgBuilderError Now that these are in the same module, there's no need for them to be separate from one-another. DEV-11864	2022-05-19 10:56:07 -04:00
Mike Gerwitz	07d2ec1ffb	tamer: Move Dim and {Sym=>}Dtype into num module A previous commit mentioned that there's not a place for `Dim`, and duplicated it between `asg` and `xmlo`. Well, `Dtype` is also needed in both, and so here's a home for now. `Dtype` has always been an inappropriate detail for the system and will one day be removed entirely in favor of higher-level types; the machine representation is up to the compiler to decide. DEV-11864	2022-05-19 10:39:21 -04:00
Mike Gerwitz	b2a79e930b	tamer: Move SymAttrs lowering into asg_builder asg_builder is about to be replaced, but in the process of simplifying the destination IR (the ASG), I'm moving things into the proper place. This never belonged here---it belongs with the actual lowering operation. Previously, this was not reasoned about in terms of a lowering operation, and was written when I was first introducing myself to Rust and trying to get a proof-of-concept linker working. DEV-11864	2022-05-19 10:28:17 -04:00
Mike Gerwitz	8948452b71	tamer: asg::ident::Dim: Narrow type This matches xmlo::Dim, and could be the same thing, if we can find a home for it in the future; it's not worth creating such a home right now when I'm not yet sure what else ought to live there; the duplication may be fine. The conversion from xmlo needs to be moved, and `Dim` is going to be used for more than just identifiers (expressions will have type inference performed). DEV-11864	2022-05-19 09:32:43 -04:00
Mike Gerwitz	263cb68380	tamer: parse: Persistent context This allows retrieving and providing a context to a `Parser`. This is intended for use with an aggregating parser, in particular to construct the ASG and return it. This is a component of a change that replaces `asg_builder` with a `Parser`-based lowering into the ASG, but there are still changes that need to be made to simplify things and complete its integration. DEV-11864	2022-05-18 16:15:09 -04:00
Mike Gerwitz	001499d921	tamer: parse::ParseError: Remove Eq trait bound Just as in other commits, since it's an unnecessary limitation. DEV-11864	2022-05-18 16:06:22 -04:00
Mike Gerwitz	3e277270a7	tamer: asg: Track roots on graph Previously, since the graph contained only identifiers, discovered roots were stored in a separate vector and exposed to the caller. This not only leaked details, but added complexity; this was left over from the refactoring of the proof-of-concept linker some time ago. This moves the root management into the ASG itself, mostly, with one item being left over for now in the asg_builder (eligibility classifications). There are two roots that were added automatically: - __yield - __worksheet The former has been removed and is now expected to be explicitly mapped in the return map, which is now enforced with an extern in `core/base`. This is still special, in the sense that it is explicitly referenced by the generated code, but there's nothing inherently special about it and I'll continue to generalize it into oblivion in the future, such that the final yield is just a convention. `__worksheet` is the only symbol of type `IdentKind::Worksheet`, and so that was generalized just as the meta and map entries were. The goal in the future will be to have this more under the control of the source language, and to consolodate individual roots under packages, so that the _actual_ roots are few. As far as the actual ASG goes: this introduces a single root node that is used as the sole reference for reachability analysis and topological sorting. The edges of that root node replace the vector that was removed. DEV-11864	2022-05-17 10:42:05 -04:00
Mike Gerwitz	34eb994a0d	tamer: asg::Asg::set_fragment: {ObjectRef=>SymbolId} In the actual implementation (outside of tests), this is always looking up before adding the symbol. This will simplify the API, while still retaining errors, since the identifier will fail the state transition if the identifier did not exist before attempting to set a fragment. So while this is slower in microbenchmarks, this has no effect on real-world performance. Further, I'm refactoring toward a streaming ASG aggregation, which is a lot easier if we do not need to perform lookups in a separate step from the ASG's primitives. DEV-11864	2022-05-16 13:14:27 -04:00
Mike Gerwitz	c49d87976d	tamer: parse::Token: Remove Eq trait bound `PartialEq` remains, and is all that is needed. See previous commit regarding the removal of this same bound from `Context`. This can be re-added if it ends up actually being necessary. But Tokens are ephemeral and used only in lowering pipelines, using pattern matching. DEV-11864	2022-05-16 10:05:14 -04:00
Mike Gerwitz	d87006391e	tamer: asg::object: Remove IdentObjectState, IdentObjectData These traits are no longer necessary now that I'm using concrete types; they just add unnecessary noise and confusion as I attempt to further refactor. Don't abstract prematurely. DEV-11864	2022-05-12 16:31:36 -04:00
Mike Gerwitz	3748762d31	tamer: asg::graph::Asg: Remove type parameter O This removes the generic on the Asg (which was formerly BaseAsg), hard-coding `IdentObject`, which will further evolve. This makes the IR an actual concrete IR rather than an abstract data structure. These tests bring me back a bit, since they were written as I was still becoming familiar with Rust. DEV-11864	2022-05-12 15:46:17 -04:00
Mike Gerwitz	f2c5443176	tamer: asg: Remove generic Asg, rename {Base=>}Asg This is the beginning of an incremental refactoring to remove generics, to simplify the ASG. When I initially wrote the linker, I wasn't sure what direction I was going in, but I was also negatively influenced by more traditional approaches to both design and unit testing. If we're going to call the ASG an IR, then it needs to be one---if the core of the IR is generic, then it's more like an abstract data structure than anything. We can abstract around the IR to slice it up into components that are a little easier to reason about and understand how responsibilities are segregated. DEV-11864	2022-05-11 16:47:13 -04:00
Mike Gerwitz	0493e68cb3	tamer: parse::ParseState::Context: Add missing comment DEV-11864	2022-05-10 11:06:22 -04:00
Mike Gerwitz	0ef0d2b553	tamer: parse::ParseState:Error: Relax Eq trait bound This is unnecessarily restrictive, since we do not require anything further than `PartialEq` for the situations where we care about equality (tests). DEV-11864	2022-05-06 15:28:47 -04:00
Mike Gerwitz	9f990e19e9	tamer: parse::ParseState::Context: Remove Default trait bound This is too restrictive, especially for parsers that fold into something, like the ASG, which may exist prior to invoking the parser. This moves the trait bound to the functions that actually need it. Those obviously cannot be used if the Context does not implement `Default`, but I'll provide alternative conveniences. DEV-11864	2022-05-05 15:55:04 -04:00
Mike Gerwitz	ba9f429ee7	tamer: obj::xmlo::{XmloEvent=>XmloToken} The original "event" name was based on quick-xml's `Event`. This terminology shift is more closely matched with the new parsing system. DEV-11864	2022-05-05 12:25:59 -04:00
Mike Gerwitz	0281dfdf0d	tamer: Remove wip-frontends feature flag We want the new system to be used so that we can start catching any problems that may arise. Further changes will be flagged as necessary. DEV-10936	2022-05-04 09:37:10 -04:00
Mike Gerwitz	1ad2fb1dc8	Copyright year update 2022 RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no "Group"), but I'm not sure if the legal name has been changed yet or not, so I'll wait on that.	2022-05-03 14:14:29 -04:00
Mike Gerwitz	34fcd19cd0	tamer: obj::xmlo::reader: Replace todo! with error These are no longer TODOs---they represent invalid tokens. I'm going to put effort into providing further context with the diagnostic system [right now] because these are internal errors caused by either miscompilation or an incomplete reader. DEV-10936	2022-05-03 09:19:47 -04:00
Mike Gerwitz	5875477efa	tamer: xir::Token: Remove span from Display This was missed when removing it from other Display impls when the new diagnostic system was introduced. Raw `Span`s display byte offsets and the context, which is no longer desirable as part of an error message. DEV-10936	2022-05-03 09:09:55 -04:00
Mike Gerwitz	a2e6e37ed1	tamer: Bump nightly Rust version 1.{57=>62} This removes a couple of feature flags that are no longer necessary.	2022-05-02 11:05:32 -04:00
Mike Gerwitz	7248ef77e4	tamer: diagnose::resolve{r=>}: Rename Consistent with naming of other modules, which prefers to not needlessly transform words. DEV-12151	2022-05-02 09:49:22 -04:00
Mike Gerwitz	75b966c577	tamer: diagnose: Additional documentation I had waited to provide more documentation until I was sure that the abstraction was not going to change significantly; there was a lot of refactoring in prior commits. DEV-12151	2022-05-02 09:44:53 -04:00
Mike Gerwitz	fc1dad8483	tamer: diagnose::report::Section: Further refactor resolved constructor This speaks for itself. DEV-12151	2022-04-29 15:54:38 -04:00
Mike Gerwitz	ba0ceddd2d	tamer: diagnose::report::Section: Constructor refactoring This moves construction out of `From` and into separate associated functions, which can be further simplified in a bit. We also need unit tests for this, since this still relies on integration tests due to the cost of the aggressive and tight refactoring iterations. DEV-12151	2022-04-29 13:10:04 -04:00
Mike Gerwitz	3e04217741	tamer: diagnose::report::Section::maybe_squash_into: Remove syslabel TODO Previously, when adjacent duplicate spans were both resolved, if one failed, the other certainly would, which would result in duplicate labels each squash. Elided spans do not have syslabels, and so this is no longer a concern. DEV-12151	2022-04-29 13:07:51 -04:00
Mike Gerwitz	2ae6df38e7	tamer: diagnose::report: Restore source line preview for invalid UTF-8 This was removed in a previous commit while working on simplifying the implementation, with the hope of returning to it once things were in a better place. They are, so let's bring it back. DEV-12151	2022-04-29 12:41:56 -04:00
Mike Gerwitz	f8dda12fae	tamer: diagnose::report: Remove TODOs that are no longer applicable These relate to the most recent commits. DEV-12151	2022-04-29 12:34:48 -04:00
Mike Gerwitz	2ce0dbdd84	tamer: diagnose::report::SpanLabel: Remove in favor of separate Level and Label `SpanLabel` was created during a very early refactoring of this system, and I've just been fighting with it sense. This removes it, and simplifies some things in the process. It also makes clear that `Level` is never optional and removes the awkward `Level::default` that was there previously; the default is now the lowest level, which will always be able to be escalated. DEV-12151	2022-04-29 12:13:11 -04:00
Mike Gerwitz	9a5a2c4f3f	tamer: diagnose::report: Avoid re-resolving adjacent identical spans This does what the original proof-of-concept implementation did---skip a span that was just processed, since it'll be squashed into the previous anyway. These duplicate spans originate from the diagnostic system when producing supplemental help information. DEV-12151	2022-04-29 11:57:50 -04:00
Mike Gerwitz	a533244473	tamer: diagnose::report::VisualReporter::render: Avoid mspan collection This used to be necessary when `Report` stored references to heap-allocated strings, but `Report` now owns those values itself. DEV-12151	2022-04-29 09:53:22 -04:00
Mike Gerwitz	b0a5265ad3	tamer: diagnose::report::test: Extract into separate file Tests are large and will be getting larger. The source will also grow as it's better documented and cleaned up. It's getting more difficult to navigate efficiently and concurrently modify implementation and tests, and parsing via LSP is getting slower with certain types of changes. DEV-12151	2022-04-29 09:23:06 -04:00
Mike Gerwitz	5c0e224d3c	tamer: diagnose::report: Line numbers in gutter Alright, starting to settle on an abstraction now, and things are coming together. This gives us line numbers in the previously-empty gutter, and widens the gutter to accommodate. Gutters are normalized across sections. Sections are not yet collapsed for sequential line numbers in the same context. Exciting! Here's an example, on an xmlo file: error: expected closing tag for `preproc:symtable` --> /home/.../foo.xmlo:16:4 \| 16 \| <preproc:symtable xmlns:map="http://www.w3.org/2005/xpath-functions/map"> \| ----------------- note: element `preproc:symtable` is opened here --> /home/.../foo.xmlo:11326:4 \| 11326 \| </preproc:wrong> \| ^^^^^^^^^^^^^^^^ error: expected `</preproc:symtable>` DEV-12151	2022-04-28 23:53:38 -04:00
Mike Gerwitz	5744e08984	tamer: diagnostic::report: Hoist gutter output into Section The `Section` itself is now responsible for outputting the gutter, which puts us in a position to be able to apply consistent formatting without having to propagate width data to every line variant.	2022-04-28 22:59:13 -04:00
Mike Gerwitz	4e03a367a5	tamer: diagnose::report::SourceLine: Separate variants for each line Now `SourceLine` _does_ actually correspond to a line of output, which will allow for better formatting (e.g. collapsing padding) and, importantly, proper management of gutters. Note that the seemingly unnecessary `SectionSourceLine` allows for a subtle consistent formatting for all variants' gutters in `SectionLine`, which will allow us to hoist that rendering out in the next commit. The other option was to include a trailing space for padding and marks, but that is not only sloppy and undesirable, but asking for confusion, especially in editors (like mine) that trim trailing whitespace. DEV-12151	2022-04-28 22:49:35 -04:00
Mike Gerwitz	fd1c6430a8	tamer: diagnose::report::SectionSourceLine: {Option<Column>=>Column} If a column isn't present, it degrades to displaying labels like footnotes anyway, so this simplifies the system rather than catering to a rare case. With that said, this does lose functionality, since it does not render the source line at all, even though we _could_ do so. I may re-introduce that rendering after some further refactoring, specifically for gutters. DEV-12151	2022-04-28 22:23:58 -04:00
Mike Gerwitz	3a5dcfc016	tamer: diagnose::resolver::SourceLine: {Vec<u8>=>String} Using a byte vector just makes life more difficult with regard to preparing the diagnostic reports. We're already validating UTF-8 data for column generation, which is necessary for a robust report, so let's just store it as a String to begin with. DEV-12151	2022-04-28 22:03:37 -04:00
Mike Gerwitz	838db689ad	tamer: diagnose::report: Render labels on mark line Note that, if a span is first encountered with a mark but with _no_ label, the first label (if collapsed) will be on the next line. This allows a span to be marked without extra visual noise if it's not necessary, and to be able to trust that it'll stay that way. Until coloring is introduced, this may or may not be easier to read depending on context. This is also not yet taking into account where on the line it begins, and so may render poorly if the span is at the end of a line. That will be fixed later on. DEV-12151	2022-04-28 16:23:13 -04:00
Mike Gerwitz	a197267a2d	tamer: xir::flat: Remove closing tag name from label This is now visible in the diagnostic output. Example at this point in time, on an xmlo file for one of our smallest systems: error: expected closing tag for `preproc:symtable` --> /home/.../foo.xmlo:16:4 \| \| <preproc:symtable xmlns:map="http://www.w3.org/2005/xpath-functions/map"> \| ----------------- = note: element `preproc:symtable` is opened here --> /home/.../foo.xmlo:11326:4 \| \| </preproc:wrong> \| ^^^^^^^^^^^^^^^^ = error: expected `</preproc:symtable>` DEV-12151	2022-04-28 15:47:34 -04:00
Mike Gerwitz	33baca113a	tamer: diagnose::report: Vary mark character depending on level Looking more and more Rust-like. Shameless copy. TBH I forget what character it uses for help, but it's easy enough to change. Also, to be clear: this is modeled after Rust, but it's not a requirement of mine that it look exactly like it. I just like the general style; I'll surely deviate over time, as appropriate (or as I feel like it). DEV-12151	2022-04-28 15:44:50 -04:00
Mike Gerwitz	8119d1ca0d	tamer: diagnose::report: Render span marks under lines This has the effect of highlighting the columns of the source lines using '^' as an underline. The next step will be to have the underline character depend on the `Level`. If this commit message doesn't sound all that exciting, given what it finally achieved after all this time, it's because I'm exhausted, and my prototype has already taken my excitement. But this is significant, given all the work leading up to it. There is some code cleanup needed and some unit tests that ought to be written rather than relying on integration, but considering how much this is being refactored, I don't want to add to that refactoring cost just yet before gutters are introduced and I know things are settled for now. DEV-12151	2022-04-28 15:44:49 -04:00
Mike Gerwitz	5db026ed76	tamer: diagnose::report: Initial display of source lines This has been a lot of refactoring for something that I prototyped a week ago, and the prototype is still further along in its output formatting (it has line numbering in gutters and span markings). But, this has come a long way, and I'm happy with it overall, though I'm not happy with my slow pace and struggle to maintain focus. But those are personal issues. This leaves a lot to be desired, but at the same time is still really helpful. There's a couple notable TODOs regarding pointless allocation and UTF8 re-checking, but otherwise, the feature-related steps are: - Gutters with line numbers; and - Marking columns associated with the span. DEV-12151	2022-04-28 14:33:08 -04:00
Mike Gerwitz	3e06c9aaf3	tamer: diagnose::report: Prepare Section for output of source lines This lowers the resolved span data into `Section` for display. The next step is to actually output it. DEV-12151	2022-04-28 13:34:05 -04:00
Mike Gerwitz	331aada2bd	tamer: diagnose::report::MaybeResolvedSpan: Move up in file Just rearranging, since this was awkwardly placed relative to where it's used. DEV-12151	2022-04-28 11:00:36 -04:00
Mike Gerwitz	6a5a29c2f5	tamer: diagnose::report: Remove Section variants and eagerly squash Rather than squashing as a separate operation, and explicitly denoting when it occurred, we'll just always squash, as was done before these changes. It doesn't really make sense to make this optional and there's not any value in keeping the decision around. This also sets us up favorably for future changes: it creates a vector of labels, which can be analyzed later to determine how to best lay out marks and labels. DEV-12151	2022-04-28 10:30:04 -04:00
Mike Gerwitz	c8d919d0cc	tamer: diagnose::report: {'l=>'d} Just renames the lifetime to refer to the `Diagnostic`, rather than a `Label` returned by it, which was all `'l` was previously used for. Note that many labels have a `'static` lifetime; this doesn't change that or somehow cause it to reallocate; the label must life _for at least `'d`_. DEV-12151	2022-04-27 15:20:16 -04:00
Mike Gerwitz	e2c68c5e84	tamer: diagnose::report: Avoid message copy Rather than rendering the diagnostic `Display` message to a string only to copy it to yet another buffer later on, this simply stores a reference to the `Diagnostic` that was provided. This also adds a type to the `Report` associating it with the provided `Diagnostic`, which does seem appropriate, given that the report was produced for it. I should probably rename '{l=>d} now. DEV-12151	2022-04-27 15:20:14 -04:00
Mike Gerwitz	3dbab881da	tamer: diagnose::report: Produce Report object Rather than writing to the provided `Write` object, this produces a `Report` object. While a lifetime still exists for the diagnostic data (labels, specifically), I was able to remove the other lifetime resulting from `ResolvedSpan` by transferring ownership of the data to the `Report` itself. Once actual source lines are integrated shortly, `Report` will include those as well. This has been a tedious process, but it's coming together. Hopefully these commits documenting the progressive and ugly refactoring are found useful by some reader in the future. DEV-12151	2022-04-27 15:00:30 -04:00
Mike Gerwitz	3679ff590c	tamer: diagnose::report: Remove `L` type parameter The line number was getting special treatment that is simply not worth the cost (with regards to how burdensome it is on the type definitions). This simplifies things quite a bit. If we want header customization in the future, we can worry about that in a different way, or allow the header as a whole to be swapped out, rather than its constituents. DEV-12151	2022-04-27 14:23:58 -04:00
Mike Gerwitz	589f5e8c58	tamer: diagnose::report::HeadingLineNum: Compose HeadingColNum `HeadingColNum` is no longer constructed by `HeadingLineNum`. This both narrows the types and required data (e.g. removing dummy values in test cases), and reduces the coupling (by favoring composition, but still coupled with the concrete type). DEV-12151	2022-04-27 11:43:46 -04:00
Mike Gerwitz	7dbe25be05	tamer: diagnose::report::HeadingLineNum: Lower MaybeResolvedSpan Same as the previous commit with `HeadingColNum`---this removes the dependency on `MaybeResolvedSpan`. DEV-12151	2022-04-27 11:28:17 -04:00
Mike Gerwitz	68f9f4d241	tamer: diagnose::report::HeadingColNum: Lower MaybeResolvedSpan This eliminates `MaybeResolvedSpan` from `HeadingColNum`, along with its type parameters and lifetimes. DEV-121251	2022-04-27 11:10:16 -04:00
Mike Gerwitz	f29918b5a0	tamer: diagnose::report: Continue refactoring into report components I'm unhappy with the current state of this, which is why I haven't settled on docs or unit tests for these changes yet (though note that the integration tests do cover these changes)---this is still a prototype refactoring. In particular, this needs to do more lowering---the `ResolvedSpan` and `MaybeResolvedSpan` need to be eliminated and lowered into exactly what is needed so that we can stop reasoning about them and propagating them. Further, having lines and columns lazily evaluate themselves for display---based on `MaybeResolvedSpan`---adds extra generics that shouldn't be necessary; they should be pre-computed and store the concrete data they need in variants. Display shouldn't involve computation beyond formatting of pre-computed data. That was always the plan, but this refactoring has been incremental. Anyway: this is in a working and integration-tested state, but it's going to change. DEV-12151	2022-04-27 10:48:41 -04:00
Mike Gerwitz	e2f9d71c1f	tamer: diagnose::report: Refined report components This generalizes the types a bit more and introduces unit tests. Note that these are still also covered by integration tests. The next step will be to finish generalizing `<VisualReporter as Reporter>::render`, after which I'll get back to the task of outputting the source line along with markings and labels. DEV-12151	2022-04-26 13:26:52 -04:00
Mike Gerwitz	d05bcaab03	tamer: {Resolved,Span}::{ctx=>context}: Rename This is just to provide clarity. `ctx` is not so widely used that we benefit from such a short identifier, and it's not worth the cognitive burden of people unfamiliar with what it may mean. DEV-12151	2022-04-26 10:52:32 -04:00
Mike Gerwitz	16d76b95d0	tamer: diagnose::resolver::ResolvedSpanData: New trait This provides the methods originally implemented on `ResolvedSpan` itself, which will allow for mocking for unit testing. DEV-12151	2022-04-26 10:46:47 -04:00
Mike Gerwitz	0928427116	tamer: diagnose::resolver::Column::At: Remove This is redundant with the `Endpoints` variant, although it did read better. It's just another case to have to handle. I was originally going to use `std::ops::RangeInclusive` for `Endpoints`, however that struct also contains an extra bool indicating whether it was exhausted (as an iterator), which isn't appropriate for this. DEV-12151	2022-04-26 10:30:07 -04:00

... 3 4 5 6 7 ...

836 Commits (7cfe6a6f8db9fcb0e88dea145c2bea8df6b29c74)