employer/tame - tame - Mike Gerwitz's Forge

employer

/

tame

Author	SHA1	Message	Date
Mike Gerwitz	6d39474127	tamer: NIR re-simplification Alright, this has been a rather tortured experience. The previous commit began to state what is going on. This is reversing a lot of prior work, with the benefit of hindsight. Little bit of history, for the people who will probably never read this, but who knows: As noted at the top of NIR, I've long wanted a very simple set of general primitives where all desugaring is done by the template system---TAME is a metalanguage after all. Therefore, I never intended on having any explicit desugaring operations. But I didn't have time to augment the template system to support parsing on attribute strings (nor am I sure if I want to do such a thing), so it became clear that interpolation would be a pass in the compiler. Which led me to the idea of a desugaring pass. That in turn spiraled into representing the status of whether NIR was desugared, and separating primitives, etc, which lead to a lot of additional complexity. The idea was to have a Sugared and Plan NIR, and further within them have symbols that have latent types---if they require interpolation, then those types would be deferred until after template expansion. The obvious problem there is that now: 1. NIR has the complexity of various types; and 2. Types were tightly coupled with NIR and how it was defined in terms of XML destructuring. The first attempt at this didn't go well: it was clear that the symbol types would make mapping from Sugared to Plain NIR very complicated. Further, since NIR had any number of symbols per Sugared NIR token, interpolation was a pain in the ass. So that lead to the idea of interpolating at the _attribute_ level. That seemed to be going well at first, until I realized that the token stream of the attribute parser does not match that of the element parser, and so that general solution fell apart. It wouldn't have been great anyway, since then interpolation was _also_ coupled to the destructuring of the document. Another goal of mine has been to decouple TAME from XML. Not because I want to move away from XML (if I did, I'd want S-expressions, not YAML, but I don't think the team would go for that). This decoupling would allow the use of a subset of the syntax of TAME in other places, like CSVMs and YAML test cases, for example, if appropriate. This approach makes sense: the grammar of TAME isn't XML, it's _embedded within_ XML. The XML layer has to be stripped to expose it. And so that's what NIR is now evolving into---the stripped, bare repsentation of TAME's language. That also has other benefits too down the line, like a REPL where you can use any number of syntaxes. I intend for NIR to be stack-based, which I'd find to be intuitive for manipulating and querying packages, but it could have any number of grammars, including Prolog-like for expressing Horn clauses and querying with a Prolog/Datalog-like syntax. But that's for the future... The next issue is that of attribute types. If we have a better language for NIR, then the types can be associated with the NIR tokens, rather than having to associate each symbol with raw type data, which doesn't make a whole lot of sense. That also allows for AIR to better infer types and determine what they ought to be, and further makes checking types after template application natural, since it's not part of NIR at all. It also means the template system can naturally apply to any sources. Now, if we take that final step further, and make attributes streaming instead of aggregating, we're back to a streaming pipeline where all aggregation takes place on the ASG (which also resolves the memcpy concerns worked around previously, also further simplifying `ele_parse` again, though it sucks that I wasted that time). And, without the symbol types getting in the way, since now NIR has types more fundamentally associated with tokens, we're able to interpolate on a token stream using simple SPairs, like I always hoped (and reverted back to in the previous commit). Oh, and what about that desugaring pass? There's the issue of how to represent such a thing in the type system---ideally we'd know statically that desugaring always lowers into a more primitive NIR that reduces the mapping that needs to be done to AIR. But that adds complexity, as mentioned above. The alternative is to just use the templat system, as I originally wanted to, and resolve shortcomings by augmenting the template system to be able to handle it. That not only keeps NIR and the compiler much simpler, but exposes more powerful tools to developers via TAME's metalanguage, if such a thing is appropriate. Anyway, this creates a system that's far more intuitive, and far simpler. It does kick the can to AIR, but that's okay, since it's also better positioned to deal with it. Everything I wrote above is a thought dump and has not been proof-read, so good luck! And lets hope this finally works out...it's actually feeling good this time. The journey was necessary to discover and justify what came out of it---everything I'm stripping away was like a cocoon, and within it is a more beautiful and more elegant TAME. DEV-13346	2022-12-01 11:09:25 -05:00
Mike Gerwitz	76beb117f9	Revert "tamer: nir::desugar::interp: Include attribute name in derived param name" Also: Revert "tamer: nir::desugar::interp: Token {SPair=>Attr}" This reverts commit 7fd60d6cdafaedc19642a3f10dfddfa7c7ae8f53. This reverts commit 12a008c66414c3d628097e503a98c80687e3c088. This has been quite a tortured experience, trying to figure out how to best fit desugaring into the existing system. The truth is that it ultimately failed because I was not sticking with my intuition---I was trying to get things out quickly by compromising on the design, and in the end, it saved me nothing. But I wouldn't say that it was a waste of time---the path was a dead end, but it was full of experiences. More to come, but interpolation is back to operating on NIR directly, and I chose to treat it as a source-to-source mapping and not represent it using the type system---interpolation can be an optional feature when writing TAME frontends (the principal one being the XML-based one), and it's up to later checks to assert that identifiers match a given domain. I am disappointed by the additional context we lose here, but that can always be introduced in the future differently, e.g. by maintaining a dictionary of additional context for spans that can be later referenced for diagnostic purposes. But let's worry about that in the future; it doesn't make sense to further complicate IRs for such a thing. DEV-13346	2022-12-01 11:09:25 -05:00
Mike Gerwitz	9da6cb439f	tamer: nir::desugar::interp: Include attribute name in derived param name This is simply to aid with debugging. See commit for information on why I didn't include the attribute name in the param name itself. DEV-13156	2022-12-01 11:09:25 -05:00
Mike Gerwitz	d0a728c27f	tamer: nir::desugar::interp: Token {SPair=>Attr} This changes the input token from a more generic `SPair` to `Attr`, which reflects the new target integration point in the `attr_parse!` parser-generator. This is a compromise---I'd like for it to remain generic and have stitching deal with all integration concerns, but I have spent far too much time on this and need to keep moving. With that said, we do benefit from knowing where this must fit in---it's easier to reason about in a more concrete way, and we can take advantage of the extra information rather than being burdened by its presence and ignoring it. We need to be able to convert back into `XirfToken` (see a recent commit that discusses that) for `StitchExpansion`, which is why `Attr` is here. And since it is, we can use it to explain to the user not just the interpolation specification used to derive params, but also the attribute it is associated with. This is what TAME (in XSLT) does today, IIRC (I wrote it, I just forget exactly). It also means that I can name the parameters after the attribute. So, that'll be in a following commit; I was disappointed when my prior approach with `SPair` didn't give me enough information to be able to do that, since I think it's important that the system be as descriptive as possible in how it derives information. Of course, traces would reveal how the parser came about the derivation, but that requires recompilation in a special tracing mode. DEV-13156	2022-12-01 11:09:25 -05:00
Mike Gerwitz	55c55cabd3	tamer: parse::util::expand: Move expansion into own module This has evolved into a more robust and independent concept, but it is still a utility in the sense that it's utilizing existing parsing framework features and making them more convenient. DEV-13156	2022-11-15 13:28:54 -05:00
Mike Gerwitz	4117efc50c	tamer: nir::desugar::interp: Generalize without NIR symbol types This is a shift in approach. My original idea was to try to keep NIR parsing the way it was, since it's already hard enough to reason about with the `ele_parse!` parser-generator macro mess. The idea was to produce an IR that would explicitly be denoted as "maybe sugared", and have a desugaring operation as part of the lowering pipeline that would perform interpolation and lower the symbol into a plain version. The problem with that is: 1. The use of the type was going to introduce a lot of mapping for all the NIR token variants there are going to be; and 2. _The types weren't even utilized for interpolation._ Instead, if we interpolated _as attributes are encountered_ while parsing NIR, then we'd be able to expand directly into that NIR token stream and handle _all_ symbols in a generic way, without any mapping beyond the definition of NIR's grammar using `ele_parse!`. This is a step in that direction---it removes `NirSymbolTy` and introduces a generic abstraction for the concept of expansion, which will be utilized soon by the attribute parser to allow replacing `TryFrom` with something akin to `ParseFrom`, or something like that, which is able to produce a token stream before finally yielding the value of the attribute (which will be either the original symbol or the replacement metavariable, in the case of interpolation). (Note that interpolation isn't yet finished---errors still need to be implemented. But I want a working vertical slice first.) DEV-13156	2022-11-10 12:33:30 -05:00
Mike Gerwitz	5c5041f90e	tamer: nir::desugar::interp: Proper span offsets The spans were previously not being calculated relative to the offset of the original symbol span. Tests were passing because all of those spans began at offset 0. DEV-13156	2022-11-08 00:55:45 -05:00
Mike Gerwitz	6b9979da9a	tamer: nir::desugar::interp: Valid parses This completes the valid parses, though some more refactoring will be done. Next up is error handling and recovery. DEV-13156	2022-11-07 23:59:47 -05:00
Mike Gerwitz	4a7fe887d5	tamer: nir::desugar: Initial interpolation desugaring This demonstrates how desugaring of interpolated strings will work, testing one of the happy paths. The remaining work to be done is largely refactoring; handling some other cases; and errors. Each of those items are marked with `todo!`s. I'm pleased with how this is turning out, and I'm excited to see diagnostic reporting within the specification string using the derived spans once I get a bit further along; this robust system is going to be much more helpful to developers than the existing system in XSLT. This also eliminates the ~50% performance degredation mentioned in a recent commit by eliminating the SugaredNirSymbol enum and replacing it with a newtype; this is a much better approach, though it doesn't change that I do need to eventually address the excessive `memcpy`s on hot code paths. DEV-13156	2022-11-07 14:15:16 -05:00
Mike Gerwitz	9922910d09	tamer: nir::NirSymbolTy (Display): Add impl Add initial descriptions and consolodate some of the types. There'll be more to come; this is just to get `Display` derives working for types that'll be using it. I'd like to see where this description manifests itself before I decide how user-friendly I'd like it to be. DEV-13156	2022-11-01 16:23:51 -04:00
Mike Gerwitz	5e2d8f13a7	tamer: nir (SugaredNir): Mirror PlainNir This mirror is only a `Todo` variant at the moment, but my hope had been to try to creatively nest or use generics to simplify the conversaion between the two flavors without a lot of boilerplate. But it doesn't seem like I'm going to be successful, and may have to resort to macros to remove boilerplate. But I need to stop fighting with myself and move on. Though I would still like to keep the types purely compile-time via const generics if possible, since they're not needed in memory (or disk) until we get to templates; they're otherwise static relative to a NIR token variant. DEV-13209	2022-11-01 15:22:42 -04:00
Mike Gerwitz	7f71f3f09f	tamer: nir: Detect interpolated values This simply detects whether a value will need to be further parsed for interpolation; it does not yet perform the parsing itself, which will happen during desugaring. This introduces a performance regression, for an interesting reason. I found that introducing a single new variant to `SugaredNir` (with a `(SymbolId, Span)` pair), was causing the width of the `NirParseState` type to increase just enough to cause Rust to be unable to optimize away a significant number of memcpys related to `Parser` moves, and consequently reducing performance by nearly 50% for `tamec`. Yikes. I suspected this would be a problem, and indeed have tried in all other cases to avoid aggregation until the ASG---the problem is that I had wanted to aggregate attributes for NIR so that the IR could actually make some progress toward simplifying the stream (and therefore working with the data), and be able to validate against a grammar defined in a single place. The problem is that the `NirParseState` type contains a sum type for every attribute parser, and is therefore as wide as the largest one. That is what Rust is having trouble optimizing memcpy away for. Indeed, reducing the number of attributes improves the situation drastically. However, it doesn't make it go away entirely. If you look at a callgrind profile for `tameld` (or a dissassembly), you'll notice that I put quite a bit of effort into ensuring that the hot code path for the lowering pipeline contains _no_ memcpys for the parsers. But that is not the case with `tamec`---I had to move on. But I do still have the same escape hatch that I introduced for `tameld`, which is the mutable `Context`. It seems that may be the solution there too, but I want to get a bit further along first to see how these data end up propagating before I go through that somewhat significant effort. DEV-13156	2022-11-01 15:15:40 -04:00
Mike Gerwitz	d195eedacb	tamer: nir: Sugared and plain flavors This introduces the concept of sugared NIR and provides the boilerplate for a desugaring pass. The earlier commits dealing with cleaning up the lowering pipeline were to support this work, in particular to ensure that reporting and recovery properly applied to this lowering operation without adding a ton more boilerplate. DEV-13158	2022-10-26 14:19:19 -04:00
Brandon Ellis	00f46b0032	[DEV-12990] Add gt, gte, lt, lte operators to if/unless This includes updating Tamer's parser to account for the new operator possibilities.	2022-09-22 11:38:06 -04:00
Mike Gerwitz	9966b82b9d	tamer: nir::parse: Grammar summary docs This is intended to provide just enough information to help elucidate how the system works and why. DEV-7145	2022-09-19 09:26:38 -04:00
Mike Gerwitz	419b24f251	tamer: Introduce NIR (accepting only) This introduces NIR, but only as an accepting grammar; it doesn't yet emit the NIR IR, beyond TODOs. This modifies `tamec` to, while copying XIR, also attempt to lower NIR to produce parser errors, if any. It does not yet fail compilation, as I just want to be cautious and observe that everything's working properly for a little while as people use it, before I potentially break builds. This is the culmination of months of supporting effort. The NIR grammar is derived from our existing TAME sources internally, which I use for now as a test case until I introduce test cases directly into TAMER later on (I'd do it now, if I hadn't spent so much time on this; I'll start introducing tests as I begin emitting NIR tokens). This is capable of fully parsing our largest system with >900 packages, as well as `core`. `tamec`'s lowering is a mess; that'll be cleaned up in future commits. The same can be said about `tameld`. NIR's grammar has some initial documentation, but this will improve over time as well. The generated docs still need some improvement, too, especially with generated identifiers; I just want to get this out here for testing. DEV-7145	2022-08-29 15:52:04 -04:00

1 2