employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	bba181f573	tamer: xir::attr::Attr: Introduce AttrSpan This replaces a tuple with a tuple struct that allows for calculating more complete span information, such as the span encompassing the entire attribute and the value span including the surrounding quotes. This includes logic that ought to be abstracted into `Span` itself, and it's not as formal as I'd like it to be (e.g. not ensuring context), but this is a good starting point. Note that parsers call `Token::span`, which in turn calculates the attribute span, each time an attribute is encountered during lowering. But Rust does a good job at optimizing away unnecessary operations, so this didn't have an observable impact on time. DEV-7145	2022-06-06 11:31:28 -04:00
Mike Gerwitz	2b8e7e6031	tamer: xir::st::qname: New module This moves and deduplicates the static `QName`s into a common area. DEV-7145	2022-06-06 11:31:27 -04:00
Mike Gerwitz	3da82b351e	tamer: xir::flat::{State=>XirToXirf}: Rename Like the previous two commits, this states the intent of this parser, which results in more clear pipeline composition. DEV-7145	2022-06-02 13:48:54 -04:00
Mike Gerwitz	8d92667388	tamer: Integrate xir::reader as a parser in the lowering pipeline This allows `XmlXirReader` to be used in a `Lower` operation, just as everything else, bringing me one step closer to a pipeline that can be concisely represented; this is finally beginning to unify in a clear way, though it is still a bit of a mess. This causes `XmlXirReader` to _act_ like a `parse::Parser` in that it yields a `ParsedResult`, but it does not use `parse::Parser` itself; that was the _original_ plan: convert it into a `ParseState` where `XmlXirReader` became a context, and force `Parser` to yield by feeding it a stream of tokens with `repeat`, but that ended up performing poorly relative to this change. I did some investigation, which I might write about in the future, but for now, this solution works just fine. DEV-7145	2022-06-02 10:30:44 -04:00
Mike Gerwitz	b084e23497	tamer: Refactor asg_builder into obj::xmlo::lower and asg::air This finally uses `parse` all the way up to aggregation into the ASG, as can be seen by the mess in `poc`. This will be further simplified---I just need to get this committed so that I can mentally get it off my plate. I've been separating this commit into smaller commits, but there's a point where it's just not worth the effort anymore. I don't like making large changes such as this one. There is still work to do here. First, it's worth re-mentioning that `poc` means "proof-of-concept", and represents things that still need a proper home/abstraction. Secondly, `poc` is retrieving the context of two parsers---`LowerContext` and `Asg`. The latter is desirable, since it's the final aggregation point, but the former needs to be eliminated; in particular, packages need to be worked into the ASG so that `found` can be removed. Recursively loading `xmlo` files still happens in `poc`, but the compiler will need this as well. Once packages are on the ASG, along with their state, that responsibility can be generalized as well. That will then simplify lowering even further, to the point where hopefully everything has the same shape (once final aggregation has an abstraction), after which we can then create a final abstraction to concisely stitch everything together. Right now, Rust isn't able to infer `S` for `Lower<S, LS>`, which is unfortunate, but we'll be able to help it along with a more explicit abstraction. DEV-11864	2022-05-27 13:51:29 -04:00
Mike Gerwitz	eafb3b2a1b	tamer: Add Display impl for each ParseState for generic ParseErrors This is intended to describe, to the user, the state that the parser is in. This will be used to convey additional information for general parser errors, but it should also probably be integrated into parsers' individual errors as well when appropriate. This is something I expected to add at some point, but I wanted to add them because, when dealing with lowering errors, it can be difficult to tell what parser the error originated from. DEV-11864	2022-05-25 15:26:02 -04:00
Mike Gerwitz	1ad2fb1dc8	Copyright year update 2022 RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no "Group"), but I'm not sure if the legal name has been changed yet or not, so I'll wait on that.	2022-05-03 14:14:29 -04:00
Mike Gerwitz	a197267a2d	tamer: xir::flat: Remove closing tag name from label This is now visible in the diagnostic output. Example at this point in time, on an xmlo file for one of our smallest systems: error: expected closing tag for `preproc:symtable` --> /home/.../foo.xmlo:16:4 \| \| <preproc:symtable xmlns:map="http://www.w3.org/2005/xpath-functions/map"> \| ----------------- = note: element `preproc:symtable` is opened here --> /home/.../foo.xmlo:11326:4 \| \| </preproc:wrong> \| ^^^^^^^^^^^^^^^^ = error: expected `</preproc:symtable>` DEV-12151	2022-04-28 15:47:34 -04:00
Mike Gerwitz	eaa8133d21	tamer: diagnose: Introduction of diagnostic system This is a working concept that will continue to evolve. I wanted to start with some basic output before getting too carried away, since there's a lot of potential here. This is heavily influenced by Rust's helpful diagnostic messages, but will take some time to realize a lot of the things that Rust does. The next step will be to resolve line and column numbers, and then possibly include snippets and underline spans, placing the labels alongside them. I need to balance this work with everything else I have going on. This is a large commit, but it converts the existing Error Display impls into Diagnostic. This separation is a bit verbose, so I'll see how this ends up evolving. Diagnostics are tied to Error at the moment, but I imagine in the future that any object would be able to describe itself, error or not, which would be useful in the future both for the Summary Page and for query functionality, to help developers understand the systems they are writing using TAME. Output is integrated into tameld only in this commit; I'll add tamec next. Examples of what this outputs are available in the test cases in this commit. DEV-10935	2022-04-13 15:22:46 -04:00
Mike Gerwitz	c49510646b	tamer: parse::Parser (last_span): Replace Option with UNKNOWN_SPAN There's no use in complicating the error handling here when we'd just default to `UNKNOWN_SPAN` anyway when trying to render it. `UNKNOWN_SPAN` didn't exist at the time of writing. DEV-10935	2022-04-12 09:59:00 -04:00
Mike Gerwitz	cfc7f45bc4	tamer: Remove wip-xmlo-xir-reader This entirely removes the old XmloReader that has since been replaced with a XIR-based reader. I had been holding off on this because the new reader is slower, pending performance optimizations (which I'll do a little later on), however the performance loss is of no practical consideration and only affects the linker, which is still fast. Therefore, it's better to get this old code out of the way to simplify refactoring going forward. In particular, I'm working on the diagnostic system. This is a little sad, in a way---this is some of my first Rust code that I'm deleting. DEV-10935	2022-04-11 16:11:49 -04:00
Mike Gerwitz	a1a4ad3e8e	tamer: Introduce context into XirReader tamec and tameld will now both introduce a `Context` to XIR, which will use it to create spans. Here's an example of an error, now that it's all working well together: $ target/release/tameld --emit xmle -o /dev/null path/to/package.xmlo error: invalid preproc:sym/@dim `9` at [/../path/to/package.xmlo offset 1175451-1175452] A future task will make this human-readable by producing line and column numbers, and perhaps even a snippet (if not now, then eventually). It's exciting to see this coming together finally. DEV-10934	2022-04-08 16:16:23 -04:00
Mike Gerwitz	68223cb7d3	tamer: xir::reader: Additional quick-xml error spans There's a bit to unpack here. Some of the spans originate from quick-xml's error handling, but in coming up with test cases to try to trigger errors, I found that quick-xml is far too permissive in what it accepts, and oughtright dangerous in some situations. I feel like the writing is on the wall for quick-xml, but I'll probably wait until replacing `xmlo` with a more efficient format before deciding whether to use a different library or implement parsing ourselves. There's a lot of factors to consider, and a library would have to not only be correct and performant, but provide useful information for span generation. But for now, I have other more important things to work on, like a functioning compiler. So while quick-xml is around, I'll just have to do the best I can to provide a correct parser with useful errors. DEV-10934	2022-04-08 14:54:49 -04:00
Mike Gerwitz	ab181670b5	tamer: xir::reader: Initial introduction of spans This is a large change, and was a bit of a tedious one, given the comprehensive tests. This introduces proper offsets and lengths for spans, with the exception of some quick-xml errors that still need proper mapping. Further, this still uses `UNKNOWN_CONTEXT`, which will be resolved shortly. This also introduces `SpanlessError`, which `Error` explicitly _does not_ implement `From<SpanlessError>` for---this forces the caller to provide a span before the error is compatable with the return value, ensuring that spans will actually be available rather than forgotten for errors. This is important, given that errors are generally less tested than the happy path, and errors are when users need us the most (so, need span information). Further, I had to use pointer arithmetic in order to calculate many of the spans, because quick-xml does not provide enough information. There's no safety considerations here, and the comprehensive unit test will ensure correct behavior if the implementation changes in the future. I would like to introduce typed spans at some point---I made some opinionated choices when it comes to what the spans ought to represent. Specifically, whether to include the `<` or `>` with the open span (depends), whether to include quotes with attribute values (no), and some other details highlighted in the test cases. If we provide typed spans, then we could, knowing the type of span, calculate other spans on request, e.g. to include or omit quotes for attributes. Different such spans may be useful in different situations when presenting information to the user. This also highlights gaps in the tokens emitted by XIR, such as whitespace between attributes, the `=` between name and value, and so on. These are important when it comes to code formatting, so that we can reliably reconstruct the XML tree, but it's not important right now. I anticipate future changes would allow the XIR reader to be configured (perhaps via generics, like a strategy-type pattern) to optionally omit these tokens if desired. Anyway, more to come. DEV-10934	2022-04-08 13:59:37 -04:00
Mike Gerwitz	99aacaf7ca	tamer: tamec: Replace copy with XIR parsing/writing When wip-frontends is on, this will parse the input file using XIR and then immediately output it again. This makes the necessary changes to be able to read every source file we have in our largest project, such that the output is identical after having been formatted with `xmllint --format -` (there are differences because e.g. whitespace between attributes is not yet maintained). This is performant too, with times remaining essentially identical despite the additional work. DEV-10413	2022-04-07 12:13:49 -04:00
Mike Gerwitz	2e386f1baf	tamer: xir::reader::XmlXirReader::refill_buf: Clear read buffer This was done in the old reader many months ago, but I somehow forgot to do it here (or forgot to). The new reader was using substantially more memory. Here's how this change affects the memory profile for one of our systems (output from `ms_print`): Before: MB 79.75^ # \| # \| # @ \| @@@@ # @ \| @@@ # @@ \| @@@ @@@#@ @@@@@ \| @@@ @@ #@@@@@@@@@@ \| @@@@@@ @@@@ #@@@@@@@@@@ \| @@ @@ @@@ @@ @ @@ #@@@@@@@@@@ \| @@ @@ @@@ @@@@@ @@ #@@@@@@@@@@ \| @@@@@ @@@ @@@@@@ @@ #@@@@@@@@@@ \| @@@@@ @@@ @@@@@@ @@ #@@@@@@@@@@ \| @@ @@@@@ @@@ @@@@@@ @@ #@@@@@@@@@@ \| @ @@ @@ @ @@@@@@ @@@ @@@@@@ @@ #@@@@@@@@@@ \| @ @ @@@ @@ @@@ @@@@@@ @@@ @@@@@@ @@ #@@@@@@@@@@ \| @ @@@@ @@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@ @@@ @@@@@@ @@ #@@@@@@@@@@ \| @@@ @@@@@@ @@@@@@@@@ @@@@@ @@@@@ @@ @@@@@@@ @@@ @@@@@@ @@ #@@@@@@@@@@ \| @@@ @ @@@@ @@@@@@@@@ @@@@@ @@@@@ @@ @@@@@@@ @@@ @@@@@@ @@ #@@@@@@@@@@ \| @@@ @@@ @@@@ @@@@@@@@@ @@@@@ @@@@@ @@ @@@@@@@ @@@ @@@@@@ @@ #@@@@@@@@@@ \| @@@ @@@ @@@@ @@@@@@@@@ @@@@@ @@@@@ @@ @@@@@@@ @@@ @@@@@@ @@ #@@@@@@@@@@ 0 +----------------------------------------------------------------------->Gi 0 15.20 After: MB 63.25^ # \| # \| @@@@@@@@@#@ \| @@@@@@ @@#@ \| @@@@@@ @@#@ \| @@@@@@ @@#@ \| @@@@@@ @@#@ \| @@@@@@@@@@@@ @@#@ \| @@@@@@@@@ @@ @@@@@@ @@#@ \| @@@@@@@@ @@@ @@@ @@ @@@@@@ @@#@ \| @@@@@ @ @@@ @@@ @@ @@@@@@ @@#@ \| @@@@@ @ @@@ @@@ @@ @@@@@@ @@#@ \| @@@@@@ @ @@@ @@@ @@ @@@@@@ @@#@ \| @@@@@@ @ @@@ @@@ @@ @@@@@@ @@#@ \| @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ @@@ @@@ @@ @@@@@@ @@#@ \| @@@@@@@@@@@@@ @@@@@@@@ @@@@@@@@@@@@@@@ @ @@@ @@@ @@ @@@@@@ @@#@ \| @@@@@@@@@@@@@@@ @@@@@@@@ @@@@@@@@@@@@@@@ @ @@@ @@@ @@ @@@@@@ @@#@ \| @@@@@@@@@@@@@@@@@ @@@@@@@@ @@@@@@@@@@@@@@@ @ @@@ @@@ @@ @@@@@@ @@#@ \| @@@@@@@@@@@@@@@@@@@@ @@@@@@@@ @@@@@@@@@@@@@@@ @ @@@ @@@ @@ @@@@@@ @@#@ \| @@@@@@@@@@@@@@@@@@@@ @@@@@@@@ @@@@@@@@@@@@@@@ @ @@@ @@@ @@ @@@@@@ @@#@ 0 +----------------------------------------------------------------------->Gi 0 15.20 The bottom graph is virtually identical to the memory profile of the old reader, just with the exception that it's interning a bit more data than before, because we're reading more comprehensively. That's (potentially) the subject of future changes. DEV-12038	2022-04-06 11:50:07 -04:00
Mike Gerwitz	e77bdaf19a	tamer: parse: Introduce mutable Context This resolves the performance issues caused by Rust's failure to elide the ElementStack (ArrayVec) memcpys on move. Since XIRF is invoked tens of millions of times in some cases for larger systems, prior to this change, failure to optimize away moves for XIRF resulted in tens of millions of memcpys. This resulted in linking of one program going from 1s -> ~15s. This change reduces it to ~2.5s with the wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the subject of future changes). In particular, this change introduces a new mutable reference to `ParseState::parse_token`, which is a reference to a `Context` owned by the caller (e.g. `Parser`). In the case of XIRF, this means that `Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of `flat::State`; this allows the latter to remain pure and benefit from Rust's move optimizations, without sacrificing the otherwise-pure implementation. ParseStates that do not need a mutable context can use `NoContext` and remain pure. DEV-12024	2022-04-05 15:50:53 -04:00
Mike Gerwitz	4cb478a42d	tamer: parser::ParseState::delegate_lookahead: New concept This introduces a new method similar to the previous `delegate`, but with another closure that allows for handling lookahead tokens from the child parser. Admittedly, this isn't exactly what I was going for---a list of arguments isn't exactly self-documenting, especially with the brevity when the arguments line up---but this was easy to do and so I'll run with this for now. This also modified `delegate` to accept a context, even though it wasn't necessary, both for consistency with its lookup counterpart and for brevity with the `into` argument (allowing, in our case, to just pass the name of the variant, rather than a closure). I'm not going to handle the actual starting and accepting state stitching abstraction for now; I'd like to observe future boilerplate more before I consider the best way to handle it, though I do have some ideas. DEV-10863	2022-03-29 14:46:43 -04:00
Mike Gerwitz	f402e51d04	tamer: parse: More flexible Transition API This does some cleanup and adds `parse::Object` for use in disambiguating `From` for `ParseStatus`, allowing the `Transition` API to be much more flexible in the data it accepts and automatically converts. This allows us to concisely provide raw output data to be wrapped, or provide `ParseStatus` directly when more convenient. There aren't yet examples in the docs; I'll do so once I make sure this API is actually utilized as intended. DEV-10863	2022-03-25 16:45:32 -04:00
Mike Gerwitz	c0fa89222e	tamer: obj::xmlo::ir::Dim: New enum This replaces u8 and will be used for the new XmloReader. Previously I wasn't sure what direction TAMER was going to go in with regards to dimensionality, but I do not expect that higher dimensions will be supported, and if they are, they'd very likely compile down to lower ones and create an illusion of higher-dimensionality. Whatever the future holds, it's not used today, and I'd rather these types be correct. ASG needs changing too, but one step at a time. DEV-10863	2022-03-25 14:28:18 -04:00
Mike Gerwitz	279ddc79d7	tamer: parse::TransitionResult: Alias=>newtype This converts the tuple type alias into a newtype, so that we may provide our own implementations. This differs from a previous approach that I took, which involved making this type `Result<(S, T), (S, E)>` so that the return values composed well with other functions. But the reality is that this is used only by other `ParseState`s and `Parser`, so it's unnecessary. However, this is also an attempt to utilize the new Try and FromResidual traits; note how the Try associated types match precisely what I was trying to do before, though they're used as intermediate types. I'll see how this evolves. DEV-10863	2022-03-25 12:28:50 -04:00
Mike Gerwitz	2e98a69d15	Revert "tamer: parse::TransitionResult: Move common Transition into Result" This reverts commit `bf5da75096`.	2022-03-25 09:17:25 -04:00
Mike Gerwitz	bf5da75096	tamer: parse::TransitionResult: Move common Transition into Result This allows the Results to compose and, importantly, is compatible with `?` without having to put in any extra effort. This makes puts the caller in an awkward spot, so I introduced a utility function `result_tup0_invert` for now; we'll see if that stays or evolves differently. DEV-10863	2022-03-24 23:48:30 -04:00
Mike Gerwitz	ad8616aaa1	tamer: xir::attr::Attr: Convert to tuple struct with public fields This makes more sense for pattern matching. Encapsulation of these fields is not necessary, given that it's passed around as an owned value and its `new` method constructs it verbatim; the individual fields are self-validating. DEV-10863	2022-03-23 16:41:28 -04:00
Mike Gerwitz	b4a7591357	tamer: obj::xmlo::reader: Begin conversion to ParseState This begins to transition XmloReader into a ParseState. Unlike previous changes where ParseStates were composed into a single ParseState, this is instead a lowering operation that will take the output of one Parser and provide it to another. The mess in ld::poc (...which still needs to be refactored and removed) shows the concept, which will be abstracted away. This won't actually get to the ASG in order to test that that this works with the wip-xmlo-xir-reader flag on (development hasn't gotten that far yet), but since it type-checks, it should conceptually work. Wiring lowering operations together is something that I've been dreading for months, but my approach of only abstracting after-the-fact has helped to guide a sane approach for this. For some definition of "sane". It's also worth noting that AsgBuilder will too become a ParseState implemented as another lowering operation, so: XIR -> XIRF -> XMLO -> ASG These steps will all be streaming, with iteration happening only at the topmost level. For this reason, it's important that ASG not be responsible for doing that pull, and further we should propagate Parsed::Incomplete rather than filtering it out and looping an indeterminate number of times outside of the toplevel. One final note: the choice of 64 for the maximum depth is entirely arbitrary and should be more than generous; it'll be finalized at some point in the future once I actually evaluate what maximum depth is reasonable based on how the system is used, with some added growing room. DEV-10863	2022-03-22 14:06:52 -04:00
Mike Gerwitz	ceb00c4df5	tamer: xir: Complete parse type migration A previous commit moved the parser. This updates the types so that they can actually be utilized in that context. DEV-10863	2022-03-21 15:50:43 -04:00
Mike Gerwitz	14638a612f	tamer: {xir::=>}parse: Move parser out of XIR The parsing framework originally created for XIR is now more general and useful to other things. We'll see how this evolves. This needs additional documentation, but I'd like to see how it changes as I implement XmloReader and then some of the source readers first. DEV-10863	2022-03-18 16:24:53 -04:00
Mike Gerwitz	0360226caa	tamer: xir::parse: Generalize input token type This adds a `Token` type to `ParseState`. Everything uses `xir::Token` currently, but `XmloReader` will use `xir::flat::Object`. Now that this has been generalized beyond XIR, the parser ought to be hoisted up a level. DEV-10863	2022-03-18 15:26:05 -04:00
Mike Gerwitz	150b3b9aa4	tamer: xir::flat: Improve parser validation This does a couple of things: it ensures that documents one and only one root note, and it properly handles dead transitions once parsing is complete (allowing it to be composed). This should make XIRF feature-complete for the time being. It does rely on the assumption that the reader is stripping out any trailing whitespace, so I guess we'll see if that's true as we proceed. DEV-10863	2022-03-17 23:22:38 -04:00
Mike Gerwitz	f04d845452	tamer: xir::flat::parse_token: Remove now-unapplicable comment Forgot to delete this in a previous commit. DEV-10863	2022-03-17 21:37:05 -04:00
Mike Gerwitz	aba89f809d	tamer: xir::parse: UnexpectedEof Span at final offset I'm not rendering errors yet in practice, so this wouldn't have been noticed, but we want error messages to reference the final byte in a file on EOF, not the offset of the last-encountered token, which would be confusing. This doesn't _directly_ pertain to what I'm working on; I just happened to notice it. DEV-10863	2022-03-17 21:33:05 -04:00
Mike Gerwitz	e18eb2a4ac	tamer: xir::flat::State::parse_node: Use TransitionResult This was simply missed in a previous commit. DEV-10863	2022-03-17 16:30:35 -04:00
Mike Gerwitz	6b8f0663ea	tamer: xir::{tree::=>}attr: Move With the introduction of XIRF, attribute parsing is no longer a XIRT thing. DEV-10863	2022-03-17 16:10:56 -04:00
Mike Gerwitz	7b6d68af85	tamer: xir::parse::Transition: Generalize flat::Transition XIRF introduced the concept of `Transition` to help document code and provide mental synchronization points that make it easier to reason about the system. I decided to hoist this into XIR's parser itself, and have `parse_token` accept an owned state and require a new state to be returned, utilizing `Transition`. Together with the convenience methods introduced on `Transition` itself, this produces much clearer code, as is evidenced by tree::Stack (XIRT's parser). Passing an owned state is something that I had wanted to do originally, but I thought it'd lead to more concise code to use a mutable reference. Unfortunately, that concision lead to code that was much more difficult than necessary to understand, and ended up having a net negative benefit by leading to some more boilerplate for the nested types (granted, that could have been alleviated in other ways). This also opens up the possibility to do something that I wasn't able to before, which was continue to abstract away parser composition by stitching their state machines together. I don't know if this'll be done immediately, but because the actual parsing operations are now able to compose functionally without mutability getting the way, the previous state coupling issues with the parent parser go away. DEV-10863	2022-03-17 16:02:05 -04:00
Mike Gerwitz	899fa79e59	tamer: xir::flat: Initial XIRF implementation This introduces XIR Flat (XIRF), which is conceptually between XIR and XIRT. This provides a more appropriate level of abstraction for further lowering operations to parse against, and removes the need for other parsers to perform their own validations (inappropriately) to ensure well-formed XML. There is still some cleanup worth doing, including moving some of the parsing responsibility up a level back into the XIR parser. DEV-10863	2022-03-17 13:08:16 -04:00
Mike Gerwitz	74ddc77adb	tamer: xir::escape::CachingEscaper: allow(dead_code) for feature-flagged code For now, until this feature flag is removed, so that we do not see warnings when the flag is off.	2022-03-10 10:03:07 -05:00
Mike Gerwitz	5af698d15c	tamer: xir::{tree::=>}parse: Move module It's a bit odd that I've done next to nothing with TAMER for the past week or so, and decided to do this one small thing before I go on break for the holidays, but I felt compelled to do _something_. Besides, this gets me in a better spot for the inevitable mental planning and writing I'll be doing over the holidays. This move was natural, given what this has evolved into---it has nothing to do with the concept of a "tree", and the modules imports emphasized that fact given the level of inappropriate nesting.	2021-12-23 13:17:18 -05:00
Mike Gerwitz	8221e3a011	tamer: xir::tree::Stack: Refactor transitions Now that the parser has been simplified by removing attributes, we can further simplify the state transitions to make it more clear what further refactoring can be done. DEV-11339	2021-12-17 11:40:30 -05:00
Mike Gerwitz	d5a2d43526	tamer: xir::tree::attr::parse::AttrParse{r=>}State Simply correcting a naming inconsistency between the trait and the concrete type. DEV-11339 / DEV-11268	2021-12-17 10:22:29 -05:00
Mike Gerwitz	0cc0bc9d5a	tamer: xir::Token::AttrEnd: Remove More information can be found in the prior commit message, but I'll summarize here. This token was introduced to create a LL(0) parser---no tokens of lookahead. This allowed the underlying TokenStream to be freely passed to the next system that needed it. Since then, Parser and ParseState were introduced, along with ParseStatus::Dead, which introduces the concept of lookahead for a single token---an LL(1) grammar. I had always suspected that this would happen, given the awkwardness of AttrEnd; it was just a matter of time before the right abstraction manifested itself to handle lookahead. DEV-11339	2021-12-17 10:14:31 -05:00
Mike Gerwitz	61f7a12975	tamer: xir::tree: Integrate AttrParserState into Stack Note that AttrParse{r=>}State needs renaming, and Stack will get a better name down the line too. This commit message is accurate, but confusing. This performs the long-awaited task of trying to observe, concretely, how to combine two automata. This has the effect of stitching together the state machines, such that the union of the two is equivalent to the original monolith. The next step will be to abstract this away. There are some important things to note here. First, this introduces a new "dead" state concept, where here a dead state is defined as an _accepting_ state that has no state transitions for the given input token. This is more strict than a dead state as defined in, for example, the Dragon Book, where backtracking may occur. The reason I chose for a Dead state to be accepting is simple: it represents a lookahead situation. It says, "I don't know what this token is, but I've done my job, so it may be useful in a parent context". The "I've done my job" part is only applicable in an accepting state. If the parser is _not_ in an accepting state, then an unknown token is simply an error; we should _not_ try to backtrack or anything of the sort, because we want only a single token of lookahead. The reason this was done is because it's otherwise difficult to compose the two parsers without requiring that AttrEnd exist in every XIR stream; this has always been an awkward delimiter that was introduced to make the parser LL(0), but I tried to compromise by saying that it was optional. Of course, I knew that decision caused awkward inconsistencies, I had just hoped that those inconsistencies wouldn't manifest in practical issues. Well, now it did, and the benefits of AttrEnd that we had in the previous construction do not exist in this one. Consequently, it makes more sense to simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future commit will remove it entirely. All of this information will be documented, but I want to get further in the implementation first to make sure I don't change course again and therefore waste my time on docs. DEV-11268	2021-12-16 09:44:02 -05:00
Mike Gerwitz	0c7f04e092	tamer: xir::tree: Simplify Stack and remove isolated attr remnants These were missed from a couple of commits ago, after I recalled that I could now simplify the Stack variants; they were made more complicated due to isolated attribute parsing. These progressive refactorings do a good job illustrating why composing parsers is better than a monolith---the complexity of the parsers is significantly reduced, and the number of combinations of states are also greatly reduced, which allows us to reason about them in isolation. DEV-11268	2021-12-14 12:49:06 -05:00
Mike Gerwitz	0061a13d63	tree: xir::tree::Object: Remove now-unneeded enum This was added only for isolated attribute parsing. Of course, this does mean that a new union type will be needed when combining the two parsers, depending on the desired resolution, but that'll come at a later time and possibly in a more general way. DEV-11268	2021-12-14 12:44:32 -05:00
Mike Gerwitz	c7f846752d	tamer: xir::tree: Remove now-unused isolated attribute parsing This is handled by the new AttrState, so this is largely just removing now-duplicate code. DEV-11268	2021-12-14 12:42:02 -05:00
Mike Gerwitz	69acba3ec0	tamer: xir::tree: Use parse::Parser for parse All tree module parsing functions now make use of parse::Parser. This module will eventually be hoisted from tree. DEV-11268	2021-12-14 12:36:35 -05:00
Mike Gerwitz	b30d7dc84e	tamer: xir::tree::parser_from: Use parse::Parser This nearly completely integrates the new Parser with xir::tree, but does not yet compose AttrParseState. I also need to determine what to do with `parse()` and, further, make `parser_from` generic as part of mod parse. If we take a moment to reflect on all of the changes, this struggle has been a roundabout way of converting tree's parser into parse::Parser; providing a trait for Stack (as ParseState); beginning parser decomposition; and moving some common logic into Parser. The composition of parsers is the final piece to be realized. This could have been a lot less work if I really understood exactly what I wanted to do up front, but as was mentioned in previous commits, I was really confusing myself trying to maintain API BC in ways that I should not have for XmloReader. More on that will be coming soon as well. DEV-11268	2021-12-13 16:57:04 -05:00
Mike Gerwitz	6e9d139373	tamer: xir::tree::parse::Parser: Remove lifetime This will allow Parser to operate on both owned and &mut values, and is the same approach that Rust's built-in iterators take. This is at first quite surprising, and I often forget that this is a feature, and, as a bonus, an attractive way to avoid lifetimes in struct definitions when generics are used for the type that may become a reference. DEV-11268	2021-12-13 16:51:15 -05:00
Mike Gerwitz	f09900b80c	tamer: xir::tree: Remove isolated AttrList parsing This isn't currently used by anything, and this is collecting, which does not fit well with the streaming model. AttrList was originally written for Element parsing, and the isolated attr parser was written for test cases, before it was fully decided how this system ought to work. Instead, if AttrList is in fact needed, we can either collect (ideally not) or implement Extend for AttrList. (Or create TryExtend.) DEV-11268	2021-12-13 16:20:50 -05:00
Mike Gerwitz	29fdf5428c	tamer: xir::tree: {Parse=>Stack}Error Prepare to adopt parse::ParseError, which will contain StackError. DEV-11268	2021-12-13 15:27:20 -05:00
Mike Gerwitz	faed32af7e	tamer: xir::tree::ParserState: Remove and expose Stack directly This removes the layer of encapsulation that was hiding Stack, which is the actual parser. The new layer of encapsulation is parse::Parser, which will be introduced here soon. Baby steps, so it's clear how this evolves. DEV-11268	2021-12-13 15:02:08 -05:00

1 2

73 Commits (bba181f573de49a08e6312731697e163f09fc20e)