employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	2fafc331a1	tamer: xir::reader: Opening and closing tag whitespace Non-attribute and non-empty start/end tags will have their whitespace as part of the produced span. This sets us up for a following change that will allow for deriving the name span from this span given a QName, which gives us a span that both represents the entire XIR token and allows deriving the element name. An accurate token span is necessary for parsing errors where an element was not expected, while an element name span is more appropriate for issues of grammar and semantic errors that deal not with the fact that an element was encountered, but _what_ element was encountered. DEV-7145	2022-06-22 15:10:49 -04:00
Mike Gerwitz	e5c8a218c3	tamer: xir::reader: Correct empty element whitespace handling This both adds clarifying tests and corrects the case of `<foo/>`, where the offset was erroneously off by one---it saw that there were no attributes and added a byte thinking it'd include `>`, as in `<foo>`. DEV-7145	2022-06-22 10:28:44 -04:00
Mike Gerwitz	adc45d90df	tamer: xir::parse: Attribute parser generator This is the first parser generator for the parsing framework. I've been waiting quite a while to do this because I wanted to be sure that I understood how I intended to write the attribute parsers manually. Now that I'm about to start parsing source XML files, it is necessary to have a parser generator. Typically one thinks of a parser generator as a separate program that generates code for some language, but that is not always the case---that represents a lack of expressiveness in the language itself (e.g. C). Here, I simply use Rust's macro system, which should be a concept familiar to someone coming from a language like Lisp. This also resolves where I stand on parser combinators with respect to this abstraction: they both accomplish the exact same thing (composition of smaller parsers), but this abstraction doesn't do so in the typical functional way. But the end result is the same. The parser generated by this abstraction will be optimized an inlined in the same manner as the hand-written parsers. Since they'll be tightly coupled with an element parser (which too will have a parser generator), I expect that most attribute parsers will simply be inlined; they exist as separate parsers conceptually, for the same reason that you'd use parser combinators. It's worth mentioning that this awkward reliance on dead state for a lookahead token to determine when aggregation is complete rubs me the wrong way, but resolving it would involve reintroducing the XIR AttrEnd that I had previously removed. I'll keep fighting with myself on this, but I want to get a bit further before I determine if it's worth the tradeoff of reintroducing (more complex IR but simplified parsing). DEV-7145	2022-06-21 13:23:02 -04:00
Mike Gerwitz	9598532d8b	tamer: xir::st: Add missing docs for generated QName constants This was missed. It was not possible, using the documentation alone (without looking at the linked source) to tell what the QName actually represented, though you could assume by the name. DEV-7145	2022-06-21 13:23:01 -04:00
Mike Gerwitz	3f23bc5e33	tamer: fmt: New type-based formatting system This is partly an experiment, but is designed to simplify producing English sentences in various contexts. It makes use of a not only unstable, but incomplete, Rust feature---adt_const_params, for a static str const type parameter. Hopefully that ends up being stabalized. This uses types, but it's the same as function composition due to Rust's monomorphization. DEV-7145	2022-06-10 16:28:15 -04:00
Mike Gerwitz	f7752436da	tamer: parse::Parser: Add remaining field docs DEV-7145	2022-06-07 15:23:20 -04:00
Mike Gerwitz	3c227e5a2d	tamer: parse::ParseState: Remove Default trait bound `ParseState` originally required `Default` for use with `mem::take` in `Parser::feed_tok`. This unfortunately cannot last, since more specialized parsers require context during initialization in order to provide useful diagnostic information. (The other option is to require the caller to augment errors with diagnostic information, but that would have to be duplicated by every caller and complicates parser composition; I'd prefer those diagnostic details remain encapsulated.) Replacing `Default` with `Option` is uglier, but it ends up producing the same assembly as `mem::take` did, at least at the time of writing. Because Rust is able to elide unnecessary moves using this implementation, there is no need for `unwrap_unchecked` or other unsafe methods, which is great, since it shows that this parsing methodology is viable entirely in safe Rust. DEV-7145	2022-06-07 15:08:40 -04:00
Mike Gerwitz	f14ffc87c2	tamer: parse::state::ParseState::DeadToken: New associated type Previously, `ParseStatus::Dead` always yielded `ParseState::Token`. However, I'm working on introducing parsers that aggregate (parsing XML attributes into structs), and those parsers do not know that they have completed aggregation until they reach a dead state; given that, I need to yield additional information at that time. I played around with a number of alternative ideas, but this ended up being the cleanest, relative to the effort involved. For example, introducing another parameter to `ParseStatus::Dead` was too burdensome on APIs that ought not concern themselves with the possibility of receiving an object in addition to a lookahead token, since many parsers are not capable of doing so (given that they map M:(N<=M)). Another option that I abandoned fairly quickly was having `is_accepting` (potentially renamed) return an aggregate object, since that's on the side and didn't feel like it was part of the parsing pipeline. The intent is to abstract this some in a new `ParseState` method for delegation + aggregation. DEV-7145	2022-06-07 09:37:41 -04:00
Mike Gerwitz	495c1438fd	tamer: Consistent span diagram representation I'll document it more formally eventually, but this settles on a mix of the two: square brackets and dashes for intervals, `+` for intersecting lines, byte offsets below interval endpoints, and names below that. The docblock for `Span` itself iss still off; I'll probably just take one of the test cases and paste it there at some point. DEV-7145	2022-06-06 11:32:35 -04:00
Mike Gerwitz	bba181f573	tamer: xir::attr::Attr: Introduce AttrSpan This replaces a tuple with a tuple struct that allows for calculating more complete span information, such as the span encompassing the entire attribute and the value span including the surrounding quotes. This includes logic that ought to be abstracted into `Span` itself, and it's not as formal as I'd like it to be (e.g. not ensuring context), but this is a good starting point. Note that parsers call `Token::span`, which in turn calculates the attribute span, each time an attribute is encountered during lowering. But Rust does a good job at optimizing away unnecessary operations, so this didn't have an observable impact on time. DEV-7145	2022-06-06 11:31:28 -04:00
Mike Gerwitz	2b8e7e6031	tamer: xir::st::qname: New module This moves and deduplicates the static `QName`s into a common area. DEV-7145	2022-06-06 11:31:27 -04:00
Mike Gerwitz	3da82b351e	tamer: xir::flat::{State=>XirToXirf}: Rename Like the previous two commits, this states the intent of this parser, which results in more clear pipeline composition. DEV-7145	2022-06-02 13:48:54 -04:00
Mike Gerwitz	91b55999e2	tamer: asg::air::{AirState=>AirAggregate}: Rename Like the previous commit, this emphasizes what is happening. DEV-7145	2022-06-02 13:26:46 -04:00
Mike Gerwitz	45bbf3879e	tamer: obj::xmlo::{lower=>air}: Rename {LowerState=>XmloToAir} This provides much more clarity as to what is going on. Further, it's less ambiguous, since I'm about to introduce a new type of xmlo lowering into XIR for writing the actual xmlo files. DEV-7145	2022-06-02 13:23:41 -04:00
Mike Gerwitz	8d92667388	tamer: Integrate xir::reader as a parser in the lowering pipeline This allows `XmlXirReader` to be used in a `Lower` operation, just as everything else, bringing me one step closer to a pipeline that can be concisely represented; this is finally beginning to unify in a clear way, though it is still a bit of a mess. This causes `XmlXirReader` to _act_ like a `parse::Parser` in that it yields a `ParsedResult`, but it does not use `parse::Parser` itself; that was the _original_ plan: convert it into a `ParseState` where `XmlXirReader` became a context, and force `Parser` to yield by feeding it a stream of tokens with `repeat`, but that ended up performing poorly relative to this change. I did some investigation, which I might write about in the future, but for now, this solution works just fine. DEV-7145	2022-06-02 10:30:44 -04:00
Mike Gerwitz	f8c28655dc	tamer: parse: Split into multiple modules This abstraction has grown quite a bit, and it's time to start formalizing it a bit. This split doesn't change any behavior, but it does start to make it easier to reason about by clearly stating the broad components and how they interact with one-another. This doesn't yet move the tests; those will come next, but they are very few. The reason I gave previously for this was because (a) they're tested indirectly via the systems that utilize them and (b) because the abstraction was not yet settled on the process was already very expensive. No test coverage was lost---it's only that failures were potentially harder to debug on test failures, but in practice not even this was true, because the deeply expressive types all but ensured that, if it compiles, it will function in a way that is expected. Unit tests and documentation for this system will be added once I'm sure that this abstraction is in a proper state. DEV-7145	2022-06-01 11:32:58 -04:00
Mike Gerwitz	63aa452197	tamer: parse: Move parse::lower into Lower This also modifies `poc` such that `Lower` is invoked as an associated function rather than a method to emphasize the pattern that is forming, so that it can be later abstracted away. DEV-11864	2022-06-01 11:15:43 -04:00
Mike Gerwitz	f40f8bbafc	tamer: parse: Rename {lower__while_ok=>lower_} The `while_ok` can just be implied with a lowering operation, and that reduces the name complexity so that we can maybe introduce even more specialized methods without resulting in a huge sentence as a name. DEV-11864	2022-05-27 14:10:55 -04:00
Mike Gerwitz	b084e23497	tamer: Refactor asg_builder into obj::xmlo::lower and asg::air This finally uses `parse` all the way up to aggregation into the ASG, as can be seen by the mess in `poc`. This will be further simplified---I just need to get this committed so that I can mentally get it off my plate. I've been separating this commit into smaller commits, but there's a point where it's just not worth the effort anymore. I don't like making large changes such as this one. There is still work to do here. First, it's worth re-mentioning that `poc` means "proof-of-concept", and represents things that still need a proper home/abstraction. Secondly, `poc` is retrieving the context of two parsers---`LowerContext` and `Asg`. The latter is desirable, since it's the final aggregation point, but the former needs to be eliminated; in particular, packages need to be worked into the ASG so that `found` can be removed. Recursively loading `xmlo` files still happens in `poc`, but the compiler will need this as well. Once packages are on the ASG, along with their state, that responsibility can be generalized as well. That will then simplify lowering even further, to the point where hopefully everything has the same shape (once final aggregation has an abstraction), after which we can then create a final abstraction to concisely stitch everything together. Right now, Rust isn't able to infer `S` for `Lower<S, LS>`, which is unfortunate, but we'll be able to help it along with a more explicit abstraction. DEV-11864	2022-05-27 13:51:29 -04:00
Mike Gerwitz	95229916ca	current/compiler/worksheet: Generate lv:package/@name This is present on all other packages. Rather than complicating TAMER to accommodate a missing name, it's trivial to just add it. This will, unfortunately, invalidate and require rebuilding of all xmlo files, based on the `.rev-xmlo` bump. DEV-11864	2022-05-26 10:20:05 -04:00
Mike Gerwitz	eafb3b2a1b	tamer: Add Display impl for each ParseState for generic ParseErrors This is intended to describe, to the user, the state that the parser is in. This will be used to convey additional information for general parser errors, but it should also probably be integrated into parsers' individual errors as well when appropriate. This is something I expected to add at some point, but I wanted to add them because, when dealing with lowering errors, it can be difficult to tell what parser the error originated from. DEV-11864	2022-05-25 15:26:02 -04:00
Mike Gerwitz	9edc32dd3b	tamer: parse::LowerIter: Generic inner TripIter iterator This commit is preparing to compose LowerIter directly. DEV-11864	2022-05-24 10:27:14 -04:00
Mike Gerwitz	f218c452b9	tamer: iter::trip: Flatten Result The `*_iter_while_ok` functions now compose like monads, flattening `Result` at each step and drastically simplifying handling of error types. This also removes the bunch of `?`s at the end of the expression, and allows me to use `?` within the callback itself. I had originally not used `Result` as the return type of the callback because I was not entirely sure how I was going to use them, but it's now clear that I _always_ use `Result` as the return type, and so there's no use in trying to be too accommodating; it can always change in the future. This is desirable not just for cleanup, but because trying to refactor `asg_builder` into a pair of `Parser`s is really messy to chain without flattening, especially given some state that has to leak temporarily to the caller. More on that in a future commit. DEV-11864	2022-05-20 16:08:16 -04:00
Mike Gerwitz	958a707e02	tamer: asg: Hoist Root from Ident into Object This was always the intent, but I didn't have a higher-level object yet. This removes all the awkwardness that existed with working the root in as an identifier. DEV-11864	2022-05-19 12:48:43 -04:00
Mike Gerwitz	6252758730	tamer: asg::Object: Introduce Object::Ident This wraps `Ident` in a new `Object` variant and modifies `Asg` so that its nodes are of type `Object`. This unfortunately requires runtime type checking. Whether or not that's worth alleviating in the future depends on a lot of different things, since it'll require my own graph implementation, and I have to focus on other things right now. Maybe it'll be worth it in the future. Note that this also gets rid of some doc examples that simply aren't worth maintaining as the API evolves. DEV-11864	2022-05-19 12:33:59 -04:00
Mike Gerwitz	f75f1b605e	tamer: num: Header typo correction	2022-05-19 12:02:38 -04:00
Mike Gerwitz	ebf1de5a60	tamer: asg::Ident{Object=>}: Rename I think this may have been renamed _from_ `Ident` some time ago, but I'm too lazy to check. In any case, the name is redundant. DEV-11864	2022-05-19 11:17:04 -04:00
Mike Gerwitz	7d76cb53f6	tamer: asg: Move SymAttrs conversion into asg_builder This is a lowering operation and does not belong here. What a tangled mess this all was (see recent commits); no wonder it was so confusing. DEV-11864	2022-05-19 11:07:15 -04:00
Mike Gerwitz	eae194abc6	tamer: asg::object: Merge into asg::ident Everything in this file relates to identifiers, and I'm about to introduce a higher-level object, one of which may be an identifier. DEV-11864	2022-05-19 11:05:20 -04:00
Mike Gerwitz	92dba0a28c	tamer: obj::xmlo::asg_builder::IdentKindError: Merge into AsgBuilderError Now that these are in the same module, there's no need for them to be separate from one-another. DEV-11864	2022-05-19 10:56:07 -04:00
Mike Gerwitz	07d2ec1ffb	tamer: Move Dim and {Sym=>}Dtype into num module A previous commit mentioned that there's not a place for `Dim`, and duplicated it between `asg` and `xmlo`. Well, `Dtype` is also needed in both, and so here's a home for now. `Dtype` has always been an inappropriate detail for the system and will one day be removed entirely in favor of higher-level types; the machine representation is up to the compiler to decide. DEV-11864	2022-05-19 10:39:21 -04:00
Mike Gerwitz	b2a79e930b	tamer: Move SymAttrs lowering into asg_builder asg_builder is about to be replaced, but in the process of simplifying the destination IR (the ASG), I'm moving things into the proper place. This never belonged here---it belongs with the actual lowering operation. Previously, this was not reasoned about in terms of a lowering operation, and was written when I was first introducing myself to Rust and trying to get a proof-of-concept linker working. DEV-11864	2022-05-19 10:28:17 -04:00
Mike Gerwitz	8948452b71	tamer: asg::ident::Dim: Narrow type This matches xmlo::Dim, and could be the same thing, if we can find a home for it in the future; it's not worth creating such a home right now when I'm not yet sure what else ought to live there; the duplication may be fine. The conversion from xmlo needs to be moved, and `Dim` is going to be used for more than just identifiers (expressions will have type inference performed). DEV-11864	2022-05-19 09:32:43 -04:00
Mike Gerwitz	263cb68380	tamer: parse: Persistent context This allows retrieving and providing a context to a `Parser`. This is intended for use with an aggregating parser, in particular to construct the ASG and return it. This is a component of a change that replaces `asg_builder` with a `Parser`-based lowering into the ASG, but there are still changes that need to be made to simplify things and complete its integration. DEV-11864	2022-05-18 16:15:09 -04:00
Mike Gerwitz	001499d921	tamer: parse::ParseError: Remove Eq trait bound Just as in other commits, since it's an unnecessary limitation. DEV-11864	2022-05-18 16:06:22 -04:00
Mike Gerwitz	3e277270a7	tamer: asg: Track roots on graph Previously, since the graph contained only identifiers, discovered roots were stored in a separate vector and exposed to the caller. This not only leaked details, but added complexity; this was left over from the refactoring of the proof-of-concept linker some time ago. This moves the root management into the ASG itself, mostly, with one item being left over for now in the asg_builder (eligibility classifications). There are two roots that were added automatically: - __yield - __worksheet The former has been removed and is now expected to be explicitly mapped in the return map, which is now enforced with an extern in `core/base`. This is still special, in the sense that it is explicitly referenced by the generated code, but there's nothing inherently special about it and I'll continue to generalize it into oblivion in the future, such that the final yield is just a convention. `__worksheet` is the only symbol of type `IdentKind::Worksheet`, and so that was generalized just as the meta and map entries were. The goal in the future will be to have this more under the control of the source language, and to consolodate individual roots under packages, so that the _actual_ roots are few. As far as the actual ASG goes: this introduces a single root node that is used as the sole reference for reachability analysis and topological sorting. The edges of that root node replace the vector that was removed. DEV-11864	2022-05-17 10:42:05 -04:00
Mike Gerwitz	5a866f7735	core/base (___yield): New extern Rather than having the linker add this symbol opaquely, let's remove the special case and generalize it. There's nothing special about yield, except historical precedent. Systems can explicitly add it as a root in a common return map. DEV-11864	2022-05-16 15:07:37 -04:00
Mike Gerwitz	34eb994a0d	tamer: asg::Asg::set_fragment: {ObjectRef=>SymbolId} In the actual implementation (outside of tests), this is always looking up before adding the symbol. This will simplify the API, while still retaining errors, since the identifier will fail the state transition if the identifier did not exist before attempting to set a fragment. So while this is slower in microbenchmarks, this has no effect on real-world performance. Further, I'm refactoring toward a streaming ASG aggregation, which is a lot easier if we do not need to perform lookups in a separate step from the ASG's primitives. DEV-11864	2022-05-16 13:14:27 -04:00
Mike Gerwitz	c49d87976d	tamer: parse::Token: Remove Eq trait bound `PartialEq` remains, and is all that is needed. See previous commit regarding the removal of this same bound from `Context`. This can be re-added if it ends up actually being necessary. But Tokens are ephemeral and used only in lowering pipelines, using pattern matching. DEV-11864	2022-05-16 10:05:14 -04:00
Mike Gerwitz	d87006391e	tamer: asg::object: Remove IdentObjectState, IdentObjectData These traits are no longer necessary now that I'm using concrete types; they just add unnecessary noise and confusion as I attempt to further refactor. Don't abstract prematurely. DEV-11864	2022-05-12 16:31:36 -04:00
Mike Gerwitz	3748762d31	tamer: asg::graph::Asg: Remove type parameter O This removes the generic on the Asg (which was formerly BaseAsg), hard-coding `IdentObject`, which will further evolve. This makes the IR an actual concrete IR rather than an abstract data structure. These tests bring me back a bit, since they were written as I was still becoming familiar with Rust. DEV-11864	2022-05-12 15:46:17 -04:00
Mike Gerwitz	1114edbc6e	rater/tame: Remove circular symlink This was added long ago to maintain BC in some bizarre situation, and I had forgotten about it, but it's causing problems with lsp-mode in Emacs.	2022-05-12 14:32:24 -04:00
Mike Gerwitz	f2c5443176	tamer: asg: Remove generic Asg, rename {Base=>}Asg This is the beginning of an incremental refactoring to remove generics, to simplify the ASG. When I initially wrote the linker, I wasn't sure what direction I was going in, but I was also negatively influenced by more traditional approaches to both design and unit testing. If we're going to call the ASG an IR, then it needs to be one---if the core of the IR is generic, then it's more like an abstract data structure than anything. We can abstract around the IR to slice it up into components that are a little easier to reason about and understand how responsibilities are segregated. DEV-11864	2022-05-11 16:47:13 -04:00
Mike Gerwitz	0493e68cb3	tamer: parse::ParseState::Context: Add missing comment DEV-11864	2022-05-10 11:06:22 -04:00
Mike Gerwitz	0ef0d2b553	tamer: parse::ParseState:Error: Relax Eq trait bound This is unnecessarily restrictive, since we do not require anything further than `PartialEq` for the situations where we care about equality (tests). DEV-11864	2022-05-06 15:28:47 -04:00
Mike Gerwitz	9f990e19e9	tamer: parse::ParseState::Context: Remove Default trait bound This is too restrictive, especially for parsers that fold into something, like the ASG, which may exist prior to invoking the parser. This moves the trait bound to the functions that actually need it. Those obviously cannot be used if the Context does not implement `Default`, but I'll provide alternative conveniences. DEV-11864	2022-05-05 15:55:04 -04:00
Mike Gerwitz	ba9f429ee7	tamer: obj::xmlo::{XmloEvent=>XmloToken} The original "event" name was based on quick-xml's `Event`. This terminology shift is more closely matched with the new parsing system. DEV-11864	2022-05-05 12:25:59 -04:00
Mike Gerwitz	0d999b56cd	src/current/summary.xsl: Correct invalid UTF-8 sequence This broke when encoding was set to UTF-8 on this file.	2022-05-04 11:11:02 -04:00
Mike Gerwitz	2954c591a1	src/current/include/preproc/symtable: Remove extern @dtype check I attempted to resolve an error previously, and I thought I had, but apparently some symbols acquire a @dtype at some point in the process, or lose it. Regardless, I have no interest in debugging or resolving this mess, since it's going away. The linker ensures that externs match, so while this could potentially allow conflicting imports within a package (unlikely, given that extern templates are recommended), it still will not resolve with a conflicting concrete implementation. I'm not worried. DEV-1036	2022-05-04 10:50:14 -04:00
Mike Gerwitz	0281dfdf0d	tamer: Remove wip-frontends feature flag We want the new system to be used so that we can start catching any problems that may arise. Further changes will be flagged as necessary. DEV-10936	2022-05-04 09:37:10 -04:00

... 8 9 10 11 12 ...

1740 Commits (31f6a102eb03f8d8224323687a1978f28d4c5359) All Branches Search

1740 Commits (31f6a102eb03f8d8224323687a1978f28d4c5359)

All Branches