employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	ea6259570e	tamer: ld::poc: Extract xmlo loading pipeline into new pipeline module I want to clean this up a bit further. The motivation is that we need this for imports in `tamec`. Eventually this will be cleaned up to the point where it's declarative and easy to understand---there's a mess of types involved now and, when something goes wrong, it can be brutally confusing. DEV-13162	2023-05-25 16:38:41 -04:00
Mike Gerwitz	79fa10f26b	tamer: ld::xmle::lower::test: Use AIR (decouple from Asg and index) This uses AIR---the ASG's proper public interface now---to construct the graph for tests, just as all the other modern tests do. This is change works towards encapsulating index operations (both creation and lookups) so that the index can be moved off of Asg and into AIR, where it belongs. More information on that and rationale to come. DEV-13162	2023-05-17 10:50:57 -04:00
Mike Gerwitz	a9d0f43684	tamer: src::asg::graph::object::pkg::name: New module This introduces, but does not yet integrate, `CanonicalName`, which not only represents canonicalized package names, but handles namespec resolution. The term "namespec" is motivated by Git's use of *spec (e.g. refspec) referring to various ways of specifying a particular object. Names look like paths, and are derived from them, but they _are not paths_. Their resolution is a purely lexical operation, and they include a number of restrictions to simplify their clarity and handling. I expect them to evolve more in the future, and I've had ideas to do so for quite some time. In particular, resolving packages in this way and then loading the from the filesystem relative to the project root will ensure that traversing (conceptually) to a parent directory will not operate unintuitively with symlinks. The path will always resolve unambigiously. (With that said, if the symlink is to a shared directory with different directory structures, that doesn't solve the compilation problem---we'll have to move object files into a project-specific build directory to handle that.) Span Slicing ------------ Okay, it's worth commenting on the horridity of the path name slicing that goes on here. Care has been taken to ensure that spans will be able to be properly sliced in all relevant contexts, and there are plenty of words devoted to that in the documentation committed here. But there is a more fundamental problem here that I regret not having solved earlier, because I don't have the time for it right now: while we do have SPair, it makes no guarantees that the span associated with the corresponding SymbolId is actually the span that matches the original source lexeme. In fact, it's often not. This is a problem when we want to slice up a symbol in an SPair and produce a sensible span. If it _is_ a source lexeme with its original span, that's no problem. But if it's _not_, then the two are not in sync, and slicing up the span won't produce something that actually makes sense to the user. Or, worse (or maybe it's not worse?), it may cause a panic if the slicing is out of bounds. The solution in the future might be to store explicitly the state of an SPair, or call it Lexeme, or something, so that we know the conditions under which slicing is safe. If I ever have time for that in this project. But the result of the lack of a proper abstraction really shows here: this is some of the most confusing code in TAMER, and it's really not doing anything all that complicated. It is disproportionately confusing. DEV-13162	2023-05-05 10:26:56 -04:00
Mike Gerwitz	b7aae207c2	tamer: Rust v1.{68=>70}: Stabalized nonzero_min_max and is_some_and These two features have been stabalized in Rust 1.70.	2023-04-12 12:04:13 -04:00
Mike Gerwitz	f307f2d70b	tamer: asg::air: Extract template parsing into own parser Just as was done with the expression parser, which this will utilize. This initializes it, but doesn't yet make use of it (`AirExprAggregate`). Refactoring was definitely needed; decomposing this is quite a bit of work, in no small part because of the complexity. This helps significantly. DEV-13708	2023-03-10 14:27:59 -05:00
Mike Gerwitz	e6f736298b	tamer: asg::graph::visit::tree_reconstruction: New graph traversal This begins to introduce a graph traversal useful for a source reconstruction from the current state of the ASG. The idea is to, after having parsed and ingested the source through the lowering pipeline, to re-output it to (a) prove that we have parsed correctly and (b) allow progressively moving things from the XSLT-based compiler into TAMER. There's quite a bit of documentation here; see that for more information. Generalizing this in an appropriate way took some time, but I think this makes sense (that work began with the introduction of cross edges in terms of the tree described by the graph's ontology). But I do need to come up with an illustration to include in the documentation. DEV-13708	2023-03-10 14:27:57 -05:00
Mike Gerwitz	954b5a2795	Copyright year and name update Ryan Specialty Group (RSG) rebranded to Ryan Specialty after its IPO.	2023-01-20 23:37:30 -05:00
Mike Gerwitz	e6640c0019	tamer: Integrate clippy This invokes clippy as part of `make check` now, which I had previously avoided doing (I'll elaborate on that below). This commit represents the changes needed to resolve all the warnings presented by clippy. Many changes have been made where I find the lints to be useful and agreeable, but there are a number of lints, rationalized in `src/lib.rs`, where I found the lints to be disagreeable. I have provided rationale, primarily for those wondering why I desire to deviate from the default lints, though it does feel backward to rationalize why certain lints ought to be applied (the reverse should be true). With that said, this did catch some legitimage issues, and it was also helpful in getting some older code up-to-date with new language additions that perhaps I used in new code but hadn't gone back and updated old code for. My goal was to get clippy working without errors so that, in the future, when others get into TAMER and are still getting used to Rust, clippy is able to help guide them in the right direction. One of the reasons I went without clippy for so long (though I admittedly forgot I wasn't using it for a period of time) was because there were a number of suggestions that I found disagreeable, and I didn't take the time to go through them and determine what I wanted to follow. Furthermore, it was hard to make that judgment when I was new to the language and lacked the necessary experience to do so. One thing I would like to comment further on is the use of `format!` with `expect`, which is also what the diagnostic system convenience methods do (which clippy does not cover). Because of all the work I've done trying to understand Rust and looking at disassemblies and seeing what it optimizes, I falsely assumed that Rust would convert such things into conditionals in my otherwise-pure code...but apparently that's not the case, when `format!` is involved. I noticed that, after making the suggested fix with `get_ident`, Rust proceeded to then inline it into each call site and then apply further optimizations. It was also previously invoking the thread lock (for the interner) unconditionally and invoking the `Display` implementation. That is not at all what I intended for, despite knowing the eager semantics of function calls in Rust. Anyway, possibly more to come on that, I'm just tired of typing and need to move on. I'll be returning to investigate further diagnostic messages soon.	2023-01-20 23:37:29 -05:00
Mike Gerwitz	edbfc87a54	tamer: f::Functor: New trait This commit is purposefully coupled with changes that utilize it to demonstrate that the need for this abstraction has been _derived_, not forced; TAMER doesn't aim to be functional for the sake of it, since idiomatic Rust achieves many of its benefits without the formalisms. But, the formalisms do occasionally help, and this is one such example. There is other existing code that can be refactored to take advantage of this style as well. I do _not_ wish to pull an existing functional dependency into TAMER; I want to keep these abstractions light, and eliminate them as necessary, as Rust continues to integrate new features into its core. I also want to be able to modify the abstractions to suit our particular needs. (This is _not_ a general recommendation; it's particular to TAMER and to my experience.) This implementation of `Functor` is one such example. While it is modeled after Haskell in that it provides `fmap`, the primitive here is instead `map`, with `fmap` derived from it, since `map` allows for better use of Rust idioms. Furthermore, it's polymorphic over _trait_ type parameters, not method, allowing for separate trait impls for different container types, which can in turn be inferred by Rust and allow for some very concise mapping; this is particularly important for TAMER because of the disciplined use of newtypes. For example, `foo.overwrite(span)` and `foo.overwrite(name)` are both self-documenting, and better alternatives than, say, `foo.map_span(\|_\| span)` and `foo.map_symbol(\|_\| name)`; the latter are perfectly clear in what they do, but lack a layer of abstraction, and are verbose. But the clarity of the _new_ form does rely on either good naming conventions of arguments, or explicit type annotations using turbofish notation if necessary. This will be implemented on core Rust types as appropriate and as possible. At the time of writing, we do not yet have trait specialization, and there's too many soundness issues for me to be comfortable enabling it, so that limits that we can do with something like, say, a generic `Result`, while also allowing for specialized implementations based on newtypes. DEV-13160	2023-01-20 23:37:27 -05:00
Mike Gerwitz	c71f3247b1	tamer: Remove int_log feature flag (stabalized in 1.68-nightly) This also bumps the minimum nightly version.	2022-12-16 14:44:39 -05:00
Mike Gerwitz	5403dd06c6	tamer: Provide links to `tame{c,ld}` DEV-7145	2022-09-19 10:04:40 -04:00
Mike Gerwitz	f9bdcc2775	tamer: xir::parse::ele: Remove `*Error_` types A type alias was added for BC before errors were hoisted out in a previous commit, but they are unnecessary because of the associated type on `ParseState`. This also corrects the long-existing issue of using generated identifiers in tests. DEV-7145	2022-09-15 16:10:47 -04:00
Mike Gerwitz	419b24f251	tamer: Introduce NIR (accepting only) This introduces NIR, but only as an accepting grammar; it doesn't yet emit the NIR IR, beyond TODOs. This modifies `tamec` to, while copying XIR, also attempt to lower NIR to produce parser errors, if any. It does not yet fail compilation, as I just want to be cautious and observe that everything's working properly for a little while as people use it, before I potentially break builds. This is the culmination of months of supporting effort. The NIR grammar is derived from our existing TAME sources internally, which I use for now as a test case until I introduce test cases directly into TAMER later on (I'd do it now, if I hadn't spent so much time on this; I'll start introducing tests as I begin emitting NIR tokens). This is capable of fully parsing our largest system with >900 packages, as well as `core`. `tamec`'s lowering is a mess; that'll be cleaned up in future commits. The same can be said about `tameld`. NIR's grammar has some initial documentation, but this will improve over time as well. The generated docs still need some improvement, too, especially with generated identifiers; I just want to get this out here for testing. DEV-7145	2022-08-29 15:52:04 -04:00
Mike Gerwitz	13641e1812	tamer: diagnose::report: `int_log` feature: {=>i}log10 https://github.com/rust-lang/rust/pull/100332 The above MR replaces `log10` and friends with `ilog10`; this is the first time an unstable feature bit us in a substantially backwards-incompatible way that's a pain to deal with. Fortunately, I'm just not going to deal with it: this is used with the diagnostic system, which isn't yet used by our projects (outside of me testing), and so those builds shouldn't fail before people upgrade. This is now pending stabalization with the new name, so hopefully we're good now: https://github.com/rust-lang/rust/issues/70887#issuecomment-1210602692	2022-08-12 16:42:30 -04:00
Mike Gerwitz	2a36bc4210	tamer: (explicit_generic_args_with_impl_trait): Remove unstable feature flag This was stabalized in Rust 1.63. I was waiting to be sure our build servers were updated properly before removing this (and they were, long ago).	2022-08-12 16:42:30 -04:00
Mike Gerwitz	f9fe4aa13b	tamer: xir::st: Static namespace prefixes (c and t) In particular, `t:*` will be recognized by NIR for short-hand template application. These will be utilized in an upcoming commit. DEV-7145	2022-08-12 00:47:14 -04:00
Mike Gerwitz	c671bf6a9c	tamer: xir: Introduce {Ele,Open,Close}Span This isn't conceptally all that significant of a change, but there was a lot of modify to get it working. I would generally separate this into a commit for the implementation and another commit for the integration, but I decided to keep things together. This serves a role similar to AttrSpan---this allows deriving a span representing the element name from a span representing the entire XIR token. This will provide more useful context for errors---including the tag delimiter(s) means that we care about the fact that an element is in that position (as opposed to some other type of node) within the context of an error. However, if we are expecting an element but take issue with the element name itself, we want to place emphasis on that instead. This also starts to consider the issue of span contexts---a blob of detached data that is `Span` is useful for error context, but it's not useful for manipulation or deriving additional information. For that, we need to encode additional context, and this is an attempt at that. I am interested in the concept of providing Spans that are guaranteed to actually make sense---that are instantiated and manipulated with APIs that ensure consistency. But such a thing buys us very little, practically speaking, over what I have now for TAMER, and so I don't expect to actually implement that for this project; I'll leave that for a personal project. TAMER's already take a lot of my personal interests and it can cause me a lot of grief sometimes (with regards to letting my aspirations cause me more work). DEV-7145	2022-06-24 14:16:29 -04:00
Mike Gerwitz	adc45d90df	tamer: xir::parse: Attribute parser generator This is the first parser generator for the parsing framework. I've been waiting quite a while to do this because I wanted to be sure that I understood how I intended to write the attribute parsers manually. Now that I'm about to start parsing source XML files, it is necessary to have a parser generator. Typically one thinks of a parser generator as a separate program that generates code for some language, but that is not always the case---that represents a lack of expressiveness in the language itself (e.g. C). Here, I simply use Rust's macro system, which should be a concept familiar to someone coming from a language like Lisp. This also resolves where I stand on parser combinators with respect to this abstraction: they both accomplish the exact same thing (composition of smaller parsers), but this abstraction doesn't do so in the typical functional way. But the end result is the same. The parser generated by this abstraction will be optimized an inlined in the same manner as the hand-written parsers. Since they'll be tightly coupled with an element parser (which too will have a parser generator), I expect that most attribute parsers will simply be inlined; they exist as separate parsers conceptually, for the same reason that you'd use parser combinators. It's worth mentioning that this awkward reliance on dead state for a lookahead token to determine when aggregation is complete rubs me the wrong way, but resolving it would involve reintroducing the XIR AttrEnd that I had previously removed. I'll keep fighting with myself on this, but I want to get a bit further before I determine if it's worth the tradeoff of reintroducing (more complex IR but simplified parsing). DEV-7145	2022-06-21 13:23:02 -04:00
Mike Gerwitz	3f23bc5e33	tamer: fmt: New type-based formatting system This is partly an experiment, but is designed to simplify producing English sentences in various contexts. It makes use of a not only unstable, but incomplete, Rust feature---adt_const_params, for a static str const type parameter. Hopefully that ends up being stabalized. This uses types, but it's the same as function composition due to Rust's monomorphization. DEV-7145	2022-06-10 16:28:15 -04:00
Mike Gerwitz	2b8e7e6031	tamer: xir::st::qname: New module This moves and deduplicates the static `QName`s into a common area. DEV-7145	2022-06-06 11:31:27 -04:00
Mike Gerwitz	07d2ec1ffb	tamer: Move Dim and {Sym=>}Dtype into num module A previous commit mentioned that there's not a place for `Dim`, and duplicated it between `asg` and `xmlo`. Well, `Dtype` is also needed in both, and so here's a home for now. `Dtype` has always been an inappropriate detail for the system and will one day be removed entirely in favor of higher-level types; the machine representation is up to the compiler to decide. DEV-11864	2022-05-19 10:39:21 -04:00
Mike Gerwitz	1ad2fb1dc8	Copyright year update 2022 RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no "Group"), but I'm not sure if the legal name has been changed yet or not, so I'll wait on that.	2022-05-03 14:14:29 -04:00
Mike Gerwitz	a2e6e37ed1	tamer: Bump nightly Rust version 1.{57=>62} This removes a couple of feature flags that are no longer necessary.	2022-05-02 11:05:32 -04:00
Mike Gerwitz	5c0e224d3c	tamer: diagnose::report: Line numbers in gutter Alright, starting to settle on an abstraction now, and things are coming together. This gives us line numbers in the previously-empty gutter, and widens the gutter to accommodate. Gutters are normalized across sections. Sections are not yet collapsed for sequential line numbers in the same context. Exciting! Here's an example, on an xmlo file: error: expected closing tag for `preproc:symtable` --> /home/.../foo.xmlo:16:4 \| 16 \| <preproc:symtable xmlns:map="http://www.w3.org/2005/xpath-functions/map"> \| ----------------- note: element `preproc:symtable` is opened here --> /home/.../foo.xmlo:11326:4 \| 11326 \| </preproc:wrong> \| ^^^^^^^^^^^^^^^^ error: expected `</preproc:symtable>` DEV-12151	2022-04-28 23:53:38 -04:00
Mike Gerwitz	ab48d79e1f	tamer: diagnost::resolver: Initial concept for line resolution This works, but it's ugly and requires some cleanup. It shows that there are some interesting considerations when determining how to best represent the location of spans to the user in a way that is intuitive. This is not yet integrated with the reporter, which will require a layer to load a `Context` from disk. DEV-10935	2022-04-20 09:42:13 -04:00
Mike Gerwitz	eaa8133d21	tamer: diagnose: Introduction of diagnostic system This is a working concept that will continue to evolve. I wanted to start with some basic output before getting too carried away, since there's a lot of potential here. This is heavily influenced by Rust's helpful diagnostic messages, but will take some time to realize a lot of the things that Rust does. The next step will be to resolve line and column numbers, and then possibly include snippets and underline spans, placing the labels alongside them. I need to balance this work with everything else I have going on. This is a large commit, but it converts the existing Error Display impls into Diagnostic. This separation is a bit verbose, so I'll see how this ends up evolving. Diagnostics are tied to Error at the moment, but I imagine in the future that any object would be able to describe itself, error or not, which would be useful in the future both for the Summary Page and for query functionality, to help developers understand the systems they are writing using TAME. Output is integrated into tameld only in this commit; I'll add tamec next. Examples of what this outputs are available in the test cases in this commit. DEV-10935	2022-04-13 15:22:46 -04:00
Mike Gerwitz	cfc7f45bc4	tamer: Remove wip-xmlo-xir-reader This entirely removes the old XmloReader that has since been replaced with a XIR-based reader. I had been holding off on this because the new reader is slower, pending performance optimizations (which I'll do a little later on), however the performance loss is of no practical consideration and only affects the linker, which is still fast. Therefore, it's better to get this old code out of the way to simplify refactoring going forward. In particular, I'm working on the diagnostic system. This is a little sad, in a way---this is some of my first Rust code that I'm deleting. DEV-10935	2022-04-11 16:11:49 -04:00
Mike Gerwitz	942bf66231	tamer: frontend: Clean up unused modules These were part of a POC for frontends quite some time ago. Some portions of this concept may be reintroduced, but this was pre-XIR. DEV-10413	2022-04-07 14:21:08 -04:00
Mike Gerwitz	e77bdaf19a	tamer: parse: Introduce mutable Context This resolves the performance issues caused by Rust's failure to elide the ElementStack (ArrayVec) memcpys on move. Since XIRF is invoked tens of millions of times in some cases for larger systems, prior to this change, failure to optimize away moves for XIRF resulted in tens of millions of memcpys. This resulted in linking of one program going from 1s -> ~15s. This change reduces it to ~2.5s with the wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the subject of future changes). In particular, this change introduces a new mutable reference to `ParseState::parse_token`, which is a reference to a `Context` owned by the caller (e.g. `Parser`). In the case of XIRF, this means that `Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of `flat::State`; this allows the latter to remain pure and benefit from Rust's move optimizations, without sacrificing the otherwise-pure implementation. ParseStates that do not need a mutable context can use `NoContext` and remain pure. DEV-12024	2022-04-05 15:50:53 -04:00
Mike Gerwitz	279ddc79d7	tamer: parse::TransitionResult: Alias=>newtype This converts the tuple type alias into a newtype, so that we may provide our own implementations. This differs from a previous approach that I took, which involved making this type `Result<(S, T), (S, E)>` so that the return values composed well with other functions. But the reality is that this is used only by other `ParseState`s and `Parser`, so it's unnecessary. However, this is also an attempt to utilize the new Try and FromResidual traits; note how the Try associated types match precisely what I was trying to do before, though they're used as intermediate types. I'll see how this evolves. DEV-10863	2022-03-25 12:28:50 -04:00
Mike Gerwitz	14638a612f	tamer: {xir::=>}parse: Move parser out of XIR The parsing framework originally created for XIR is now more general and useful to other things. We'll see how this evolves. This needs additional documentation, but I'd like to see how it changes as I implement XmloReader and then some of the source readers first. DEV-10863	2022-03-18 16:24:53 -04:00
Mike Gerwitz	150b3b9aa4	tamer: xir::flat: Improve parser validation This does a couple of things: it ensures that documents one and only one root note, and it properly handles dead transitions once parsing is complete (allowing it to be composed). This should make XIRF feature-complete for the time being. It does rely on the assumption that the reader is stripping out any trailing whitespace, so I guess we'll see if that's true as we proceed. DEV-10863	2022-03-17 23:22:38 -04:00
Mike Gerwitz	899fa79e59	tamer: xir::flat: Initial XIRF implementation This introduces XIR Flat (XIRF), which is conceptually between XIR and XIRT. This provides a more appropriate level of abstraction for further lowering operations to parse against, and removes the need for other parsers to perform their own validations (inappropriately) to ensure well-formed XML. There is still some cleanup worth doing, including moving some of the parsing responsibility up a level back into the XIR parser. DEV-10863	2022-03-17 13:08:16 -04:00
Mike Gerwitz	428d508be4	tamer: {ir::=>}{asg, xir} See the previous commit. There is no sense in some common "IR" namespace, since those IRs should live close to whatever system whose data they represent. In the case of these, they are general IRs that can apply to many different parts of the system. If that proves to be a false statement, they'll be moved. DEV-10863	2021-11-04 16:13:27 -04:00
Mike Gerwitz	d045786cfb	tamer: ir::xir::tree::Element::attrs: Wrap in Option This allows AttrList not only to be lazily initialized (which is less of a problem at the moment with Vec, but may become one in the future), but also leaves a space open for attributes to be added _after_ having been parsed. It further leaves room to _take_ attributes from their `Element`. This is important because the next commit will re-introduce the ability to parse attributes independently, allowing us to put the parser in a state where we can parse AttrList without an Element context. To re-use that parsing under an Element context, we can simply attach an AttrList after it has been parsed. Option adds no additional size cost to Vec, so we get this for free (except for the tiny change that initializes the attribute list when we try to push to it). I also think this reads better ("attrs: None"). Though it makes the API slightly more of a pain to work with. DEV-10863	2021-10-29 16:34:05 -04:00
Mike Gerwitz	18ab032ba0	tamer: Begin XIR-based xmlo reader impl There isn't a whole lot here, but there is additional work needed in various places to support upcoming changes and so I want to get this commited to ease the cognitive burden of what I have thusfar. And to stop stashing. We have a feature flag for a reason. DEV-10863	2021-10-28 21:21:30 -04:00
Mike Gerwitz	f6c5a224c8	tamer: iter::trip: Introduce initial TripIter concept See the documentation in this commit for more information. This is pretty significant, in that it's been a long-standing question for me how I'd like to join together `Result` iterators without having unnecessarily complex APIs, and also allow for error recovery. This solves both of those problems. It should be noted, however, that this does not yet explicitly implement error recovery, beyond being able to observe the failure as the result of the provided callback function. Proper recovery will be implemented once there's a use-case. DEV-11006	2021-10-28 14:50:41 -04:00
Mike Gerwitz	f9c9c95516	tamer: sym::prefill: Static symbol polymorphism See the docs for a much deeper discussion. In summary: traits do not support static methods, and this is the workaround, which relies on unstable nightly constant function features. This implementation is tested using `qname_const!`, and will be utilized with a new static type in a following commit.	2021-10-02 00:58:14 -04:00
Mike Gerwitz	5250571f15	tamer: ir::asg::ident: Use symbols in place of string slice mapping `IdentKind` needs to be written to `xmle` files and displayed in error messages. String slices were used when quick-xml was used for writing, which will be going away with the new writer.	2021-09-29 23:18:23 -04:00
Mike Gerwitz	6864fbc1cd	tamer: Start of XIR-based xmle writer This has been a long time coming, and has been repeatedly stashed as other parts of the system have evolved to support it. The introduction of the XIR tree was to write tests for this (which are sloppy atm). This currently writes out the `xmle` header and _most_ of the `l:dep` section; it's missing the object-type-specific attributes. There is, relatively speaking, not much more work to do here. The feature flag `wip-xir-xmle-writer` was introduced to toggle this system in place of `XmleWriter`. Initial benchmarks show that it will be competitive with the quick-xml-based writer, but remember that is not the goal: the purpose of this is to test XIR in a production system before we continue to implement it for a frontend, and to refactor so that we do not have multiple implementations writing XML files (once we echo the source XML files). I'm excited to get this done with so that I can move on. This has been rather exhausting.	2021-09-28 14:52:53 -04:00
Mike Gerwitz	3bb6f0cf35	tamer: ir::asg::ident: AsRef impls for SymbolId types This commit will make more sense once the broader context is committed, but it's needed for lowering from `Sections` into a XIR stream. This will also change once we pre-allocate symbols, like rustc, when the interner is initialized. This is my first use of the `paste` crate, which is used to generate identifiers. So this is partly an experiment, and it seems much better than having to write a proc macro, at least at this point in time. If this code stays around, it'll probably be generalized further and used elsewhere, but I'd prefer not to go this route long-term.	2021-09-20 16:50:40 -04:00
Mike Gerwitz	2586827d64	tamer: convert::{ExpectFrom, ExpectInto}: New traits These traits are intended to eliminate boilerplate, primarily in tests, in situations where from/into is not expected to fail. Given that TAMER must only panic for internal compiler errors, this should not often be used outside of test cases. Further, there may be better options in the future (e.g. QNames could be statically compiled rather than trying to convert at runtime, in this case).	2021-09-08 16:03:44 -04:00
Mike Gerwitz	a23bae5e4d	tamer: XIR: Working concept This is a working streaming IR for XML. I want to get this committed before I go further cleaning it up and integrating it into the xmle writer. This is lacking detailed documentation, and the names of things may end up changing. Initial benchmarks do show that it has a ~2x performance improvement over quick-xml when dealing with two attributes on a node, and I suspect that improvement will increase with the number of attributes. We will see how it compares in real-world benchmarks once the linker has been modified to use it. The goal isn't to _avoid_ quick-xml---it'll be used in the future for things like escaping that would be a huge waste to implement ourselves. It just so happened that quick-xml was not beneficial for these changes; indeed, its own writer is fairly simple for the portions that were implemented here, so there's no use in fighting with its API, particularly around attributes and our need to explicitly control whitespace (with the intent of handling code formatters in the future). To put this into perspective: the reason this work is being done isn't to refactor the linker, or to speed it up, but to generalize XML writing and provide a suitable IR for use in the compiler. The first step of the frontend is to essentially echo the XML token stream back out so we can incrementally parse it and do something useful, to incrementally rewrite the compiler in Rust.	2021-08-20 10:16:36 -04:00
Mike Gerwitz	0ff0f88e5f	tamer: Introduce span This is an initial implementation optimized for expected use cases. Hopefully that pans out and doesn't come back to bite me. Regarding the context: it only allows for interned paths atm, which are strings (and so much be valid UTF-8, which is fine for us, but sucks for something more general-purpose). I'll be curious if the context needs extension later on, or if different contexts will be stored in IRs (e.g. to store a template application site as well as the location of the expansion within the template body).	2021-08-13 15:16:39 -04:00
Mike Gerwitz	9deb393bfd	tamer: Global interners This is a major change, and I apologize for it all being in one commit. I had wanted to break it up, but doing so would have required a significant amount of temporary work that was not worth doing while I'm the only one working on this project at the moment. This accomplishes a number of important things, now that I'm preparing to write the first compiler frontend for TAMER: 1. `Symbol` has been removed; `SymbolId` is used in its place. 2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer. 3. Using symbols no longer requires dereferencing. 4. Lifetimes no longer pollute the entire system! (`'i`) 5. Two global interners are offered to produce `SymbolStr` with `'static` lifetimes, simplfiying lifetime management and borrowing where strings are still needed. 6. A nice API is provided for interning and lookups (e.g. "foo".intern()) which makes this look like a core feature of Rust. Unfortunately, making this change required modifications to...virtually everything. And that serves to emphasize why this change was needed: _everything_ used symbols, and so there's no use in not providing globals. I implemented this in a way that still provides for loose coupling through Rust's trait system. Indeed, Rustc offers a global interner, and I decided not to go that route initially because it wasn't clear to me that such a thing was desirable. It didn't become apparent to me, in fact, until the recent commit where I introduced `SymbolIndexSize` and saw how many things had to be touched; the linker evolved so rapidly as I was trying to learn Rust that I lost track of how bad it got. Further, this shows how the design of the internment system was a bit naive---I assumed certain requirements that never panned out. In particular, everything using symbols stored `&'i Symbol<'i>`---that is, a reference (usize) to an object containing an index (32-bit) and a string slice (128-bit). So it was a reference to a pretty large value, which was allocated in the arena alongside the interned string itself. But, that was assuming that something would need both the symbol index _and_ a readily available string. That's not the case. In fact, it's pretty clear that interning happens at the beginning of execution, that `SymbolId` is all that's needed during processing (unless an error occurs; more on that below); and it's not until _the very end_ that we need to retrieve interned strings from the pool to write either to a file or to display to the user. It was horribly wasteful! So `SymbolId` solves the lifetime issue in itself for most systems, but it still requires that an interner be available for anything that needs to create or resolve symbols, which, as it turns out, is still a lot of things. Therefore, I decided to implement them as thread-local static variables, which is very similar to what Rustc does itself (Rustc's are scoped). TAMER does not use threads, so the resulting `'static` lifetime should be just fine for now. Eventually I'd like to implement `!Send` and `!Sync`, though, to prevent references from escaping the thread (as noted in the patch); I can't do that yet, since the feature has not yet been stabalized. In the end, this leaves us with a system that's much easier to use and maintain; hopefully easier for newcomers to get into without having to deal with so many complex lifetimes; and a nice API that makes it a pleasure to work with symbols. Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I end up regretting that down the line, but it exists for an important reason: the `Span` and other structures that'll be introduced need to pack a lot of data into 64 bits so they can be freely copied around to keep lifetimes simple without wreaking havoc in other ways, but a 32-bit symbol size needed by the linker is too large for that. (Actually, the linker doesn't yet need 32 bits for our systems, but it's going to in the somewhat near future unless we optimize away a bunch of symbols...but I'd really rather not have the linker hit a limit that requires a lot of code changes to resolve). Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid that for now. Most systems can just use on of the `PkgSymbolId` or `ProgSymbolId` type aliases and not have to worry about it. Systems that are actually shared between the compiler and the linker do, though, but it's not like we don't already have a bunch of trait bounds. Of course, as we implement link-time optimizations (LTO) in the future, it's possible most things will need the size and I'll grow frustrated with that and possibly revisit this. We shall see. Anyway, this was exhausting...and...onward to the first frontend!	2021-08-11 14:24:55 -04:00
Mike Gerwitz	0fc8a1a4df	tamer: Remove default SymbolIndex (et al) index type Oh boy. What a mess of a change. This demonstrates some significant issues we have with Symbol. I had originally modelled the system a bit after Rustc's, but deviated in certain regards: 1. This has a confurable base type to enable better packing without bit twiddling and potentially unsafe tricks I'd rather avoid unless necessary; and 2. The lifetime is not static, and there is no global, singleton interner; and 3. I pass around references to a Symbol rather than passing around an index into an interner. For #3---this is done because there's no singleton interner and therefore resolving a symbol requires a direct reference to an available interner. It also wasn't clear to me (and still isn't, in fact) whether more than one interner may be used for different contexts. But, that doesn't preclude removing lifetimes and just passing around indexes; in fact, I plan to do this in the frontend where the parser and such will have direct interner access and can therefore just look up based on a symbol index. We could reserve references for situations where exposing an interner would be undesirable. Anyway, more to come...	2021-07-29 14:26:40 -04:00
Mike Gerwitz	d9dcfe8777	tamer: Introduce tpwrap module to contain quick_xml::Error adapter This adapter exists to implement PartialEq so that it can be derived on Error objects. This is used primarily (well, exclusively atm) for tests.	2021-07-23 23:23:55 -04:00
Mike Gerwitz	fb8422d670	tamer: Initial frontend concept This introduces the beginnings of frontends for TAMER, gated behind a `wip-features` flag. This will be introduced in stages: 1. Replace the existing copy with a parser-based copy (echo back out the tokens), when the flag is on. 2. Begin to parse portions of the source, augmenting the output xmlo (xmli at the moment). The XSLT-based compiler will be modified to skip compilation steps as necessary. As portions of the compilation are implemented in TAMER, they'll be placed behind their own feature flags and stabalized, which will incrementally remove the compilation steps from the XSLT-based system. The result should be substantial incremental performance improvements. Short-term, the priorities are for loading identifiers into an IR are (though the order may change): 1. Echo 2. Imports 3. Extern declarations. 4. Simple identifiers (e.g. param, const, template, etc). 5. Classifications. 6. Documentation expressions. 7. Calculation expressions. 8. Template applications. 9. Template definitions. 10. Inline templates. After each of those are done, the resulting xmlo (xmli) will have fully reconstructed the source document from the IR produced during parsing.	2021-07-23 22:24:08 -04:00
Mike Gerwitz	2e50af1220	Copyright year update 2021	2021-07-22 15:00:15 -04:00
Mike Gerwitz	716556c39f	tamer: Rust 1.{42=>48}.0 for stable intra-doc links without nightly	2021-06-21 13:10:00 -04:00

1 2

60 Commits (9c6b00a124cd4a381eadf4e0090a921d83620407)