employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	e77bdaf19a	tamer: parse: Introduce mutable Context This resolves the performance issues caused by Rust's failure to elide the ElementStack (ArrayVec) memcpys on move. Since XIRF is invoked tens of millions of times in some cases for larger systems, prior to this change, failure to optimize away moves for XIRF resulted in tens of millions of memcpys. This resulted in linking of one program going from 1s -> ~15s. This change reduces it to ~2.5s with the wip-xmlo-xir-reader flag on, with the extra time coming from elsewhere (the subject of future changes). In particular, this change introduces a new mutable reference to `ParseState::parse_token`, which is a reference to a `Context` owned by the caller (e.g. `Parser`). In the case of XIRF, this means that `Parser<flat::State, _>` will own the `ElementStack`/`ArrayVec` instead of `flat::State`; this allows the latter to remain pure and benefit from Rust's move optimizations, without sacrificing the otherwise-pure implementation. ParseStates that do not need a mutable context can use `NoContext` and remain pure. DEV-12024	2022-04-05 15:50:53 -04:00
Mike Gerwitz	279ddc79d7	tamer: parse::TransitionResult: Alias=>newtype This converts the tuple type alias into a newtype, so that we may provide our own implementations. This differs from a previous approach that I took, which involved making this type `Result<(S, T), (S, E)>` so that the return values composed well with other functions. But the reality is that this is used only by other `ParseState`s and `Parser`, so it's unnecessary. However, this is also an attempt to utilize the new Try and FromResidual traits; note how the Try associated types match precisely what I was trying to do before, though they're used as intermediate types. I'll see how this evolves. DEV-10863	2022-03-25 12:28:50 -04:00
Mike Gerwitz	14638a612f	tamer: {xir::=>}parse: Move parser out of XIR The parsing framework originally created for XIR is now more general and useful to other things. We'll see how this evolves. This needs additional documentation, but I'd like to see how it changes as I implement XmloReader and then some of the source readers first. DEV-10863	2022-03-18 16:24:53 -04:00
Mike Gerwitz	150b3b9aa4	tamer: xir::flat: Improve parser validation This does a couple of things: it ensures that documents one and only one root note, and it properly handles dead transitions once parsing is complete (allowing it to be composed). This should make XIRF feature-complete for the time being. It does rely on the assumption that the reader is stripping out any trailing whitespace, so I guess we'll see if that's true as we proceed. DEV-10863	2022-03-17 23:22:38 -04:00
Mike Gerwitz	899fa79e59	tamer: xir::flat: Initial XIRF implementation This introduces XIR Flat (XIRF), which is conceptually between XIR and XIRT. This provides a more appropriate level of abstraction for further lowering operations to parse against, and removes the need for other parsers to perform their own validations (inappropriately) to ensure well-formed XML. There is still some cleanup worth doing, including moving some of the parsing responsibility up a level back into the XIR parser. DEV-10863	2022-03-17 13:08:16 -04:00
Mike Gerwitz	428d508be4	tamer: {ir::=>}{asg, xir} See the previous commit. There is no sense in some common "IR" namespace, since those IRs should live close to whatever system whose data they represent. In the case of these, they are general IRs that can apply to many different parts of the system. If that proves to be a false statement, they'll be moved. DEV-10863	2021-11-04 16:13:27 -04:00
Mike Gerwitz	d045786cfb	tamer: ir::xir::tree::Element::attrs: Wrap in Option This allows AttrList not only to be lazily initialized (which is less of a problem at the moment with Vec, but may become one in the future), but also leaves a space open for attributes to be added _after_ having been parsed. It further leaves room to _take_ attributes from their `Element`. This is important because the next commit will re-introduce the ability to parse attributes independently, allowing us to put the parser in a state where we can parse AttrList without an Element context. To re-use that parsing under an Element context, we can simply attach an AttrList after it has been parsed. Option adds no additional size cost to Vec, so we get this for free (except for the tiny change that initializes the attribute list when we try to push to it). I also think this reads better ("attrs: None"). Though it makes the API slightly more of a pain to work with. DEV-10863	2021-10-29 16:34:05 -04:00
Mike Gerwitz	18ab032ba0	tamer: Begin XIR-based xmlo reader impl There isn't a whole lot here, but there is additional work needed in various places to support upcoming changes and so I want to get this commited to ease the cognitive burden of what I have thusfar. And to stop stashing. We have a feature flag for a reason. DEV-10863	2021-10-28 21:21:30 -04:00
Mike Gerwitz	f6c5a224c8	tamer: iter::trip: Introduce initial TripIter concept See the documentation in this commit for more information. This is pretty significant, in that it's been a long-standing question for me how I'd like to join together `Result` iterators without having unnecessarily complex APIs, and also allow for error recovery. This solves both of those problems. It should be noted, however, that this does not yet explicitly implement error recovery, beyond being able to observe the failure as the result of the provided callback function. Proper recovery will be implemented once there's a use-case. DEV-11006	2021-10-28 14:50:41 -04:00
Mike Gerwitz	f9c9c95516	tamer: sym::prefill: Static symbol polymorphism See the docs for a much deeper discussion. In summary: traits do not support static methods, and this is the workaround, which relies on unstable nightly constant function features. This implementation is tested using `qname_const!`, and will be utilized with a new static type in a following commit.	2021-10-02 00:58:14 -04:00
Mike Gerwitz	5250571f15	tamer: ir::asg::ident: Use symbols in place of string slice mapping `IdentKind` needs to be written to `xmle` files and displayed in error messages. String slices were used when quick-xml was used for writing, which will be going away with the new writer.	2021-09-29 23:18:23 -04:00
Mike Gerwitz	6864fbc1cd	tamer: Start of XIR-based xmle writer This has been a long time coming, and has been repeatedly stashed as other parts of the system have evolved to support it. The introduction of the XIR tree was to write tests for this (which are sloppy atm). This currently writes out the `xmle` header and _most_ of the `l:dep` section; it's missing the object-type-specific attributes. There is, relatively speaking, not much more work to do here. The feature flag `wip-xir-xmle-writer` was introduced to toggle this system in place of `XmleWriter`. Initial benchmarks show that it will be competitive with the quick-xml-based writer, but remember that is not the goal: the purpose of this is to test XIR in a production system before we continue to implement it for a frontend, and to refactor so that we do not have multiple implementations writing XML files (once we echo the source XML files). I'm excited to get this done with so that I can move on. This has been rather exhausting.	2021-09-28 14:52:53 -04:00
Mike Gerwitz	3bb6f0cf35	tamer: ir::asg::ident: AsRef impls for SymbolId types This commit will make more sense once the broader context is committed, but it's needed for lowering from `Sections` into a XIR stream. This will also change once we pre-allocate symbols, like rustc, when the interner is initialized. This is my first use of the `paste` crate, which is used to generate identifiers. So this is partly an experiment, and it seems much better than having to write a proc macro, at least at this point in time. If this code stays around, it'll probably be generalized further and used elsewhere, but I'd prefer not to go this route long-term.	2021-09-20 16:50:40 -04:00
Mike Gerwitz	2586827d64	tamer: convert::{ExpectFrom, ExpectInto}: New traits These traits are intended to eliminate boilerplate, primarily in tests, in situations where from/into is not expected to fail. Given that TAMER must only panic for internal compiler errors, this should not often be used outside of test cases. Further, there may be better options in the future (e.g. QNames could be statically compiled rather than trying to convert at runtime, in this case).	2021-09-08 16:03:44 -04:00
Mike Gerwitz	a23bae5e4d	tamer: XIR: Working concept This is a working streaming IR for XML. I want to get this committed before I go further cleaning it up and integrating it into the xmle writer. This is lacking detailed documentation, and the names of things may end up changing. Initial benchmarks do show that it has a ~2x performance improvement over quick-xml when dealing with two attributes on a node, and I suspect that improvement will increase with the number of attributes. We will see how it compares in real-world benchmarks once the linker has been modified to use it. The goal isn't to _avoid_ quick-xml---it'll be used in the future for things like escaping that would be a huge waste to implement ourselves. It just so happened that quick-xml was not beneficial for these changes; indeed, its own writer is fairly simple for the portions that were implemented here, so there's no use in fighting with its API, particularly around attributes and our need to explicitly control whitespace (with the intent of handling code formatters in the future). To put this into perspective: the reason this work is being done isn't to refactor the linker, or to speed it up, but to generalize XML writing and provide a suitable IR for use in the compiler. The first step of the frontend is to essentially echo the XML token stream back out so we can incrementally parse it and do something useful, to incrementally rewrite the compiler in Rust.	2021-08-20 10:16:36 -04:00
Mike Gerwitz	0ff0f88e5f	tamer: Introduce span This is an initial implementation optimized for expected use cases. Hopefully that pans out and doesn't come back to bite me. Regarding the context: it only allows for interned paths atm, which are strings (and so much be valid UTF-8, which is fine for us, but sucks for something more general-purpose). I'll be curious if the context needs extension later on, or if different contexts will be stored in IRs (e.g. to store a template application site as well as the location of the expansion within the template body).	2021-08-13 15:16:39 -04:00
Mike Gerwitz	9deb393bfd	tamer: Global interners This is a major change, and I apologize for it all being in one commit. I had wanted to break it up, but doing so would have required a significant amount of temporary work that was not worth doing while I'm the only one working on this project at the moment. This accomplishes a number of important things, now that I'm preparing to write the first compiler frontend for TAMER: 1. `Symbol` has been removed; `SymbolId` is used in its place. 2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer. 3. Using symbols no longer requires dereferencing. 4. Lifetimes no longer pollute the entire system! (`'i`) 5. Two global interners are offered to produce `SymbolStr` with `'static` lifetimes, simplfiying lifetime management and borrowing where strings are still needed. 6. A nice API is provided for interning and lookups (e.g. "foo".intern()) which makes this look like a core feature of Rust. Unfortunately, making this change required modifications to...virtually everything. And that serves to emphasize why this change was needed: _everything_ used symbols, and so there's no use in not providing globals. I implemented this in a way that still provides for loose coupling through Rust's trait system. Indeed, Rustc offers a global interner, and I decided not to go that route initially because it wasn't clear to me that such a thing was desirable. It didn't become apparent to me, in fact, until the recent commit where I introduced `SymbolIndexSize` and saw how many things had to be touched; the linker evolved so rapidly as I was trying to learn Rust that I lost track of how bad it got. Further, this shows how the design of the internment system was a bit naive---I assumed certain requirements that never panned out. In particular, everything using symbols stored `&'i Symbol<'i>`---that is, a reference (usize) to an object containing an index (32-bit) and a string slice (128-bit). So it was a reference to a pretty large value, which was allocated in the arena alongside the interned string itself. But, that was assuming that something would need both the symbol index _and_ a readily available string. That's not the case. In fact, it's pretty clear that interning happens at the beginning of execution, that `SymbolId` is all that's needed during processing (unless an error occurs; more on that below); and it's not until _the very end_ that we need to retrieve interned strings from the pool to write either to a file or to display to the user. It was horribly wasteful! So `SymbolId` solves the lifetime issue in itself for most systems, but it still requires that an interner be available for anything that needs to create or resolve symbols, which, as it turns out, is still a lot of things. Therefore, I decided to implement them as thread-local static variables, which is very similar to what Rustc does itself (Rustc's are scoped). TAMER does not use threads, so the resulting `'static` lifetime should be just fine for now. Eventually I'd like to implement `!Send` and `!Sync`, though, to prevent references from escaping the thread (as noted in the patch); I can't do that yet, since the feature has not yet been stabalized. In the end, this leaves us with a system that's much easier to use and maintain; hopefully easier for newcomers to get into without having to deal with so many complex lifetimes; and a nice API that makes it a pleasure to work with symbols. Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I end up regretting that down the line, but it exists for an important reason: the `Span` and other structures that'll be introduced need to pack a lot of data into 64 bits so they can be freely copied around to keep lifetimes simple without wreaking havoc in other ways, but a 32-bit symbol size needed by the linker is too large for that. (Actually, the linker doesn't yet need 32 bits for our systems, but it's going to in the somewhat near future unless we optimize away a bunch of symbols...but I'd really rather not have the linker hit a limit that requires a lot of code changes to resolve). Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid that for now. Most systems can just use on of the `PkgSymbolId` or `ProgSymbolId` type aliases and not have to worry about it. Systems that are actually shared between the compiler and the linker do, though, but it's not like we don't already have a bunch of trait bounds. Of course, as we implement link-time optimizations (LTO) in the future, it's possible most things will need the size and I'll grow frustrated with that and possibly revisit this. We shall see. Anyway, this was exhausting...and...onward to the first frontend!	2021-08-11 14:24:55 -04:00
Mike Gerwitz	0fc8a1a4df	tamer: Remove default SymbolIndex (et al) index type Oh boy. What a mess of a change. This demonstrates some significant issues we have with Symbol. I had originally modelled the system a bit after Rustc's, but deviated in certain regards: 1. This has a confurable base type to enable better packing without bit twiddling and potentially unsafe tricks I'd rather avoid unless necessary; and 2. The lifetime is not static, and there is no global, singleton interner; and 3. I pass around references to a Symbol rather than passing around an index into an interner. For #3---this is done because there's no singleton interner and therefore resolving a symbol requires a direct reference to an available interner. It also wasn't clear to me (and still isn't, in fact) whether more than one interner may be used for different contexts. But, that doesn't preclude removing lifetimes and just passing around indexes; in fact, I plan to do this in the frontend where the parser and such will have direct interner access and can therefore just look up based on a symbol index. We could reserve references for situations where exposing an interner would be undesirable. Anyway, more to come...	2021-07-29 14:26:40 -04:00
Mike Gerwitz	d9dcfe8777	tamer: Introduce tpwrap module to contain quick_xml::Error adapter This adapter exists to implement PartialEq so that it can be derived on Error objects. This is used primarily (well, exclusively atm) for tests.	2021-07-23 23:23:55 -04:00
Mike Gerwitz	fb8422d670	tamer: Initial frontend concept This introduces the beginnings of frontends for TAMER, gated behind a `wip-features` flag. This will be introduced in stages: 1. Replace the existing copy with a parser-based copy (echo back out the tokens), when the flag is on. 2. Begin to parse portions of the source, augmenting the output xmlo (xmli at the moment). The XSLT-based compiler will be modified to skip compilation steps as necessary. As portions of the compilation are implemented in TAMER, they'll be placed behind their own feature flags and stabalized, which will incrementally remove the compilation steps from the XSLT-based system. The result should be substantial incremental performance improvements. Short-term, the priorities are for loading identifiers into an IR are (though the order may change): 1. Echo 2. Imports 3. Extern declarations. 4. Simple identifiers (e.g. param, const, template, etc). 5. Classifications. 6. Documentation expressions. 7. Calculation expressions. 8. Template applications. 9. Template definitions. 10. Inline templates. After each of those are done, the resulting xmlo (xmli) will have fully reconstructed the source document from the IR produced during parsing.	2021-07-23 22:24:08 -04:00
Mike Gerwitz	2e50af1220	Copyright year update 2021	2021-07-22 15:00:15 -04:00
Mike Gerwitz	716556c39f	tamer: Rust 1.{42=>48}.0 for stable intra-doc links without nightly	2021-06-21 13:10:00 -04:00
Mike Gerwitz	0127d4b698	TAMER: sym::Interner::index_lookup This was originally omitted because there wasn't a use case for it. Now that we're adding context to errors, however, an owned value is highly desirable. This adds almost no measurable overhead to the internment system in benchmarks (largely within the margin of error).	2020-04-29 11:33:41 -04:00
Mike Gerwitz	d97e53a835	[DEV-7084] TAMER: fs: Basic filesystem abstraction This also includes an implementation to visit paths only once. Note that it does not yet canonicalize the path before visiting, so relative paths to the same file can slip through, and relative paths to _different_ files could be erroneously considered to have been visited. This will be fixed in an upcoming commit.	2020-04-28 09:06:19 -04:00
Mike Gerwitz	7a4f6cf9f2	[DEV-7087] TAMER: symbol_dummy! macro	2020-03-24 14:14:05 -04:00
Mike Gerwitz	400d5b25a1	ir::asg::Object::Empty: Remove variant This variant is unnecessary, as it was used only by the indexer to represent the absence of a node, for which was can simply use `None` in the containing `Option`. * tamer/Cargo.toml: Add `lazy_static`. * tamer/Cargo.lock: Update. * tamer/src/ir/asg/base.rs (with_capacity): Use `None` in place of `Some(Object::Empty)`. * tamer/src/ir/asg/object.rs: Adjust state machine graphic. (Empty): Remove variant. (Missing): Remove reference to variance. * tamer/src/lib.rs: Import `lazy_static` for test builds. * tamer/obj/xmle/writer/writer.rs (Section::iter): Remove `Object::Empty` from documentation. (test::): Remove references to `Object::Missing`. `lazy_static!` used here. * tamer/obj/xmle/writer/xmle.rs (test::write_section_catch_missing): Replace reference to `Object::Missing`.	2020-03-19 15:42:06 -04:00
Mike Gerwitz	bfea768f89	Copyright year 2020 update	2020-03-06 11:05:18 -05:00
Mike Gerwitz	b89408e5bb	TAMER: Extract quick_xml event-related mocks	2020-02-26 10:49:01 -05:00
Mike Gerwitz	bcc2ab1221	TAMER: Initial abstract semantic graph (ASG) This begins to introduce the ASG, backed by Petgraph. The API will continue to evolve, and Petgraph will likely be encapsulated so that our implementation can vary independently from it (or even remove it in the future).	2020-02-26 10:48:59 -05:00
Mike Gerwitz	a929c8cae4	TAMER: xmlo reader This introduces the reader for xmlo files produced by the XSLT-based compiler. It is an initial implementation but is not complete; see future commits.	2020-02-25 16:46:25 -05:00
Mike Gerwitz	e4e0089815	TAMER: Initial string interning abstraction This is missing two key things that I'll add shortly: a HashMap-based one for use in the ASG for node mapping, and an entry-based system for manipulations. This has been a nice start for exploring various aspects of Rust development, as well as conventions that I'd like to implement. In particular: - Robust documentation intended to guide people through learning the necessary material about the compiler, as well as related work to rationalize design decisions; - Benchmarks; - TDD; - And just getting used to Rust in general. I've beat this one to death, so I'll commit this and make smaller changes going forward to show how easily it can evolve. (This module was originally named `intern` but this commit and those that follow rewrote it to `sym`.)	2020-02-24 14:56:28 -05:00
Mike Gerwitz	8374541965	tamer: Initial baisc POC with no XML output This is garbage code. Do not use it. It is intentionally throwaway. While I've researched Rust, I haven't actually _used_ it for a project, so this is a combination of me exploring various ways of accomplishing the problem and forcing myself to learn certain aspects of the language. I'll likely be using petgraph, and this also currently lacks symbol abstractions. This commit also performs far too much heap allocation copying strings around. But it _does_ perform the topological sort. Since this only stores the symbol name, it lacks enough information about the symbol to perform a proper linking.	2019-12-02 10:00:53 -05:00

32 Commits (b90bf9d8a8aad714bdbb989813df1652a337d583)