employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	954b5a2795	Copyright year and name update Ryan Specialty Group (RSG) rebranded to Ryan Specialty after its IPO.	2023-01-20 23:37:30 -05:00
Mike Gerwitz	e6640c0019	tamer: Integrate clippy This invokes clippy as part of `make check` now, which I had previously avoided doing (I'll elaborate on that below). This commit represents the changes needed to resolve all the warnings presented by clippy. Many changes have been made where I find the lints to be useful and agreeable, but there are a number of lints, rationalized in `src/lib.rs`, where I found the lints to be disagreeable. I have provided rationale, primarily for those wondering why I desire to deviate from the default lints, though it does feel backward to rationalize why certain lints ought to be applied (the reverse should be true). With that said, this did catch some legitimage issues, and it was also helpful in getting some older code up-to-date with new language additions that perhaps I used in new code but hadn't gone back and updated old code for. My goal was to get clippy working without errors so that, in the future, when others get into TAMER and are still getting used to Rust, clippy is able to help guide them in the right direction. One of the reasons I went without clippy for so long (though I admittedly forgot I wasn't using it for a period of time) was because there were a number of suggestions that I found disagreeable, and I didn't take the time to go through them and determine what I wanted to follow. Furthermore, it was hard to make that judgment when I was new to the language and lacked the necessary experience to do so. One thing I would like to comment further on is the use of `format!` with `expect`, which is also what the diagnostic system convenience methods do (which clippy does not cover). Because of all the work I've done trying to understand Rust and looking at disassemblies and seeing what it optimizes, I falsely assumed that Rust would convert such things into conditionals in my otherwise-pure code...but apparently that's not the case, when `format!` is involved. I noticed that, after making the suggested fix with `get_ident`, Rust proceeded to then inline it into each call site and then apply further optimizations. It was also previously invoking the thread lock (for the interner) unconditionally and invoking the `Display` implementation. That is not at all what I intended for, despite knowing the eager semantics of function calls in Rust. Anyway, possibly more to come on that, I'm just tired of typing and need to move on. I'll be returning to investigate further diagnostic messages soon.	2023-01-20 23:37:29 -05:00
Mike Gerwitz	1ad2fb1dc8	Copyright year update 2022 RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no "Group"), but I'm not sure if the legal name has been changed yet or not, so I'll wait on that.	2022-05-03 14:14:29 -04:00
Mike Gerwitz	eaa8133d21	tamer: diagnose: Introduction of diagnostic system This is a working concept that will continue to evolve. I wanted to start with some basic output before getting too carried away, since there's a lot of potential here. This is heavily influenced by Rust's helpful diagnostic messages, but will take some time to realize a lot of the things that Rust does. The next step will be to resolve line and column numbers, and then possibly include snippets and underline spans, placing the labels alongside them. I need to balance this work with everything else I have going on. This is a large commit, but it converts the existing Error Display impls into Diagnostic. This separation is a bit verbose, so I'll see how this ends up evolving. Diagnostics are tied to Error at the moment, but I imagine in the future that any object would be able to describe itself, error or not, which would be useful in the future both for the Summary Page and for query functionality, to help developers understand the systems they are writing using TAME. Output is integrated into tameld only in this commit; I'll add tamec next. Examples of what this outputs are available in the test cases in this commit. DEV-10935	2022-04-13 15:22:46 -04:00
Mike Gerwitz	cfc7f45bc4	tamer: Remove wip-xmlo-xir-reader This entirely removes the old XmloReader that has since been replaced with a XIR-based reader. I had been holding off on this because the new reader is slower, pending performance optimizations (which I'll do a little later on), however the performance loss is of no practical consideration and only affects the linker, which is still fast. Therefore, it's better to get this old code out of the way to simplify refactoring going forward. In particular, I'm working on the diagnostic system. This is a little sad, in a way---this is some of my first Rust code that I'm deleting. DEV-10935	2022-04-11 16:11:49 -04:00
Mike Gerwitz	68223cb7d3	tamer: xir::reader: Additional quick-xml error spans There's a bit to unpack here. Some of the spans originate from quick-xml's error handling, but in coming up with test cases to try to trigger errors, I found that quick-xml is far too permissive in what it accepts, and oughtright dangerous in some situations. I feel like the writing is on the wall for quick-xml, but I'll probably wait until replacing `xmlo` with a more efficient format before deciding whether to use a different library or implement parsing ourselves. There's a lot of factors to consider, and a library would have to not only be correct and performant, but provide useful information for span generation. But for now, I have other more important things to work on, like a functioning compiler. So while quick-xml is around, I'll just have to do the best I can to provide a correct parser with useful errors. DEV-10934	2022-04-08 14:54:49 -04:00
Mike Gerwitz	ab181670b5	tamer: xir::reader: Initial introduction of spans This is a large change, and was a bit of a tedious one, given the comprehensive tests. This introduces proper offsets and lengths for spans, with the exception of some quick-xml errors that still need proper mapping. Further, this still uses `UNKNOWN_CONTEXT`, which will be resolved shortly. This also introduces `SpanlessError`, which `Error` explicitly _does not_ implement `From<SpanlessError>` for---this forces the caller to provide a span before the error is compatable with the return value, ensuring that spans will actually be available rather than forgotten for errors. This is important, given that errors are generally less tested than the happy path, and errors are when users need us the most (so, need span information). Further, I had to use pointer arithmetic in order to calculate many of the spans, because quick-xml does not provide enough information. There's no safety considerations here, and the comprehensive unit test will ensure correct behavior if the implementation changes in the future. I would like to introduce typed spans at some point---I made some opinionated choices when it comes to what the spans ought to represent. Specifically, whether to include the `<` or `>` with the open span (depends), whether to include quotes with attribute values (no), and some other details highlighted in the test cases. If we provide typed spans, then we could, knowing the type of span, calculate other spans on request, e.g. to include or omit quotes for attributes. Different such spans may be useful in different situations when presenting information to the user. This also highlights gaps in the tokens emitted by XIR, such as whitespace between attributes, the `=` between name and value, and so on. These are important when it comes to code formatting, so that we can reliably reconstruct the XML tree, but it's not important right now. I anticipate future changes would allow the XIR reader to be configured (perhaps via generics, like a strategy-type pattern) to optionally omit these tokens if desired. Anyway, more to come. DEV-10934	2022-04-08 13:59:37 -04:00
Mike Gerwitz	99aacaf7ca	tamer: tamec: Replace copy with XIR parsing/writing When wip-frontends is on, this will parse the input file using XIR and then immediately output it again. This makes the necessary changes to be able to read every source file we have in our largest project, such that the output is identical after having been formatted with `xmllint --format -` (there are differences because e.g. whitespace between attributes is not yet maintained). This is performant too, with times remaining essentially identical despite the additional work. DEV-10413	2022-04-07 12:13:49 -04:00
Mike Gerwitz	b1c0783c75	tamer: xir::XirString: WIP implementation (likely going away) I'm not fond of this implementation, which is why it's not fully completed. I wanted to commit this for future reference, and take the opportunity to explain why I don't like it. First: this task started as an idea to implement a third variant to AttrValue and friends that indicates that a value is fixed, in the sense of a fixed-point function: escaped or unescaped, its value is the same. This would allow us to skip wasteful escape/unescape operations. In doing so, it became obvious that there's no need to leak this information through the API, and indeed, no part of the system should care. When we read XML, it should be unescaped, and when we write, it should be escaped. The reason that this didn't quite happen to begin with was an optimization: I'll be creating an echo writer in place of the current filesystem-based copy in tamec shortly, and this would allow streaming XIR directly from the reader to the writer without any unescaping or re-escaping. When we unescape, we know the value that it came from, so we could simply store both symbols---they're 32-bit, so it results in a nicely compressed 64-bit value, so it's essentially cost-free, as long as we accept the expense of internment. This is `XirString`. Then, when we want to escape or unescape, we first check to see whether a symbol already exists and, if so, use it. While this works well for echoing streams, it won't work all that well in practice: the unescaped SymbolId will be taken and the XirString discarded, since nothing after XIR should be coupled with it. Then, when we later construct a XIR stream for writting, XirString will no longer be available and our previously known escape is lost, so the writer will have to re-escape. Further, if we look at XirString's generic for the XirStringEscaper---it uses phantom, which hints that maybe it's not in the best place. Indeed, I've already acknowledged that only a reader unescapes and only a writer escapes, and that the rest of the system works with normal (unescaped) values, so only readers and writers should be part of this process. I also already acknowledged that XirString would be lost and only the unescaped SymbolId would be used. So what's the point of XirString, then, if it won't be a useful optimization beyond the temporary echo writer? Instead, we can take the XirStringWriter and implement two caches on that: mapping SymbolId from escaped->unescaped and vice-versa. These can be simple vectors, since SymbolId is a 32-bit value we will not have much wasted space for symbols that never get read or written. We could even optimize for preinterned symbols using markers, though I'll probably not do so, and I'll explain why later. If we do _that_, we get even _better_ optimizations through caching that _will_ apply in the general case (so, not just for echo), and we're able to ditch XirString entirely and simply use a SymbolId. This makes for a much more friendly API that isn't leaking implementation details, though it _does_ put an onus on the caller to pass the encoder to both the reader and the writer, _if_ it wants to take advantage of a cache. But that burden is not significant (and is, again, optional if we don't want it). So, that'll be the next step.	2021-11-10 12:22:10 -05:00
Mike Gerwitz	428d508be4	tamer: {ir::=>}{asg, xir} See the previous commit. There is no sense in some common "IR" namespace, since those IRs should live close to whatever system whose data they represent. In the case of these, they are general IRs that can apply to many different parts of the system. If that proves to be a false statement, they'll be moved. DEV-10863	2021-11-04 16:13:27 -04:00

10 Commits (bef68e16340ab5e6abdcf2807e535771d8e98436)