employer/tame - tame - Mike Gerwitz's Forge

employer

tame

Author	SHA1	Message	Date
Mike Gerwitz	eaa8133d21	tamer: diagnose: Introduction of diagnostic system This is a working concept that will continue to evolve. I wanted to start with some basic output before getting too carried away, since there's a lot of potential here. This is heavily influenced by Rust's helpful diagnostic messages, but will take some time to realize a lot of the things that Rust does. The next step will be to resolve line and column numbers, and then possibly include snippets and underline spans, placing the labels alongside them. I need to balance this work with everything else I have going on. This is a large commit, but it converts the existing Error Display impls into Diagnostic. This separation is a bit verbose, so I'll see how this ends up evolving. Diagnostics are tied to Error at the moment, but I imagine in the future that any object would be able to describe itself, error or not, which would be useful in the future both for the Summary Page and for query functionality, to help developers understand the systems they are writing using TAME. Output is integrated into tameld only in this commit; I'll add tamec next. Examples of what this outputs are available in the test cases in this commit. DEV-10935	2022-04-13 15:22:46 -04:00
Mike Gerwitz	cfc7f45bc4	tamer: Remove wip-xmlo-xir-reader This entirely removes the old XmloReader that has since been replaced with a XIR-based reader. I had been holding off on this because the new reader is slower, pending performance optimizations (which I'll do a little later on), however the performance loss is of no practical consideration and only affects the linker, which is still fast. Therefore, it's better to get this old code out of the way to simplify refactoring going forward. In particular, I'm working on the diagnostic system. This is a little sad, in a way---this is some of my first Rust code that I'm deleting. DEV-10935	2022-04-11 16:11:49 -04:00
Mike Gerwitz	f07c0e75be	tamer: tameld (TameldError): Error sum type This aggregates all non-panic errors that can occur during link time, making `Box<dyn Error>` unnecessary. I've been wanting to do this for a long time, so it's nice seeing this come together. This is a powerful tool, in that we know, at compile time, all errors that can occur, and properly report on them and compose them. This method of error composition ensures that all errors have a chance to be handled within their context, though it'll take time to do so in a decent way. This just maintains compatibility with the dynamic dispatch that was previous occurring. This work is being done to introduce the initial diagnostic system, which was really difficult/confusing to do without proper errors types at the top level, considering the toplevel is responsible for triggering the diagnostic reporting. The cycle error is in particular going to be interesting once the system is in place, especially once it provides spans in the future, since it will guide the user through the code to understand how the cycle formed. More to come. DEV-10935	2022-04-11 15:15:04 -04:00
Mike Gerwitz	a1a4ad3e8e	tamer: Introduce context into XirReader tamec and tameld will now both introduce a `Context` to XIR, which will use it to create spans. Here's an example of an error, now that it's all working well together: $ target/release/tameld --emit xmle -o /dev/null path/to/package.xmlo error: invalid preproc:sym/@dim `9` at [/../path/to/package.xmlo offset 1175451-1175452] A future task will make this human-readable by producing line and column numbers, and perhaps even a snippet (if not now, then eventually). It's exciting to see this coming together finally. DEV-10934	2022-04-08 16:16:23 -04:00
Mike Gerwitz	2e3d94c3d6	tamer: obj::xmlo::reader: Simplify wip-xmlo-xir-reader flagging This removes the flag from most of the code, which also resolves the indentation. Not only was it bothering me, but I don't want (a) every line modified when the module body is hoisted and (b) `rustfmt` to reformat everything when that happens. This means that everything will be built, even though it's not used, when the flag is off, but I see that as a good thing. DEV-10863	2022-03-24 09:45:59 -04:00
Mike Gerwitz	fbf786086a	tamer: parse::Parser (lower_while_ok): New method This introduces a WIP lowering operation, abstracting away quite a bit of the manual wiring work, which is really important to providing an API that provides the proper level of abstraction for actually understanding what the system is doing. This does not yet have tests associated with it---I had started, but it's a lot of work and boilerplate for something that is going to evolve. Generally, I wouldn't use that as an excuse, but the robust type definitions in play, combined with the tiny amount of actual logic, provide a pretty high level of confidence. It's very difficult to wire these types together and produce something incorrect without doing something obviously bad. Similarly, I'm holding off on proper docs too, though I did write some information here. More to come, after I actually get to work on the XmloReader. On a side note: I'm happy to have made progress on this, since this wiring is something I've been dreading and wondering about since before the Parser abstraction even existed. Note also that this makes parser::feed_toks private again---I don't intend to support push parsers yet, since they're only needed internally. Maybe for error recovery, but I'll wait to decide until it's actually needed. DEV-10863	2022-03-23 14:31:16 -04:00
Mike Gerwitz	b4a7591357	tamer: obj::xmlo::reader: Begin conversion to ParseState This begins to transition XmloReader into a ParseState. Unlike previous changes where ParseStates were composed into a single ParseState, this is instead a lowering operation that will take the output of one Parser and provide it to another. The mess in ld::poc (...which still needs to be refactored and removed) shows the concept, which will be abstracted away. This won't actually get to the ASG in order to test that that this works with the wip-xmlo-xir-reader flag on (development hasn't gotten that far yet), but since it type-checks, it should conceptually work. Wiring lowering operations together is something that I've been dreading for months, but my approach of only abstracting after-the-fact has helped to guide a sane approach for this. For some definition of "sane". It's also worth noting that AsgBuilder will too become a ParseState implemented as another lowering operation, so: XIR -> XIRF -> XMLO -> ASG These steps will all be streaming, with iteration happening only at the topmost level. For this reason, it's important that ASG not be responsible for doing that pull, and further we should propagate Parsed::Incomplete rather than filtering it out and looping an indeterminate number of times outside of the toplevel. One final note: the choice of 64 for the maximum depth is entirely arbitrary and should be more than generous; it'll be finalized at some point in the future once I actually evaluate what maximum depth is reasonable based on how the system is used, with some added growing room. DEV-10863	2022-03-22 14:06:52 -04:00
Mike Gerwitz	4c5b860195	tamer: Remove Ix generic from ASG This is simply not worth it; the size is not going to be the bottleneck (at least any time soon) and the generic not only pollutes all the things that will use ASG in the near future, but is also incompatible with the SymbolId default that is used everywhere; if we have to force it to 32 bits anyway, then we may as well just default it right off the bat. I thought that this seemed like a good idea at the time, and saving bits is certainly tempting, but it was premature.	2022-01-14 10:21:49 -05:00
Mike Gerwitz	61f7a12975	tamer: xir::tree: Integrate AttrParserState into Stack Note that AttrParse{r=>}State needs renaming, and Stack will get a better name down the line too. This commit message is accurate, but confusing. This performs the long-awaited task of trying to observe, concretely, how to combine two automata. This has the effect of stitching together the state machines, such that the union of the two is equivalent to the original monolith. The next step will be to abstract this away. There are some important things to note here. First, this introduces a new "dead" state concept, where here a dead state is defined as an _accepting_ state that has no state transitions for the given input token. This is more strict than a dead state as defined in, for example, the Dragon Book, where backtracking may occur. The reason I chose for a Dead state to be accepting is simple: it represents a lookahead situation. It says, "I don't know what this token is, but I've done my job, so it may be useful in a parent context". The "I've done my job" part is only applicable in an accepting state. If the parser is _not_ in an accepting state, then an unknown token is simply an error; we should _not_ try to backtrack or anything of the sort, because we want only a single token of lookahead. The reason this was done is because it's otherwise difficult to compose the two parsers without requiring that AttrEnd exist in every XIR stream; this has always been an awkward delimiter that was introduced to make the parser LL(0), but I tried to compromise by saying that it was optional. Of course, I knew that decision caused awkward inconsistencies, I had just hoped that those inconsistencies wouldn't manifest in practical issues. Well, now it did, and the benefits of AttrEnd that we had in the previous construction do not exist in this one. Consequently, it makes more sense to simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future commit will remove it entirely. All of this information will be documented, but I want to get further in the implementation first to make sure I don't change course again and therefore waste my time on docs. DEV-11268	2021-12-16 09:44:02 -05:00
Mike Gerwitz	77c18d0615	tamer: xir: Remove Attr::Extensible This removes XIRT support for attribute fragments. The reason is that because this is a write-only operation---fragments are used to concatenate SymbolIds without reallocation, which can only happen if we are generating XIR internally. Given that this cannot happen during read, it was a mistake to complicate the parsers. But it makes sense why I did originally, given that the XIRT parser was written for simplifying test cases. But now that we want parsers for real, and are writing production-quality parsers, this extra complexity is very undesirable. As a bonus, we also avoid any potential for heap allocations related to attributes. Granted, they didn't _really_ exist to begin with, but it was part of XIRT, and was ugly. DEV-11268	2021-12-06 14:26:58 -05:00
Mike Gerwitz	f519dab2b6	tamer: xir::tree::attr::Attr::value_atom: Option<SymbolId>=>SymbolId To maintain a proper abstraction, this cannot be the responsibility of the caller; most callers should not know that fragments exist, letalone how to handle them.	2021-11-16 12:41:03 -05:00
Mike Gerwitz	5233822322	tamer: xir: Remove Text enum Like previous commits, this replaces the explicit escaping context with the convention that all values retrieved from `xir` are unescaped on read and escaped on write. Comments are a notable TODO, since we must escape only `--`. CData is also an issue. I had _expected_ to use it as a means to avoid unescaping fragments, but I had forgotten that quick_xml hard-codes escaping on read, so that it can re-use BytesStart! That is terribly unfortunate, and may result in us having to re-implement our own read method in the future to avoid this nonsense. So I'm just leaving it as a TODO for now. DEV-11081	2021-11-15 23:47:14 -05:00
Mike Gerwitz	d710437ee4	tamer: xir::escape::CachingEscaper: New Escaper As promised, this will cache previously seen escaped/unescaped values by creating a two-way mapping between them. DEV-11081	2021-11-15 16:44:24 -05:00
Mike Gerwitz	27ba03b59b	tamer: xir::escape: Remove XirString in favor of Escaper This rewrites a good portion of the previous commit. Rather than explicitly storing whether a given string has been escaped, we can instead assume that all SymbolIds leaving or entering XIR are unescaped, because there is no reason for any other part of the system to deal with such details of XML documents. Given that, we need only unescape on read and escape on write. This is customary, so why didn't I do that to begin with? The previous commit outlines the reason, mainly being an optimization for the echo writer that is upcoming. However, this solution will end up being better---it's not implemented yet, but we can have a caching layer, such that the Escaper records a mapping between escaped and unescaped SymbolIds to avoid work the next time around. If we share the Escaper between _all_ readers and the writer, the result is that 1. Duplicate strings between source files and object files (many of which are read by both the linker and compiler) avoid re-unescaping; and 2. Writers can use this cache to avoid re-escaping when we've already seen the escaped variant of the string during read. The alternative would be a global cache, like the internment system, but I did not find that to be appropriate here, since this is far less fundamental and is much easier to compose. DEV-11081	2021-11-12 14:03:23 -05:00
Mike Gerwitz	b1c0783c75	tamer: xir::XirString: WIP implementation (likely going away) I'm not fond of this implementation, which is why it's not fully completed. I wanted to commit this for future reference, and take the opportunity to explain why I don't like it. First: this task started as an idea to implement a third variant to AttrValue and friends that indicates that a value is fixed, in the sense of a fixed-point function: escaped or unescaped, its value is the same. This would allow us to skip wasteful escape/unescape operations. In doing so, it became obvious that there's no need to leak this information through the API, and indeed, no part of the system should care. When we read XML, it should be unescaped, and when we write, it should be escaped. The reason that this didn't quite happen to begin with was an optimization: I'll be creating an echo writer in place of the current filesystem-based copy in tamec shortly, and this would allow streaming XIR directly from the reader to the writer without any unescaping or re-escaping. When we unescape, we know the value that it came from, so we could simply store both symbols---they're 32-bit, so it results in a nicely compressed 64-bit value, so it's essentially cost-free, as long as we accept the expense of internment. This is `XirString`. Then, when we want to escape or unescape, we first check to see whether a symbol already exists and, if so, use it. While this works well for echoing streams, it won't work all that well in practice: the unescaped SymbolId will be taken and the XirString discarded, since nothing after XIR should be coupled with it. Then, when we later construct a XIR stream for writting, XirString will no longer be available and our previously known escape is lost, so the writer will have to re-escape. Further, if we look at XirString's generic for the XirStringEscaper---it uses phantom, which hints that maybe it's not in the best place. Indeed, I've already acknowledged that only a reader unescapes and only a writer escapes, and that the rest of the system works with normal (unescaped) values, so only readers and writers should be part of this process. I also already acknowledged that XirString would be lost and only the unescaped SymbolId would be used. So what's the point of XirString, then, if it won't be a useful optimization beyond the temporary echo writer? Instead, we can take the XirStringWriter and implement two caches on that: mapping SymbolId from escaped->unescaped and vice-versa. These can be simple vectors, since SymbolId is a 32-bit value we will not have much wasted space for symbols that never get read or written. We could even optimize for preinterned symbols using markers, though I'll probably not do so, and I'll explain why later. If we do _that_, we get even _better_ optimizations through caching that _will_ apply in the general case (so, not just for echo), and we're able to ditch XirString entirely and simply use a SymbolId. This makes for a much more friendly API that isn't leaking implementation details, though it _does_ put an onus on the caller to pass the encoder to both the reader and the writer, _if_ it wants to take advantage of a cache. But that burden is not significant (and is, again, optional if we don't want it). So, that'll be the next step.	2021-11-10 12:22:10 -05:00
Mike Gerwitz	428d508be4	tamer: {ir::=>}{asg, xir} See the previous commit. There is no sense in some common "IR" namespace, since those IRs should live close to whatever system whose data they represent. In the case of these, they are general IRs that can apply to many different parts of the system. If that proves to be a false statement, they'll be moved. DEV-10863	2021-11-04 16:13:27 -04:00
Mike Gerwitz	cee6402f8b	tamer: Move {ir::legacyir=>obj::xmlo::legacyir} The IRs really ought to live where they are owned, especially given that "IR" is so generic that it makes no sense for there to be a single location for them; they're just data structures coupled with different phases of compilation. This will be renamed next commit; see that for details. This also removes some documentation describing the lowering process, because it's undergone a number of changes and needs to be accurately re-summarized in another location. That will come at a later time after the work is further along so that I don't have to keep spending the time rewriting it. DEV-10863	2021-11-04 13:20:38 -04:00
Mike Gerwitz	d045786cfb	tamer: ir::xir::tree::Element::attrs: Wrap in Option This allows AttrList not only to be lazily initialized (which is less of a problem at the moment with Vec, but may become one in the future), but also leaves a space open for attributes to be added _after_ having been parsed. It further leaves room to _take_ attributes from their `Element`. This is important because the next commit will re-introduce the ability to parse attributes independently, allowing us to put the parser in a state where we can parse AttrList without an Element context. To re-use that parsing under an Element context, we can simply attach an AttrList after it has been parsed. Option adds no additional size cost to Vec, so we get this for free (except for the tiny change that initializes the attribute list when we try to push to it). I also think this reads better ("attrs: None"). Though it makes the API slightly more of a pain to work with. DEV-10863	2021-10-29 16:34:05 -04:00
Mike Gerwitz	18ab032ba0	tamer: Begin XIR-based xmlo reader impl There isn't a whole lot here, but there is additional work needed in various places to support upcoming changes and so I want to get this commited to ease the cognitive burden of what I have thusfar. And to stop stashing. We have a feature flag for a reason. DEV-10863	2021-10-28 21:21:30 -04:00
Mike Gerwitz	581b9d4e65	tamer: Use `..` for tuple unimportant variant matches Tbh, I was unaware that this was supported by tuple variants until reading over the Rustc source code for something. (Which I had previously read, but I must have missed it.) This is more proper, in the sense that in a lot of cases we not only care about how many values a tuple has, but if we explicitly match on them using `_`, then any time we modify the number of values, it would _break_ any code doing so. Using this method, we improve maintainability by not causing breakages under those circumstances. But, consequently, it's important that we use this only when we _really_ don't care and don't want to be notified by the compiler. I did not use `..` as a prefix, even where supported, because the intent is to append additional information to tuples. Consequently, I also used `..` in places where no additional fields currently exist, since they may in the future (e.g. introducing `Span` for `IdentObject`).	2021-10-15 12:28:59 -04:00
Mike Gerwitz	739cf7e6eb	tamer: ir::asg::object::IdentObject: Define methods from IdentObjectData In particular, `name` needn't return an `Option`. `fragment` also returns a copy, since it's just a `SymbolId`. (It really ought to be a newtype rather than an alias, but we'll worry about that some other time.) These changes allow us to remove some runtime panics. DEV-10859	2021-10-14 14:38:02 -04:00
Mike Gerwitz	f055cb77c2	tamer: ld::xmle: Narrow Sections types This moves the logic that sorts identifiers into sections into Sections itself, and introduces XmleSections to allow for mocking for testing. This then allows us to narrow the types significantly, eliminating some runtime checks. The types can be narrowed further, but I'll be limiting the work I'll be doing now; this'll be inevitably addressed as we use the ASG for the compiler. This also handles moving Sections tests, which was a TODO from the previous commit. DEV-10859	2021-10-14 12:40:13 -04:00
Mike Gerwitz	ea11cf1416	tamer: ld::xmle::lower: Extract sectioning into Sections This is the appropriate place to be, now that we've begun narrowing the types. We'll be able to do so further; this is just the first step. This does not yet move the tests, but the code is still tested because it's tightly coupled with `sort`. Those will move in the next commit(s). DEV-10859	2021-10-12 12:15:11 -04:00
Mike Gerwitz	08d92ca663	tamer: ld::xmle::sections: Remove generic object type xmle sections will only ever contain an object of one type, so there is no use in making this generic. I think the original plan was to have this represent, generically, sections of some object file (like ELF), but doing so would require a significant redesign anyway, so it makes no sense. This is easier to reason about. DEV-10859	2021-10-12 10:35:14 -04:00
Mike Gerwitz	df328da71f	tamer: ir::asg::SortableAsg: Move into ld::xmle::lower This has always been a lowering operation, but it was not phrased in terms of it, which made the process a bit more confusing to understand. The implementation hasn't changed, but this is an incremental refactoring and so exposes BaseAsg and its `graph` field temporarily. DEV-10859	2021-10-12 09:49:33 -04:00
Mike Gerwitz	81ec65742a	tamer: {ir::asg=>ld::xmle}::section Sections, as written, are specific to xmle files. I think the intent originally was to have this be more generic, but that doesn't really make sense. By explicitly coupling it with `xmle` files, that will allow us to turn this into a proper lowering operation with its own validations that will allow `xmle::xir` to do its job without having to validate anything itself.	2021-10-12 00:05:44 -04:00
Mike Gerwitz	1c181b568d	tamer: ld::poc: Update comment reflecting current state The linker is feature-complete, but this file has lived on because the project was on pause for quite some time.	2021-10-11 23:54:24 -04:00
Mike Gerwitz	f899ac898e	tamer: {obj=>ld}::xmle This is a linker-specific module.	2021-10-11 23:52:59 -04:00
Mike Gerwitz	5ea5cffd09	tamer: relroot String->SymbolId This was [one of] the last remaining Strings; SymbolId should be used across the board.	2021-10-11 16:00:19 -04:00
Mike Gerwitz	85909f1590	tamer: sym::SymbolStr: Remove This removes `SymbolStr` in favor of, simply, `&'static str`. The abstraction provided no additional safety since the slice was trivially extracted (and commonly, in practice), and was inconvenient to work with. This is part of a process of relaxing lookups so that symbols can be conveniently displayed in errors; rather than trying to prevent the developer from doing something bad, we'll just rely on conventions, hope that it doesn't happen, and if it does, address it either at that time or when it shows up in the profiler.	2021-10-11 12:58:48 -04:00
Mike Gerwitz	3e385d1a1b	tamer: obj::xmle::xir: Finalize docs This could be improved upon, but there will be more work coming up for this to finalize Sections. DEV-10561	2021-10-11 11:43:49 -04:00
Mike Gerwitz	f70f5653b2	tamer: ir::asg::section: Head and tail can have only one object This is the beginning of a refactoring to simplify this implementation a little bit.	2021-10-09 00:27:03 -04:00
Mike Gerwitz	0626629cb3	tamer: Remove old xmle writer and wip-xir-xmle-writer flag The new writer has reached parity of the old, with the exception of some edge case explicit error handling that should never occur (which will be added), and cleanup/docs. Removing this flag now allows me to perform that cleanup without having to worry about updating the now-old implementation. I ran `tameld` with the new writer against our production system with numerous programs and a significant number of test cases, and diff'd the old and new xmle files, and everything looks good.	2021-10-08 22:04:42 -04:00
Austin Schaffer	d54ef62a0d	Fix import ordering	2021-10-04 17:15:02 -04:00
Mike Gerwitz	1a44e04333	tamer: ld: Write is unused outside of flag	2021-10-04 16:34:25 -04:00
Mike Gerwitz	5250571f15	tamer: ir::asg::ident: Use symbols in place of string slice mapping `IdentKind` needs to be written to `xmle` files and displayed in error messages. String slices were used when quick-xml was used for writing, which will be going away with the new writer.	2021-09-29 23:18:23 -04:00
Mike Gerwitz	6864fbc1cd	tamer: Start of XIR-based xmle writer This has been a long time coming, and has been repeatedly stashed as other parts of the system have evolved to support it. The introduction of the XIR tree was to write tests for this (which are sloppy atm). This currently writes out the `xmle` header and _most_ of the `l:dep` section; it's missing the object-type-specific attributes. There is, relatively speaking, not much more work to do here. The feature flag `wip-xir-xmle-writer` was introduced to toggle this system in place of `XmleWriter`. Initial benchmarks show that it will be competitive with the quick-xml-based writer, but remember that is not the goal: the purpose of this is to test XIR in a production system before we continue to implement it for a frontend, and to refactor so that we do not have multiple implementations writing XML files (once we echo the source XML files). I'm excited to get this done with so that I can move on. This has been rather exhausting.	2021-09-28 14:52:53 -04:00
Mike Gerwitz	e91aeef478	tamer: Remove Ix generalization throughout system This had the writing on the wall all the same as the `'i` interner lifetime that came before it. It was too much of a maintenance burden trying to accommodate both 16-bit and 32-bit symbols generically. There is a situation where we do still want 16-bit symbols---the `Span`. Therefore, I have left generic support for symbol sizes, as well as the different global interners, but `SymbolId` now defaults to 32-bit, as does `Asg`. Further, the size parameter has been removed from the rest of the code, with the exception of `Span`. This cleans things up quite a bit, and is much nicer to work with. If we want 16-bit symbols in the future for packing to increase CPU cache performance, we can handle that situation then in that specific case; it's a premature optimization that's not at all worth the effort here.	2021-09-23 14:52:54 -04:00
Mike Gerwitz	0a8fb71c1b	tamer: tameld: Use buffered writes This was an oversight. The difference is significant. I had my suspicions about this when I noticed the huge difference in time between writing to /dev/null vs. an actual file during profiling. On one of our systems, here's the number of syscalls _before_ this change: $ strace -c target/release/tameld --emit xmle -o foo foo.xmlo % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 85.05 4.966192 16 318473 write 7.23 0.421977 13 32298 lstat 6.53 0.381424 15 25113 read 0.75 0.043691 13 3350 readlink 0.25 0.014713 61 241 close 0.12 0.007167 30 241 openat 0.05 0.003175 151 21 munmap 0.01 0.000488 14 35 brk 0.01 0.000292 9 33 mmap 0.00 0.000266 38 7 mremap 0.00 0.000004 1 3 sigaltstack 0.00 0.000000 0 6 fstat 0.00 0.000000 0 1 poll 0.00 0.000000 0 11 mprotect 0.00 0.000000 0 7 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 6 6 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 1 set_robust_list 0.00 0.000000 0 2 prlimit64 ------ ----------- ----------- --------- --------- ---------------- 100.00 5.839389 379854 6 total And _after_: $ strace -c target/release/tameld --emit xmle -o foo foo.xmlo % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.21 0.435010 13 32298 lstat 40.09 0.385752 15 25113 read 6.14 0.059113 21 2809 write 4.75 0.045687 14 3350 readlink 2.51 0.024115 100 241 close 0.84 0.008045 33 241 openat 0.26 0.002468 118 21 munmap 0.06 0.000580 17 35 brk 0.06 0.000566 17 33 mmap 0.03 0.000279 40 7 mremap 0.02 0.000181 16 11 mprotect 0.01 0.000087 15 6 6 access 0.01 0.000082 12 7 rt_sigaction 0.01 0.000075 13 6 fstat 0.00 0.000027 9 3 sigaltstack 0.00 0.000024 12 2 prlimit64 0.00 0.000018 18 1 execve 0.00 0.000016 16 1 poll 0.00 0.000013 13 1 sched_getaffinity 0.00 0.000012 12 1 rt_sigprocmask 0.00 0.000012 12 1 arch_prctl 0.00 0.000012 12 1 set_robust_list 0.00 0.000011 11 1 set_tid_address ------ ----------- ----------- --------- --------- ---------------- 100.00 0.962185 64190 6 total What a difference! There's still a lot of other red flags in there; those can be addressed separately. This was originally written as I was learning Rust, and I suspect that I didn't realize that File wasn't buffered at the time. For the above link: times go from 1.23s pre-change to 0.85s after: 0.77user 0.44system 0:01.23elapsed 99%CPU (0avgtext+0avgdata 48520maxresident)k 0inputs+43952outputs (0major+12825minor)pagefaults 0swaps 0.69user 0.15system 0:00.85elapsed 98%CPU (0avgtext+0avgdata 48396maxresident)k 0inputs+43952outputs (0major+12823minor)pagefaults 0swaps	2021-08-20 12:14:42 -04:00
Mike Gerwitz	9deb393bfd	tamer: Global interners This is a major change, and I apologize for it all being in one commit. I had wanted to break it up, but doing so would have required a significant amount of temporary work that was not worth doing while I'm the only one working on this project at the moment. This accomplishes a number of important things, now that I'm preparing to write the first compiler frontend for TAMER: 1. `Symbol` has been removed; `SymbolId` is used in its place. 2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer. 3. Using symbols no longer requires dereferencing. 4. Lifetimes no longer pollute the entire system! (`'i`) 5. Two global interners are offered to produce `SymbolStr` with `'static` lifetimes, simplfiying lifetime management and borrowing where strings are still needed. 6. A nice API is provided for interning and lookups (e.g. "foo".intern()) which makes this look like a core feature of Rust. Unfortunately, making this change required modifications to...virtually everything. And that serves to emphasize why this change was needed: _everything_ used symbols, and so there's no use in not providing globals. I implemented this in a way that still provides for loose coupling through Rust's trait system. Indeed, Rustc offers a global interner, and I decided not to go that route initially because it wasn't clear to me that such a thing was desirable. It didn't become apparent to me, in fact, until the recent commit where I introduced `SymbolIndexSize` and saw how many things had to be touched; the linker evolved so rapidly as I was trying to learn Rust that I lost track of how bad it got. Further, this shows how the design of the internment system was a bit naive---I assumed certain requirements that never panned out. In particular, everything using symbols stored `&'i Symbol<'i>`---that is, a reference (usize) to an object containing an index (32-bit) and a string slice (128-bit). So it was a reference to a pretty large value, which was allocated in the arena alongside the interned string itself. But, that was assuming that something would need both the symbol index _and_ a readily available string. That's not the case. In fact, it's pretty clear that interning happens at the beginning of execution, that `SymbolId` is all that's needed during processing (unless an error occurs; more on that below); and it's not until _the very end_ that we need to retrieve interned strings from the pool to write either to a file or to display to the user. It was horribly wasteful! So `SymbolId` solves the lifetime issue in itself for most systems, but it still requires that an interner be available for anything that needs to create or resolve symbols, which, as it turns out, is still a lot of things. Therefore, I decided to implement them as thread-local static variables, which is very similar to what Rustc does itself (Rustc's are scoped). TAMER does not use threads, so the resulting `'static` lifetime should be just fine for now. Eventually I'd like to implement `!Send` and `!Sync`, though, to prevent references from escaping the thread (as noted in the patch); I can't do that yet, since the feature has not yet been stabalized. In the end, this leaves us with a system that's much easier to use and maintain; hopefully easier for newcomers to get into without having to deal with so many complex lifetimes; and a nice API that makes it a pleasure to work with symbols. Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I end up regretting that down the line, but it exists for an important reason: the `Span` and other structures that'll be introduced need to pack a lot of data into 64 bits so they can be freely copied around to keep lifetimes simple without wreaking havoc in other ways, but a 32-bit symbol size needed by the linker is too large for that. (Actually, the linker doesn't yet need 32 bits for our systems, but it's going to in the somewhat near future unless we optimize away a bunch of symbols...but I'd really rather not have the linker hit a limit that requires a lot of code changes to resolve). Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid that for now. Most systems can just use on of the `PkgSymbolId` or `ProgSymbolId` type aliases and not have to worry about it. Systems that are actually shared between the compiler and the linker do, though, but it's not like we don't already have a bunch of trait bounds. Of course, as we implement link-time optimizations (LTO) in the future, it's possible most things will need the size and I'll grow frustrated with that and possibly revisit this. We shall see. Anyway, this was exhausting...and...onward to the first frontend!	2021-08-11 14:24:55 -04:00
Mike Gerwitz	0fc8a1a4df	tamer: Remove default SymbolIndex (et al) index type Oh boy. What a mess of a change. This demonstrates some significant issues we have with Symbol. I had originally modelled the system a bit after Rustc's, but deviated in certain regards: 1. This has a confurable base type to enable better packing without bit twiddling and potentially unsafe tricks I'd rather avoid unless necessary; and 2. The lifetime is not static, and there is no global, singleton interner; and 3. I pass around references to a Symbol rather than passing around an index into an interner. For #3---this is done because there's no singleton interner and therefore resolving a symbol requires a direct reference to an available interner. It also wasn't clear to me (and still isn't, in fact) whether more than one interner may be used for different contexts. But, that doesn't preclude removing lifetimes and just passing around indexes; in fact, I plan to do this in the frontend where the parser and such will have direct interner access and can therefore just look up based on a symbol index. We could reserve references for situations where exposing an interner would be undesirable. Anyway, more to come...	2021-07-29 14:26:40 -04:00
Mike Gerwitz	2e50af1220	Copyright year update 2021	2021-07-22 15:00:15 -04:00
Mike Gerwitz	0d4bbe5e4e	[DEV-8000] ir::asg: Introduce SortableAsgError This will be used for the next commit, but this change has been isolated both because it distracts from the implementation change in the next commit, and because it cleans up the code by removing the need for a type parameter on `AsgError`. Note that the sort test cases now use `unwrap` instead of having `{,Sortable}AsgError` support one or the other---this is because that does not currently happen in practice, and there is not supposed to be a hierarchy; they are siblings (though perhaps their name may imply otherwise).	2020-07-01 13:42:14 -04:00
Joseph Frazer	43d00a8268	[DEV-7504] Add GraphML generation We want to be able to build a representation of the dependency graph so we can easily inspect it. We do not want to make GraphML by default. It is better to use a tool. We use "petgraph-graphml".	2020-05-13 08:04:48 -04:00
Mike Gerwitz	0f4b2d75f8	[DEV-7084] TAMER: obj::xmlo: Private inner modules	2020-04-28 11:08:05 -04:00
Mike Gerwitz	549e9ca23b	[DEV-7084] TAMER: AsgBuilderState:🆕 New constructor	2020-04-28 09:06:25 -04:00
Mike Gerwitz	21a0bdcce1	[DEV-7084] TAMER: AsgBuilderError: Introduce proper error variants This is a union (sum type) of three other errors types, plus errors specific to this builder. This commit does a good job demonstrating the boilerplate, as well as a need for additional context (in the case of `IdentKindError`), that we'll want to work on abstracting away.	2020-04-28 09:06:25 -04:00
Mike Gerwitz	ecc2e33ba7	[DEV-7084] TAMER: xmlo::AsgBuilder: Accept XmloResult iterator This flips the API from using XmloWriter as the context to using Asg and consuming anything that can produce XmloResults. This not only makes more sense, but avoids having to create a trait for XmloReader, and simplifies the trait bounds we have to concern ourselves with.	2020-04-28 09:06:25 -04:00
Mike Gerwitz	0f423f3b24	[DEV-7084] TAMER: Simplify path canonicalization This abstracts away the canonicalizer and solves the problem whereby canonicalization was not being performed prior to recording whether a path has been visited. This ensures that multiple relative paths to the same file will be properly recognized as visited.	2020-04-28 09:06:25 -04:00
Mike Gerwitz	4a7e00c404	[DEV-7084] TAMER: ld::poc: Remove unused fragments arg	2020-04-28 09:06:25 -04:00

1 2

90 Commits (a22e8e79f70108dc62c310d41ab7b13740ff9e5c)