Commit Graph

150 Commits (7d86fdd97d1a6a3264a6963494da4e5ae07fd40c)

Author SHA1 Message Date
Mike Gerwitz c0fa89222e tamer: obj::xmlo::ir::Dim: New enum
This replaces u8 and will be used for the new XmloReader.

Previously I wasn't sure what direction TAMER was going to go in with
regards to dimensionality, but I do not expect that higher dimensions will
be supported, and if they are, they'd very likely compile down to lower ones
and create an illusion of higher-dimensionality.

Whatever the future holds, it's not used today, and I'd rather these types
be correct.

ASG needs changing too, but one step at a time.

DEV-10863
2022-03-25 14:28:18 -04:00
Mike Gerwitz 9d9b1f30a8 tamer: obj::xmlo::reader: Move XmloEvent to top of module
Since this is the object produced by this parser, this is likely the most
useful first thing to present as a summary of what `XmloReader` actually
does.

DEV-10863
2022-03-24 10:14:40 -04:00
Mike Gerwitz 2e3d94c3d6 tamer: obj::xmlo::reader: Simplify wip-xmlo-xir-reader flagging
This removes the flag from most of the code, which also resolves the
indentation.  Not only was it bothering me, but I don't want (a) every line
modified when the module body is hoisted and (b) `rustfmt` to reformat
everything when that happens.

This means that everything will be built, even though it's not used, when
the flag is off, but I see that as a good thing.

DEV-10863
2022-03-24 09:45:59 -04:00
Mike Gerwitz fab7b16ea0 tamer: obj::xmlo::reader: Parse package attributes
Finally we get to do some actual parsing with all of the preparatory work!

This means that we're finally ready to fully replace the old XmloReader,
provided that I'm okay with some boilerplate / lack of abstractions for
now (and I am, because all I've been doing is working on abstractions to
prepare lowering operations).

DEV-10863
2022-03-23 16:48:51 -04:00
Mike Gerwitz fbf786086a tamer: parse::Parser (lower_while_ok): New method
This introduces a WIP lowering operation, abstracting away quite a bit of
the manual wiring work, which is really important to providing an API that
provides the proper level of abstraction for actually understanding what the
system is doing.

This does not yet have tests associated with it---I had started, but it's a
lot of work and boilerplate for something that is going to
evolve.  Generally, I wouldn't use that as an excuse, but the robust type
definitions in play, combined with the tiny amount of actual logic, provide
a pretty high level of confidence.  It's very difficult to wire these types
together and produce something incorrect without doing something obviously
bad.

Similarly, I'm holding off on proper docs too, though I did write some
information here.

More to come, after I actually get to work on the XmloReader.

On a side note: I'm happy to have made progress on this, since this wiring
is something I've been dreading and wondering about since before the Parser
abstraction even existed.

Note also that this makes parser::feed_toks private again---I don't intend
to support push parsers yet, since they're only needed internally.  Maybe
for error recovery, but I'll wait to decide until it's actually needed.

DEV-10863
2022-03-23 14:31:16 -04:00
Mike Gerwitz b4a7591357 tamer: obj::xmlo::reader: Begin conversion to ParseState
This begins to transition XmloReader into a ParseState.  Unlike previous
changes where ParseStates were composed into a single ParseState, this is
instead a lowering operation that will take the output of one Parser and
provide it to another.

The mess in ld::poc (...which still needs to be refactored and removed)
shows the concept, which will be abstracted away.  This won't actually get
to the ASG in order to test that that this works with the
wip-xmlo-xir-reader flag on (development hasn't gotten that far yet), but
since it type-checks, it should conceptually work.

Wiring lowering operations together is something that I've been dreading for
months, but my approach of only abstracting after-the-fact has helped to
guide a sane approach for this.  For some definition of "sane".

It's also worth noting that AsgBuilder will too become a ParseState
implemented as another lowering operation, so:

  XIR -> XIRF -> XMLO -> ASG

These steps will all be streaming, with iteration happening only at the
topmost level.  For this reason, it's important that ASG not be responsible
for doing that pull, and further we should propagate Parsed::Incomplete
rather than filtering it out and looping an indeterminate number of times
outside of the toplevel.

One final note: the choice of 64 for the maximum depth is entirely
arbitrary and should be more than generous; it'll be finalized at some point
in the future once I actually evaluate what maximum depth is reasonable
based on how the system is used, with some added growing room.

DEV-10863
2022-03-22 14:06:52 -04:00
Mike Gerwitz 14638a612f tamer: {xir::=>}parse: Move parser out of XIR
The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863
2022-03-18 16:24:53 -04:00
Mike Gerwitz 0360226caa tamer: xir::parse: Generalize input token type
This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863
2022-03-18 15:26:05 -04:00
Mike Gerwitz 2f703ab2df tamer: obj::xmlo: Remove PackageAttrs in favor of token stream
The Options here are awkward and will be able to go away in the new reader
and in AsgBuilder once it has a proper state machine.

This gets rid of some of the initial migratory work for the new reader,
because PackageAttrs is gone.  I'm going to wait to update this to the new
way until I get further into this.

DEV-11449
2022-03-10 15:44:54 -05:00
Mike Gerwitz d428755a2e tamer: obj::xmlo::XmloEvent::SymDeps: Remove
This is not longer needed after the previous commit.
2022-03-10 13:43:07 -05:00
Mike Gerwitz dcfae8a624 tamer: obj::xmlo: Begin transition to streaming quick-xml reader
I'm finally back to TAMER development.

The original plan, some time ago, was to gate an entirely new XmloReader
behind a feature flag (wip-xmlo-xir-reader), and go from there, leaving the
existing implementation untouched.  Unfortunately, it became too difficult
and confusing to marry the old aggregate API with the new streaming one.

AsgBuilder is the only system interacting with XmloReader, so I decided (see
previous commits) to just go the route of refactoring the existing
one.  I'm not yet sure if I'll continue to progressively refactor this one
and eliminate the two separate implementations behind the flag, or if I'll
get this API similar and then keep the flag and reimplement it.  But I'll
know soon.

DEV-11449
2022-03-10 13:31:24 -05:00
Mike Gerwitz 4c5b860195 tamer: Remove Ix generic from ASG
This is simply not worth it; the size is not going to be the bottleneck (at
least any time soon) and the generic not only pollutes all the things that
will use ASG in the near future, but is also incompatible with the SymbolId
default that is used everywhere; if we have to force it to 32 bits anyway,
then we may as well just default it right off the bat.

I thought that this seemed like a good idea at the time, and saving bits is
certainly tempting, but it was premature.
2022-01-14 10:21:49 -05:00
Mike Gerwitz 5af698d15c tamer: xir::{tree::=>}parse: Move module
It's a bit odd that I've done next to nothing with TAMER for the past week
or so, and decided to do this one small thing before I go on break for the
holidays, but I felt compelled to do _something_.  Besides, this gets me in
a better spot for the inevitable mental planning and writing I'll be doing
over the holidays.

This move was natural, given what this has evolved into---it has nothing to
do with the concept of a "tree", and the modules imports emphasized that
fact given the level of inappropriate nesting.
2021-12-23 13:17:18 -05:00
Mike Gerwitz 0cc0bc9d5a tamer: xir::Token::AttrEnd: Remove
More information can be found in the prior commit message, but I'll
summarize here.

This token was introduced to create a LL(0) parser---no tokens of
lookahead.  This allowed the underlying TokenStream to be freely passed to
the next system that needed it.

Since then, Parser and ParseState were introduced, along with
ParseStatus::Dead, which introduces the concept of lookahead for a single
token---an LL(1) grammar.

I had always suspected that this would happen, given the awkwardness of
AttrEnd; it was just a matter of time before the right abstraction
manifested itself to handle lookahead.

DEV-11339
2021-12-17 10:14:31 -05:00
Mike Gerwitz 61f7a12975 tamer: xir::tree: Integrate AttrParserState into Stack
Note that AttrParse{r=>}State needs renaming, and Stack will get a better
name down the line too.  This commit message is accurate, but confusing.

This performs the long-awaited task of trying to observe, concretely, how to
combine two automata.  This has the effect of stitching together the state
machines, such that the union of the two is equivalent to the original
monolith.

The next step will be to abstract this away.

There are some important things to note here.  First, this introduces a new
"dead" state concept, where here a dead state is defined as an _accepting_
state that has no state transitions for the given input token.  This is more
strict than a dead state as defined in, for example, the Dragon Book, where
backtracking may occur.

The reason I chose for a Dead state to be accepting is simple: it represents
a lookahead situation.  It says, "I don't know what this token is, but I've
done my job, so it may be useful in a parent context".  The "I've done my
job" part is only applicable in an accepting state.

If the parser is _not_ in an accepting state, then an unknown token is
simply an error; we should _not_ try to backtrack or anything of the sort,
because we want only a single token of lookahead.

The reason this was done is because it's otherwise difficult to compose the
two parsers without requiring that AttrEnd exist in every XIR stream; this
has always been an awkward delimiter that was introduced to make the parser
LL(0), but I tried to compromise by saying that it was optional.  Of course,
I knew that decision caused awkward inconsistencies, I had just hoped that
those inconsistencies wouldn't manifest in practical issues.

Well, now it did, and the benefits of AttrEnd that we had in the previous
construction do not exist in this one.  Consequently, it makes more sense to
simply go from LL(0) to LL(1), which makes AttrEnd unnecessary, and a future
commit will remove it entirely.

All of this information will be documented, but I want to get further in
the implementation first to make sure I don't change course again and
therefore waste my time on docs.

DEV-11268
2021-12-16 09:44:02 -05:00
Mike Gerwitz 29fdf5428c tamer: xir::tree: {Parse=>Stack}Error
Prepare to adopt parse::ParseError, which will contain StackError.

DEV-11268
2021-12-13 15:27:20 -05:00
Mike Gerwitz 325c3167ee tamer: xir::Token::span: New method
This permits retrieving a Span from any Token variant.  To support this,
rather than having this return an Option, Token::AttrEnd was augmented with
a Span; this results in a much simpler and friendlier API.

DEV-11268
2021-12-06 14:48:55 -05:00
Mike Gerwitz 77c18d0615 tamer: xir: Remove Attr::Extensible
This removes XIRT support for attribute fragments.  The reason is that
because this is a write-only operation---fragments are used to concatenate
SymbolIds without reallocation, which can only happen if we are generating
XIR internally.

Given that this cannot happen during read, it was a mistake to complicate
the parsers.  But it makes sense why I did originally, given that the XIRT
parser was written for simplifying test cases.  But now that we want parsers
for real, and are writing production-quality parsers, this extra complexity
is very undesirable.

As a bonus, we also avoid any potential for heap allocations related to
attributes.  Granted, they didn't _really_ exist to begin with, but it was
part of XIRT, and was ugly.

DEV-11268
2021-12-06 14:26:58 -05:00
Mike Gerwitz ba7ebad930 tamer: obj::xmlo::reader::test: {DUMMY_SPAN=>DS} for brevity
There's a lot of boilerplate that can be reduced in general, but I _really_
want to focus on getting this thing done; I can clean up later.
2021-11-22 11:16:43 -05:00
Mike Gerwitz ba4c32383f tamer: obj::xmlo::reader: Parse root package node attributes
Well, parse to the extent that it was being parsed before, anyway.

The core of this change demonstrates how well TAMER's abstractions work well
together.  (As long as you have an e.g. LSP to help you make sense of all of
the inference, I suppose.)

  Token::Open(QN_LV_PACKAGE | QN_PACKAGE, _) => {
      return Ok(XmloEvent::Package(
          attr_parser_from(&mut self.reader)
              .try_collect_ok()??,
      ));
  }

This finally makes use of `attr_parser_from` and `try_collect_ok`.  All of
the types are inferred---from the iterator transformations, to the error
conversions, to the destination PackageAttrs type.

DEV-10863
2021-11-18 00:59:10 -05:00
Mike Gerwitz 7367e20c01 tamer: obj::xmlo: Extract error types into own module 2021-11-16 15:47:52 -05:00
Mike Gerwitz 428d508be4 tamer: {ir::=>}{asg, xir}
See the previous commit.  There is no sense in some common "IR" namespace,
since those IRs should live close to whatever system whose data they
represent.

In the case of these, they are general IRs that can apply to many different
parts of the system.  If that proves to be a false statement, they'll be
moved.

DEV-10863
2021-11-04 16:13:27 -04:00
Mike Gerwitz 5a91db6d54 tamer: obj::xmlo::{legacy=>}ir
Calling it "legacyir" is just confusing.  The original hope, when beginning
TAMER, was that I'd be able to use a new object format in the near future to
help speed up the compilation process.  But that's far from our list of
priorities now, and so seeing "legacy" all over the place is really
confusing considering that it implies that perhaps it shouldn't be used for
new code.

This helps to clear up that cognitive dissonance by remaining neutral on the
topic.  And the reality is that it won't be "legacy" for some time.

DEV-10863
2021-11-04 13:23:38 -04:00
Mike Gerwitz cee6402f8b tamer: Move {ir::legacyir=>obj::xmlo::legacyir}
The IRs really ought to live where they are owned, especially given that
"IR" is so generic that it makes no sense for there to be a single location
for them; they're just data structures coupled with different phases of
compilation.

This will be renamed next commit; see that for details.

This also removes some documentation describing the lowering process,
because it's undergone a number of changes and needs to be accurately
re-summarized in another location.  That will come at a later time after the
work is further along so that I don't have to keep spending the time
rewriting it.

DEV-10863
2021-11-04 13:20:38 -04:00
Mike Gerwitz d06f31b4d3 tamer: obj::xmlo: Compile quickxml even with flag off
This was previous gated behind the negation of the wip-xmlo-xir-reader flag,
which meant that it was not being compiled or picked up by LSP.  Both of
those things are inconvenient and unideal.

DEV-10863
2021-11-04 12:35:08 -04:00
Mike Gerwitz 18ab032ba0 tamer: Begin XIR-based xmlo reader impl
There isn't a whole lot here, but there is additional work needed in various
places to support upcoming changes and so I want to get this commited to
ease the cognitive burden of what I have thusfar.  And to stop stashing.  We
have a feature flag for a reason.

DEV-10863
2021-10-28 21:21:30 -04:00
Mike Gerwitz f0f58a6e16 tamer: obj::xmlo::asg_builder: Remove example for now
Just until the new xmlo reader is ready, since it will be changing slightly
and fails to compile with the feature flag on now.

DEV-10863
2021-10-28 21:17:53 -04:00
Mike Gerwitz 18cadb9c7d tamer: obj::xmlo::reader: Better organize flagged code
This moves the Iterator impl and From<B> back into `quickxml`.  The type of
the new reader is different, taking an iterator instead of a BufRead.  This
will allow us to easily mock for unit tests, without the clustfuckery that
has ensued previously with quick-xml mocking.

DEV-10863
2021-10-25 13:47:26 -04:00
Mike Gerwitz c76fe87acd tamer: obj::xmlo::reader: Move Xmlo{Result,Error,Event}
These will need an API change, but are otherwise shared.  This means that
only the XmloReader is gated.
2021-10-25 12:26:25 -04:00
Mike Gerwitz f7d8aa1e4f tamer: wip-xml-xir-reader flag and setup
The original plan was to modify the existing reader to use the new
XmlXirReader, but that's going to be a lot of ongoing uncommitted work, with
both tests and implementation.  The better option seems to be to reimplement
it, since so many things are changing.

This flag will be short-lived and removed as soon as the implementation is
complete.

DEV-10863
2021-10-25 12:02:46 -04:00
Mike Gerwitz f899ac898e tamer: {obj=>ld}::xmle
This is a linker-specific module.
2021-10-11 23:52:59 -04:00
Mike Gerwitz 5ea5cffd09 tamer: relroot String->SymbolId
This was [one of] the last remaining Strings; SymbolId should be used across
the board.
2021-10-11 16:00:19 -04:00
Mike Gerwitz 7873d46afb tamer: Replace all &'static str in errors with SymbolId
Now that SymbolId implements Display and resolves, this works out well.
2021-10-11 15:39:53 -04:00
Mike Gerwitz 85909f1590 tamer: sym::SymbolStr: Remove
This removes `SymbolStr` in favor of, simply, `&'static str`.

The abstraction provided no additional safety since the slice was trivially
extracted (and commonly, in practice), and was inconvenient to work with.

This is part of a process of relaxing lookups so that symbols can be
conveniently displayed in errors; rather than trying to prevent the
developer from doing something bad, we'll just rely on conventions, hope
that it doesn't happen, and if it does, address it either at that time or
when it shows up in the profiler.
2021-10-11 12:58:48 -04:00
Mike Gerwitz f65ec818ab tamer: obj::xmle::xir: Correct doc typos Xml{e=>}Writer 2021-10-11 11:51:32 -04:00
Mike Gerwitz 3e385d1a1b tamer: obj::xmle::xir: Finalize docs
This could be improved upon, but there will be more work coming up for this
to finalize Sections.

DEV-10561
2021-10-11 11:43:49 -04:00
Mike Gerwitz bc5e8ebe75 tamer: obj::xmle::xir: Extract ElemWrap into ir::xir::iter
This generalizes it a bit and provides tests, which was always the intent;
the existing code was POC to determine if this could be done without
performance degradation (see that commit for more information).
2021-10-11 10:33:24 -04:00
Mike Gerwitz 1a2f6bd209 tamer: obj::xmle::xir: Extract ElemWrap into ir::xir::iter 2021-10-11 09:34:17 -04:00
Mike Gerwitz de62a2acbc tamer: ir::asg::section: Reduce fields
This is intended to represent the sections written to the final xmle file,
and there was unnecessary complexity in separating everything.

By reducing this IR further, we can begin to constrain its types to
eliminate some of the runtime panics and error checking we have/had in the
writer.
2021-10-11 09:07:48 -04:00
Mike Gerwitz 0626629cb3 tamer: Remove old xmle writer and wip-xir-xmle-writer flag
The new writer has reached parity of the old, with the exception of some
edge case explicit error handling that should never occur (which will be
added), and cleanup/docs.

Removing this flag now allows me to perform that cleanup without having to
worry about updating the now-old implementation.

I ran `tameld` with the new writer against our production system with
numerous programs and a significant number of test cases, and diff'd the old
and new xmle files, and everything looks good.
2021-10-08 22:04:42 -04:00
Mike Gerwitz 82727a5d66 tamer: obj::xmle::xir::header: Remove Rust 2018 comment
We're on 2021 now.
2021-10-08 21:43:28 -04:00
Mike Gerwitz d616d9475c tamer: obj::xmle::xir: Complete writer functionality
This is a significant milestone, in the sense that it is the culmination of
the past month or so of work to prove that an Iterator-based XIR will be
viable for the system.

This barely had any impact on the performance from the previous commit
reporting the profiling.  This performs at least as well as the quick-xml
based writer.  In isolated benchmarks, it performs better, but in the real
world, the linker spends most of its time reading xmlo files, and so minor
differences in writing do not have a significant overall impact.

With that said, a lot of cleanup and documentation is still needed.  That is
the subject of the upcoming commits, before this writer can finalized.
2021-10-08 16:37:46 -04:00
Mike Gerwitz 75d2ecf4dd tamer: obj::xmle::xir: Consideration of simplified iterators
The previous iterators had to be used in a certain order because they mixed
concerns, out of concern for performance.  This attempts to chain even more
iterators to see how it may perform.

To be clear: this will be cleaned up.  This was just an experiment.

Here were profiles on the average of 50 runs of linking our largest program:

  Baseline, pre-XIR (with fragments removed from output)               0.8082
  XIR writer, pre-ElemWrap, no #[inline]                               0.7844s
  XIR writer, ElemWrap, no #[inline]                                   0.7918s
  XIR writer, ElemWrap, inlines in obj::xmle::xir                      0.7892s
  XIR writer, ElemWrap, inlines in obj::xmle::xir and ir::asg::section 0.7858s
  XIR writer, ElemWrap, inline in only ir::asg::section                0.781s
  Pre-ElemWrap, inlines in ir::asg::section                            0.7772s

These profiles are difficult, because they hit the filesystem so much.  I
write to /dev/null, but it reads 100s of xmlo files from disk.

It's clear that the impact is fairly modest and within a margin of error; as
such, I will continue down the path of writing code that's easier to grok
and maintain, since not doing so would be a micro-optimization relative to
the concerns of the rest of the system at this point.

But the purpose of all of this work was to determine whether an
iterator-based XIR would be viable.  It seems to be competitive.  I'll
finish up the writer reimplementation and move on.
2021-10-07 16:48:58 -04:00
Mike Gerwitz 7f5064c665 tamer: obj::xmle::xir: Write l:map-from
This contains some awkward coupling for opening and closing tags to reduce
the complexity of the `Iterator` types that must be manually
specified.  That may be addressed shortly.
2021-10-05 16:13:47 -04:00
Mike Gerwitz e2c9944f1b tamer: Move Sections map from unique from writer into Sections
We're implementing an new XIR-based writer and don't want to have to
duplicate this; it didn't really belong there to begin with.
2021-10-04 16:31:30 -04:00
Mike Gerwitz 004f5dc312 tamer: Read only a single map preproc:from from xmlo files
This was creating a heap-allocated `Vec` for each map symbol despite not
actually needing it.  We do have multiple froms for return map values.

But by the time we may want this type of thing, we'll have a different IR
for it anyway.
2021-10-04 14:59:33 -04:00
Mike Gerwitz 772619f6f0 tamer: Replace explicit array::IntoIter::new with IntoIter
Now that we're on 2021 Edition, the default behavior has changed to be
consistent.
2021-10-02 01:03:19 -04:00
Mike Gerwitz 7c61a92d30 tamer: obj::xmle::xir: Minor clean and docs
This is far from fully documented; it's just a start.  I'll document fully
once the implementation is done, to ensure I don't waste time documenting
things that may change.
2021-10-02 00:58:14 -04:00
Mike Gerwitz 42188e80e7 tamer: obj::xmle::xir::test: Extract into own file
These are getting large and messy.

And I now notice that I never completed the header test after
prototyping.  Shame on me.

Also, errata from the previous commit message: the diffs are identical
_except for attribute escaping_ that is unnecessary; we're outputting data
read directly from existing XML files (output by Saxon), so characters are
already escaped as needed.

DEV-10561
2021-10-02 00:58:13 -04:00
Mike Gerwitz 7269e68b00 tamer: obj::xmle::xir: Complete l:dep
The `l:dep` section of the `xmle` file, after formatting (since XIR writes
without newlines and indentation), is now identical to the existing xmle
writer.  I can now move on to the other sections.

Note that the attribute movement in this commit is simply to get the diff to
properly align.  Once the current xmle writer is removed, I'll organize them
a bit more sensibly.

`obj::xmle::xir` also needs documentation, now that it's shown to be viable.
2021-09-30 13:06:30 -04:00
Mike Gerwitz acf55fad81 tamer: Intern desc from xmle on read
The new xmle writer was having to intern before write, which did not make
sense.

This continues with consistently using symbols throughout the system, and
is a smaller size than `String` as a bonus.
2021-09-29 23:31:07 -04:00
Mike Gerwitz 5250571f15 tamer: ir::asg::ident: Use symbols in place of string slice mapping
`IdentKind` needs to be written to `xmle` files and displayed in error
messages.  String slices were used when quick-xml was used for writing,
which will be going away with the new writer.
2021-09-29 23:18:23 -04:00
Mike Gerwitz 6864fbc1cd tamer: Start of XIR-based xmle writer
This has been a long time coming, and has been repeatedly stashed as other
parts of the system have evolved to support it.  The introduction of the XIR
tree was to write tests for this (which are sloppy atm).

This currently writes out the `xmle` header and _most_ of the `l:dep`
section; it's missing the object-type-specific attributes.  There is,
relatively speaking, not much more work to do here.

The feature flag `wip-xir-xmle-writer` was introduced to toggle this system
in place of `XmleWriter`.  Initial benchmarks show that it will be
competitive with the quick-xml-based writer, but remember that is not the
goal: the purpose of this is to test XIR in a production system before we
continue to implement it for a frontend, and to refactor so that we do not
have multiple implementations writing XML files (once we echo the source XML
files).

I'm excited to get this done with so that I can move on.  This has been
rather exhausting.
2021-09-28 14:52:53 -04:00
Mike Gerwitz e91aeef478 tamer: Remove Ix generalization throughout system
This had the writing on the wall all the same as the `'i` interner lifetime
that came before it.  It was too much of a maintenance burden trying to
accommodate both 16-bit and 32-bit symbols generically.

There is a situation where we do still want 16-bit symbols---the
`Span`.  Therefore, I have left generic support for symbol sizes, as well as
the different global interners, but `SymbolId` now defaults to 32-bit, as
does `Asg`.  Further, the size parameter has been removed from the rest of
the code, with the exception of `Span`.

This cleans things up quite a bit, and is much nicer to work with.  If we
want 16-bit symbols in the future for packing to increase CPU cache
performance, we can handle that situation then in that specific case; it's a
premature optimization that's not at all worth the effort here.
2021-09-23 14:52:54 -04:00
Mike Gerwitz ab093046e9 tamer: ir::asg::section: Provide iterators for major section groups
These groups happen to correspond with the sections of the xmle file, which
suggests again that this lives in the wrong place.  But I should really have
my focus elsewhere right now, so I don't know if I'll go any further right
now.  I guess we'll see as the writer is reimplemented.
2021-09-01 11:21:44 -04:00
Mike Gerwitz 1fa9614698 tamer: ir::asg::section: Improve iteration
`SectionsIter` was introduced to remove that responsibility from xmle
writer, since that's currently being reimplemented using XIR.

The existing iterator has been renamed SectionIter{ator=>} for a more
idiomatic name for iterator structs, and now has a static type rather than
relying on dynamic dispatch.  The author of that code wasn't sure how to
handle it otherwise.  (Which is understandable, since we were both still
getting acquainted with Rust.)  There's no notable change in performance in
my benchmarking.

This abstraction is a bit awkward, in that it's named for object file
sections, but they aren't.  Further, it's coupled with the ASG via
`SortableAsg` and perhaps should be generalized into a sorting routine that
takes a function for sorting, so that `Sections` can be moved into xmle's
packages.
2021-09-01 09:14:51 -04:00
Mike Gerwitz 1cdb3fbbc5 tamer: tameld: Skip fragment unescaping only to re-escape on write
Fragments' text were unescaped on reading, producing an owned String and
spending time parsing the text to unescape.  We were then copying that into
an internement pool (so, copying twice, effectively).

Further, we were then _re-escaping_ on write.

This was all wasteful, since we do not do any manipulation of the fragment
before outputting to the xmle file; we know that Saxon produced properly
escaped XML to begin with, and can trust to propagate it.

This also introduces a new global `clone_uninterned_utf8_unchecked` method.

In profiling this change, I tested (a) before this change, (b) after writing
without escaping, and (c) after both reading escaped and writing without
escaping.

     (a)              (b)              (c)
  sec   mem (B)    sec     B        sec     B
0:00.95 47896 -> 0:00.91 47988 -> 0:00.87 48288
0:00.40 30176 -> 0:00.37 25656 -> 0:00.36 25788
0:00.39 45672 -> 0:00.37 45756 -> 0:00.35 34952
0:00.39 20716 -> 0:00.38 19604 -> 0:00.36 19956
0:00.33 16836 -> 0:00.32 16988 -> 0:00.31 16892
0:00.23 15268 -> 0:00.23 15236 -> 0:00.22 15312
0:00.44 20780 -> 0:00.44 20048 -> 0:00.41 20148
0:00.54 44516 -> 0:00.50 36964 -> 0:00.49 36728
0:00.62 55976 -> 0:00.57 46204 -> 0:00.54 41468
0:00.31 28016 -> 0:00.30 27308 -> 0:00.28 23844
0:00.23 15388 -> 0:00.22 15316 -> 0:00.21 15304
0:00.05 4888  -> 0:00.05 4760  -> 0:00.05 4948
0:00.41 19756 -> 0:00.41 19852 -> 0:00.40 19992
0:00.47 20828 -> 0:00.46 20844 -> 0:00.44 20968
0:00.27 18152 -> 0:00.26 18184 -> 0:00.25 18312

Interestingly, the peak memory usage increases very slightly between the
second and third steps (though decreases from the first), likely because the
raw (encoded) is larger than the unencoded text (e.g. `&gt;` takes more
space than `>`).
2021-08-18 11:39:06 -04:00
Mike Gerwitz f97141f5c5 tamer: tameld: Use uninterned symbols for reader
Fragments were previously represented by `String` to avoid the cost of
interning (hashing and copying).  This change modifies it to use uninterned
symbols, which does still have a copy overhead but it does not hash.

Initial tests shows a small performance decrease of about 15% and a small
memory increase of similar proportion.  However, once I realized that I was
not clearing buffers from quick_xml events and implemented that change in a
previous commit, this change ended up being approximately on par with
`String`, despite the copying of some pretty large fragments.

YMMV, though, and perhaps on less powerful systems time may increase
slightly.

The upcoming XIR (XML IR) was originally going to support both owned strings
and symbols, but now we'll just use uninterned symbols; I can't rationalize
complicating the API at this time when it will provide an almost
imperceivable performance benefit.  If ever that changes in the future,
that change will be entertained.

The end result is that the fate of a fragment's underlying memory is
determined by whatever is processing the data, _not_ by the API itself---the
API was previously forcing use of a String, whereas now it's up to the
caller to determine whether we want comparable interns.  For fragments,
that's not likely ever to be the case, especially considering that the
representation will change so drastically in the future.
2021-08-16 14:05:32 -04:00
Mike Gerwitz d96dcad7d8 tamer: tameld: Reduce peak memory usage
This clears the buffers used by quick_xml, which was apparently forgotten
during initial development (I think I expected it to re-use the previously
allocated space automatically).

This has significant effects in some cases.  For example, one of our UI
builds drops from ~9KiB to ~5KiB peak memory usage.  Other builds for larger
suppliers are only slightly effected because of some of their massive
fragments.
2021-08-16 13:38:14 -04:00
Mike Gerwitz 9deb393bfd tamer: Global interners
This is a major change, and I apologize for it all being in one commit.  I
had wanted to break it up, but doing so would have required a significant
amount of temporary work that was not worth doing while I'm the only one
working on this project at the moment.

This accomplishes a number of important things, now that I'm preparing to
write the first compiler frontend for TAMER:

  1. `Symbol` has been removed; `SymbolId` is used in its place.
  2. Consequently, symbols use 16 or 32 bits, rather than a 64-bit pointer.
  3. Using symbols no longer requires dereferencing.
  4. **Lifetimes no longer pollute the entire system! (`'i`)**
  5. Two global interners are offered to produce `SymbolStr` with `'static`
     lifetimes, simplfiying lifetime management and borrowing where strings
     are still needed.
  6. A nice API is provided for interning and lookups (e.g. "foo".intern())
     which makes this look like a core feature of Rust.

Unfortunately, making this change required modifications to...virtually
everything.  And that serves to emphasize why this change was needed:
_everything_ used symbols, and so there's no use in not providing globals.

I implemented this in a way that still provides for loose coupling through
Rust's trait system.  Indeed, Rustc offers a global interner, and I decided
not to go that route initially because it wasn't clear to me that such a
thing was desirable.  It didn't become apparent to me, in fact, until the
recent commit where I introduced `SymbolIndexSize` and saw how many things
had to be touched; the linker evolved so rapidly as I was trying to learn
Rust that I lost track of how bad it got.

Further, this shows how the design of the internment system was a bit
naive---I assumed certain requirements that never panned out.  In
particular, everything using symbols stored `&'i Symbol<'i>`---that is, a
reference (usize) to an object containing an index (32-bit) and a string
slice (128-bit).  So it was a reference to a pretty large value, which was
allocated in the arena alongside the interned string itself.

But, that was assuming that something would need both the symbol index _and_
a readily available string.  That's not the case.  In fact, it's pretty
clear that interning happens at the beginning of execution, that `SymbolId`
is all that's needed during processing (unless an error occurs; more on that
below); and it's not until _the very end_ that we need to retrieve interned
strings from the pool to write either to a file or to display to the
user.  It was horribly wasteful!

So `SymbolId` solves the lifetime issue in itself for most systems, but it
still requires that an interner be available for anything that needs to
create or resolve symbols, which, as it turns out, is still a lot of
things.  Therefore, I decided to implement them as thread-local static
variables, which is very similar to what Rustc does itself (Rustc's are
scoped).  TAMER does not use threads, so the resulting `'static` lifetime
should be just fine for now.  Eventually I'd like to implement `!Send` and
`!Sync`, though, to prevent references from escaping the thread (as noted in
the patch); I can't do that yet, since the feature has not yet been
stabalized.

In the end, this leaves us with a system that's much easier to use and
maintain; hopefully easier for newcomers to get into without having to deal
with so many complex lifetimes; and a nice API that makes it a pleasure to
work with symbols.

Admittedly, the `SymbolIndexSize` adds some complexity, and we'll see if I
end up regretting that down the line, but it exists for an important reason:
the `Span` and other structures that'll be introduced need to pack a lot of
data into 64 bits so they can be freely copied around to keep lifetimes
simple without wreaking havoc in other ways, but a 32-bit symbol size needed
by the linker is too large for that.  (Actually, the linker doesn't yet need
32 bits for our systems, but it's going to in the somewhat near future
unless we optimize away a bunch of symbols...but I'd really rather not have
the linker hit a limit that requires a lot of code changes to resolve).

Rustc uses interned spans when they exceed 8 bytes, but I'd prefer to avoid
that for now.  Most systems can just use on of the `PkgSymbolId` or
`ProgSymbolId` type aliases and not have to worry about it.  Systems that
are actually shared between the compiler and the linker do, though, but it's
not like we don't already have a bunch of trait bounds.

Of course, as we implement link-time optimizations (LTO) in the future, it's
possible most things will need the size and I'll grow frustrated with that
and possibly revisit this.  We shall see.

Anyway, this was exhausting...and...onward to the first frontend!
2021-08-11 14:24:55 -04:00
Mike Gerwitz 01722c9c3b tamer: Symbol{Index=>Id}
The former was a misnomer (it represents an index _entry_).  This name is
also shorter, which is nice, considering how often it'll be used.
2021-07-30 13:32:32 -04:00
Mike Gerwitz 0fc8a1a4df tamer: Remove default SymbolIndex (et al) index type
Oh boy.  What a mess of a change.

This demonstrates some significant issues we have with Symbol.  I had
originally modelled the system a bit after Rustc's, but deviated in certain
regards:

  1. This has a confurable base type to enable better packing without bit
     twiddling and potentially unsafe tricks I'd rather avoid unless
     necessary; and
  2. The lifetime is not static, and there is no global, singleton interner;
     and
  3. I pass around references to a Symbol rather than passing around an
     index into an interner.

For #3---this is done because there's no singleton interner and therefore
resolving a symbol requires a direct reference to an available interner.  It
also wasn't clear to me (and still isn't, in fact) whether more than one
interner may be used for different contexts.

But, that doesn't preclude removing lifetimes and just passing around
indexes; in fact, I plan to do this in the frontend where the parser and
such will have direct interner access and can therefore just look up based
on a symbol index.  We could reserve references for situations where
exposing an interner would be undesirable.

Anyway, more to come...
2021-07-29 14:26:40 -04:00
Mike Gerwitz d9dcfe8777 tamer: Introduce tpwrap module to contain quick_xml::Error adapter
This adapter exists to implement PartialEq so that it can be derived on
Error objects.  This is used primarily (well, exclusively atm) for tests.
2021-07-23 23:23:55 -04:00
Mike Gerwitz 5aaa1106cb tamer: obj::xmlo::reader::mock: Extract into crate::test::quick_xml
Other mocks exist here, and here it can be re-used for the upcoming XML
frontend.
2021-07-22 15:32:30 -04:00
Mike Gerwitz 2e50af1220 Copyright year update 2021 2021-07-22 15:00:15 -04:00
Mike Gerwitz e5bbd49166 tamer: obj::xmlo::reader: Extract tests separate file
The file's getting a bit large and the tests are rather complex.  Further,
LSP does better on smaller, less complex files.
2021-07-22 14:39:06 -04:00
Mike Gerwitz 1f24cfdf25 Remove :map: sym-dep generation
This was incorrect to begin with---it does not make sense that an input
mapping should depend upon the identifier that it maps to, in the sense that
we make use of these dependencies.  If we add weak symbol references in the
future, then this can be reintroduced.

By removing this, we free tameld from having to perform the check itself.

.rev-xmlo bumped to force rebuilding of object files since the linker now
expects that no such dependencies will exist within them.
2021-07-22 14:27:15 -04:00
Mike Gerwitz 90c6b51fd5 tamer: tameld: Place constants into static section in executable
This is something that changed when the TAMER POC was initially created, as
I was learning Rust.  I don't recall the original reason why this was moved,
but it could have been moved back long ago.

In our systems, constants can hold tables (as matrices) with tens or
hundreds of thousands of rows, and there are a number of them in certain
projects.  As an example, the YAML-based test cases for one of our systems
went from ~2m30s to ~45s after this change was made.  Much of the cost
savings comes from saving GC.
2021-07-21 14:53:15 -04:00
Mike Gerwitz 96ffd5f6e5 [DEV-8000] ir::asg: Error types for unresolved identifiers during sorting
This checks explicitly for unresolved objects while sorting and provides an
explicit error for them.  For example, this will catch externs that have no
concrete resolution.

This previously fell all the way through to the unreachable! block.  The old
POC implementation was catching unresolved objects, albeit with a debug
error.
2020-07-02 01:38:32 -04:00
Mike Gerwitz 0d4bbe5e4e [DEV-8000] ir::asg: Introduce SortableAsgError
This will be used for the next commit, but this change has been isolated
both because it distracts from the implementation change in the next commit,
and because it cleans up the code by removing the need for a type parameter
on `AsgError`.

Note that the sort test cases now use `unwrap` instead of having
`{,Sortable}AsgError` support one or the other---this is because that does
not currently happen in practice, and there is not supposed to be a
hierarchy; they are siblings (though perhaps their name may imply otherwise).
2020-07-01 13:42:14 -04:00
Mike Gerwitz bcca5f7c49 [DEV-7084] TAMER: AsgBuilder and IR lowering docs 2020-04-28 13:39:55 -04:00
Mike Gerwitz 0f4b2d75f8 [DEV-7084] TAMER: obj::xmlo: Private inner modules 2020-04-28 11:08:05 -04:00
Mike Gerwitz 549e9ca23b [DEV-7084] TAMER: AsgBuilderState:🆕 New constructor 2020-04-28 09:06:25 -04:00
Mike Gerwitz 9893d56775 [DEV-7084] TAMER: Finalize AsgBuilder 2020-04-28 09:06:25 -04:00
Mike Gerwitz 32abc7dce2 [DEV-7084] TAMER: impl PartialEq for XmloError
This cannot be dervied because XmlError does not implement PartialEq,
which is quite the annoyance in tests.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 21a0bdcce1 [DEV-7084] TAMER: AsgBuilderError: Introduce proper error variants
This is a union (sum type) of three other errors types, plus errors specific
to this builder.

This commit does a good job demonstrating the boilerplate, as well as a need
for additional context (in the case of `IdentKindError`), that we'll want to
work on abstracting away.
2020-04-28 09:06:25 -04:00
Mike Gerwitz ecc2e33ba7 [DEV-7084] TAMER: xmlo::AsgBuilder: Accept XmloResult iterator
This flips the API from using XmloWriter as the context to using Asg and
consuming anything that can produce XmloResults.  This not only makes more
sense, but avoids having to create a trait for XmloReader, and simplifies
the trait bounds we have to concern ourselves with.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 323ea79bf8 [DEV-7084] TAMER: Basic AsgBuilder cleanup
This just tidies things up a little bit before I get into some further
refactoring.  I wrote the original code when I was just learning Rust not
too long ago, so it's interesting to see how my understanding has changed
over that relatively short period of time.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 9220de4769 [DEV-7084] TAMER: Finish encapsulating petgraph
This will allow us to migrate away from Petgraph in the future should we
choose to do so.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 0f423f3b24 [DEV-7084] TAMER: Simplify path canonicalization
This abstracts away the canonicalizer and solves the problem whereby
canonicalization was not being performed prior to recording whether a path
has been visited.  This ensures that multiple relative paths to the same
file will be properly recognized as visited.
2020-04-28 09:06:25 -04:00
Mike Gerwitz da69118592 [DEV-7084] TAMER: AsgBuilderState
This completes the POC extraction for AsgBuilder, but is still POC
code.  The commits that follow will clean it up and provide tests.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 3f46917da9 [DEV-7084] TAMER: AsgBuilder extracted from POC
This extracts the changes nearly verbatim before doing refactoring so that
it's easier to observe what changes have been made.
2020-04-28 09:06:25 -04:00
Mike Gerwitz 90ed4e9bd6 [DEV-7084] TAMER: From<B, &I> for XmloReader
This serves as a constructor for the time being, decoupling from POC.  We
may do something better once we have a better idea of how the various
abstractions around this will evolve.
2020-04-20 10:53:51 -04:00
Mike Gerwitz 7a4f6cf9f2 [DEV-7087] TAMER: symbol_dummy! macro 2020-03-24 14:14:05 -04:00
Mike Gerwitz f969877324 [DEV-7087] TAMER: {=>Ident}Object{,State,Data}
This is essential to clarify what exactly the different object types
represent with the new generic abstractions.  For example, we will have
expressions as an object type.
2020-03-24 09:56:25 -04:00
Mike Gerwitz 5fb68f9b67 TAMER: Make Asg generic over object
There's a lot here to make the object stored on the `Asg` generic.  This
introduces `ObjectState` for state transitions and `ObjectData` for pure
data retrieval.  This will allow not only for mocking, but will be useful to
enforce compile-time restrictions on the type of objects expected by the
linker vs. the compiler (e.g. the linker will not have expressions).

This commit intentionally leaves the corresponding tests in their original
location to prove that the functionality has not changed; they'll be moved
in a future commit.

This also leaves the names as "Object" to reduce the number the cognative
overhead of this commit.  It will be renamed to something like "IdentObject"
in the near future to clarify the intent of the current object type and to
open the way for expressions and a type that marries both of them in the
future.

Once all of this is done, we'll finally be able to make changes to the
compatibility logic in state transitions to implement extern compatibility
checks during resolution.

DEV-7087
2020-03-24 09:56:20 -04:00
Mike Gerwitz 400d5b25a1 ir::asg::Object::Empty: Remove variant
This variant is unnecessary, as it was used only by the indexer to represent
the absence of a node, for which was can simply use `None` in the containing
`Option`.

* tamer/Cargo.toml: Add `lazy_static`.
* tamer/Cargo.lock: Update.
* tamer/src/ir/asg/base.rs (with_capacity): Use `None` in place of
    `Some(Object::Empty)`.
* tamer/src/ir/asg/object.rs: Adjust state machine graphic.
  (Empty): Remove variant.
  (Missing): Remove reference to variance.
* tamer/src/lib.rs: Import `lazy_static` for test builds.
* tamer/obj/xmle/writer/writer.rs (Section::iter): Remove `Object::Empty`
    from documentation.
  (test::): Remove references to `Object::Missing`.  `lazy_static!` used
    here.
* tamer/obj/xmle/writer/xmle.rs (test::write_section_catch_missing): Replace
    reference to `Object::Missing`.
2020-03-19 15:42:06 -04:00
Joseph Frazer 59a0c382af [DEV-7085] Move sections to IR module
We need to use `Sections` in both the writer and the ASG so it needs to
be in a place that makes sense.
2020-03-13 11:51:59 -04:00
Joseph Frazer 01e7d3e560 [DEV-7134] Propagate errors from the writer
When an error occurs during the XML writing, they should be shown to the
user.
2020-03-09 08:23:13 -04:00
Joseph Frazer 2a5551a04a [DEV-7134] Propagate errors setting fragments
If we cannot set a fragment, we need to display the error to the user.

We are currently ignoring "___head", "___tail", and objects that are
both virtual and overridden. Those will be corrected in with future
changes.
2020-03-09 08:23:13 -04:00
Mike Gerwitz bfea768f89 Copyright year 2020 update 2020-03-06 11:05:18 -05:00
Joseph Frazer 6ac7641087 [DEV-7083] TAMER: xmle writer
This introduces the writer for xmle files.
2020-03-03 11:21:18 -05:00
Mike Gerwitz b89408e5bb TAMER: Extract quick_xml event-related mocks 2020-02-26 10:49:01 -05:00
Mike Gerwitz 7c60b53de8 TAMER: Virtual symbol override 2020-02-26 10:49:01 -05:00
Mike Gerwitz 645908e258 TAMER: xmle output changes to support Summary Page
Co-Authored-By: Joseph Frazer <joseph.frazer@ryansg.com>
2020-02-26 10:49:00 -05:00
Mike Gerwitz 6939753ca0 TAMER: POC: Output xmle
This is a working proof-of-concept that will be finalized in future commits.
2020-02-26 10:49:00 -05:00
Mike Gerwitz 85a4934db5 TAMER: Symbol source data and metadata 2020-02-26 10:49:00 -05:00
Mike Gerwitz 10b9caa7ad TAMER: Fail on empty fragment ids (and fix underlying problem) 2020-02-25 16:46:28 -05:00
Mike Gerwitz a0893da577 TAMER: xmlo: Add Package event 2020-02-25 16:46:27 -05:00
Mike Gerwitz a929c8cae4 TAMER: xmlo reader
This introduces the reader for xmlo files produced by the XSLT-based
compiler.  It is an initial implementation but is not complete; see future
commits.
2020-02-25 16:46:25 -05:00