Commit Graph

1181 Commits (eaa8133d211346f012785e70ffadbcbecef6b17a)

Author SHA1 Message Date
Mike Gerwitz 9d87962e96 tamer: Use Rust 2021 Edition
This will be stable Oct 21; this uses nightly for now.
2021-10-02 00:58:14 -04:00
Mike Gerwitz 885d5e4d8f tamer: Switch back to nightly toolchain
This is to support two things:
  1. Early switch to 2021 Edition, which is stable Oct 21; and
  2. To make use of unstable const features.

The rationale is that switching to nightly does not really have any
significant downside for us, given that TAMER is used only by us and
the only risk is that unstable features may change a bit, which can be
mitigated with certain precautions.

The rationale for each unstable feature will be documented as they are used,
including documentation on what would be required to remove it and what
functionality would be lost / need to change in doing so.
2021-10-02 00:58:14 -04:00
Mike Gerwitz 7c61a92d30 tamer: obj::xmle::xir: Minor clean and docs
This is far from fully documented; it's just a start.  I'll document fully
once the implementation is done, to ensure I don't waste time documenting
things that may change.
2021-10-02 00:58:14 -04:00
Mike Gerwitz 42188e80e7 tamer: obj::xmle::xir::test: Extract into own file
These are getting large and messy.

And I now notice that I never completed the header test after
prototyping.  Shame on me.

Also, errata from the previous commit message: the diffs are identical
_except for attribute escaping_ that is unnecessary; we're outputting data
read directly from existing XML files (output by Saxon), so characters are
already escaped as needed.

DEV-10561
2021-10-02 00:58:13 -04:00
Mike Gerwitz 7269e68b00 tamer: obj::xmle::xir: Complete l:dep
The `l:dep` section of the `xmle` file, after formatting (since XIR writes
without newlines and indentation), is now identical to the existing xmle
writer.  I can now move on to the other sections.

Note that the attribute movement in this commit is simply to get the diff to
properly align.  Once the current xmle writer is removed, I'll organize them
a bit more sensibly.

`obj::xmle::xir` also needs documentation, now that it's shown to be viable.
2021-09-30 13:06:30 -04:00
Mike Gerwitz acf55fad81 tamer: Intern desc from xmle on read
The new xmle writer was having to intern before write, which did not make
sense.

This continues with consistently using symbols throughout the system, and
is a smaller size than `String` as a bonus.
2021-09-29 23:31:07 -04:00
Mike Gerwitz 5250571f15 tamer: ir::asg::ident: Use symbols in place of string slice mapping
`IdentKind` needs to be written to `xmle` files and displayed in error
messages.  String slices were used when quick-xml was used for writing,
which will be going away with the new writer.
2021-09-29 23:18:23 -04:00
Mike Gerwitz fa4181770f tamer: src::ir::asg::ident::Dim: Assert n<10
This replaces a TODO with an assertion.
2021-09-29 16:26:41 -04:00
Mike Gerwitz 6864fbc1cd tamer: Start of XIR-based xmle writer
This has been a long time coming, and has been repeatedly stashed as other
parts of the system have evolved to support it.  The introduction of the XIR
tree was to write tests for this (which are sloppy atm).

This currently writes out the `xmle` header and _most_ of the `l:dep`
section; it's missing the object-type-specific attributes.  There is,
relatively speaking, not much more work to do here.

The feature flag `wip-xir-xmle-writer` was introduced to toggle this system
in place of `XmleWriter`.  Initial benchmarks show that it will be
competitive with the quick-xml-based writer, but remember that is not the
goal: the purpose of this is to test XIR in a production system before we
continue to implement it for a frontend, and to refactor so that we do not
have multiple implementations writing XML files (once we echo the source XML
files).

I'm excited to get this done with so that I can move on.  This has been
rather exhausting.
2021-09-28 14:52:53 -04:00
Mike Gerwitz 863d990cbd tamer: sym: 16-bit static symbol prefill
The 16-bit interner at present will be used only for span contexts.  In the
future, this interner may become specialized specifically for that, but for
now let's just re-use what we already have so that I can move on.

DEV-10733
2021-09-28 10:39:46 -04:00
Mike Gerwitz 96b16c6de9 tamer: sym::prefill::test::global_sanity_check: Note duplicate strings
I want to make it clear in the assertion that the problem could be caused by
duplicate strings.  We do not sort by string, because in part we may in the
future want to group certain symbols together in some arbitrary way so we
can compare ranges (using the markers).

If that doesn't end up happening, it may be better to just sort by string
to obviate the problem.
2021-09-24 16:25:29 -04:00
Mike Gerwitz db8a098452 tamer: sym: Minor documentation refinement
Mostly rewording.
2021-09-24 10:11:19 -04:00
Mike Gerwitz c71d36b154 tamer: sym::prefill: All-caps constants for static symbols
It's really awkward not having them caps, when not only are constants
expected to be, but also that we cannot maintain consistency between the
string and the identifier name in even the simplest of cases.

(We could use `r#`, but that's too cumbersome.)
2021-09-23 23:48:28 -04:00
Mike Gerwitz 785ca0fe9e tamer: sym::prefill: Remove StaticSymbolId in favor of refined types
`StaticSymbolId` was created before the more specific types, which render it
unnecessary.  If we need a generic type, it can be re-introduced, but using
`static_symbol_newtypes!`.
2021-09-23 23:35:45 -04:00
Mike Gerwitz 15ff00b3cf tamer: sym: Only prefill 32-bit global interner
This is the interner that is intended to be used with the majority of the
system; the 16-bit interner is left around for the moment, but will likely
later become specialized.
2021-09-23 16:11:17 -04:00
Mike Gerwitz e91aeef478 tamer: Remove Ix generalization throughout system
This had the writing on the wall all the same as the `'i` interner lifetime
that came before it.  It was too much of a maintenance burden trying to
accommodate both 16-bit and 32-bit symbols generically.

There is a situation where we do still want 16-bit symbols---the
`Span`.  Therefore, I have left generic support for symbol sizes, as well as
the different global interners, but `SymbolId` now defaults to 32-bit, as
does `Asg`.  Further, the size parameter has been removed from the rest of
the code, with the exception of `Span`.

This cleans things up quite a bit, and is much nicer to work with.  If we
want 16-bit symbols in the future for packing to increase CPU cache
performance, we can handle that situation then in that specific case; it's a
premature optimization that's not at all worth the effort here.
2021-09-23 14:52:54 -04:00
Mike Gerwitz ed245bb099 tamer: sym::prefill: Initial typed static symbol concept
We'll see how the syntax evolves over time.  It's not ideal to have to
specify the type, rather than having the compiler infer it, but I don't much
feel like getting into my first procedural macro right now, so we'll stick
with this approach for the time being.

This will set the stage to be able to safely e.g. create QNames statically
at compile-time and would allow us to make any attempts to bypass it
unsafe.
2021-09-23 00:37:39 -04:00
Mike Gerwitz b972b0b202 tamer: sym::StaticSymbolId: Introduce
Previously, we were allocating only u32 versions of `SymbolId` for the
statically allocated symbols.  This introduces a new symbol type with a very
small datatype (8 bits) that is able to cast into any `SymbolId`.  This is
explained in the docs.

We'll be taking this typing further in future commits so that static symbols
are better-suited for compile-time guarantees for static newtype
construction.

DEV-10710
2021-09-22 21:37:06 -04:00
Mike Gerwitz c87147c277 configure.ac: Bump Rust 1.{53=>54} for using macros in attribute values
The previous commit uses `concat!` for doc generation.  I forgot that this
was only recently stabalized.
2021-09-22 16:47:17 -04:00
Mike Gerwitz 366fef714b tamer: sym::prefill: Introduce static symbols
This is the beginning of static symbols, which is becoming increasing
necessary as it's quite a pain to have to deal with interning static strings
any place they're used.

It's _more_ of a pain to do that in conjunction with newtypes (e.g. `QName`,
`AttValue`, etc) that make use of `SymbolId`; this will allow us to
construct _those_ statically as well, and additional work to support that
will be coming up.

DEV-10701
2021-09-22 16:08:40 -04:00
Mike Gerwitz e0a209d417 tamer: bench: xir: Reduce writer benchmark memory usage
These were using GiB of memory, which is ...unnecessary.

I reduced the iteration count significantly, but it was still wasting a lot
of time and memory and needed `with_capacity` to reduce the number of copies
after reallocation.

It is not typical that a buffer would contain this much information.
2021-09-21 16:21:32 -04:00
Mike Gerwitz aee781a6fb tamer: bench: xir: Fix broken benchmark
This broke when I removed `SelfClose`.  I used to run
`make all fmt check bench` before every push, but they take a while to run,
in part because it uses nightly and has to recompile too.

But it looks like I need to be more diligent again.
2021-09-21 16:09:50 -04:00
Mike Gerwitz b348892276 tamer: ir::xir::tree: Introduce attribute fragment parsing
This is exactly was I said I was _not_ going to do in the previous commit,
but apparently hacking late at night had me forget the whole reason that
XIRT is being introduced now---unit tests.  I'll be emitting a XIR stream
and I need to parse it for convenience in the tests.

So, here's a good start.  Next will be some generalizations that are useful
for the tests as well.  This is pretty bare, but accomplishes the task.

See docs for more info.
2021-09-21 16:07:38 -04:00
Mike Gerwitz a5afc76568 tamer: ir::xir::tree: Extract Attr{,List} into new module
The `tree` module is getting more difficult to navigate.  The tests still
remain where they were, since a bunch of concerns are mixed together.  Any
tests specific only to this module will be added here.
2021-09-21 10:43:23 -04:00
Mike Gerwitz fe7b64fe62 tamer: ir::xir::tree::AttrName: Remove unused, rename {Ele=>}AttrName
Attributes used to be able to be emitted standalone, but that was abandoned
a while back to clean things up a bit.  This cleanup was missed.
2021-09-21 09:29:56 -04:00
Mike Gerwitz c6a7988bc8 tamer: ir::xir: Add Token::AttrValueFragment with writer support
This is implemented only for the writer, since its use case is to be able to
concatenate strings without copying during writing.

It doesn't really make sense to support this in XIR Tree, since a reader
should never produce this.  But if we ever run into this (e.g. due to some
internal processing pipeline), we'll address it then; XIR Tree might have to
do copying, then, but should probably wait until encountering all fragments
before interning.  That'd be a distraction right now.
2021-09-21 00:16:30 -04:00
Mike Gerwitz e95afe2658 tamer: ir::xir::tree::Element::open: Fix doc typo 2021-09-21 00:16:30 -04:00
Mike Gerwitz 3bb6f0cf35 tamer: ir::asg::ident: AsRef impls for SymbolId types
This commit will make more sense once the broader context is committed, but
it's needed for lowering from `Sections` into a XIR stream.

This will also change once we pre-allocate symbols, like rustc, when the
interner is initialized.

This is my first use of the `paste` crate, which is used to generate
identifiers.  So this is partly an experiment, and it seems much better than
having to write a proc macro, at least at this point in time.  If this code
stays around, it'll probably be generalized further and used elsewhere, but
I'd prefer not to go this route long-term.
2021-09-20 16:50:40 -04:00
Mike Gerwitz 12daddcc2d tamer: ir::xir::tree::Element: Open element constructor
This simply moves the construction into `Element`.
2021-09-16 10:52:00 -04:00
Mike Gerwitz ea50e1112a tamer: ir::xir::tree: Extract tests into own file
This file's getting large, and will only grow more complex.
2021-09-16 10:18:02 -04:00
Mike Gerwitz 3484336b1d tamer: ir::xir::tree::Stack: Encapsulate ElementStack manipulation
This moves some logic into `ElementStack` (which would be part of `Stack` if
variants were their own types), rather than peering so deeply into its
data.
2021-09-16 10:07:37 -04:00
Mike Gerwitz a49ac23aeb tamer: ir::xir::tree: Child element attribute parsing
This correctly retains and restores the parent stack after processing an
attribute for a child element.

This does increase the size of [`Stack`] a bit, but we can evaluate whether
it's too large at a later time.  It's currently 832 bits with `Ix=u32`,
which is large, but the question is whether it matters; we'll see as we
begin to use it.
2021-09-15 16:46:15 -04:00
Mike Gerwitz 61e493066c tamer: ir::xir::tree: Clean up parser implementation
This moves most of the parsing logic into `Stack`, which rightfully owns the
stack manipulation and state transitions.  `ParserState` becomes exactly
what it says it is---a management of the persistent state of the parser, and
is also responsible for digesting tokens and dispatching their data to the
proper event.

This approach has a number of benefits over the old design: it's
self-documenting, making the intent clear; and it is easier to reason about
the subset of states (for both humans and Rusts) than a large match of
transitions.

This contains a number of TODO items that will be addressed shortly.  It
also obviated that the previous commit was incomplete---it doesn't persist
`pstack` for attributes on child elements!  That'll be fixed too.
2021-09-15 16:33:08 -04:00
Mike Gerwitz 366ecca8ea tamer: ir::xir::tree: Initial child element parsing
This modifies the tree parser to handle child elements.  It's mostly
proof-of-concept code; the next commit will clean it up a bit so that it's
largely self-documenting.
2021-09-15 11:19:08 -04:00
Mike Gerwitz 51507ccdad tamer: ir::xir: Combine Token::{SelfClose, Close} variants
This removes `SelfClose` and merges it with `Close` by making the first
parameter an `Option`.  This isn't really ideal, but it really simplifies
pattern matching, especially for the next commit.  I'll have more details
there.

The primary motivation was lack of stabalization for binding after `@` in
matches, e.g. `Foo(name, ele) | ele @ Element { name, .. }`.  It looks like
it's ready, though; maybe next Rust release?

  https://github.com/rust-lang/rust/issues/65490

I don't know if I'll revert this change after then.  This seems plenty
clear, albeit more verbose.
2021-09-13 13:06:20 -04:00
Mike Gerwitz 1c40b9c504 tamer: ir::xir::tree: Closing element parsing with balance check
This introduces parser errors, but does not yet support error recovery; that
problem will be discussed in a commit in the near future, after the writer
is sorted out a bit more.

DEV-10561
2021-09-13 10:45:38 -04:00
Mike Gerwitz 5979e1fb90 tamer: ir::xir::tree: Correct italic formatting in docs
I was using an Org mode format.
2021-09-13 09:47:39 -04:00
Mike Gerwitz fd8a05164d tamer: ir::xir::tree: Remove Tree::Attr, add AttrList
The idea, previously, was that parsing could begin at attributes selectively
and be parsed independently.  But that's really awkward with `Tree`, since
it effectively allows orphan attributes as children of an
`Element`.  Nonsense.

Instead, if we truly only want an attribute list, we can offer a function to
create a parser with an empty `Stack::BuddingElement` that can accumulate
them.
2021-09-09 14:40:58 -04:00
Mike Gerwitz 4987bc39b0 tamer: ir::xir::tree::parser_from: Yield parsed trees
Previously, `parser_from` was a simple wrapper around `parse`; now, this
provides a more convenient API where `next` will yield the next parsed
object.

See docs for much more information and rationale.
2021-09-09 13:05:11 -04:00
Mike Gerwitz 1452a4186a tamer: convert: Add missing method-level docs 2021-09-08 16:12:53 -04:00
Mike Gerwitz 2586827d64 tamer: convert::{ExpectFrom, ExpectInto}: New traits
These traits are intended to eliminate boilerplate, primarily in tests, in
situations where from/into is not expected to fail.

Given that TAMER must only panic for internal compiler errors, this should
not often be used outside of test cases.  Further, there may be better
options in the future (e.g. QNames could be statically compiled rather than
trying to convert at runtime, in this case).
2021-09-08 16:03:44 -04:00
Mike Gerwitz 12bb88e4b5 tamer: ir::xir::tree: Introduce XIR tree
This begins to introduce the XIR tree.  I was originally going to wait on
this until after implementing the xmle writer in terms of XIR, but writing
unit tests is too much of a pain on the stream, so now is as good of a time
as any.

This has very limited support so far; it'll be added to as time goes on.
2021-09-08 13:56:04 -04:00
Mike Gerwitz ab093046e9 tamer: ir::asg::section: Provide iterators for major section groups
These groups happen to correspond with the sections of the xmle file, which
suggests again that this lives in the wrong place.  But I should really have
my focus elsewhere right now, so I don't know if I'll go any further right
now.  I guess we'll see as the writer is reimplemented.
2021-09-01 11:21:44 -04:00
Mike Gerwitz 1fa9614698 tamer: ir::asg::section: Improve iteration
`SectionsIter` was introduced to remove that responsibility from xmle
writer, since that's currently being reimplemented using XIR.

The existing iterator has been renamed SectionIter{ator=>} for a more
idiomatic name for iterator structs, and now has a static type rather than
relying on dynamic dispatch.  The author of that code wasn't sure how to
handle it otherwise.  (Which is understandable, since we were both still
getting acquainted with Rust.)  There's no notable change in performance in
my benchmarking.

This abstraction is a bit awkward, in that it's named for object file
sections, but they aren't.  Further, it's coupled with the ASG via
`SortableAsg` and perhaps should be generalized into a sorting routine that
takes a function for sorting, so that `Sections` can be moved into xmle's
packages.
2021-09-01 09:14:51 -04:00
Mike Gerwitz b80064f59e tamer: configure: Check for Rust 1.{52=>53}.
Or-pattern syntax is used; I had forgotten to bump this version.

For example, match on `Foo(Bar | Baz)` vs. `Foo(Bar) | Foo(Baz)`.
2021-08-30 15:19:14 -04:00
Mike Gerwitz 9331858c6d doc: Give @mdash macro an argument
This macro is used to consume whitespace so that the following sentence can
start on the next line without producing any whitespace in the output.  Its
argument is, therefore, whitespace.

This used to work in earlier versions of Texinfo, but around 6.{6,7} it
began failing because an argument was provided when it wasn't defined with
one.
2021-08-30 10:41:49 -04:00
Mike Gerwitz 0a8fb71c1b tamer: tameld: Use buffered writes
This was an oversight.  The difference is significant.  I had my suspicions
about this when I noticed the huge difference in time between writing to
/dev/null vs. an actual file during profiling.

On one of our systems, here's the number of syscalls _before_ this change:

  $ strace -c target/release/tameld --emit xmle -o foo foo.xmlo
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
   85.05    4.966192          16    318473           write
    7.23    0.421977          13     32298           lstat
    6.53    0.381424          15     25113           read
    0.75    0.043691          13      3350           readlink
    0.25    0.014713          61       241           close
    0.12    0.007167          30       241           openat
    0.05    0.003175         151        21           munmap
    0.01    0.000488          14        35           brk
    0.01    0.000292           9        33           mmap
    0.00    0.000266          38         7           mremap
    0.00    0.000004           1         3           sigaltstack
    0.00    0.000000           0         6           fstat
    0.00    0.000000           0         1           poll
    0.00    0.000000           0        11           mprotect
    0.00    0.000000           0         7           rt_sigaction
    0.00    0.000000           0         1           rt_sigprocmask
    0.00    0.000000           0         6         6 access
    0.00    0.000000           0         1           execve
    0.00    0.000000           0         1           arch_prctl
    0.00    0.000000           0         1           sched_getaffinity
    0.00    0.000000           0         1           set_tid_address
    0.00    0.000000           0         1           set_robust_list
    0.00    0.000000           0         2           prlimit64
  ------ ----------- ----------- --------- --------- ----------------
  100.00    5.839389                379854         6 total

And _after_:

  $ strace -c target/release/tameld --emit xmle -o foo foo.xmlo
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
   45.21    0.435010          13     32298           lstat
   40.09    0.385752          15     25113           read
    6.14    0.059113          21      2809           write
    4.75    0.045687          14      3350           readlink
    2.51    0.024115         100       241           close
    0.84    0.008045          33       241           openat
    0.26    0.002468         118        21           munmap
    0.06    0.000580          17        35           brk
    0.06    0.000566          17        33           mmap
    0.03    0.000279          40         7           mremap
    0.02    0.000181          16        11           mprotect
    0.01    0.000087          15         6         6 access
    0.01    0.000082          12         7           rt_sigaction
    0.01    0.000075          13         6           fstat
    0.00    0.000027           9         3           sigaltstack
    0.00    0.000024          12         2           prlimit64
    0.00    0.000018          18         1           execve
    0.00    0.000016          16         1           poll
    0.00    0.000013          13         1           sched_getaffinity
    0.00    0.000012          12         1           rt_sigprocmask
    0.00    0.000012          12         1           arch_prctl
    0.00    0.000012          12         1           set_robust_list
    0.00    0.000011          11         1           set_tid_address
  ------ ----------- ----------- --------- --------- ----------------
  100.00    0.962185                 64190         6 total

What a difference!

There's still a lot of other red flags in there; those can be addressed
separately.

This was originally written as I was learning Rust, and I suspect that I
didn't realize that File wasn't buffered at the time.

For the above link: times go from 1.23s pre-change to 0.85s after:

  0.77user 0.44system 0:01.23elapsed 99%CPU (0avgtext+0avgdata 48520maxresident)k
  0inputs+43952outputs (0major+12825minor)pagefaults 0swaps

  0.69user 0.15system 0:00.85elapsed 98%CPU (0avgtext+0avgdata 48396maxresident)k
  0inputs+43952outputs (0major+12823minor)pagefaults 0swaps
2021-08-20 12:14:42 -04:00
Mike Gerwitz c9a2ae533f tamer: xir (XmlWriter)[write_new]: Correct #[must_use] declaration
The return value has no meaningful side-effects at all; the write operation
failing isn't worth pointing out, since it has to be used regardless.

The normal `write` does have useful side-effects, of course.
2021-08-20 11:38:58 -04:00
Mike Gerwitz 59d578e669 tamer: xir (XmlWriter)[write_new]: New method
This change was primarily intended to clean up unit tests.  Since it
allocates and returns a new buffer, I do not expect this to have much use
within TAMER itself in the near future.  Maybe in later tooling.

If this is abused, person from the future: add `#[cfg(test)]` to its
definition.
2021-08-20 11:37:01 -04:00
Mike Gerwitz cd1eae95ca tamer: xir: {NodeStream=>Token}
I decided not to do this in a previous commit because I had documented
"NodeStream" elsewhere, so I'd like it to be in the Git history to
understand its evolution.

This never was a "Node" stream beyond the initial concept phase, because it
represents tokens that aren't themselves nodes.  It is intended to generate
XML nodes, but may need to accommodate non-nodes (e.g. XML declarations) in
the future.

The name originated from `Node`, which was a tree-based IR that was
initially conceived, but removed because it's not yet needed.  What we need
is a streaming IR for xmle writing, and then for reading and echoing back
out XML for the new frontend.
2021-08-20 10:30:27 -04:00