Commit Graph

519 Commits (53a689741b8d99dc5243799452e01b00cc3d25e5)

Author SHA1 Message Date
Mike Gerwitz adc45d90df tamer: xir::parse: Attribute parser generator
This is the first parser generator for the parsing framework.  I've been
waiting quite a while to do this because I wanted to be sure that I
understood how I intended to write the attribute parsers manually.  Now that
I'm about to start parsing source XML files, it is necessary to have a
parser generator.

Typically one thinks of a parser generator as a separate program that
generates code for some language, but that is not always the case---that
represents a lack of expressiveness in the language itself (e.g. C).  Here,
I simply use Rust's macro system, which should be a concept familiar to
someone coming from a language like Lisp.

This also resolves where I stand on parser combinators with respect to this
abstraction: they both accomplish the exact same thing (composition of
smaller parsers), but this abstraction doesn't do so in the typical
functional way.  But the end result is the same.

The parser generated by this abstraction will be optimized an inlined in the
same manner as the hand-written parsers.  Since they'll be tightly coupled
with an element parser (which too will have a parser generator), I expect
that most attribute parsers will simply be inlined; they exist as separate
parsers conceptually, for the same reason that you'd use parser combinators.

It's worth mentioning that this awkward reliance on dead state for a
lookahead token to determine when aggregation is complete rubs me the wrong
way, but resolving it would involve reintroducing the XIR AttrEnd that I had
previously removed.  I'll keep fighting with myself on this, but I want to
get a bit further before I determine if it's worth the tradeoff of
reintroducing (more complex IR but simplified parsing).

DEV-7145
2022-06-21 13:23:02 -04:00
Mike Gerwitz 9598532d8b tamer: xir::st: Add missing docs for generated QName constants
This was missed.  It was not possible, using the documentation
alone (without looking at the linked source) to tell what the QName actually
represented, though you could assume by the name.

DEV-7145
2022-06-21 13:23:01 -04:00
Mike Gerwitz 3f23bc5e33 tamer: fmt: New type-based formatting system
This is partly an experiment, but is designed to simplify producing English
sentences in various contexts.  It makes use of a not only unstable, but
incomplete, Rust feature---adt_const_params, for a static str const type
parameter.  Hopefully that ends up being stabalized.

This uses types, but it's the same as function composition due to Rust's
monomorphization.

DEV-7145
2022-06-10 16:28:15 -04:00
Mike Gerwitz f7752436da tamer: parse::Parser: Add remaining field docs
DEV-7145
2022-06-07 15:23:20 -04:00
Mike Gerwitz 3c227e5a2d tamer: parse::ParseState: Remove Default trait bound
`ParseState` originally required `Default` for use with `mem::take` in
`Parser::feed_tok`.  This unfortunately cannot last, since more specialized
parsers require context during initialization in order to provide useful
diagnostic information.  (The other option is to require the caller to
augment errors with diagnostic information, but that would have to be
duplicated by every caller and complicates parser composition; I'd prefer
those diagnostic details remain encapsulated.)

Replacing `Default` with `Option` is uglier, but it ends up producing the
same assembly as `mem::take` did, at least at the time of writing.  Because
Rust is able to elide unnecessary moves using this implementation, there is
no need for `unwrap_unchecked` or other unsafe methods, which is great,
since it shows that this parsing methodology is viable entirely in safe
Rust.

DEV-7145
2022-06-07 15:08:40 -04:00
Mike Gerwitz f14ffc87c2 tamer: parse::state::ParseState::DeadToken: New associated type
Previously, `ParseStatus::Dead` always yielded
`ParseState::Token`.  However, I'm working on introducing parsers that
aggregate (parsing XML attributes into structs), and those parsers do not
know that they have completed aggregation until they reach a dead state;
given that, I need to yield additional information at that time.

I played around with a number of alternative ideas, but this ended up being
the cleanest, relative to the effort involved.  For example, introducing
another parameter to `ParseStatus::Dead` was too burdensome on APIs that
ought not concern themselves with the possibility of receiving an object in
addition to a lookahead token, since many parsers are not capable of doing
so (given that they map M:(N<=M)).

Another option that I abandoned fairly quickly was having
`is_accepting` (potentially renamed) return an aggregate object, since
that's on the side and didn't feel like it was part of the parsing pipeline.

The intent is to abstract this some in a new `ParseState` method for
delegation + aggregation.

DEV-7145
2022-06-07 09:37:41 -04:00
Mike Gerwitz 495c1438fd tamer: Consistent span diagram representation
I'll document it more formally eventually, but this settles on a mix of the
two: square brackets and dashes for intervals, `+` for intersecting lines,
byte offsets below interval endpoints, and names below that.

The docblock for `Span` itself iss still off; I'll probably just take one of
the test cases and paste it there at some point.

DEV-7145
2022-06-06 11:32:35 -04:00
Mike Gerwitz bba181f573 tamer: xir::attr::Attr: Introduce AttrSpan
This replaces a tuple with a tuple struct that allows for calculating more
complete span information, such as the span encompassing the entire
attribute and the value span including the surrounding quotes.

This includes logic that ought to be abstracted into `Span` itself, and it's
not as formal as I'd like it to be (e.g. not ensuring context), but this is
a good starting point.

Note that parsers call `Token::span`, which in turn calculates the attribute
span, each time an attribute is encountered during lowering.  But Rust does
a good job at optimizing away unnecessary operations, so this didn't have an
observable impact on time.

DEV-7145
2022-06-06 11:31:28 -04:00
Mike Gerwitz 2b8e7e6031 tamer: xir::st::qname: New module
This moves and deduplicates the static `QName`s into a common area.

DEV-7145
2022-06-06 11:31:27 -04:00
Mike Gerwitz 3da82b351e tamer: xir::flat::{State=>XirToXirf}: Rename
Like the previous two commits, this states the intent of this parser, which
results in more clear pipeline composition.

DEV-7145
2022-06-02 13:48:54 -04:00
Mike Gerwitz 91b55999e2 tamer: asg::air::{AirState=>AirAggregate}: Rename
Like the previous commit, this emphasizes what is happening.

DEV-7145
2022-06-02 13:26:46 -04:00
Mike Gerwitz 45bbf3879e tamer: obj::xmlo::{lower=>air}: Rename {LowerState=>XmloToAir}
This provides much more clarity as to what is going on.  Further, it's less
ambiguous, since I'm about to introduce a new type of xmlo lowering into XIR
for writing the actual xmlo files.

DEV-7145
2022-06-02 13:23:41 -04:00
Mike Gerwitz 8d92667388 tamer: Integrate xir::reader as a parser in the lowering pipeline
This allows `XmlXirReader` to be used in a `Lower` operation, just as
everything else, bringing me one step closer to a pipeline that can be
concisely represented; this is finally beginning to unify in a clear way,
though it is still a bit of a mess.

This causes `XmlXirReader` to _act_ like a `parse::Parser` in that it yields
a `ParsedResult`, but it does not use `parse::Parser` itself; that was the
_original_ plan: convert it into a `ParseState` where `XmlXirReader` became
a context, and force `Parser` to yield by feeding it a stream of tokens with
`repeat`, but that ended up performing poorly relative to this change.  I
did some investigation, which I might write about in the future, but for
now, this solution works just fine.

DEV-7145
2022-06-02 10:30:44 -04:00
Mike Gerwitz f8c28655dc tamer: parse: Split into multiple modules
This abstraction has grown quite a bit, and it's time to start formalizing
it a bit.  This split doesn't change any behavior, but it does start to make
it easier to reason about by clearly stating the broad components and how
they interact with one-another.

This doesn't yet move the tests; those will come next, but they are very
few. The reason I gave previously for this was because (a) they're tested
indirectly via the systems that utilize them and (b) because the abstraction
was not yet settled on the process was already very expensive.  No test
coverage was lost---it's only that failures were potentially harder to debug
on test failures, but in practice not even this was true, because the deeply
expressive types all but ensured that, if it compiles, it will function in a
way that is expected.  Unit tests and documentation for this system will be
added once I'm sure that this abstraction is in a proper state.

DEV-7145
2022-06-01 11:32:58 -04:00
Mike Gerwitz 63aa452197 tamer: parse: Move parse::lower into Lower
This also modifies `poc` such that `Lower` is invoked as an associated
function rather than a method to emphasize the pattern that is forming, so
that it can be later abstracted away.

DEV-11864
2022-06-01 11:15:43 -04:00
Mike Gerwitz f40f8bbafc tamer: parse: Rename {lower_*_while_ok=>lower_*}
The `while_ok` can just be implied with a lowering operation, and that
reduces the name complexity so that we can maybe introduce even more
specialized methods without resulting in a huge sentence as a name.

DEV-11864
2022-05-27 14:10:55 -04:00
Mike Gerwitz b084e23497 tamer: Refactor asg_builder into obj::xmlo::lower and asg::air
This finally uses `parse` all the way up to aggregation into the ASG, as can
be seen by the mess in `poc`.  This will be further simplified---I just need
to get this committed so that I can mentally get it off my plate.  I've been
separating this commit into smaller commits, but there's a point where it's
just not worth the effort anymore.  I don't like making large changes such
as this one.

There is still work to do here.  First, it's worth re-mentioning that
`poc` means "proof-of-concept", and represents things that still need a
proper home/abstraction.

Secondly, `poc` is retrieving the context of two parsers---`LowerContext`
and `Asg`.  The latter is desirable, since it's the final aggregation point,
but the former needs to be eliminated; in particular, packages need to be
worked into the ASG so that `found` can be removed.

Recursively loading `xmlo` files still happens in `poc`, but the compiler
will need this as well.  Once packages are on the ASG, along with their
state, that responsibility can be generalized as well.

That will then simplify lowering even further, to the point where hopefully
everything has the same shape (once final aggregation has an abstraction),
after which we can then create a final abstraction to concisely stitch
everything together.  Right now, Rust isn't able to infer `S` for
`Lower<S, LS>`, which is unfortunate, but we'll be able to help it along
with a more explicit abstraction.

DEV-11864
2022-05-27 13:51:29 -04:00
Mike Gerwitz eafb3b2a1b tamer: Add Display impl for each ParseState for generic ParseErrors
This is intended to describe, to the user, the state that the parser is
in.  This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.

This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.

DEV-11864
2022-05-25 15:26:02 -04:00
Mike Gerwitz 9edc32dd3b tamer: parse::LowerIter: Generic inner TripIter iterator
This commit is preparing to compose LowerIter directly.

DEV-11864
2022-05-24 10:27:14 -04:00
Mike Gerwitz f218c452b9 tamer: iter::trip: Flatten Result
The `*_iter_while_ok` functions now compose like monads, flattening `Result`
at each step and drastically simplifying handling of error types.  This also
removes the bunch of `?`s at the end of the expression, and allows me to use
`?` within the callback itself.

I had originally not used `Result` as the return type of the callback
because I was not entirely sure how I was going to use them, but it's now
clear that I _always_ use `Result` as the return type, and so there's no use
in trying to be too accommodating; it can always change in the future.

This is desirable not just for cleanup, but because trying to refactor
`asg_builder` into a pair of `Parser`s is really messy to chain without
flattening, especially given some state that has to leak temporarily to the
caller.  More on that in a future commit.

DEV-11864
2022-05-20 16:08:16 -04:00
Mike Gerwitz 958a707e02 tamer: asg: Hoist Root from Ident into Object
This was always the intent, but I didn't have a higher-level object
yet.  This removes all the awkwardness that existed with working the root
in as an identifier.

DEV-11864
2022-05-19 12:48:43 -04:00
Mike Gerwitz 6252758730 tamer: asg::Object: Introduce Object::Ident
This wraps `Ident` in a new `Object` variant and modifies `Asg` so that its
nodes are of type `Object`.

This unfortunately requires runtime type checking.  Whether or not that's
worth alleviating in the future depends on a lot of different things, since
it'll require my own graph implementation, and I have to focus on other
things right now.  Maybe it'll be worth it in the future.

Note that this also gets rid of some doc examples that simply aren't worth
maintaining as the API evolves.

DEV-11864
2022-05-19 12:33:59 -04:00
Mike Gerwitz f75f1b605e tamer: num: Header typo correction 2022-05-19 12:02:38 -04:00
Mike Gerwitz ebf1de5a60 tamer: asg::Ident{Object=>}: Rename
I think this may have been renamed _from_ `Ident` some time ago, but I'm too
lazy to check.  In any case, the name is redundant.

DEV-11864
2022-05-19 11:17:04 -04:00
Mike Gerwitz 7d76cb53f6 tamer: asg: Move SymAttrs conversion into asg_builder
This is a lowering operation and does not belong here.

What a tangled mess this all was (see recent commits); no wonder it was so
confusing.

DEV-11864
2022-05-19 11:07:15 -04:00
Mike Gerwitz eae194abc6 tamer: asg::object: Merge into asg::ident
Everything in this file relates to identifiers, and I'm about to introduce a
higher-level object, one of which may be an identifier.

DEV-11864
2022-05-19 11:05:20 -04:00
Mike Gerwitz 92dba0a28c tamer: obj::xmlo::asg_builder::IdentKindError: Merge into AsgBuilderError
Now that these are in the same module, there's no need for them to be
separate from one-another.

DEV-11864
2022-05-19 10:56:07 -04:00
Mike Gerwitz 07d2ec1ffb tamer: Move Dim and {Sym=>}Dtype into num module
A previous commit mentioned that there's not a place for `Dim`, and
duplicated it between `asg` and `xmlo`.  Well, `Dtype` is also needed in
both, and so here's a home for now.

`Dtype` has always been an inappropriate detail for the system and will one
day be removed entirely in favor of higher-level types; the machine
representation is up to the compiler to decide.

DEV-11864
2022-05-19 10:39:21 -04:00
Mike Gerwitz b2a79e930b tamer: Move SymAttrs lowering into asg_builder
asg_builder is about to be replaced, but in the process of simplifying the
destination IR (the ASG), I'm moving things into the proper place.  This
never belonged here---it belongs with the actual lowering operation.

Previously, this was not reasoned about in terms of a lowering operation,
and was written when I was first introducing myself to Rust and trying to
get a proof-of-concept linker working.

DEV-11864
2022-05-19 10:28:17 -04:00
Mike Gerwitz 8948452b71 tamer: asg::ident::Dim: Narrow type
This matches xmlo::Dim, and could be the same thing, if we can find a home
for it in the future; it's not worth creating such a home right now when I'm
not yet sure what else ought to live there; the duplication may be fine.

The conversion from xmlo needs to be moved, and `Dim` is going to be used
for more than just identifiers (expressions will have type inference
performed).

DEV-11864
2022-05-19 09:32:43 -04:00
Mike Gerwitz 263cb68380 tamer: parse: Persistent context
This allows retrieving and providing a context to a `Parser`.  This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.

This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.

DEV-11864
2022-05-18 16:15:09 -04:00
Mike Gerwitz 001499d921 tamer: parse::ParseError: Remove Eq trait bound
Just as in other commits, since it's an unnecessary limitation.

DEV-11864
2022-05-18 16:06:22 -04:00
Mike Gerwitz 3e277270a7 tamer: asg: Track roots on graph
Previously, since the graph contained only identifiers, discovered roots
were stored in a separate vector and exposed to the caller.  This not only
leaked details, but added complexity; this was left over from the
refactoring of the proof-of-concept linker some time ago.

This moves the root management into the ASG itself, mostly, with one item
being left over for now in the asg_builder (eligibility classifications).

There are two roots that were added automatically:

  - __yield
  - __worksheet

The former has been removed and is now expected to be explicitly mapped in
the return map, which is now enforced with an extern in `core/base`.  This
is still special, in the sense that it is explicitly referenced by the
generated code, but there's nothing inherently special about it and I'll
continue to generalize it into oblivion in the future, such that the final
yield is just a convention.

`__worksheet` is the only symbol of type `IdentKind::Worksheet`, and so that
was generalized just as the meta and map entries were.

The goal in the future will be to have this more under the control of the
source language, and to consolodate individual roots under packages, so that
the _actual_ roots are few.

As far as the actual ASG goes: this introduces a single root node that is
used as the sole reference for reachability analysis and topological
sorting.  The edges of that root node replace the vector that was removed.

DEV-11864
2022-05-17 10:42:05 -04:00
Mike Gerwitz 34eb994a0d tamer: asg::Asg::set_fragment: {ObjectRef=>SymbolId}
In the actual implementation (outside of tests), this is always looking up
before adding the symbol.  This will simplify the API, while still retaining
errors, since the identifier will fail the state transition if the
identifier did not exist before attempting to set a fragment.  So while this
is slower in microbenchmarks, this has no effect on real-world performance.

Further, I'm refactoring toward a streaming ASG aggregation, which is a lot
easier if we do not need to perform lookups in a separate step from the
ASG's primitives.

DEV-11864
2022-05-16 13:14:27 -04:00
Mike Gerwitz c49d87976d tamer: parse::Token: Remove Eq trait bound
`PartialEq` remains, and is all that is needed.  See previous commit
regarding the removal of this same bound from `Context`.

This can be re-added if it ends up actually being necessary.  But Tokens are
ephemeral and used only in lowering pipelines, using pattern matching.

DEV-11864
2022-05-16 10:05:14 -04:00
Mike Gerwitz d87006391e tamer: asg::object: Remove IdentObjectState, IdentObjectData
These traits are no longer necessary now that I'm using concrete types; they
just add unnecessary noise and confusion as I attempt to further refactor.

Don't abstract prematurely.

DEV-11864
2022-05-12 16:31:36 -04:00
Mike Gerwitz 3748762d31 tamer: asg::graph::Asg: Remove type parameter O
This removes the generic on the Asg (which was formerly BaseAsg),
hard-coding `IdentObject`, which will further evolve.  This makes the IR an
actual concrete IR rather than an abstract data structure.

These tests bring me back a bit, since they were written as I was still
becoming familiar with Rust.

DEV-11864
2022-05-12 15:46:17 -04:00
Mike Gerwitz f2c5443176 tamer: asg: Remove generic Asg, rename {Base=>}Asg
This is the beginning of an incremental refactoring to remove generics, to
simplify the ASG.  When I initially wrote the linker, I wasn't sure what
direction I was going in, but I was also negatively influenced by more
traditional approaches to both design and unit testing.

If we're going to call the ASG an IR, then it needs to be one---if the core
of the IR is generic, then it's more like an abstract data structure than
anything.  We can abstract around the IR to slice it up into components that
are a little easier to reason about and understand how responsibilities are
segregated.

DEV-11864
2022-05-11 16:47:13 -04:00
Mike Gerwitz 0493e68cb3 tamer: parse::ParseState::Context: Add missing comment
DEV-11864
2022-05-10 11:06:22 -04:00
Mike Gerwitz 0ef0d2b553 tamer: parse::ParseState:Error: Relax Eq trait bound
This is unnecessarily restrictive, since we do not require anything further
than `PartialEq` for the situations where we care about equality (tests).

DEV-11864
2022-05-06 15:28:47 -04:00
Mike Gerwitz 9f990e19e9 tamer: parse::ParseState::Context: Remove Default trait bound
This is too restrictive, especially for parsers that fold into something,
like the ASG, which may exist prior to invoking the parser.

This moves the trait bound to the functions that actually need it.  Those
obviously cannot be used if the Context does not implement `Default`, but
I'll provide alternative conveniences.

DEV-11864
2022-05-05 15:55:04 -04:00
Mike Gerwitz ba9f429ee7 tamer: obj::xmlo::{XmloEvent=>XmloToken}
The original "event" name was based on quick-xml's `Event`.  This
terminology shift is more closely matched with the new parsing system.

DEV-11864
2022-05-05 12:25:59 -04:00
Mike Gerwitz 0281dfdf0d tamer: Remove wip-frontends feature flag
We want the new system to be used so that we can start catching any problems
that may arise.  Further changes will be flagged as necessary.

DEV-10936
2022-05-04 09:37:10 -04:00
Mike Gerwitz 1ad2fb1dc8 Copyright year update 2022
RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no
"Group"), but I'm not sure if the legal name has been changed yet or not, so
I'll wait on that.
2022-05-03 14:14:29 -04:00
Mike Gerwitz 34fcd19cd0 tamer: obj::xmlo::reader: Replace todo! with error
These are no longer TODOs---they represent invalid tokens.

I'm going to put effort into providing further context with the diagnostic
system [right now] because these are internal errors caused by either
miscompilation or an incomplete reader.

DEV-10936
2022-05-03 09:19:47 -04:00
Mike Gerwitz 5875477efa tamer: xir::Token: Remove span from Display
This was missed when removing it from other Display impls when the new
diagnostic system was introduced.  Raw `Span`s display byte offsets and the
context, which is no longer desirable as part of an error message.

DEV-10936
2022-05-03 09:09:55 -04:00
Mike Gerwitz a2e6e37ed1 tamer: Bump nightly Rust version 1.{57=>62}
This removes a couple of feature flags that are no longer necessary.
2022-05-02 11:05:32 -04:00
Mike Gerwitz 7248ef77e4 tamer: diagnose::resolve{r=>}: Rename
Consistent with naming of other modules, which prefers to not needlessly
transform words.

DEV-12151
2022-05-02 09:49:22 -04:00
Mike Gerwitz 75b966c577 tamer: diagnose: Additional documentation
I had waited to provide more documentation until I was sure that the
abstraction was not going to change significantly; there was a lot of
refactoring in prior commits.

DEV-12151
2022-05-02 09:44:53 -04:00
Mike Gerwitz fc1dad8483 tamer: diagnose::report::Section: Further refactor resolved constructor
This speaks for itself.

DEV-12151
2022-04-29 15:54:38 -04:00
Mike Gerwitz ba0ceddd2d tamer: diagnose::report::Section: Constructor refactoring
This moves construction out of `From` and into separate associated
functions, which can be further simplified in a bit.

We also need unit tests for this, since this still relies on integration
tests due to the cost of the aggressive and tight refactoring iterations.

DEV-12151
2022-04-29 13:10:04 -04:00
Mike Gerwitz 3e04217741 tamer: diagnose::report::Section::maybe_squash_into: Remove syslabel TODO
Previously, when adjacent duplicate spans were both resolved, if one failed,
the other certainly would, which would result in duplicate labels each
squash.  Elided spans do not have syslabels, and so this is no longer a
concern.

DEV-12151
2022-04-29 13:07:51 -04:00
Mike Gerwitz 2ae6df38e7 tamer: diagnose::report: Restore source line preview for invalid UTF-8
This was removed in a previous commit while working on simplifying the
implementation, with the hope of returning to it once things were in a
better place.  They are, so let's bring it back.

DEV-12151
2022-04-29 12:41:56 -04:00
Mike Gerwitz f8dda12fae tamer: diagnose::report: Remove TODOs that are no longer applicable
These relate to the most recent commits.

DEV-12151
2022-04-29 12:34:48 -04:00
Mike Gerwitz 2ce0dbdd84 tamer: diagnose::report::SpanLabel: Remove in favor of separate Level and Label
`SpanLabel` was created during a very early refactoring of this system, and
I've just been fighting with it sense.  This removes it, and simplifies
some things in the process.

It also makes clear that `Level` is never optional and removes the awkward
`Level::default` that was there previously; the default is now the lowest
level, which will always be able to be escalated.

DEV-12151
2022-04-29 12:13:11 -04:00
Mike Gerwitz 9a5a2c4f3f tamer: diagnose::report: Avoid re-resolving adjacent identical spans
This does what the original proof-of-concept implementation did---skip a
span that was just processed, since it'll be squashed into the previous
anyway.  These duplicate spans originate from the diagnostic system when
producing supplemental help information.

DEV-12151
2022-04-29 11:57:50 -04:00
Mike Gerwitz a533244473 tamer: diagnose::report::VisualReporter::render: Avoid mspan collection
This used to be necessary when `Report` stored references to heap-allocated
strings, but `Report` now owns those values itself.

DEV-12151
2022-04-29 09:53:22 -04:00
Mike Gerwitz b0a5265ad3 tamer: diagnose::report::test: Extract into separate file
Tests are large and will be getting larger.  The source will also grow as
it's better documented and cleaned up.  It's getting more difficult to
navigate efficiently and concurrently modify implementation and tests, and
parsing via LSP is getting slower with certain types of changes.

DEV-12151
2022-04-29 09:23:06 -04:00
Mike Gerwitz 5c0e224d3c tamer: diagnose::report: Line numbers in gutter
Alright, starting to settle on an abstraction now, and things are coming
together.  This gives us line numbers in the previously-empty gutter, and
widens the gutter to accommodate.  Gutters are normalized across
sections.  Sections are not yet collapsed for sequential line numbers in the
same context.

Exciting!

Here's an example, on an xmlo file:

error: expected closing tag for `preproc:symtable`
     --> /home/.../foo.xmlo:16:4
      |
   16 |    <preproc:symtable xmlns:map="http://www.w3.org/2005/xpath-functions/map">
      |    ----------------- note: element `preproc:symtable` is opened here

     --> /home/.../foo.xmlo:11326:4
      |
11326 |    </preproc:wrong>
      |    ^^^^^^^^^^^^^^^^ error: expected `</preproc:symtable>`

DEV-12151
2022-04-28 23:53:38 -04:00
Mike Gerwitz 5744e08984 tamer: diagnostic::report: Hoist gutter output into Section
The `Section` itself is now responsible for outputting the gutter, which
puts us in a position to be able to apply consistent formatting without
having to propagate width data to every line variant.
2022-04-28 22:59:13 -04:00
Mike Gerwitz 4e03a367a5 tamer: diagnose::report::SourceLine: Separate variants for each line
Now `SourceLine` _does_ actually correspond to a line of output, which will
allow for better formatting (e.g. collapsing padding) and, importantly,
proper management of gutters.

Note that the seemingly unnecessary `SectionSourceLine` allows for a subtle
consistent formatting for all variants' gutters in `SectionLine`, which will
allow us to hoist that rendering out in the next commit.  The other option
was to include a trailing space for padding and marks, but that is not only
sloppy and undesirable, but asking for confusion, especially in editors (like
mine) that trim trailing whitespace.

DEV-12151
2022-04-28 22:49:35 -04:00
Mike Gerwitz fd1c6430a8 tamer: diagnose::report::SectionSourceLine: {Option<Column>=>Column}
If a column isn't present, it degrades to displaying labels like footnotes
anyway, so this simplifies the system rather than catering to a rare
case.  With that said, this does lose functionality, since it does not
render the source line at all, even though we _could_ do so.

I may re-introduce that rendering after some further refactoring,
specifically for gutters.

DEV-12151
2022-04-28 22:23:58 -04:00
Mike Gerwitz 3a5dcfc016 tamer: diagnose::resolver::SourceLine: {Vec<u8>=>String}
Using a byte vector just makes life more difficult with regard to preparing
the diagnostic reports.  We're already validating UTF-8 data for column
generation, which is necessary for a robust report, so let's just store it
as a String to begin with.

DEV-12151
2022-04-28 22:03:37 -04:00
Mike Gerwitz 838db689ad tamer: diagnose::report: Render labels on mark line
Note that, if a span is first encountered with a mark but with _no_ label,
the first label (if collapsed) will be on the next line.  This allows a span
to be marked without extra visual noise if it's not necessary, and to be
able to trust that it'll stay that way.

Until coloring is introduced, this may or may not be easier to read
depending on context.

This is also not yet taking into account where on the line it begins, and so
may render poorly if the span is at the end of a line.  That will be fixed
later on.

DEV-12151
2022-04-28 16:23:13 -04:00
Mike Gerwitz a197267a2d tamer: xir::flat: Remove closing tag name from label
This is now visible in the diagnostic output.  Example at this point in
time, on an xmlo file for one of our smallest systems:

error: expected closing tag for `preproc:symtable`
  --> /home/.../foo.xmlo:16:4
   |
   |    <preproc:symtable xmlns:map="http://www.w3.org/2005/xpath-functions/map">
   |    -----------------
   = note: element `preproc:symtable` is opened here

  --> /home/.../foo.xmlo:11326:4
   |
   |    </preproc:wrong>
   |    ^^^^^^^^^^^^^^^^
   = error: expected `</preproc:symtable>`

DEV-12151
2022-04-28 15:47:34 -04:00
Mike Gerwitz 33baca113a tamer: diagnose::report: Vary mark character depending on level
Looking more and more Rust-like.  Shameless copy.

TBH I forget what character it uses for help, but it's easy enough to
change.

Also, to be clear: this is modeled after Rust, but it's not a requirement of
mine that it look exactly like it.  I just like the general style; I'll
surely deviate over time, as appropriate (or as I feel like it).

DEV-12151
2022-04-28 15:44:50 -04:00
Mike Gerwitz 8119d1ca0d tamer: diagnose::report: Render span marks under lines
This has the effect of highlighting the columns of the source lines using
'^' as an underline.

The next step will be to have the underline character depend on the
`Level`.

If this commit message doesn't sound all that exciting, given what it
finally achieved after all this time, it's because I'm exhausted, and my
prototype has already taken my excitement.  But this is significant, given
all the work leading up to it.

There is some code cleanup needed and some unit tests that ought to be
written rather than relying on integration, but considering how much this is
being refactored, I don't want to add to that refactoring cost just yet
before gutters are introduced and I know things are settled for now.

DEV-12151
2022-04-28 15:44:49 -04:00
Mike Gerwitz 5db026ed76 tamer: diagnose::report: Initial display of source lines
This has been a lot of refactoring for something that I prototyped a week
ago, and the prototype is still further along in its output formatting (it
has line numbering in gutters and span markings).

But, this has come a long way, and I'm happy with it overall, though I'm not
happy with my slow pace and struggle to maintain focus.  But those are
personal issues.

This leaves a lot to be desired, but at the same time is still really
helpful.  There's a couple notable TODOs regarding pointless allocation and
UTF8 re-checking, but otherwise, the feature-related steps are:

  - Gutters with line numbers; and
  - Marking columns associated with the span.

DEV-12151
2022-04-28 14:33:08 -04:00
Mike Gerwitz 3e06c9aaf3 tamer: diagnose::report: Prepare Section for output of source lines
This lowers the resolved span data into `Section` for display.  The next
step is to actually output it.

DEV-12151
2022-04-28 13:34:05 -04:00
Mike Gerwitz 331aada2bd tamer: diagnose::report::MaybeResolvedSpan: Move up in file
Just rearranging, since this was awkwardly placed relative to where it's
used.

DEV-12151
2022-04-28 11:00:36 -04:00
Mike Gerwitz 6a5a29c2f5 tamer: diagnose::report: Remove Section variants and eagerly squash
Rather than squashing as a separate operation, and explicitly denoting when
it occurred, we'll just always squash, as was done before these changes.  It
doesn't really make sense to make this optional and there's not any value in
keeping the decision around.

This also sets us up favorably for future changes: it creates a vector of
labels, which can be analyzed later to determine how to best lay out marks
and labels.

DEV-12151
2022-04-28 10:30:04 -04:00
Mike Gerwitz c8d919d0cc tamer: diagnose::report: {'l=>'d}
Just renames the lifetime to refer to the `Diagnostic`, rather than a
`Label` returned by it, which was all `'l` was previously used for.

Note that many labels have a `'static` lifetime; this doesn't change that or
somehow cause it to reallocate; the label must life _for at least `'d`_.

DEV-12151
2022-04-27 15:20:16 -04:00
Mike Gerwitz e2c68c5e84 tamer: diagnose::report: Avoid message copy
Rather than rendering the diagnostic `Display` message to a string only to
copy it to yet another buffer later on, this simply stores a reference to
the `Diagnostic` that was provided.  This also adds a type to the `Report`
associating it with the provided `Diagnostic`, which does seem appropriate,
given that the report was produced for it.

I should probably rename '{l=>d} now.

DEV-12151
2022-04-27 15:20:14 -04:00
Mike Gerwitz 3dbab881da tamer: diagnose::report: Produce Report object
Rather than writing to the provided `Write` object, this produces a `Report`
object.  While a lifetime still exists for the diagnostic data (labels,
specifically), I was able to remove the other lifetime resulting from
`ResolvedSpan` by transferring ownership of the data to the `Report`
itself.  Once actual source lines are integrated shortly, `Report` will
include those as well.

This has been a tedious process, but it's coming together.  Hopefully these
commits documenting the progressive and ugly refactoring are found useful by
some reader in the future.

DEV-12151
2022-04-27 15:00:30 -04:00
Mike Gerwitz 3679ff590c tamer: diagnose::report: Remove `L` type parameter
The line number was getting special treatment that is simply not worth the
cost (with regards to how burdensome it is on the type definitions).  This
simplifies things quite a bit.

If we want header customization in the future, we can worry about that in a
different way, or allow the header as a whole to be swapped out, rather than
its constituents.

DEV-12151
2022-04-27 14:23:58 -04:00
Mike Gerwitz 589f5e8c58 tamer: diagnose::report::HeadingLineNum: Compose HeadingColNum
`HeadingColNum` is no longer constructed by `HeadingLineNum`.  This both
narrows the types and required data (e.g. removing dummy values in test
cases), and reduces the coupling (by favoring composition, but still coupled
with the concrete type).

DEV-12151
2022-04-27 11:43:46 -04:00
Mike Gerwitz 7dbe25be05 tamer: diagnose::report::HeadingLineNum: Lower MaybeResolvedSpan
Same as the previous commit with `HeadingColNum`---this removes the
dependency on `MaybeResolvedSpan`.

DEV-12151
2022-04-27 11:28:17 -04:00
Mike Gerwitz 68f9f4d241 tamer: diagnose::report::HeadingColNum: Lower MaybeResolvedSpan
This eliminates `MaybeResolvedSpan` from `HeadingColNum`, along with its
type parameters and lifetimes.

DEV-121251
2022-04-27 11:10:16 -04:00
Mike Gerwitz f29918b5a0 tamer: diagnose::report: Continue refactoring into report components
I'm unhappy with the current state of this, which is why I haven't settled
on docs or unit tests for these changes yet (though note that the
integration tests do cover these changes)---this is still a prototype
refactoring.

In particular, this needs to do more lowering---the `ResolvedSpan` and
`MaybeResolvedSpan` need to be eliminated and lowered into exactly what is
needed so that we can stop reasoning about them and propagating them.

Further, having lines and columns lazily evaluate themselves for
display---based on `MaybeResolvedSpan`---adds extra generics that shouldn't
be necessary; they should be pre-computed and store the concrete data they
need in variants.  Display shouldn't involve computation beyond formatting
of pre-computed data.

That was always the plan, but this refactoring has been incremental.

Anyway: this is in a working and integration-tested state, but it's going to
change.

DEV-12151
2022-04-27 10:48:41 -04:00
Mike Gerwitz e2f9d71c1f tamer: diagnose::report: Refined report components
This generalizes the types a bit more and introduces unit tests.  Note that
these are still also covered by integration tests.

The next step will be to finish generalizing
`<VisualReporter as Reporter>::render`, after which I'll get back to the
task of outputting the source line along with markings and labels.

DEV-12151
2022-04-26 13:26:52 -04:00
Mike Gerwitz d05bcaab03 tamer: {Resolved,Span}::{ctx=>context}: Rename
This is just to provide clarity.  `ctx` is not so widely used that we
benefit from such a short identifier, and it's not worth the cognitive
burden of people unfamiliar with what it may mean.

DEV-12151
2022-04-26 10:52:32 -04:00
Mike Gerwitz 16d76b95d0 tamer: diagnose::resolver::ResolvedSpanData: New trait
This provides the methods originally implemented on `ResolvedSpan` itself,
which will allow for mocking for unit testing.

DEV-12151
2022-04-26 10:46:47 -04:00
Mike Gerwitz 0928427116 tamer: diagnose::resolver::Column::At: Remove
This is redundant with the `Endpoints` variant, although it did read
better.  It's just another case to have to handle.

I was originally going to use `std::ops::RangeInclusive` for `Endpoints`,
however that struct also contains an extra bool indicating whether it was
exhausted (as an iterator), which isn't appropriate for this.

DEV-12151
2022-04-26 10:30:07 -04:00
Mike Gerwitz ec93488365 tamer: diagnost::resolver::ResolvedSpan: Clear methods for all data
This (a) makes it clear the intent of these methods and (b) will allow
introducing a trait for mocking it.

DEV-12151
2022-04-26 10:22:31 -04:00
Mike Gerwitz b9ff7770aa tamer: diagnose::report: Begin refactoring into Display impls
This logic is still covered by the integration tests; I'll be adding unit
tests once it's decoupled to the point where that's possible, which should
be shortly, and after I make sure this is the route I do want to go down.

DEV-12151
2022-04-26 10:14:51 -04:00
Mike Gerwitz c0ace258f0 tamer: diagnose::resolver::SourceLine:: Guarantee non-empty lines
This simplifies types and error handling since we will always have at least
one line, provided that the span is within the range of the context.  To
ensure that, this patch introduces a new error.

DEV-12151
2022-04-22 16:50:16 -04:00
Mike Gerwitz 56b8aec9b7 tamer: diagnose::resolver::test: Extract into own file
There's just a lot here.

DEV-12151
2022-04-22 15:31:12 -04:00
Mike Gerwitz 2e0925627e tamer: diagnose::Label: Introduce lifetime and inner Cow
I did not initially introduce lifetimes because I wasn't sure how the system
was going to evolve, but now lifetimes are going to be needed in a number of
contexts.  The core of TAMER is able to avoid lifetimes in most instances
because of its internment system, but its use is not appropriate for the
diagnostic system's buffers (beyond sourcing strings from already-interned
data).

DEV-12151
2022-04-22 13:23:53 -04:00
Mike Gerwitz aeff7aeed3 tamer: diagnose::test: Extract into own file
This is going to get quite large over time.

DEV-12151
2022-04-22 09:21:18 -04:00
Mike Gerwitz 596c9de85e tamer: diagnose::resolver::SourceLine (line=>num): Rename
`line.line` was rather confounding.

DEV-12151
2022-04-21 15:47:15 -04:00
Mike Gerwitz 5b1f0ab6c6 tamer: diagnostic: Column resolution
Determining the column number is not as simple as performing byte
arithmetic, because certain characters have different widths.  Even if we
only accepted ASCII, control characters aren't visible to the user.

This uses the unicode-width crate as an alternative to POSIX wcwidth, to
determine (hopefully) the number of fixed-width cells that a unicode
character will take up on a terminal.  For example, control characters are
zero-width, while an emoji is likely double-width.  See test cases for more
information on that.

There is also the unicode-segmentation crate, which can handle extended
grapheme clusters and such, but (a) we'll be outputting the line to the
terminal and (b) there's no guarantee that the user's editor displays
grapheme clusters as a single column.  LSP measures in UTF-16,
apparently.  I use both Emacs and Vim from a terminal, so unicode-width
applies to me.  There's too much variation to try to solve that right now.

The columns can be considered a visual span---this gives us enough
information to draw line annotations, which will happen soon.

Here are some useful links:

  - https://hsivonen.fi/string-length/
  - https://unicode.org/reports/tr29/
  - https://github.com/rust-analyzer/rowan/issues/17
  - https://www.reddit.com/r/rust/comments/gpw2ra/how_is_the_rust_compiler_able_to_tell_the_visible/

DEV-10935
2022-04-21 14:27:36 -04:00
Mike Gerwitz e555955450 tamer: span::Span::endpoints_saturated: New method
This gets rid of the `Option` and is used in the diagnostic system (next
commit).

DEV-10935
2022-04-21 14:15:25 -04:00
Mike Gerwitz a22e8e79f7 tamer: diagnose: Integrate resolver for source lines
This does not yet resolve columns, and omits the length of the span, but
it's starting to come together.

This is particularly exciting for me to see because I've been wanting line
numbers in TAME error messages for over a decade.

DEV-10935
2022-04-21 12:34:17 -04:00
Mike Gerwitz 9b4c84de26 tamer: diagnose::resolver: Support rewinding
This does adds support for rewinding the underlying buffer when necessary to
read a span that occurs earlier within the same context (which could also
include the same span read twice).

As part of this change, I cleaned up the code a bit.  Working with this
system can be confusing with the different meanings of the byte offsets and
the different ways of interpreting lines relative to the span that is
provided.  There's not a lot of code here, but it represents a lot of work
to get right.
2022-04-21 12:33:27 -04:00
Mike Gerwitz 1b02e77537 tamer: span (SpanOffsetSize, SpanLenSize): New type aliases
Callers can use these types instead of having to reference globals.

DEV-10935
2022-04-20 09:42:13 -04:00
Mike Gerwitz ab48d79e1f tamer: diagnost::resolver: Initial concept for line resolution
This works, but it's ugly and requires some cleanup.  It shows that there
are some interesting considerations when determining how to best represent
the location of spans to the user in a way that is intuitive.

This is not yet integrated with the reporter, which will require a layer to
load a `Context` from disk.

DEV-10935
2022-04-20 09:42:13 -04:00
Mike Gerwitz a77eb7d937 tamer: span: Minor test refactoring
Just some cleanup based on some new conventions, now that I'm about to make
some changes.

DEV-10935
2022-04-20 09:42:12 -04:00
Mike Gerwitz 725dc3fb54 tamer: tamec: Use diagnostic system for errors
This is a POC, minimal-effort integration that also creates the TamecError
sum type analogous to TameldError.

I'll work on reducing the boilerplate in the future.

A note regarding the type and boilerplate vs. dynamic dispatch, for any
future readers: the purpose of this is to be explicit about the error types
so that the system is self-documenting and it forces and understanding of
its error conditions.  `Box<dyn Error>` is basically "eh idk anything can
happen!", which is not what I'm interested in having.

DEV-10935
2022-04-20 09:42:11 -04:00
Mike Gerwitz eaa8133d21 tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve.  I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.

This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does.  The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them.  I need to
balance this work with everything else I have going on.

This is a large commit, but it converts the existing Error Display impls
into Diagnostic.  This separation is a bit verbose, so I'll see how this
ends up evolving.

Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.

Output is integrated into tameld only in this commit; I'll add tamec
next.  Examples of what this outputs are available in the test cases in this
commit.

DEV-10935
2022-04-13 15:22:46 -04:00
Mike Gerwitz 702b5ebb23 tamer: span: Remove PathIndex
We can just use PathSymbolId directly and simplify things.  Typing can (and
should) happen on the symbol itself, and if we want a separate symbol type,
it ought to have its own interner.

For now, it doesn't, and having this extra type is just a PITA.

DEV-10935
2022-04-13 09:59:11 -04:00