I wonder when this option was introduced, unless I never saw it because it
is called "quiet". But this is what I always wanted (and how I write the
output for my own tools, like progtest in this repo); the output has long
gotten far too large.
DEV-7145
Along with this change we also had to change how we handle dead states in
the superstate. So there were two problems here:
1. Sum states were not yielding a dead state after recovery, which meant
that parsing was unable to continue (we still have a `todo!`); and
2. The superstate considered it an error when there was nothing left on
the stack, because I assumed that ought not happen.
Regarding #2---it _shouldn't_ happen, _unless_ we have extra input after we
have completed parsing. Which happens to be the case for this test case,
but more importantly, we shouldn't be panicing with errors about TAMER bugs
if somebody puts extra input after a closing root tag in a source file.
DEV-7145
This does two things:
1. Places the expected list on a separate help line as a footnote where
it'll be a bit more tolerable when it inevitably overflows the terminal
width in certain contexts (we may wrap in the future); and
2. Removes angled brackets from the element names so that they (a) better
correspond with the span which highlights only the element name and (b)
do not imply that the elements take no attributes.
DEV-7145
When we match a QName against a namespace, we ought to store the matching
QName to use (a) in error messages and (b) to make available as a
binding. The former is necessary for sensible errors (rather than saying
that it's e.g. expecting a closing `t:*`) and the latter is necessary for
e.g. getting the template name out of `t:foo`.
DEV-7145
This allows matching on a namespace prefix by providing a `Prefix` instead
of a `QName`. This works, but is missing a couple notable things (and
possibly more):
1. Tracking the QName that is _actually_ matched so that it can be used in
messages stating what the expected closing tag is; and
2. Making that QName available via a binding.
This will be used to match on `t:*` in NIR. If you're wondering how
attribute parsing is supposed to work with that (of course you're wondering
that, random person reading this)---that'll have to work differently for
those matches, since template shorthand application contains argument names
as attributes.
DEV-7145
This introduces `NodeMatcher`, with the intent of introducing wildcard QName
matches for e.g. `t:*` nodes. It's not yet clear if I'll expand this to
support text nodes yet, or if I'll convert text nodes into elements to
re-use the existing system (which I had initially planned on doing, but
didn't because of the work and expense (token expansion) involved in the
conversion).
DEV-7145
I need to move on, and there are (a) a couple different ways to proceed that
I want to mull over and (b) upcoming changes that may influence my decision
one way or another.
DEV-7145
This will utilize the superstate's error object in place of nested errors,
which was the result of the previous composition-based delegation.
As you can see, all we had to do was remove the special handling of these
errors; the existing delegation setup continues to handle the types properly
with no change. The composition continues to work for `*Attr_`.
The alternative was to box inner errors, since they're far from the hot code
path, but that's clearly unnecessary.
To be clear: this is necessary to allow for recursive grammars in
`ele_parse` without creating recursive data structures in Rust.
DEV-7145
Comments ought not have any more semantic meaning than whitespace. Other
languages may have conventions that allow for various types of things in
comments, like annotations, but those are symptoms of language
limitations---we control the source language here.
DEV-7145
This properly integrates the trampoline into `ele_parse!`. The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child. That is temporarily marked as incomplete; see a future commit.
The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`. It will fail to compile if those invariants are
violated. (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)
This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:
NT -> Dead -> Parent -> Next NT
This could be optimized in the future, if it's worth doing.
This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now. I (desperately) need to move on, and still have a bunch of cleanup to
do.
I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.
DEV-7145
This change introduces diagnostic messages for panics. The intent is to be
able to use panics in situations where it is either not possible to or not
worth the time to recover from errors and ensure a consistent/sensible
system state. In those situations, we still ought to be able to provide the
user with useful information to attempt to get unstuck, since the error is
surely in response to some particular input, and maybe that input can be
tweaked to work around the problem.
Ideally, invalid states are avoided using the type system and statically
verified at compile-time. But this is not always possible, or in some cases
may be way more effort or cause way more code complexity than is worth,
given the unliklihood of the error occurring.
With that said, it's been interesting, over the past >10y that TAME has
existed, seeing how unlikely errors do sometimes pop up many years after
they were written. It's also interesting to have my intuition of what is
"unlikely" challenged, but hopefully it holds generally.
DEV-7145
I had previously used `Context` to hold the parser configuration for
repetition, since that was the easier option. But I now want to utilize the
`Context` for a stack for the superstate trampoline, and I don't want to
have to deal with the awkwardness of the repetition in doing so, since it
requires that the configuration be created during delegation, rather than
just being passed through to all child parsers.
This adds to a mess that needs cleaning up, but I'll do that after
everything is working.
DEV-7145
And here's the thing that I've been dreading, partly because of the
`macro_rules` issues involved. But, it's not too terrible.
This module was already large and complex, and this just adds to it---it's
in need of refactoring, but I want to be sure it's fully working and capable
of handling NIR before I go spending time refactoring only to undo it.
_This does not yet use trampolining in place of the call stack._ That'll
come next; I just wanted to get the macro updated, the superstate generated,
and tests passing. This does convert into the
superstate (`ParseState::Super`), but then converts back to the original
`ParseState` for BC with the existing composition-based delegation. That
will go away and will then use the equivalent of CPS, using the
superstate+`Parser` as a trampoline. This will require an explicit stack
via `Context`, like XIRF. And it will allow for tail calls, with respect to
parser delegation, if I decide it's worth doing.
The root problem is that source XML requires recursive parsing (for
expressions and statements like `<section>`), which results in recursive
data structures (`ParseState` enum variants). Resolving this with boxing is
not appropriate, because that puts heap indirection in an extremely hot code
path, and may also inhibit the aggressive optimizations that I need Rust to
perform to optimize away the majority of the lowering pipeline.
Once this is sorted out, this should be the last big thing for the
parser. This unfortunately has been a nagging and looming issue for months,
that I was hoping to avoid, and in retrospect that was naive.
DEV-7145
I'm disappointed that I keep having to implement features that I had hoped
to avoid implementing.
This introduces a "superstate" feature, which is intended really just to be
a sum type that is able to delegate to stitched `ParseState`s. This then
allows a `ParseState` to transition directly to another `ParseState` and
have the parent `ParseState` handle the delegation---a trampoline.
This issue naturally arises out of the recursive nature of parsing a TAME
XML document, where certain statements can be nested (like `<section>`), and
where expressions can be nested. I had gotten away with composition-based
delegation for now because `xmlo` headers do not have such nesting.
The composition-based approach falls flat for recursive structures. The
typical naive solution is boxing, which I cannot do, because not only is
this on an extremely hot code path, but I require that Rust be able to
deeply introspect and optimize away the lowering pipeline as much as
possible.
Many months ago, I figured that such a solution would require a trampoline,
as it typically does in stack-based languages, but I was hoping to avoid
it. Well, no longer; let's just get on with it.
This intends to implement trampolining in a `ParseState` that serves as that
sum type, rather than introducing it as yet another feature to `Parser`; the
latter would provide a more convenient API, but it would continue to bloat
`Parser` itself. Right now, only the element parser generator will require
use of this, so if it's needed beyond that, then I'll debate whether it's
worth providing a better abstraction. For now, the intent will be to use
the `Context` to store a stack that it can pop off of to restore the
previous `ParseState` before delegation.
DEV-7145
Since we'll never be reading past the header, this is all that is needed.
If in the future this is violated, XIRF will cause a nice diagnostic error
displaying precisely what opening tag caused the increased level of nesting,
which will aid in debugging and allow us to determine if it ought to be
increased. Here's an example, if I set the max to `3`:
error: maximum XML element nesting depth of `3` exceeded
--> /home/.../foo.xmlo:261:10
|
261 | <preproc:sym-ref name=":_vproduct:vector_a"/>
| ^^^^^^^^^^^^^^^^ error: this opening tag increases the level of nesting past the limit of 3
Of course, the longer-term goal is to do away with `xmlo` entirely.
This had no (perceivable via `/usr/bin/time -v`, at least) impact on memory
or CPU time.
DEV-7145
"Mixed content" is the XML term representing element nodes mixed with text
nodes. For example, `foo <strong>bar</strong> baz` is mixed.
TAME supports text nodes as documentation, intended to be in a literate
style but never fully realized. In any case, we need to permit them, and I
wanted to do more than just ignore the nodes.
This takes a different approach than typical parser delegation---it has the
parent parser _preempt_ the child by intercepting text before delegation
takes place, rather than having the child reject the token (or possibly
interpret it itself!) and have to handle an error or dead state.
And while this makes it more confusing in terms of state machine stitching,
it does make sense, in the sense that the parent parser is really what
"owns" the text node---the parser is delegating _element_ parsing only, take
asserts authority when necessary to take back control where it shouldn't be
delegated.
DEV-7145
Previously a `Depth` was provided only for `Open` and `Close`. This depth
information, for example, will be used by NIR to quickly determine whether a
given parser ought to assert ownership of a text/comment token rather than
delegating it.
This involved modifying a number of test cases, but it's worth repeating in
these commits that this is intentional---I've been bit in the past using
`..` in contexts where I really do want to know if variant fields change so
that I can consider whether and how that change may affect the code
utilizing that variant.
DEV-7145
Recent changes regarding whitespace were all to support this change (though
it was also needed for XIRF, pre- and post-root).
Now I'll have to conted with how I want to handle text nodes in various
circumstances, in terms of `ele_parse!`.
DEV-7145
Various DUMMY_SPAN-derived spans are used by many test cases, so this
finally extracts them---something I've been meaning to do for some time.
This also places DUMMY_SPAN behind a `cfg(test)` directive to ensure that it
is _only_ used in tests; UNKNOWN_SPAN should be used when a span is actually
unknown, which may also be the case during development.
DEV-7145
Whether or not quoting is appropriate depends on context, and that parent
context is already performing the quoting. For example:
error: expected `</rater>`, but found `<import>`
--> /home/[...]/foo.xml:2:1
|
2 | <rater xmlns="http://www.lovullo.com/rater"
| ------ note: element starts here
--> /home/[...]/foo.xml:7:3
|
7 | <import package="/rater/core/base" />
| ^^^^^^^ error: expected `</rater>`
In these cases (obviously I'm still working on the parser, since this is
nonsense), the parser is responsible for quoting the token "<import>".
DEV-7145
There were two problem errors: one showing "element element" and one showing
the value along with the name of the attribute.
The change for `<Attr as Display>::fmt` is debatable. I'm going to do this
for now (only show `@name`) and adjust later if necessary.
I'll need to go use `crate::fmt` consistently in previously-existing format
strings at some point, too.
DEV-7145
This teaches XIRF to optionally refine Text into RefinedText, which
determines whether the given SymbolId represents entirely whitespace.
This is something I've been putting off for some time, but now that I'm
parsing source language for NIR, it is necessary, in that we can only permit
whitespace Text nodes in certain contexts.
The idea is to capture the most common whitespace as preinterned
symbols. Note that this heuristic ought to be determined from scanning a
codebase, which I haven't done yet; this is just an initial list.
The fallback is to look up the string associated with the SymbolId and
perform a linear scan, aborting on the first non-whitespace character. This
combination of checks should be sufficiently performant for now considering
that this is only being run on source files, which really are not all that
large. (They become large when template-expanded.) I'll optimize further
if I notice it show up during profiling.
This also frees XIR itself from being concerned by Whitespace. Initially I
had used quick-xml's whitespace trimming, but it messed up my span
calculations, and those were a pain in the ass to implement to begin with,
since I had to resort to pointer arithmetic. I'd rather avoid tweaking it.
tameld will not check for whitespace, since it's not important---xmlo files,
if malformed, are the fault of the compiler; we can ignore text nodes except
in the context of code fragments, where they are never whitespace (unless
that's also a compiler bug).
Onward and yonward.
DEV-7145
The trace outputs a note in the footer indicating _why_ it's being output,
so that the reader understands both where the potentially-unexpected
behavior originates from and so they know (in the case of the feature flag)
how to inhibit it.
That information originally lived in `Parser`, where the `cfg` directive to
enable it lives, but it was moved into the abstraction. This corrects that.
DEV-7145
This has gotten large and was cluttering `feed_tok`. This also provides the
ability to more easily expand into other types of tracing in the future.
DEV-7145
This information is likely redundant in a lowering pipeline, but is more
useful outside of such a pipeline. It's also more clear.
`Object` does not implement `Display`, though, because that's too burdensome
for how it's currently used. Many `Object`s are also `Token`s though and,
if fed to another `Parser` for lowering, it'll get `Display::fmt`'d.
DEV-7145
Rust was warning that `cfg` was unused if both `test` and
`parser-trace-stderr`. This both allows that and adjusts the precedence to
make more sense for tests.
DEV-7145
Because of recovery, the trace otherwise paints a really confusing-looking
picture when given unexpected input.
This is large enough now that it really ought to be extracted from
`feed_tok`, but I'll wait to see how this evolves further. I considered
adding color too, but it's not yet clear to me that the visual noise will be
all that helpful.
DEV-7145
This flag allows toggling the parser trace that was previously only
available to tests. Unfortunately, at the time of writing, Cargo cannot
enable flags in profiles, so I have to check for either `test` or this flag
being set to enable relevant features.
This trace is useful as I start to run the parser against existing code
written in TAME so that our existing systems can help to guide my
development. Unlike the current tests, it also allows seeing real-world
data as part of the lowering pipeline, where multiple `Parser`s are in
play.
Having this feature flag also makes this feature more easily discoverable to
those wishing to observe how the lowering pipeline works.
DEV-7145
impl for `&Token` instead of Token; the writer is just copying data into the
destination stream anyway.
This will allow us to continue writing the token while also using it for
further processing, like `tee`.
DEV-7145
We need to be able to export generated identifiers. Trying to figure out a
syntax for this was a bit tricky considering how much is generated, so I
just settled on something that's reasonably clear and easy to parse with
`macro_rules!`.
I had intended to just make everything public by default and encapsulate
using private modules, but that then required making everything else that it
uses public (e.g. error and token objects), which would have been a bizarre
thing to do in e.g. test cases.
DEV-7145
Values can be parsed using `TryFrom<Attr>`. Previously only `From<Attr>`
was supported, which could not fail.
This is critical for parsing values into types, which will wrap `SymbolId`
to provide data assurances.
DEV-7145
The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.
This is admittedly a sloppy commit, with a number of miscellaneous fixes. I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.
DEV-7145
The only additional information needed was opening spans so that we can
provide useful information regarding closing tags.
This uses a generic Span in place of {Open,Close}Span because the latter
wasn't necessary, but more descriptive types would be nice; it may be
beneficial later on to introduce newtypes for each of the span generated by
{Open,Close}Span.
DEV-7145
This was a TODO for the attribute parser generator. The first attribute
will be kept and later ones will be ignored, producing an error. Recovery
permits further attribute parsing having ignored the duplicate.
DEV-7145
This allows an element to be repeated by the parent NT. The easiest way I
saw to implement this for now was to abuse the Context to provide a runtime
configuration that would allow the state machine to reset after it has
completed parsing.
This also influences error recovery, in that if we're expecting zero or more
of something, we cannot provide an error for an unexpected name, and instead
must emit a dead state so that the caller can determine what to do.
DEV-7145
This produces useful parse traces that are output as part of a failing test
case. The parser generator macros can be a bit confusing to deal with when
things go wrong, so this helps to clarify matters.
This is _not_ intended to be machine-readable, but it does show that it
would be possible to generate machine-readable output to visualize the
entire lowering pipeline. Perhaps something for the future.
I left these inline in Parser::feed_tok because they help to elucidate what
is going on, just by reading what the trace would output---that is, it helps
to make the method more self-documenting, albeit a tad bit more
verbose. But with that said, it should probably be extracted at some point;
I don't want this to set a precedent where composition is feasible.
Here's an example from test cases:
[Parser::feed_tok] (input IR: XIRF)
| ==> Parser before tok is parsing attributes for `package`.
| | Attrs_(SutAttrsState_ { ___ctx: (QName(None, LocalPart(NCName(SymbolId(46 "package")))), OpenSpan(Span { len: 0, offset: 0, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10)), ___done: false })
|
| ==> XIRF tok: `<unexpected>`
| | Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))
|
| ==> Parser after tok is expecting opening tag `<classify>`.
| | ChildA(Expecting_)
| | Lookahead: Some(Lookahead(Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))))
= note: this trace was output as a debugging aid because `cfg(test)`.
[Parser::feed_tok] (input IR: XIRF)
| ==> Parser before tok is expecting opening tag `<classify>`.
| | ChildA(Expecting_)
|
| ==> XIRF tok: `<unexpected>`
| | Open(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1))
|
| ==> Parser after tok is attempting to recover by ignoring element with unexpected name `unexpected` (expected `classify`).
| | ChildA(RecoverEleIgnore_(QName(None, LocalPart(NCName(SymbolId(82 "unexpected")))), OpenSpan(Span { len: 0, offset: 1, ctx: Context(SymbolId(1 "#!DUMMY")) }, 10), Depth(1)))
| | Lookahead: None
= note: this trace was output as a debugging aid because `cfg(test)`.
DEV-7145
This resolves a TODO by including the name of the element whose attributes
are currently being parsed.
This also frees a parent from having to provide additional context, allowing
Display to be fully delegated when stitching.
DEV-7145
This introduces `Nt := (A | ... | Z);`, where `Nt` is the name of the
nonterminal and `A ... Z` are the inner nonterminals---it produces a parser
that provides a choice between a set of nonterminals.
This is implemented efficiently by understanding the QName that is accepted
by each of the inner nonterminals and delegating that token immediately to
the appropriate parser. This is a benefit of using a parser generator macro
over parser combinators---we do not need to implement backtracking by
letting inner parsers fail, because we know ahead of time exactly what
parser we need.
This _does not_ verify that each of the inner parsers accept a unique QName;
maybe at a later time I can figure out something for that. However, because
this compiles into a `match`, there is no ambiguity---like a PEG parser,
there is precedence in the face of an ambiguous token, and the first one
wins. Consequently, tests would surely fail, since the latter wouldn't be
able to be parsed.
This also demonstrates how we can have good error suggestions for this
parsing framework: because the inner nonterminals and their QNames are known
at compile time, error messages simply generate a list of QNames that are
expected.
The error recovery strategy is the same as previously noted, and subject to
the same concerns, though it may be more appropriate here: it is desirable
for the inner parser to fail rather than retrying, so that the sum parser is
able to fail and, once the Kleene operator is introduced, retry on another
potential element. But again, that recovery strategy may happen to work in
some cases, but'll fail miserably in others (e.g. placing an unknown element
at the head of a block that expects a sequence of elements would potentially
fail the entire block rather than just the invalid one). But more to come
on that later; it's not critical at this point. I need to get parsing
completed for TAME's input language.
DEV-7145
This adds the ability to bind identifiers to represent `OpenSpan` and
`CloseSpan`, available to the `@` and `/` maps. Since identifiers in TAME
originate from attributes, this may not get a whole lot of use, but it's
important to be available.
There is some awkwardness in that the opening span appears to be scoped to
the entire nonterminal, but it's actually only available in the `@`
mapping. I'll change this if it's actually needed; this keeps things simple
for now.
DEV-7145
Since the parsers produce streaming IRs, we need to be able to emit tokens
representing closing delimiters, where they are important.
This notably doesn't use spans; I'll add those next, since they're also
needed for the previous work.
DEV-7145
The comment explains the issue. I don't think the strategy is going to be a
desirable one, but I want to move on and observe in retrospect how it ought
to be handled.
The important part right now is that recovery is accounted for and possible,
which was a long-standing concern.
DEV-7145
This begins generating parsers that are capable of parsing elements. I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.
This was the work that required the recent lookahead changes, which has been
detailed in previous commits.
This initial support is basic, but robust. It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`). Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.
This generates parsers that, like all the other parsers, use enums to
provide a typed stack. Stitched parsers produce a nested stack that is
always bounded in size. Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us. Statements that _do_ need to maintain such context are not
nested.
This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure. This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.
More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.
DEV-7145
Having the lookahead token generic over the `ParseState` was a pain in the
ass for stitching, since they shared the same token type but not the same
parser. I don't expect there to be any need to be able to infer other
parser-related types for a token of lookahead, so I'd rather just make my
life easier until such a thing is needed.
DEV-7145
Oh what a tortured journey. I had originally tried to avoid formalizing
lookahead for all parsers by pretending that it was only needed for dead
state transitions (that is---states that have no transitions for a given
input token), but then I needed to yield information for aggregation. So I
added the ability to override the token for `Dead` to yield that, in
addition to the token. But then I also needed to yield lookahead for error
conditions. It was a mess that didn't make sense.
This eliminates `ParseStatus::Dead` entirely and fully integrates the
lookahead token in `Parser` that was previously implemented.
Notably, the lookahead token is encapsulated in `TransitionResult` and
unavailable to `ParseState` implementations, forcing them to rely on
`Parser` for recursion. This not only prevents `ParseState` from recursing,
but also simplifies delegation by removing the need to manually handle
tokens of lookahead.
The awkward case here is XIRT, which does not follow the streaming parsing
convention, because it was conceived before the parsing framework. It needs
to go away, but doing so right now would be a lot of work, so it has to
stick around for a little bit longer until the new parser generators can be
used instead. It is a persistent thorn in my side, going against the grain.
`Parser` will immediately recurse if it sees a token of lookahead with an
incomplete parse. This is because stitched parsers will frequently yield a
dead state indication when they're done parsing, and there's no use in
propagating an `Incomplete` status down the entire lowering pipeline. But,
that does mean that the toplevel is not the only thing recursing. _But_,
the behavior doesn't really change, in the sense that it would infinitely
recurse down the entire lowering stack (though there'd be an opportunity to
detect that). This should never happen with a correct parser, but it's not
worth the effort right now to try to force such a thing with Rust's type
system. Something like TLA+ is better suited here as an aid, but it
shouldn't be necessary with clear implementations and proper test
cases. Parser generators will also ensure such a thing cannot occur.
I had hoped to remove ParseStatus entirely in favor of Parsed, but there's a
lot of type inference that happens based on the fact that `ParseStatus` has
a `ParseState` type parameter; `Parsed` has only `Object`. It is desirable
for a public-facing `Parsed` to not be tied to `ParseState`, since consumers
need not be concerned with such a heavy type; however, we _do_ want that
heavy type internally, as it carries a lot of useful information that allows
for significant and powerful type inference, which in turn creates
expressive and convenient APIs.
DEV-7145
*NB: This is the initial change to introduce the token of lookahead, but this
does not fully integrate it. In particular, this is missing from the
stitching/delegation layer.*
This has been a long time coming, I suppose, though I had tried to avoid it
with `Parser::delegate_lookahead`. But the problem with doing that is that
it forced the ParserState to recurse, which both violates that I want no
looping constructs except for the toplevel, and performs additional stack
allocation as it is not in tail position.
The final straw was having to both return an error _and_ an aggregate object
for the attribute parser when an unexpected element is encountered (this
code is not yet committed). One option was to add a recovery object to the
error object, and formalize that, but then we have other concerns; for
example, what if that recovery object triggered an error? We'd have to mask
either the old or the new error. But we wouldn't want to mask either,
because the object causing the error would be the aggregate attributes,
which is _not_ a recovery object, but actual data we want to emit. And so
it's a kluge right off of the bat.
The use of a token of lookahaed is a more traditional approach and has uses
outside of just this one scenario. It'll also allow for the removal of
recursion from the existing ParserStates, and possibly the elimination of
dead state associated data, though I may end up leaving that; more to come.
Rust will also optimize away lookahead storage and processing in Parsers
that do not utilize it.
DEV-7145
I'd feel rather silly if I used `debug_assert!` for the sake of tests and
they weren't actually being run due to optimization settings.
This is just to catch potential future regressions; all is well today.
DEV-7145
There was only one test outside of the `parse` module using these
fields. The next commit will be introducing lookahead, and I do not want to
have to trust callers to ensure invariants are met.
DEV-7145
This reverts commit b973d36862.
Alright, I'm getting sick of fighting with myself on this. But rather than
just removing the last commit, I'm going to keep it around, so that my
thoughts are clearly documented for my future quarrels with myself.
Firstly: this added more overhead than I wanted it to. While it wasn't
significant, it did add 100--150ms to one of our largest systems, up from
~2.8s, which seems a bit much for a token that's really just meant to make
life easier for the parser.
Further, it seems that all I've managed to do is push my original problem to
a different layer---this started as a means to resolve having to emit both
an object and an error simultaneously in the case where aggregate attribute
parsing has completed, but we encounter an error on the next token (e.g. an
unexpected element). But XIRF, if it's missing AttrEnd, should throw an
error, but should also recover. Recovery is easy---just assume that it was
present---_but then we don't emit a XIRF `AttrEnd` token_, which is
necessary for downstream systems. So we'd need to either:
(a) emit both a token and an error; or
(b) panic.
But if we're doing (a), then the need for `AttrEnd` goes away, because it
solves the original problem (though the other concerns of the previous
commit still stand). (b) is not ideal at all, even though the missing token
does represent an internal system error; it's not something the user can
correct. But, given that it's something that the user cannot correct,
doesn't that imply that it's an awkward thing to include in the token
stream? So back to `AttrEnd` being an awkward PITA to have.
So, given (a), I'll just do that: errors will become more of a "hey, this
error just occurred, but I'm trying to recover---here's an object that you
should use if you choose to continue parsing, but it may or may not be what
you're looking for; proceed with caution". That flips the original script:
I imagined having external systems feed recovery tokens, but this
encapsulates recovery within the parser, which really is more appropriate,
though less flexible than having an omniscient external recovery system;
such a monolith was always an awkward concept and would be difficult to
implement cleanly.
This can also potentially be implemented as a generalization of the Dead
state change that allowed an object to be emitted alongside the
lookahead/error.
Anyway, back to where I was...I'm sure I'll look back on this in the future
shaking my head, reflecting on how naive I was.
DEV-7145
AttrEnd was initially removed in
0cc0bc9d5a (and the commit prior), because
there was not a compelling reason to use it over a lookahead
operation (returning a token via the a dead state transition); `AttrEnd`
simply introduced inconsistencies between the XIR reader (which produced
AttrEnd) and internal XIR stream generators (e.g. the lowering operations
into XIR->XML, which do not).
But now that parsers are performing aggregation---in particular the
attribute parser-generator `xir::parse::attr`---this has become quite a
pain, because the dead state is an actionable token. For example:
1. Open
2. Attr
3. Attr
4. Open
5. ...
In the happy case, token #4 results in `Parsed::Incomplete`, and so can just
be transformed into the object representing the aggregated attributes. But
even in this happy path, it's ugly, and it requires non-tail recursion on
the parser which requires a duplicate stack allocation for the
`ParserState`. That violates a core principle of the system.
But if there is an error at #4---e.g. an unexpected element---then we no
longer have a `Parsed::Incomplete` to hijack for our own uses, and we'd have
to introduce the ability to return both an error and a token, or we'd have
to introduce the ability to keep a token of lookahead instead of reading
from the underlying token stream, but that's complicated with push parsers,
which are used for parser composition. Yikes.
And furthermore, the aggregation has caused me to introduce the ability to
override the dead state type to introduce both a token of lookahead and
aggregation information. This complicates the system and is going to be
confusing to others.
Given all of this, AttrEnd does now seem appropriate to reintroduce, since
it will allow processing of aggregate operations when encountering that
token without having to worry about the above scenario; without having to
duplicate a `ParseState` stack; without having to hijack dead state
transitions for producing our aggregate object; and everything else
mentioned above.
This commit does not modify those abstractions to use AttrEnd yet; it
re-introduces the token to the core system, not the parser-generators, and
it doesn't yet replace lookahead operations in the parsers that use
them. That'll come next. Unlike the commit that removed it, though, we are
now generating proper spans, so make note of that here. This also does not
introduce the concept to XIRF yet, which did not exist at the time that it
was removed, so XIRF is filtering it out until a following commit.
DEV-7145
This is not longer needed after the previous commit, with static spans
having been replaced by `const` spans.
This used to be required before Rust acquired better const features, and
before I had preinterned symbols.
DEV-7145
This isn't conceptally all that significant of a change, but there was a lot
of modify to get it working. I would generally separate this into a commit
for the implementation and another commit for the integration, but I decided
to keep things together.
This serves a role similar to AttrSpan---this allows deriving a span
representing the element name from a span representing the entire XIR
token. This will provide more useful context for errors---including the tag
delimiter(s) means that we care about the fact that an element is in that
position (as opposed to some other type of node) within the context of an
error. However, if we are expecting an element but take issue with the
element name itself, we want to place emphasis on that instead.
This also starts to consider the issue of span contexts---a blob of detached
data that is `Span` is useful for error context, but it's not useful for
manipulation or deriving additional information. For that, we need to
encode additional context, and this is an attempt at that.
I am interested in the concept of providing Spans that are guaranteed to
actually make sense---that are instantiated and manipulated with APIs that
ensure consistency. But such a thing buys us very little, practically
speaking, over what I have now for TAMER, and so I don't expect to actually
implement that for this project; I'll leave that for a personal
project. TAMER's already take a lot of my personal interests and it can
cause me a lot of grief sometimes (with regards to letting my aspirations
cause me more work).
DEV-7145
Non-attribute and non-empty start/end tags will have their whitespace
as part of the produced span. This sets us up for a following change that
will allow for deriving the name span from this span given a QName, which
gives us a span that both represents the entire XIR token and allows
deriving the element name.
An accurate token span is necessary for parsing errors where an element was
not expected, while an element name span is more appropriate for issues of
grammar and semantic errors that deal not with the fact that an element was
encountered, but _what_ element was encountered.
DEV-7145
This both adds clarifying tests and corrects the case of `<foo/>`, where the
offset was erroneously off by one---it saw that there were no attributes and
added a byte thinking it'd include `>`, as in `<foo>`.
DEV-7145
This is the first parser generator for the parsing framework. I've been
waiting quite a while to do this because I wanted to be sure that I
understood how I intended to write the attribute parsers manually. Now that
I'm about to start parsing source XML files, it is necessary to have a
parser generator.
Typically one thinks of a parser generator as a separate program that
generates code for some language, but that is not always the case---that
represents a lack of expressiveness in the language itself (e.g. C). Here,
I simply use Rust's macro system, which should be a concept familiar to
someone coming from a language like Lisp.
This also resolves where I stand on parser combinators with respect to this
abstraction: they both accomplish the exact same thing (composition of
smaller parsers), but this abstraction doesn't do so in the typical
functional way. But the end result is the same.
The parser generated by this abstraction will be optimized an inlined in the
same manner as the hand-written parsers. Since they'll be tightly coupled
with an element parser (which too will have a parser generator), I expect
that most attribute parsers will simply be inlined; they exist as separate
parsers conceptually, for the same reason that you'd use parser combinators.
It's worth mentioning that this awkward reliance on dead state for a
lookahead token to determine when aggregation is complete rubs me the wrong
way, but resolving it would involve reintroducing the XIR AttrEnd that I had
previously removed. I'll keep fighting with myself on this, but I want to
get a bit further before I determine if it's worth the tradeoff of
reintroducing (more complex IR but simplified parsing).
DEV-7145
This was missed. It was not possible, using the documentation
alone (without looking at the linked source) to tell what the QName actually
represented, though you could assume by the name.
DEV-7145
This is partly an experiment, but is designed to simplify producing English
sentences in various contexts. It makes use of a not only unstable, but
incomplete, Rust feature---adt_const_params, for a static str const type
parameter. Hopefully that ends up being stabalized.
This uses types, but it's the same as function composition due to Rust's
monomorphization.
DEV-7145
`ParseState` originally required `Default` for use with `mem::take` in
`Parser::feed_tok`. This unfortunately cannot last, since more specialized
parsers require context during initialization in order to provide useful
diagnostic information. (The other option is to require the caller to
augment errors with diagnostic information, but that would have to be
duplicated by every caller and complicates parser composition; I'd prefer
those diagnostic details remain encapsulated.)
Replacing `Default` with `Option` is uglier, but it ends up producing the
same assembly as `mem::take` did, at least at the time of writing. Because
Rust is able to elide unnecessary moves using this implementation, there is
no need for `unwrap_unchecked` or other unsafe methods, which is great,
since it shows that this parsing methodology is viable entirely in safe
Rust.
DEV-7145
Previously, `ParseStatus::Dead` always yielded
`ParseState::Token`. However, I'm working on introducing parsers that
aggregate (parsing XML attributes into structs), and those parsers do not
know that they have completed aggregation until they reach a dead state;
given that, I need to yield additional information at that time.
I played around with a number of alternative ideas, but this ended up being
the cleanest, relative to the effort involved. For example, introducing
another parameter to `ParseStatus::Dead` was too burdensome on APIs that
ought not concern themselves with the possibility of receiving an object in
addition to a lookahead token, since many parsers are not capable of doing
so (given that they map M:(N<=M)).
Another option that I abandoned fairly quickly was having
`is_accepting` (potentially renamed) return an aggregate object, since
that's on the side and didn't feel like it was part of the parsing pipeline.
The intent is to abstract this some in a new `ParseState` method for
delegation + aggregation.
DEV-7145
I'll document it more formally eventually, but this settles on a mix of the
two: square brackets and dashes for intervals, `+` for intersecting lines,
byte offsets below interval endpoints, and names below that.
The docblock for `Span` itself iss still off; I'll probably just take one of
the test cases and paste it there at some point.
DEV-7145
This replaces a tuple with a tuple struct that allows for calculating more
complete span information, such as the span encompassing the entire
attribute and the value span including the surrounding quotes.
This includes logic that ought to be abstracted into `Span` itself, and it's
not as formal as I'd like it to be (e.g. not ensuring context), but this is
a good starting point.
Note that parsers call `Token::span`, which in turn calculates the attribute
span, each time an attribute is encountered during lowering. But Rust does
a good job at optimizing away unnecessary operations, so this didn't have an
observable impact on time.
DEV-7145
This provides much more clarity as to what is going on. Further, it's less
ambiguous, since I'm about to introduce a new type of xmlo lowering into XIR
for writing the actual xmlo files.
DEV-7145
This allows `XmlXirReader` to be used in a `Lower` operation, just as
everything else, bringing me one step closer to a pipeline that can be
concisely represented; this is finally beginning to unify in a clear way,
though it is still a bit of a mess.
This causes `XmlXirReader` to _act_ like a `parse::Parser` in that it yields
a `ParsedResult`, but it does not use `parse::Parser` itself; that was the
_original_ plan: convert it into a `ParseState` where `XmlXirReader` became
a context, and force `Parser` to yield by feeding it a stream of tokens with
`repeat`, but that ended up performing poorly relative to this change. I
did some investigation, which I might write about in the future, but for
now, this solution works just fine.
DEV-7145
This abstraction has grown quite a bit, and it's time to start formalizing
it a bit. This split doesn't change any behavior, but it does start to make
it easier to reason about by clearly stating the broad components and how
they interact with one-another.
This doesn't yet move the tests; those will come next, but they are very
few. The reason I gave previously for this was because (a) they're tested
indirectly via the systems that utilize them and (b) because the abstraction
was not yet settled on the process was already very expensive. No test
coverage was lost---it's only that failures were potentially harder to debug
on test failures, but in practice not even this was true, because the deeply
expressive types all but ensured that, if it compiles, it will function in a
way that is expected. Unit tests and documentation for this system will be
added once I'm sure that this abstraction is in a proper state.
DEV-7145
This also modifies `poc` such that `Lower` is invoked as an associated
function rather than a method to emphasize the pattern that is forming, so
that it can be later abstracted away.
DEV-11864
The `while_ok` can just be implied with a lowering operation, and that
reduces the name complexity so that we can maybe introduce even more
specialized methods without resulting in a huge sentence as a name.
DEV-11864
This finally uses `parse` all the way up to aggregation into the ASG, as can
be seen by the mess in `poc`. This will be further simplified---I just need
to get this committed so that I can mentally get it off my plate. I've been
separating this commit into smaller commits, but there's a point where it's
just not worth the effort anymore. I don't like making large changes such
as this one.
There is still work to do here. First, it's worth re-mentioning that
`poc` means "proof-of-concept", and represents things that still need a
proper home/abstraction.
Secondly, `poc` is retrieving the context of two parsers---`LowerContext`
and `Asg`. The latter is desirable, since it's the final aggregation point,
but the former needs to be eliminated; in particular, packages need to be
worked into the ASG so that `found` can be removed.
Recursively loading `xmlo` files still happens in `poc`, but the compiler
will need this as well. Once packages are on the ASG, along with their
state, that responsibility can be generalized as well.
That will then simplify lowering even further, to the point where hopefully
everything has the same shape (once final aggregation has an abstraction),
after which we can then create a final abstraction to concisely stitch
everything together. Right now, Rust isn't able to infer `S` for
`Lower<S, LS>`, which is unfortunate, but we'll be able to help it along
with a more explicit abstraction.
DEV-11864
This is intended to describe, to the user, the state that the parser is
in. This will be used to convey additional information for general parser
errors, but it should also probably be integrated into parsers' individual
errors as well when appropriate.
This is something I expected to add at some point, but I wanted to add them
because, when dealing with lowering errors, it can be difficult to tell
what parser the error originated from.
DEV-11864
The `*_iter_while_ok` functions now compose like monads, flattening `Result`
at each step and drastically simplifying handling of error types. This also
removes the bunch of `?`s at the end of the expression, and allows me to use
`?` within the callback itself.
I had originally not used `Result` as the return type of the callback
because I was not entirely sure how I was going to use them, but it's now
clear that I _always_ use `Result` as the return type, and so there's no use
in trying to be too accommodating; it can always change in the future.
This is desirable not just for cleanup, but because trying to refactor
`asg_builder` into a pair of `Parser`s is really messy to chain without
flattening, especially given some state that has to leak temporarily to the
caller. More on that in a future commit.
DEV-11864
This was always the intent, but I didn't have a higher-level object
yet. This removes all the awkwardness that existed with working the root
in as an identifier.
DEV-11864
This wraps `Ident` in a new `Object` variant and modifies `Asg` so that its
nodes are of type `Object`.
This unfortunately requires runtime type checking. Whether or not that's
worth alleviating in the future depends on a lot of different things, since
it'll require my own graph implementation, and I have to focus on other
things right now. Maybe it'll be worth it in the future.
Note that this also gets rid of some doc examples that simply aren't worth
maintaining as the API evolves.
DEV-11864
A previous commit mentioned that there's not a place for `Dim`, and
duplicated it between `asg` and `xmlo`. Well, `Dtype` is also needed in
both, and so here's a home for now.
`Dtype` has always been an inappropriate detail for the system and will one
day be removed entirely in favor of higher-level types; the machine
representation is up to the compiler to decide.
DEV-11864
asg_builder is about to be replaced, but in the process of simplifying the
destination IR (the ASG), I'm moving things into the proper place. This
never belonged here---it belongs with the actual lowering operation.
Previously, this was not reasoned about in terms of a lowering operation,
and was written when I was first introducing myself to Rust and trying to
get a proof-of-concept linker working.
DEV-11864
This matches xmlo::Dim, and could be the same thing, if we can find a home
for it in the future; it's not worth creating such a home right now when I'm
not yet sure what else ought to live there; the duplication may be fine.
The conversion from xmlo needs to be moved, and `Dim` is going to be used
for more than just identifiers (expressions will have type inference
performed).
DEV-11864
This allows retrieving and providing a context to a `Parser`. This is
intended for use with an aggregating parser, in particular to construct the
ASG and return it.
This is a component of a change that replaces `asg_builder` with a
`Parser`-based lowering into the ASG, but there are still changes that need
to be made to simplify things and complete its integration.
DEV-11864
Previously, since the graph contained only identifiers, discovered roots
were stored in a separate vector and exposed to the caller. This not only
leaked details, but added complexity; this was left over from the
refactoring of the proof-of-concept linker some time ago.
This moves the root management into the ASG itself, mostly, with one item
being left over for now in the asg_builder (eligibility classifications).
There are two roots that were added automatically:
- __yield
- __worksheet
The former has been removed and is now expected to be explicitly mapped in
the return map, which is now enforced with an extern in `core/base`. This
is still special, in the sense that it is explicitly referenced by the
generated code, but there's nothing inherently special about it and I'll
continue to generalize it into oblivion in the future, such that the final
yield is just a convention.
`__worksheet` is the only symbol of type `IdentKind::Worksheet`, and so that
was generalized just as the meta and map entries were.
The goal in the future will be to have this more under the control of the
source language, and to consolodate individual roots under packages, so that
the _actual_ roots are few.
As far as the actual ASG goes: this introduces a single root node that is
used as the sole reference for reachability analysis and topological
sorting. The edges of that root node replace the vector that was removed.
DEV-11864
In the actual implementation (outside of tests), this is always looking up
before adding the symbol. This will simplify the API, while still retaining
errors, since the identifier will fail the state transition if the
identifier did not exist before attempting to set a fragment. So while this
is slower in microbenchmarks, this has no effect on real-world performance.
Further, I'm refactoring toward a streaming ASG aggregation, which is a lot
easier if we do not need to perform lookups in a separate step from the
ASG's primitives.
DEV-11864
`PartialEq` remains, and is all that is needed. See previous commit
regarding the removal of this same bound from `Context`.
This can be re-added if it ends up actually being necessary. But Tokens are
ephemeral and used only in lowering pipelines, using pattern matching.
DEV-11864
These traits are no longer necessary now that I'm using concrete types; they
just add unnecessary noise and confusion as I attempt to further refactor.
Don't abstract prematurely.
DEV-11864
This removes the generic on the Asg (which was formerly BaseAsg),
hard-coding `IdentObject`, which will further evolve. This makes the IR an
actual concrete IR rather than an abstract data structure.
These tests bring me back a bit, since they were written as I was still
becoming familiar with Rust.
DEV-11864
This is the beginning of an incremental refactoring to remove generics, to
simplify the ASG. When I initially wrote the linker, I wasn't sure what
direction I was going in, but I was also negatively influenced by more
traditional approaches to both design and unit testing.
If we're going to call the ASG an IR, then it needs to be one---if the core
of the IR is generic, then it's more like an abstract data structure than
anything. We can abstract around the IR to slice it up into components that
are a little easier to reason about and understand how responsibilities are
segregated.
DEV-11864
This is unnecessarily restrictive, since we do not require anything further
than `PartialEq` for the situations where we care about equality (tests).
DEV-11864
This is too restrictive, especially for parsers that fold into something,
like the ASG, which may exist prior to invoking the parser.
This moves the trait bound to the functions that actually need it. Those
obviously cannot be used if the Context does not implement `Default`, but
I'll provide alternative conveniences.
DEV-11864
RSG (Ryan Specialty Group) recently announced a rename to Ryan Specialty (no
"Group"), but I'm not sure if the legal name has been changed yet or not, so
I'll wait on that.
These are no longer TODOs---they represent invalid tokens.
I'm going to put effort into providing further context with the diagnostic
system [right now] because these are internal errors caused by either
miscompilation or an incomplete reader.
DEV-10936
This was missed when removing it from other Display impls when the new
diagnostic system was introduced. Raw `Span`s display byte offsets and the
context, which is no longer desirable as part of an error message.
DEV-10936
I had waited to provide more documentation until I was sure that the
abstraction was not going to change significantly; there was a lot of
refactoring in prior commits.
DEV-12151
This moves construction out of `From` and into separate associated
functions, which can be further simplified in a bit.
We also need unit tests for this, since this still relies on integration
tests due to the cost of the aggressive and tight refactoring iterations.
DEV-12151
Previously, when adjacent duplicate spans were both resolved, if one failed,
the other certainly would, which would result in duplicate labels each
squash. Elided spans do not have syslabels, and so this is no longer a
concern.
DEV-12151
This was removed in a previous commit while working on simplifying the
implementation, with the hope of returning to it once things were in a
better place. They are, so let's bring it back.
DEV-12151
`SpanLabel` was created during a very early refactoring of this system, and
I've just been fighting with it sense. This removes it, and simplifies
some things in the process.
It also makes clear that `Level` is never optional and removes the awkward
`Level::default` that was there previously; the default is now the lowest
level, which will always be able to be escalated.
DEV-12151
This does what the original proof-of-concept implementation did---skip a
span that was just processed, since it'll be squashed into the previous
anyway. These duplicate spans originate from the diagnostic system when
producing supplemental help information.
DEV-12151
Tests are large and will be getting larger. The source will also grow as
it's better documented and cleaned up. It's getting more difficult to
navigate efficiently and concurrently modify implementation and tests, and
parsing via LSP is getting slower with certain types of changes.
DEV-12151
Alright, starting to settle on an abstraction now, and things are coming
together. This gives us line numbers in the previously-empty gutter, and
widens the gutter to accommodate. Gutters are normalized across
sections. Sections are not yet collapsed for sequential line numbers in the
same context.
Exciting!
Here's an example, on an xmlo file:
error: expected closing tag for `preproc:symtable`
--> /home/.../foo.xmlo:16:4
|
16 | <preproc:symtable xmlns:map="http://www.w3.org/2005/xpath-functions/map">
| ----------------- note: element `preproc:symtable` is opened here
--> /home/.../foo.xmlo:11326:4
|
11326 | </preproc:wrong>
| ^^^^^^^^^^^^^^^^ error: expected `</preproc:symtable>`
DEV-12151
The `Section` itself is now responsible for outputting the gutter, which
puts us in a position to be able to apply consistent formatting without
having to propagate width data to every line variant.
Now `SourceLine` _does_ actually correspond to a line of output, which will
allow for better formatting (e.g. collapsing padding) and, importantly,
proper management of gutters.
Note that the seemingly unnecessary `SectionSourceLine` allows for a subtle
consistent formatting for all variants' gutters in `SectionLine`, which will
allow us to hoist that rendering out in the next commit. The other option
was to include a trailing space for padding and marks, but that is not only
sloppy and undesirable, but asking for confusion, especially in editors (like
mine) that trim trailing whitespace.
DEV-12151
If a column isn't present, it degrades to displaying labels like footnotes
anyway, so this simplifies the system rather than catering to a rare
case. With that said, this does lose functionality, since it does not
render the source line at all, even though we _could_ do so.
I may re-introduce that rendering after some further refactoring,
specifically for gutters.
DEV-12151
Using a byte vector just makes life more difficult with regard to preparing
the diagnostic reports. We're already validating UTF-8 data for column
generation, which is necessary for a robust report, so let's just store it
as a String to begin with.
DEV-12151
Note that, if a span is first encountered with a mark but with _no_ label,
the first label (if collapsed) will be on the next line. This allows a span
to be marked without extra visual noise if it's not necessary, and to be
able to trust that it'll stay that way.
Until coloring is introduced, this may or may not be easier to read
depending on context.
This is also not yet taking into account where on the line it begins, and so
may render poorly if the span is at the end of a line. That will be fixed
later on.
DEV-12151
This is now visible in the diagnostic output. Example at this point in
time, on an xmlo file for one of our smallest systems:
error: expected closing tag for `preproc:symtable`
--> /home/.../foo.xmlo:16:4
|
| <preproc:symtable xmlns:map="http://www.w3.org/2005/xpath-functions/map">
| -----------------
= note: element `preproc:symtable` is opened here
--> /home/.../foo.xmlo:11326:4
|
| </preproc:wrong>
| ^^^^^^^^^^^^^^^^
= error: expected `</preproc:symtable>`
DEV-12151
Looking more and more Rust-like. Shameless copy.
TBH I forget what character it uses for help, but it's easy enough to
change.
Also, to be clear: this is modeled after Rust, but it's not a requirement of
mine that it look exactly like it. I just like the general style; I'll
surely deviate over time, as appropriate (or as I feel like it).
DEV-12151
This has the effect of highlighting the columns of the source lines using
'^' as an underline.
The next step will be to have the underline character depend on the
`Level`.
If this commit message doesn't sound all that exciting, given what it
finally achieved after all this time, it's because I'm exhausted, and my
prototype has already taken my excitement. But this is significant, given
all the work leading up to it.
There is some code cleanup needed and some unit tests that ought to be
written rather than relying on integration, but considering how much this is
being refactored, I don't want to add to that refactoring cost just yet
before gutters are introduced and I know things are settled for now.
DEV-12151
This has been a lot of refactoring for something that I prototyped a week
ago, and the prototype is still further along in its output formatting (it
has line numbering in gutters and span markings).
But, this has come a long way, and I'm happy with it overall, though I'm not
happy with my slow pace and struggle to maintain focus. But those are
personal issues.
This leaves a lot to be desired, but at the same time is still really
helpful. There's a couple notable TODOs regarding pointless allocation and
UTF8 re-checking, but otherwise, the feature-related steps are:
- Gutters with line numbers; and
- Marking columns associated with the span.
DEV-12151
Rather than squashing as a separate operation, and explicitly denoting when
it occurred, we'll just always squash, as was done before these changes. It
doesn't really make sense to make this optional and there's not any value in
keeping the decision around.
This also sets us up favorably for future changes: it creates a vector of
labels, which can be analyzed later to determine how to best lay out marks
and labels.
DEV-12151
Just renames the lifetime to refer to the `Diagnostic`, rather than a
`Label` returned by it, which was all `'l` was previously used for.
Note that many labels have a `'static` lifetime; this doesn't change that or
somehow cause it to reallocate; the label must life _for at least `'d`_.
DEV-12151
Rather than rendering the diagnostic `Display` message to a string only to
copy it to yet another buffer later on, this simply stores a reference to
the `Diagnostic` that was provided. This also adds a type to the `Report`
associating it with the provided `Diagnostic`, which does seem appropriate,
given that the report was produced for it.
I should probably rename '{l=>d} now.
DEV-12151
Rather than writing to the provided `Write` object, this produces a `Report`
object. While a lifetime still exists for the diagnostic data (labels,
specifically), I was able to remove the other lifetime resulting from
`ResolvedSpan` by transferring ownership of the data to the `Report`
itself. Once actual source lines are integrated shortly, `Report` will
include those as well.
This has been a tedious process, but it's coming together. Hopefully these
commits documenting the progressive and ugly refactoring are found useful by
some reader in the future.
DEV-12151
The line number was getting special treatment that is simply not worth the
cost (with regards to how burdensome it is on the type definitions). This
simplifies things quite a bit.
If we want header customization in the future, we can worry about that in a
different way, or allow the header as a whole to be swapped out, rather than
its constituents.
DEV-12151
`HeadingColNum` is no longer constructed by `HeadingLineNum`. This both
narrows the types and required data (e.g. removing dummy values in test
cases), and reduces the coupling (by favoring composition, but still coupled
with the concrete type).
DEV-12151
I'm unhappy with the current state of this, which is why I haven't settled
on docs or unit tests for these changes yet (though note that the
integration tests do cover these changes)---this is still a prototype
refactoring.
In particular, this needs to do more lowering---the `ResolvedSpan` and
`MaybeResolvedSpan` need to be eliminated and lowered into exactly what is
needed so that we can stop reasoning about them and propagating them.
Further, having lines and columns lazily evaluate themselves for
display---based on `MaybeResolvedSpan`---adds extra generics that shouldn't
be necessary; they should be pre-computed and store the concrete data they
need in variants. Display shouldn't involve computation beyond formatting
of pre-computed data.
That was always the plan, but this refactoring has been incremental.
Anyway: this is in a working and integration-tested state, but it's going to
change.
DEV-12151
This generalizes the types a bit more and introduces unit tests. Note that
these are still also covered by integration tests.
The next step will be to finish generalizing
`<VisualReporter as Reporter>::render`, after which I'll get back to the
task of outputting the source line along with markings and labels.
DEV-12151
This is just to provide clarity. `ctx` is not so widely used that we
benefit from such a short identifier, and it's not worth the cognitive
burden of people unfamiliar with what it may mean.
DEV-12151
This is redundant with the `Endpoints` variant, although it did read
better. It's just another case to have to handle.
I was originally going to use `std::ops::RangeInclusive` for `Endpoints`,
however that struct also contains an extra bool indicating whether it was
exhausted (as an iterator), which isn't appropriate for this.
DEV-12151