2021-10-21 16:17:17 -04:00
|
|
|
// XIR errors
|
|
|
|
//
|
2023-01-17 23:09:25 -05:00
|
|
|
// Copyright (C) 2014-2023 Ryan Specialty, LLC.
|
2021-10-21 16:17:17 -04:00
|
|
|
//
|
|
|
|
// This file is part of TAME.
|
|
|
|
//
|
|
|
|
// This program is free software: you can redistribute it and/or modify
|
|
|
|
// it under the terms of the GNU General Public License as published by
|
|
|
|
// the Free Software Foundation, either version 3 of the License, or
|
|
|
|
// (at your option) any later version.
|
|
|
|
//
|
|
|
|
// This program is distributed in the hope that it will be useful,
|
|
|
|
// but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
// GNU General Public License for more details.
|
|
|
|
//
|
|
|
|
// You should have received a copy of the GNU General Public License
|
|
|
|
// along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
|
|
|
|
//! XIR error information.
|
|
|
|
|
2022-04-08 13:52:16 -04:00
|
|
|
use super::QName;
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
use crate::{
|
|
|
|
diagnose::{Annotate, AnnotatedSpan, Diagnostic},
|
|
|
|
span::Span,
|
|
|
|
sym::SymbolId,
|
|
|
|
};
|
2021-10-21 16:17:17 -04:00
|
|
|
use std::{fmt::Display, str::Utf8Error};
|
|
|
|
|
|
|
|
/// Error attempting to produce a XIR object.
|
tamer: xir::XirString: WIP implementation (likely going away)
I'm not fond of this implementation, which is why it's not fully
completed. I wanted to commit this for future reference, and take the
opportunity to explain why I don't like it.
First: this task started as an idea to implement a third variant to
AttrValue and friends that indicates that a value is fixed, in the sense of
a fixed-point function: escaped or unescaped, its value is the same. This
would allow us to skip wasteful escape/unescape operations.
In doing so, it became obvious that there's no need to leak this information
through the API, and indeed, no part of the system should care. When we
read XML, it should be unescaped, and when we write, it should be
escaped. The reason that this didn't quite happen to begin with was an
optimization: I'll be creating an echo writer in place of the current
filesystem-based copy in tamec shortly, and this would allow streaming XIR
directly from the reader to the writer without any unescaping or
re-escaping.
When we unescape, we know the value that it came from, so we could simply
store both symbols---they're 32-bit, so it results in a nicely compressed
64-bit value, so it's essentially cost-free, as long as we accept the
expense of internment. This is `XirString`. Then, when we want to escape
or unescape, we first check to see whether a symbol already exists and, if
so, use it.
While this works well for echoing streams, it won't work all that well in
practice: the unescaped SymbolId will be taken and the XirString discarded,
since nothing after XIR should be coupled with it. Then, when we later
construct a XIR stream for writting, XirString will no longer be available
and our previously known escape is lost, so the writer will have to
re-escape.
Further, if we look at XirString's generic for the XirStringEscaper---it
uses phantom, which hints that maybe it's not in the best place. Indeed,
I've already acknowledged that only a reader unescapes and only a writer
escapes, and that the rest of the system works with normal (unescaped)
values, so only readers and writers should be part of this process. I also
already acknowledged that XirString would be lost and only the unescaped
SymbolId would be used.
So what's the point of XirString, then, if it won't be a useful optimization
beyond the temporary echo writer?
Instead, we can take the XirStringWriter and implement two caches on that:
mapping SymbolId from escaped->unescaped and vice-versa. These can be
simple vectors, since SymbolId is a 32-bit value we will not have much
wasted space for symbols that never get read or written. We could even
optimize for preinterned symbols using markers, though I'll probably not do
so, and I'll explain why later.
If we do _that_, we get even _better_ optimizations through caching that
_will_ apply in the general case (so, not just for echo), and we're able to
ditch XirString entirely and simply use a SymbolId. This makes for a much
more friendly API that isn't leaking implementation details, though it
_does_ put an onus on the caller to pass the encoder to both the reader and
the writer, _if_ it wants to take advantage of a cache. But that burden is
not significant (and is, again, optional if we don't want it).
So, that'll be the next step.
2021-11-10 09:42:18 -05:00
|
|
|
#[derive(Debug, PartialEq)]
|
2021-10-21 16:17:17 -04:00
|
|
|
pub enum Error {
|
|
|
|
/// Provided name contains a `':'`.
|
tamer: xir::reader: Initial introduction of spans
This is a large change, and was a bit of a tedious one, given the
comprehensive tests.
This introduces proper offsets and lengths for spans, with the exception of
some quick-xml errors that still need proper mapping. Further, this still
uses `UNKNOWN_CONTEXT`, which will be resolved shortly.
This also introduces `SpanlessError`, which `Error` explicitly _does not_
implement `From<SpanlessError>` for---this forces the caller to provide a
span before the error is compatable with the return value, ensuring that
spans will actually be available rather than forgotten for errors. This is
important, given that errors are generally less tested than the happy path,
and errors are when users need us the most (so, need span information).
Further, I had to use pointer arithmetic in order to calculate many of the
spans, because quick-xml does not provide enough information. There's no
safety considerations here, and the comprehensive unit test will ensure
correct behavior if the implementation changes in the future.
I would like to introduce typed spans at some point---I made some
opinionated choices when it comes to what the spans ought to
represent. Specifically, whether to include the `<` or `>` with the open
span (depends), whether to include quotes with attribute values (no),
and some other details highlighted in the test cases. If we provide typed
spans, then we could, knowing the type of span, calculate other spans on
request, e.g. to include or omit quotes for attributes. Different such
spans may be useful in different situations when presenting information to
the user.
This also highlights gaps in the tokens emitted by XIR, such as whitespace
between attributes, the `=` between name and value, and so on. These are
important when it comes to code formatting, so that we can reliably
reconstruct the XML tree, but it's not important right now. I anticipate
future changes would allow the XIR reader to be configured (perhaps via
generics, like a strategy-type pattern) to optionally omit these tokens if
desired.
Anyway, more to come.
DEV-10934
2022-04-08 11:03:46 -04:00
|
|
|
NCColon(SymbolId, Span),
|
2021-10-21 16:17:17 -04:00
|
|
|
/// Provided string contains non-ASCII-whitespace characters.
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
NotWhitespace(SymbolId, Span),
|
2021-10-21 16:17:17 -04:00
|
|
|
/// Provided QName is not valid.
|
tamer: xir::reader: Initial introduction of spans
This is a large change, and was a bit of a tedious one, given the
comprehensive tests.
This introduces proper offsets and lengths for spans, with the exception of
some quick-xml errors that still need proper mapping. Further, this still
uses `UNKNOWN_CONTEXT`, which will be resolved shortly.
This also introduces `SpanlessError`, which `Error` explicitly _does not_
implement `From<SpanlessError>` for---this forces the caller to provide a
span before the error is compatable with the return value, ensuring that
spans will actually be available rather than forgotten for errors. This is
important, given that errors are generally less tested than the happy path,
and errors are when users need us the most (so, need span information).
Further, I had to use pointer arithmetic in order to calculate many of the
spans, because quick-xml does not provide enough information. There's no
safety considerations here, and the comprehensive unit test will ensure
correct behavior if the implementation changes in the future.
I would like to introduce typed spans at some point---I made some
opinionated choices when it comes to what the spans ought to
represent. Specifically, whether to include the `<` or `>` with the open
span (depends), whether to include quotes with attribute values (no),
and some other details highlighted in the test cases. If we provide typed
spans, then we could, knowing the type of span, calculate other spans on
request, e.g. to include or omit quotes for attributes. Different such
spans may be useful in different situations when presenting information to
the user.
This also highlights gaps in the tokens emitted by XIR, such as whitespace
between attributes, the `=` between name and value, and so on. These are
important when it comes to code formatting, so that we can reliably
reconstruct the XML tree, but it's not important right now. I anticipate
future changes would allow the XIR reader to be configured (perhaps via
generics, like a strategy-type pattern) to optionally omit these tokens if
desired.
Anyway, more to come.
DEV-10934
2022-04-08 11:03:46 -04:00
|
|
|
InvalidQName(SymbolId, Span),
|
2022-04-07 12:08:51 -04:00
|
|
|
/// A UTF-8 error together with the byte slice that caused it.
|
|
|
|
///
|
|
|
|
/// By storing the raw bytes instead of a string,
|
|
|
|
/// we allow the displayer to determine how to handle invalid UTF-8
|
|
|
|
/// encodings.
|
tamer: xir::reader: Initial introduction of spans
This is a large change, and was a bit of a tedious one, given the
comprehensive tests.
This introduces proper offsets and lengths for spans, with the exception of
some quick-xml errors that still need proper mapping. Further, this still
uses `UNKNOWN_CONTEXT`, which will be resolved shortly.
This also introduces `SpanlessError`, which `Error` explicitly _does not_
implement `From<SpanlessError>` for---this forces the caller to provide a
span before the error is compatable with the return value, ensuring that
spans will actually be available rather than forgotten for errors. This is
important, given that errors are generally less tested than the happy path,
and errors are when users need us the most (so, need span information).
Further, I had to use pointer arithmetic in order to calculate many of the
spans, because quick-xml does not provide enough information. There's no
safety considerations here, and the comprehensive unit test will ensure
correct behavior if the implementation changes in the future.
I would like to introduce typed spans at some point---I made some
opinionated choices when it comes to what the spans ought to
represent. Specifically, whether to include the `<` or `>` with the open
span (depends), whether to include quotes with attribute values (no),
and some other details highlighted in the test cases. If we provide typed
spans, then we could, knowing the type of span, calculate other spans on
request, e.g. to include or omit quotes for attributes. Different such
spans may be useful in different situations when presenting information to
the user.
This also highlights gaps in the tokens emitted by XIR, such as whitespace
between attributes, the `=` between name and value, and so on. These are
important when it comes to code formatting, so that we can reliably
reconstruct the XML tree, but it's not important right now. I anticipate
future changes would allow the XIR reader to be configured (perhaps via
generics, like a strategy-type pattern) to optionally omit these tokens if
desired.
Anyway, more to come.
DEV-10934
2022-04-08 11:03:46 -04:00
|
|
|
/// Further,
|
|
|
|
/// we cannot intern strings that are not valid UTF-8.
|
|
|
|
InvalidUtf8(Utf8Error, Vec<u8>, Span),
|
2022-04-07 12:08:51 -04:00
|
|
|
/// XML 1.0 only.
|
|
|
|
///
|
|
|
|
/// Other versions are not widely in use
|
|
|
|
/// (only 1.1 exists at the time of writing)
|
|
|
|
/// and providing that is either in error,
|
|
|
|
/// copy/paste,
|
|
|
|
/// or the user is expecting something they're not going to get.
|
|
|
|
UnsupportedXmlVersion(SymbolId, Span),
|
|
|
|
/// TAMER expects UTF-8 encoding for everything,
|
|
|
|
/// which should not be an unreasonable expectation.
|
|
|
|
UnsupportedEncoding(SymbolId, Span),
|
2022-04-08 13:52:16 -04:00
|
|
|
/// The named attribute is missing a value.
|
|
|
|
///
|
|
|
|
/// The span is expected to placed at the offset where the value is
|
|
|
|
/// expected.
|
|
|
|
/// The character `=` may or may not be present.
|
|
|
|
AttrValueExpected(Option<QName>, Span),
|
|
|
|
/// An attribute value was found but was not quoted.
|
|
|
|
///
|
|
|
|
/// The symbol here should be the name of the attribute.
|
|
|
|
AttrValueUnquoted(Option<QName>, Span),
|
tamer: xir::XirString: WIP implementation (likely going away)
I'm not fond of this implementation, which is why it's not fully
completed. I wanted to commit this for future reference, and take the
opportunity to explain why I don't like it.
First: this task started as an idea to implement a third variant to
AttrValue and friends that indicates that a value is fixed, in the sense of
a fixed-point function: escaped or unescaped, its value is the same. This
would allow us to skip wasteful escape/unescape operations.
In doing so, it became obvious that there's no need to leak this information
through the API, and indeed, no part of the system should care. When we
read XML, it should be unescaped, and when we write, it should be
escaped. The reason that this didn't quite happen to begin with was an
optimization: I'll be creating an echo writer in place of the current
filesystem-based copy in tamec shortly, and this would allow streaming XIR
directly from the reader to the writer without any unescaping or
re-escaping.
When we unescape, we know the value that it came from, so we could simply
store both symbols---they're 32-bit, so it results in a nicely compressed
64-bit value, so it's essentially cost-free, as long as we accept the
expense of internment. This is `XirString`. Then, when we want to escape
or unescape, we first check to see whether a symbol already exists and, if
so, use it.
While this works well for echoing streams, it won't work all that well in
practice: the unescaped SymbolId will be taken and the XirString discarded,
since nothing after XIR should be coupled with it. Then, when we later
construct a XIR stream for writting, XirString will no longer be available
and our previously known escape is lost, so the writer will have to
re-escape.
Further, if we look at XirString's generic for the XirStringEscaper---it
uses phantom, which hints that maybe it's not in the best place. Indeed,
I've already acknowledged that only a reader unescapes and only a writer
escapes, and that the rest of the system works with normal (unescaped)
values, so only readers and writers should be part of this process. I also
already acknowledged that XirString would be lost and only the unescaped
SymbolId would be used.
So what's the point of XirString, then, if it won't be a useful optimization
beyond the temporary echo writer?
Instead, we can take the XirStringWriter and implement two caches on that:
mapping SymbolId from escaped->unescaped and vice-versa. These can be
simple vectors, since SymbolId is a 32-bit value we will not have much
wasted space for symbols that never get read or written. We could even
optimize for preinterned symbols using markers, though I'll probably not do
so, and I'll explain why later.
If we do _that_, we get even _better_ optimizations through caching that
_will_ apply in the general case (so, not just for echo), and we're able to
ditch XirString entirely and simply use a SymbolId. This makes for a much
more friendly API that isn't leaking implementation details, though it
_does_ put an onus on the caller to pass the encoder to both the reader and
the writer, _if_ it wants to take advantage of a cache. But that burden is
not significant (and is, again, optional if we don't want it).
So, that'll be the next step.
2021-11-10 09:42:18 -05:00
|
|
|
|
tamer: xir::reader: Initial introduction of spans
This is a large change, and was a bit of a tedious one, given the
comprehensive tests.
This introduces proper offsets and lengths for spans, with the exception of
some quick-xml errors that still need proper mapping. Further, this still
uses `UNKNOWN_CONTEXT`, which will be resolved shortly.
This also introduces `SpanlessError`, which `Error` explicitly _does not_
implement `From<SpanlessError>` for---this forces the caller to provide a
span before the error is compatable with the return value, ensuring that
spans will actually be available rather than forgotten for errors. This is
important, given that errors are generally less tested than the happy path,
and errors are when users need us the most (so, need span information).
Further, I had to use pointer arithmetic in order to calculate many of the
spans, because quick-xml does not provide enough information. There's no
safety considerations here, and the comprehensive unit test will ensure
correct behavior if the implementation changes in the future.
I would like to introduce typed spans at some point---I made some
opinionated choices when it comes to what the spans ought to
represent. Specifically, whether to include the `<` or `>` with the open
span (depends), whether to include quotes with attribute values (no),
and some other details highlighted in the test cases. If we provide typed
spans, then we could, knowing the type of span, calculate other spans on
request, e.g. to include or omit quotes for attributes. Different such
spans may be useful in different situations when presenting information to
the user.
This also highlights gaps in the tokens emitted by XIR, such as whitespace
between attributes, the `=` between name and value, and so on. These are
important when it comes to code formatting, so that we can reliably
reconstruct the XML tree, but it's not important right now. I anticipate
future changes would allow the XIR reader to be configured (perhaps via
generics, like a strategy-type pattern) to optionally omit these tokens if
desired.
Anyway, more to come.
DEV-10934
2022-04-08 11:03:46 -04:00
|
|
|
// TODO: Better error translation.
|
2022-04-11 16:08:50 -04:00
|
|
|
QuickXmlError(QuickXmlError, Span),
|
tamer: xir::reader: Initial introduction of spans
This is a large change, and was a bit of a tedious one, given the
comprehensive tests.
This introduces proper offsets and lengths for spans, with the exception of
some quick-xml errors that still need proper mapping. Further, this still
uses `UNKNOWN_CONTEXT`, which will be resolved shortly.
This also introduces `SpanlessError`, which `Error` explicitly _does not_
implement `From<SpanlessError>` for---this forces the caller to provide a
span before the error is compatable with the return value, ensuring that
spans will actually be available rather than forgotten for errors. This is
important, given that errors are generally less tested than the happy path,
and errors are when users need us the most (so, need span information).
Further, I had to use pointer arithmetic in order to calculate many of the
spans, because quick-xml does not provide enough information. There's no
safety considerations here, and the comprehensive unit test will ensure
correct behavior if the implementation changes in the future.
I would like to introduce typed spans at some point---I made some
opinionated choices when it comes to what the spans ought to
represent. Specifically, whether to include the `<` or `>` with the open
span (depends), whether to include quotes with attribute values (no),
and some other details highlighted in the test cases. If we provide typed
spans, then we could, knowing the type of span, calculate other spans on
request, e.g. to include or omit quotes for attributes. Different such
spans may be useful in different situations when presenting information to
the user.
This also highlights gaps in the tokens emitted by XIR, such as whitespace
between attributes, the `=` between name and value, and so on. These are
important when it comes to code formatting, so that we can reliably
reconstruct the XML tree, but it's not important right now. I anticipate
future changes would allow the XIR reader to be configured (perhaps via
generics, like a strategy-type pattern) to optionally omit these tokens if
desired.
Anyway, more to come.
DEV-10934
2022-04-08 11:03:46 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
impl Error {
|
|
|
|
pub fn from_with_span<E: Into<SpanlessError>>(
|
|
|
|
span: Span,
|
|
|
|
) -> impl FnOnce(E) -> Self {
|
|
|
|
move |e: E| e.into().with_span(span)
|
|
|
|
}
|
2021-10-21 16:17:17 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
impl Display for Error {
|
|
|
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
2022-04-08 13:52:16 -04:00
|
|
|
use Error::*;
|
|
|
|
|
2021-10-21 16:17:17 -04:00
|
|
|
match self {
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
NCColon(sym, _) => {
|
|
|
|
write!(f, "NCName `{sym}` cannot contain `:`",)
|
|
|
|
}
|
|
|
|
|
|
|
|
NotWhitespace(_s, _) => {
|
|
|
|
write!(f, "whitespace expected")
|
|
|
|
}
|
|
|
|
|
|
|
|
InvalidQName(qname, _) => {
|
|
|
|
write!(f, "invalid QName `{qname}`")
|
|
|
|
}
|
|
|
|
|
|
|
|
InvalidUtf8(inner, _bytes, _) => Display::fmt(inner, f),
|
|
|
|
|
|
|
|
UnsupportedXmlVersion(ver, _) => {
|
|
|
|
write!(f, "unsupported XML version `{ver}`")
|
|
|
|
}
|
|
|
|
|
|
|
|
UnsupportedEncoding(enc, _) => {
|
2022-04-07 12:08:51 -04:00
|
|
|
// TODO: when we have hints,
|
|
|
|
// indicate that they can also entirely remove this
|
|
|
|
// attribute to resolve the error
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
write!(f, "unsupported encoding `{enc}`")
|
2022-04-07 12:08:51 -04:00
|
|
|
}
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
|
|
|
|
AttrValueExpected(Some(name), _) => {
|
|
|
|
write!(f, "value expected for attribute `@{name}`")
|
2022-04-08 13:52:16 -04:00
|
|
|
}
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
|
2022-04-08 13:52:16 -04:00
|
|
|
// TODO: Parsers should provide the name.
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
AttrValueExpected(None, _) => {
|
|
|
|
write!(f, "value expected for attribute")
|
2022-04-08 13:52:16 -04:00
|
|
|
}
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
|
|
|
|
AttrValueUnquoted(Some(name), _) => {
|
|
|
|
write!(f, "attribute `@{name}` missing quotes")
|
2022-04-08 13:52:16 -04:00
|
|
|
}
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
|
2022-04-08 13:52:16 -04:00
|
|
|
// TODO: Parsers should provide the name.
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
AttrValueUnquoted(None, _) => {
|
|
|
|
write!(f, "value for attribute is missing quotes")
|
2022-04-08 13:52:16 -04:00
|
|
|
}
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
|
tamer: xir::reader: Initial introduction of spans
This is a large change, and was a bit of a tedious one, given the
comprehensive tests.
This introduces proper offsets and lengths for spans, with the exception of
some quick-xml errors that still need proper mapping. Further, this still
uses `UNKNOWN_CONTEXT`, which will be resolved shortly.
This also introduces `SpanlessError`, which `Error` explicitly _does not_
implement `From<SpanlessError>` for---this forces the caller to provide a
span before the error is compatable with the return value, ensuring that
spans will actually be available rather than forgotten for errors. This is
important, given that errors are generally less tested than the happy path,
and errors are when users need us the most (so, need span information).
Further, I had to use pointer arithmetic in order to calculate many of the
spans, because quick-xml does not provide enough information. There's no
safety considerations here, and the comprehensive unit test will ensure
correct behavior if the implementation changes in the future.
I would like to introduce typed spans at some point---I made some
opinionated choices when it comes to what the spans ought to
represent. Specifically, whether to include the `<` or `>` with the open
span (depends), whether to include quotes with attribute values (no),
and some other details highlighted in the test cases. If we provide typed
spans, then we could, knowing the type of span, calculate other spans on
request, e.g. to include or omit quotes for attributes. Different such
spans may be useful in different situations when presenting information to
the user.
This also highlights gaps in the tokens emitted by XIR, such as whitespace
between attributes, the `=` between name and value, and so on. These are
important when it comes to code formatting, so that we can reliably
reconstruct the XML tree, but it's not important right now. I anticipate
future changes would allow the XIR reader to be configured (perhaps via
generics, like a strategy-type pattern) to optionally omit these tokens if
desired.
Anyway, more to come.
DEV-10934
2022-04-08 11:03:46 -04:00
|
|
|
// TODO: Translate error messages
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
QuickXmlError(inner, _) => {
|
|
|
|
write!(f, "internal parser error: {inner}")
|
tamer: xir::XirString: WIP implementation (likely going away)
I'm not fond of this implementation, which is why it's not fully
completed. I wanted to commit this for future reference, and take the
opportunity to explain why I don't like it.
First: this task started as an idea to implement a third variant to
AttrValue and friends that indicates that a value is fixed, in the sense of
a fixed-point function: escaped or unescaped, its value is the same. This
would allow us to skip wasteful escape/unescape operations.
In doing so, it became obvious that there's no need to leak this information
through the API, and indeed, no part of the system should care. When we
read XML, it should be unescaped, and when we write, it should be
escaped. The reason that this didn't quite happen to begin with was an
optimization: I'll be creating an echo writer in place of the current
filesystem-based copy in tamec shortly, and this would allow streaming XIR
directly from the reader to the writer without any unescaping or
re-escaping.
When we unescape, we know the value that it came from, so we could simply
store both symbols---they're 32-bit, so it results in a nicely compressed
64-bit value, so it's essentially cost-free, as long as we accept the
expense of internment. This is `XirString`. Then, when we want to escape
or unescape, we first check to see whether a symbol already exists and, if
so, use it.
While this works well for echoing streams, it won't work all that well in
practice: the unescaped SymbolId will be taken and the XirString discarded,
since nothing after XIR should be coupled with it. Then, when we later
construct a XIR stream for writting, XirString will no longer be available
and our previously known escape is lost, so the writer will have to
re-escape.
Further, if we look at XirString's generic for the XirStringEscaper---it
uses phantom, which hints that maybe it's not in the best place. Indeed,
I've already acknowledged that only a reader unescapes and only a writer
escapes, and that the rest of the system works with normal (unescaped)
values, so only readers and writers should be part of this process. I also
already acknowledged that XirString would be lost and only the unescaped
SymbolId would be used.
So what's the point of XirString, then, if it won't be a useful optimization
beyond the temporary echo writer?
Instead, we can take the XirStringWriter and implement two caches on that:
mapping SymbolId from escaped->unescaped and vice-versa. These can be
simple vectors, since SymbolId is a 32-bit value we will not have much
wasted space for symbols that never get read or written. We could even
optimize for preinterned symbols using markers, though I'll probably not do
so, and I'll explain why later.
If we do _that_, we get even _better_ optimizations through caching that
_will_ apply in the general case (so, not just for echo), and we're able to
ditch XirString entirely and simply use a SymbolId. This makes for a much
more friendly API that isn't leaking implementation details, though it
_does_ put an onus on the caller to pass the encoder to both the reader and
the writer, _if_ it wants to take advantage of a cache. But that burden is
not significant (and is, again, optional if we don't want it).
So, that'll be the next step.
2021-11-10 09:42:18 -05:00
|
|
|
}
|
2021-10-21 16:17:17 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl std::error::Error for Error {
|
|
|
|
fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
|
|
|
|
match self {
|
tamer: xir::reader: Initial introduction of spans
This is a large change, and was a bit of a tedious one, given the
comprehensive tests.
This introduces proper offsets and lengths for spans, with the exception of
some quick-xml errors that still need proper mapping. Further, this still
uses `UNKNOWN_CONTEXT`, which will be resolved shortly.
This also introduces `SpanlessError`, which `Error` explicitly _does not_
implement `From<SpanlessError>` for---this forces the caller to provide a
span before the error is compatable with the return value, ensuring that
spans will actually be available rather than forgotten for errors. This is
important, given that errors are generally less tested than the happy path,
and errors are when users need us the most (so, need span information).
Further, I had to use pointer arithmetic in order to calculate many of the
spans, because quick-xml does not provide enough information. There's no
safety considerations here, and the comprehensive unit test will ensure
correct behavior if the implementation changes in the future.
I would like to introduce typed spans at some point---I made some
opinionated choices when it comes to what the spans ought to
represent. Specifically, whether to include the `<` or `>` with the open
span (depends), whether to include quotes with attribute values (no),
and some other details highlighted in the test cases. If we provide typed
spans, then we could, knowing the type of span, calculate other spans on
request, e.g. to include or omit quotes for attributes. Different such
spans may be useful in different situations when presenting information to
the user.
This also highlights gaps in the tokens emitted by XIR, such as whitespace
between attributes, the `=` between name and value, and so on. These are
important when it comes to code formatting, so that we can reliably
reconstruct the XML tree, but it's not important right now. I anticipate
future changes would allow the XIR reader to be configured (perhaps via
generics, like a strategy-type pattern) to optionally omit these tokens if
desired.
Anyway, more to come.
DEV-10934
2022-04-08 11:03:46 -04:00
|
|
|
Self::InvalidUtf8(e, ..) => Some(e),
|
|
|
|
Self::QuickXmlError(e, ..) => Some(e),
|
2021-10-21 16:17:17 -04:00
|
|
|
_ => None,
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
impl Diagnostic for Error {
|
|
|
|
fn describe(&self) -> Vec<AnnotatedSpan> {
|
|
|
|
use Error::*;
|
|
|
|
|
|
|
|
match self {
|
|
|
|
// NB: This is often constructed from a QName and so we may not
|
|
|
|
// have as much context as we would like;
|
|
|
|
// don't be too specific.
|
|
|
|
NCColon(_, span) => span.error("unexpected `:` here").into(),
|
|
|
|
|
|
|
|
NotWhitespace(_, span) => {
|
|
|
|
span.error("whitespace expected here").into()
|
|
|
|
}
|
|
|
|
|
|
|
|
InvalidQName(_, span) => span.mark_error().into(),
|
|
|
|
|
|
|
|
InvalidUtf8(_, bytes, span) => {
|
|
|
|
span.error(format!("has byte sequence `{bytes:?}`",)).into()
|
|
|
|
}
|
|
|
|
|
|
|
|
UnsupportedXmlVersion(_, span) => {
|
|
|
|
// TODO: suggested fix: replacement of span with `1.0`
|
|
|
|
span.error("expected version `1.0`").into()
|
|
|
|
}
|
|
|
|
|
|
|
|
UnsupportedEncoding(_, span) => {
|
|
|
|
// TODO: suggested fix: remove attribute and whitespace
|
|
|
|
span.error("expected `utf-8` or `UTF-8`").into()
|
|
|
|
}
|
|
|
|
|
|
|
|
AttrValueExpected(_, span) => {
|
|
|
|
span.error("attribute value expected").into()
|
|
|
|
}
|
|
|
|
|
|
|
|
AttrValueUnquoted(_, span) => {
|
|
|
|
// TODO: suggested fix: wrap in quotes
|
|
|
|
span.error("quotes expected around this value").into()
|
|
|
|
}
|
|
|
|
|
|
|
|
QuickXmlError(_, span) => {
|
|
|
|
// TODO: note saying that this should probably be reported
|
|
|
|
// to provide a better error
|
|
|
|
span.mark_error().into()
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
tamer: xir::reader: Initial introduction of spans
This is a large change, and was a bit of a tedious one, given the
comprehensive tests.
This introduces proper offsets and lengths for spans, with the exception of
some quick-xml errors that still need proper mapping. Further, this still
uses `UNKNOWN_CONTEXT`, which will be resolved shortly.
This also introduces `SpanlessError`, which `Error` explicitly _does not_
implement `From<SpanlessError>` for---this forces the caller to provide a
span before the error is compatable with the return value, ensuring that
spans will actually be available rather than forgotten for errors. This is
important, given that errors are generally less tested than the happy path,
and errors are when users need us the most (so, need span information).
Further, I had to use pointer arithmetic in order to calculate many of the
spans, because quick-xml does not provide enough information. There's no
safety considerations here, and the comprehensive unit test will ensure
correct behavior if the implementation changes in the future.
I would like to introduce typed spans at some point---I made some
opinionated choices when it comes to what the spans ought to
represent. Specifically, whether to include the `<` or `>` with the open
span (depends), whether to include quotes with attribute values (no),
and some other details highlighted in the test cases. If we provide typed
spans, then we could, knowing the type of span, calculate other spans on
request, e.g. to include or omit quotes for attributes. Different such
spans may be useful in different situations when presenting information to
the user.
This also highlights gaps in the tokens emitted by XIR, such as whitespace
between attributes, the `=` between name and value, and so on. These are
important when it comes to code formatting, so that we can reliably
reconstruct the XML tree, but it's not important right now. I anticipate
future changes would allow the XIR reader to be configured (perhaps via
generics, like a strategy-type pattern) to optionally omit these tokens if
desired.
Anyway, more to come.
DEV-10934
2022-04-08 11:03:46 -04:00
|
|
|
/// An [`Error`] that requires its [`Span`] to be filled in by the caller.
|
|
|
|
///
|
|
|
|
/// These errors should not be converted automatically,
|
|
|
|
/// since only the caller can know the correct information to provide for
|
|
|
|
/// a useful [`Span`].
|
|
|
|
/// Failure to provide a useful span will betray the user when they need us
|
|
|
|
/// the most:
|
|
|
|
/// debugging an error.
|
|
|
|
///
|
|
|
|
/// As such,
|
|
|
|
/// please do not implement `From<SpanlessError> for Error`;
|
|
|
|
/// use [`SpanlessError::with_span`] instead.
|
|
|
|
#[derive(Debug, PartialEq)]
|
|
|
|
pub enum SpanlessError {
|
|
|
|
NCColon(SymbolId),
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
NotWhitespace(SymbolId),
|
tamer: xir::reader: Initial introduction of spans
This is a large change, and was a bit of a tedious one, given the
comprehensive tests.
This introduces proper offsets and lengths for spans, with the exception of
some quick-xml errors that still need proper mapping. Further, this still
uses `UNKNOWN_CONTEXT`, which will be resolved shortly.
This also introduces `SpanlessError`, which `Error` explicitly _does not_
implement `From<SpanlessError>` for---this forces the caller to provide a
span before the error is compatable with the return value, ensuring that
spans will actually be available rather than forgotten for errors. This is
important, given that errors are generally less tested than the happy path,
and errors are when users need us the most (so, need span information).
Further, I had to use pointer arithmetic in order to calculate many of the
spans, because quick-xml does not provide enough information. There's no
safety considerations here, and the comprehensive unit test will ensure
correct behavior if the implementation changes in the future.
I would like to introduce typed spans at some point---I made some
opinionated choices when it comes to what the spans ought to
represent. Specifically, whether to include the `<` or `>` with the open
span (depends), whether to include quotes with attribute values (no),
and some other details highlighted in the test cases. If we provide typed
spans, then we could, knowing the type of span, calculate other spans on
request, e.g. to include or omit quotes for attributes. Different such
spans may be useful in different situations when presenting information to
the user.
This also highlights gaps in the tokens emitted by XIR, such as whitespace
between attributes, the `=` between name and value, and so on. These are
important when it comes to code formatting, so that we can reliably
reconstruct the XML tree, but it's not important right now. I anticipate
future changes would allow the XIR reader to be configured (perhaps via
generics, like a strategy-type pattern) to optionally omit these tokens if
desired.
Anyway, more to come.
DEV-10934
2022-04-08 11:03:46 -04:00
|
|
|
InvalidQName(SymbolId),
|
|
|
|
InvalidUtf8(Utf8Error, Vec<u8>),
|
2022-04-11 16:08:50 -04:00
|
|
|
QuickXmlError(QuickXmlError),
|
tamer: xir::reader: Initial introduction of spans
This is a large change, and was a bit of a tedious one, given the
comprehensive tests.
This introduces proper offsets and lengths for spans, with the exception of
some quick-xml errors that still need proper mapping. Further, this still
uses `UNKNOWN_CONTEXT`, which will be resolved shortly.
This also introduces `SpanlessError`, which `Error` explicitly _does not_
implement `From<SpanlessError>` for---this forces the caller to provide a
span before the error is compatable with the return value, ensuring that
spans will actually be available rather than forgotten for errors. This is
important, given that errors are generally less tested than the happy path,
and errors are when users need us the most (so, need span information).
Further, I had to use pointer arithmetic in order to calculate many of the
spans, because quick-xml does not provide enough information. There's no
safety considerations here, and the comprehensive unit test will ensure
correct behavior if the implementation changes in the future.
I would like to introduce typed spans at some point---I made some
opinionated choices when it comes to what the spans ought to
represent. Specifically, whether to include the `<` or `>` with the open
span (depends), whether to include quotes with attribute values (no),
and some other details highlighted in the test cases. If we provide typed
spans, then we could, knowing the type of span, calculate other spans on
request, e.g. to include or omit quotes for attributes. Different such
spans may be useful in different situations when presenting information to
the user.
This also highlights gaps in the tokens emitted by XIR, such as whitespace
between attributes, the `=` between name and value, and so on. These are
important when it comes to code formatting, so that we can reliably
reconstruct the XML tree, but it's not important right now. I anticipate
future changes would allow the XIR reader to be configured (perhaps via
generics, like a strategy-type pattern) to optionally omit these tokens if
desired.
Anyway, more to come.
DEV-10934
2022-04-08 11:03:46 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
impl SpanlessError {
|
|
|
|
pub fn with_span(self, span: Span) -> Error {
|
|
|
|
match self {
|
|
|
|
Self::NCColon(sym) => Error::NCColon(sym, span),
|
tamer: diagnose: Introduction of diagnostic system
This is a working concept that will continue to evolve. I wanted to start
with some basic output before getting too carried away, since there's a lot
of potential here.
This is heavily influenced by Rust's helpful diagnostic messages, but will
take some time to realize a lot of the things that Rust does. The next step
will be to resolve line and column numbers, and then possibly include
snippets and underline spans, placing the labels alongside them. I need to
balance this work with everything else I have going on.
This is a large commit, but it converts the existing Error Display impls
into Diagnostic. This separation is a bit verbose, so I'll see how this
ends up evolving.
Diagnostics are tied to Error at the moment, but I imagine in the future
that any object would be able to describe itself, error or not, which would
be useful in the future both for the Summary Page and for query
functionality, to help developers understand the systems they are writing
using TAME.
Output is integrated into tameld only in this commit; I'll add tamec
next. Examples of what this outputs are available in the test cases in this
commit.
DEV-10935
2022-04-13 14:41:54 -04:00
|
|
|
Self::NotWhitespace(sym) => Error::NotWhitespace(sym, span),
|
tamer: xir::reader: Initial introduction of spans
This is a large change, and was a bit of a tedious one, given the
comprehensive tests.
This introduces proper offsets and lengths for spans, with the exception of
some quick-xml errors that still need proper mapping. Further, this still
uses `UNKNOWN_CONTEXT`, which will be resolved shortly.
This also introduces `SpanlessError`, which `Error` explicitly _does not_
implement `From<SpanlessError>` for---this forces the caller to provide a
span before the error is compatable with the return value, ensuring that
spans will actually be available rather than forgotten for errors. This is
important, given that errors are generally less tested than the happy path,
and errors are when users need us the most (so, need span information).
Further, I had to use pointer arithmetic in order to calculate many of the
spans, because quick-xml does not provide enough information. There's no
safety considerations here, and the comprehensive unit test will ensure
correct behavior if the implementation changes in the future.
I would like to introduce typed spans at some point---I made some
opinionated choices when it comes to what the spans ought to
represent. Specifically, whether to include the `<` or `>` with the open
span (depends), whether to include quotes with attribute values (no),
and some other details highlighted in the test cases. If we provide typed
spans, then we could, knowing the type of span, calculate other spans on
request, e.g. to include or omit quotes for attributes. Different such
spans may be useful in different situations when presenting information to
the user.
This also highlights gaps in the tokens emitted by XIR, such as whitespace
between attributes, the `=` between name and value, and so on. These are
important when it comes to code formatting, so that we can reliably
reconstruct the XML tree, but it's not important right now. I anticipate
future changes would allow the XIR reader to be configured (perhaps via
generics, like a strategy-type pattern) to optionally omit these tokens if
desired.
Anyway, more to come.
DEV-10934
2022-04-08 11:03:46 -04:00
|
|
|
Self::InvalidQName(qname) => Error::InvalidQName(qname, span),
|
|
|
|
Self::InvalidUtf8(inner, bytes) => {
|
|
|
|
Error::InvalidUtf8(inner, bytes, span)
|
|
|
|
}
|
|
|
|
Self::QuickXmlError(inner) => Error::QuickXmlError(inner, span),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
pub fn into_with_span<E>(span: Span) -> impl FnOnce(E) -> Error
|
|
|
|
where
|
|
|
|
E: Into<SpanlessError>,
|
|
|
|
{
|
|
|
|
move |e: E| e.into().with_span(span)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl std::error::Error for SpanlessError {
|
|
|
|
fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
|
|
|
|
match self {
|
|
|
|
Self::InvalidUtf8(inner, ..) => Some(inner),
|
|
|
|
Self::QuickXmlError(inner) => Some(inner),
|
|
|
|
_ => None,
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl Display for SpanlessError {
|
|
|
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
|
|
|
// This isn't friendly, since it shouldn't occur.
|
|
|
|
write!(f, "internal error: missing span for error: {self:?}")
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl From<(Utf8Error, &[u8])> for SpanlessError {
|
2021-10-21 16:17:17 -04:00
|
|
|
fn from((err, bytes): (Utf8Error, &[u8])) -> Self {
|
|
|
|
Self::InvalidUtf8(err, bytes.to_owned())
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-04-11 16:08:50 -04:00
|
|
|
impl<E: Into<QuickXmlError>> From<E> for SpanlessError {
|
tamer: xir::XirString: WIP implementation (likely going away)
I'm not fond of this implementation, which is why it's not fully
completed. I wanted to commit this for future reference, and take the
opportunity to explain why I don't like it.
First: this task started as an idea to implement a third variant to
AttrValue and friends that indicates that a value is fixed, in the sense of
a fixed-point function: escaped or unescaped, its value is the same. This
would allow us to skip wasteful escape/unescape operations.
In doing so, it became obvious that there's no need to leak this information
through the API, and indeed, no part of the system should care. When we
read XML, it should be unescaped, and when we write, it should be
escaped. The reason that this didn't quite happen to begin with was an
optimization: I'll be creating an echo writer in place of the current
filesystem-based copy in tamec shortly, and this would allow streaming XIR
directly from the reader to the writer without any unescaping or
re-escaping.
When we unescape, we know the value that it came from, so we could simply
store both symbols---they're 32-bit, so it results in a nicely compressed
64-bit value, so it's essentially cost-free, as long as we accept the
expense of internment. This is `XirString`. Then, when we want to escape
or unescape, we first check to see whether a symbol already exists and, if
so, use it.
While this works well for echoing streams, it won't work all that well in
practice: the unescaped SymbolId will be taken and the XirString discarded,
since nothing after XIR should be coupled with it. Then, when we later
construct a XIR stream for writting, XirString will no longer be available
and our previously known escape is lost, so the writer will have to
re-escape.
Further, if we look at XirString's generic for the XirStringEscaper---it
uses phantom, which hints that maybe it's not in the best place. Indeed,
I've already acknowledged that only a reader unescapes and only a writer
escapes, and that the rest of the system works with normal (unescaped)
values, so only readers and writers should be part of this process. I also
already acknowledged that XirString would be lost and only the unescaped
SymbolId would be used.
So what's the point of XirString, then, if it won't be a useful optimization
beyond the temporary echo writer?
Instead, we can take the XirStringWriter and implement two caches on that:
mapping SymbolId from escaped->unescaped and vice-versa. These can be
simple vectors, since SymbolId is a 32-bit value we will not have much
wasted space for symbols that never get read or written. We could even
optimize for preinterned symbols using markers, though I'll probably not do
so, and I'll explain why later.
If we do _that_, we get even _better_ optimizations through caching that
_will_ apply in the general case (so, not just for echo), and we're able to
ditch XirString entirely and simply use a SymbolId. This makes for a much
more friendly API that isn't leaking implementation details, though it
_does_ put an onus on the caller to pass the encoder to both the reader and
the writer, _if_ it wants to take advantage of a cache. But that burden is
not significant (and is, again, optional if we don't want it).
So, that'll be the next step.
2021-11-10 09:42:18 -05:00
|
|
|
fn from(err: E) -> Self {
|
2022-04-07 12:08:51 -04:00
|
|
|
Self::QuickXmlError(err.into())
|
2021-10-21 16:17:17 -04:00
|
|
|
}
|
|
|
|
}
|
2022-04-11 16:08:50 -04:00
|
|
|
|
|
|
|
/// Thin wrapper around [`quick_xml::Error`] to implement [`PartialEq`].
|
|
|
|
///
|
|
|
|
/// This will always yield `false`,
|
|
|
|
/// but allows us to derive the trait on types using [`Error`];
|
|
|
|
/// otherwise, this madness propagates indefinitely.
|
|
|
|
#[derive(Debug)]
|
|
|
|
pub struct QuickXmlError(pub quick_xml::Error);
|
|
|
|
|
|
|
|
impl PartialEq for QuickXmlError {
|
|
|
|
/// [`quick_xml::Error`] does not implement [`PartialEq`] and so this
|
|
|
|
/// will always yield `false`.
|
|
|
|
fn eq(&self, _other: &Self) -> bool {
|
|
|
|
false
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl Eq for QuickXmlError {}
|
|
|
|
|
|
|
|
impl From<quick_xml::Error> for QuickXmlError {
|
|
|
|
fn from(e: quick_xml::Error) -> Self {
|
|
|
|
Self(e)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
tamer: Integrate clippy
This invokes clippy as part of `make check` now, which I had previously
avoided doing (I'll elaborate on that below).
This commit represents the changes needed to resolve all the warnings
presented by clippy. Many changes have been made where I find the lints to
be useful and agreeable, but there are a number of lints, rationalized in
`src/lib.rs`, where I found the lints to be disagreeable. I have provided
rationale, primarily for those wondering why I desire to deviate from the
default lints, though it does feel backward to rationalize why certain lints
ought to be applied (the reverse should be true).
With that said, this did catch some legitimage issues, and it was also
helpful in getting some older code up-to-date with new language additions
that perhaps I used in new code but hadn't gone back and updated old code
for. My goal was to get clippy working without errors so that, in the
future, when others get into TAMER and are still getting used to Rust,
clippy is able to help guide them in the right direction.
One of the reasons I went without clippy for so long (though I admittedly
forgot I wasn't using it for a period of time) was because there were a
number of suggestions that I found disagreeable, and I didn't take the time
to go through them and determine what I wanted to follow. Furthermore, it
was hard to make that judgment when I was new to the language and lacked
the necessary experience to do so.
One thing I would like to comment further on is the use of `format!` with
`expect`, which is also what the diagnostic system convenience methods
do (which clippy does not cover). Because of all the work I've done trying
to understand Rust and looking at disassemblies and seeing what it
optimizes, I falsely assumed that Rust would convert such things into
conditionals in my otherwise-pure code...but apparently that's not the case,
when `format!` is involved.
I noticed that, after making the suggested fix with `get_ident`, Rust
proceeded to then inline it into each call site and then apply further
optimizations. It was also previously invoking the thread lock (for the
interner) unconditionally and invoking the `Display` implementation. That
is not at all what I intended for, despite knowing the eager semantics of
function calls in Rust.
Anyway, possibly more to come on that, I'm just tired of typing and need to
move on. I'll be returning to investigate further diagnostic messages soon.
2023-01-12 10:46:48 -05:00
|
|
|
impl From<QuickXmlError> for quick_xml::Error {
|
|
|
|
fn from(e: QuickXmlError) -> Self {
|
|
|
|
match e {
|
|
|
|
QuickXmlError(e) => e,
|
|
|
|
}
|
2022-04-11 16:08:50 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl Display for QuickXmlError {
|
|
|
|
fn fmt(&self, fmt: &mut std::fmt::Formatter) -> std::fmt::Result {
|
|
|
|
// NB: If we eventually use `source` to display a hierarchy of
|
|
|
|
// errors, then we likely do not want the duplication here.
|
|
|
|
self.0.fmt(fmt)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl std::error::Error for QuickXmlError {
|
|
|
|
fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
|
|
|
|
Some(&self.0)
|
|
|
|
}
|
|
|
|
}
|