Commit Graph

9 Commits (4aaf91a9e751414bf5cb0b541c0a79f04bd76c80)

Author SHA1 Message Date
Mike Gerwitz 15e04d63e2 tamer: xir::parse::ele: Transition trampoline
This properly integrates the trampoline into `ele_parse!`.  The
implementation leaves some TODOs, most notably broken mixed text handling
since we can no longer intercept those tokens before passing to the
child.  That is temporarily marked as incomplete; see a future commit.

The introduced test `ParseState`s were to help me reason about the system
intuitively as I struggled to track down some type errors in the monstrosity
that is `ele_parse!`.  It will fail to compile if those invariants are
violated.  (In the end, the problems were pretty simple to resolve, and the
struggle was the type system doing its job in telling me that I needed to
step back and try to reason about the problem again until it was intuitive.)

This keeps around the NT states for now, which are quickly used to
transition to the next NT state, like a couple of bounces on a trampoline:

  NT -> Dead -> Parent -> Next NT

This could be optimized in the future, if it's worth doing.

This also makes no attempt to implement tail calls; that would have to come
after fixing mixed content and really isn't worth the added complexity
now.  I (desperately) need to move on, and still have a bunch of cleanup to
do.

I had hoped for a smaller commit, but that was too difficult to do with all
the types involved.

DEV-7145
2022-08-10 11:46:45 -04:00
Mike Gerwitz 184ff6bdcc tamer: xir::parse: Fixes for {ele,attr}_parse! outside of module
The tests had certain things in scope, but now that I'm trying to use it
outside of those modules, some fixes are needed.

This is admittedly a sloppy commit, with a number of miscellaneous fixes.  I
didn't bother separating it more because most of them are type fixes, and
the `From<Attr>` stuff is going to have to change into, likely,
`TryFrom<Attr>` so that parse failures can occur when attributes do not
match certain patterns.

DEV-7145
2022-07-20 15:40:28 -04:00
Mike Gerwitz 73efc59582 tamer: xir::parse::ele: Initial element parser generator concept
This begins generating parsers that are capable of parsing elements.  I need
to move on, so this abstraction isn't going to go as far as it could, but
let's see where it takes me.

This was the work that required the recent lookahead changes, which has been
detailed in previous commits.

This initial support is basic, but robust.  It supports parsing elements
with attributes and children, but it does not yet support the equivalent of
the Kleene star (`*`).  Such support will likely be added by supporting
parsers that are able to recurse on their own definition in tail position,
which will also require supporting parsers that do not add to the stack.

This generates parsers that, like all the other parsers, use enums to
provide a typed stack.  Stitched parsers produce a nested stack that is
always bounded in size.  Fortunately, expressions---which can nest
deeply---do not need to maintain ancestor context on the stack, and so this
should work fine; we can get away with this because XIRF ensures proper
nesting for us.  Statements that _do_ need to maintain such context are not
nested.

This also does not yet support emitting an object on closing tag, which
will be necessary for NIR, which will be a streaming IR that is "near" to
the source XML in structure.  This will then be used to lower into AIR for
the ASG, which gives structure needed for further analysis.

More information to come; I just want to get this committed to serve as a
mental synchronization point and clear my head, since I've been sitting on
these changes for so long and have to keep stashing them as I tumble down
rabbit holes covered in yak hair.

DEV-7145
2022-07-13 14:08:47 -04:00
Mike Gerwitz adc45d90df tamer: xir::parse: Attribute parser generator
This is the first parser generator for the parsing framework.  I've been
waiting quite a while to do this because I wanted to be sure that I
understood how I intended to write the attribute parsers manually.  Now that
I'm about to start parsing source XML files, it is necessary to have a
parser generator.

Typically one thinks of a parser generator as a separate program that
generates code for some language, but that is not always the case---that
represents a lack of expressiveness in the language itself (e.g. C).  Here,
I simply use Rust's macro system, which should be a concept familiar to
someone coming from a language like Lisp.

This also resolves where I stand on parser combinators with respect to this
abstraction: they both accomplish the exact same thing (composition of
smaller parsers), but this abstraction doesn't do so in the typical
functional way.  But the end result is the same.

The parser generated by this abstraction will be optimized an inlined in the
same manner as the hand-written parsers.  Since they'll be tightly coupled
with an element parser (which too will have a parser generator), I expect
that most attribute parsers will simply be inlined; they exist as separate
parsers conceptually, for the same reason that you'd use parser combinators.

It's worth mentioning that this awkward reliance on dead state for a
lookahead token to determine when aggregation is complete rubs me the wrong
way, but resolving it would involve reintroducing the XIR AttrEnd that I had
previously removed.  I'll keep fighting with myself on this, but I want to
get a bit further before I determine if it's worth the tradeoff of
reintroducing (more complex IR but simplified parsing).

DEV-7145
2022-06-21 13:23:02 -04:00
Mike Gerwitz 14638a612f tamer: {xir::=>}parse: Move parser out of XIR
The parsing framework originally created for XIR is now more general and
useful to other things.  We'll see how this evolves.

This needs additional documentation, but I'd like to see how it changes as
I implement XmloReader and then some of the source readers first.

DEV-10863
2022-03-18 16:24:53 -04:00
Mike Gerwitz 0360226caa tamer: xir::parse: Generalize input token type
This adds a `Token` type to `ParseState`.  Everything uses `xir::Token`
currently, but `XmloReader` will use `xir::flat::Object`.

Now that this has been generalized beyond XIR, the parser ought to be
hoisted up a level.

DEV-10863
2022-03-18 15:26:05 -04:00
Mike Gerwitz aba89f809d tamer: xir::parse: UnexpectedEof Span at final offset
I'm not rendering errors yet in practice, so this wouldn't have been
noticed, but we want error messages to reference the final byte in a file on
EOF, not the offset of the last-encountered token, which would be confusing.

This doesn't _directly_ pertain to what I'm working on; I just happened to
notice it.

DEV-10863
2022-03-17 21:33:05 -04:00
Mike Gerwitz 7b6d68af85 tamer: xir::parse::Transition: Generalize flat::Transition
XIRF introduced the concept of `Transition` to help document code and
provide mental synchronization points that make it easier to reason about
the system.  I decided to hoist this into XIR's parser itself, and have
`parse_token` accept an owned state and require a new state to be returned,
utilizing `Transition`.

Together with the convenience methods introduced on `Transition` itself,
this produces much clearer code, as is evidenced by tree::Stack (XIRT's
parser).  Passing an owned state is something that I had wanted to do
originally, but I thought it'd lead to more concise code to use a mutable
reference.  Unfortunately, that concision lead to code that was much more
difficult than necessary to understand, and ended up having a net negative
benefit by leading to some more boilerplate for the nested types (granted,
that could have been alleviated in other ways).

This also opens up the possibility to do something that I wasn't able to
before, which was continue to abstract away parser composition by stitching
their state machines together.  I don't know if this'll be done immediately,
but because the actual parsing operations are now able to compose
functionally without mutability getting the way, the previous state coupling
issues with the parent parser go away.

DEV-10863
2022-03-17 16:02:05 -04:00
Mike Gerwitz 5af698d15c tamer: xir::{tree::=>}parse: Move module
It's a bit odd that I've done next to nothing with TAMER for the past week
or so, and decided to do this one small thing before I go on break for the
holidays, but I felt compelled to do _something_.  Besides, this gets me in
a better spot for the inevitable mental planning and writing I'll be doing
over the holidays.

This move was natural, given what this has evolved into---it has nothing to
do with the concept of a "tree", and the modules imports emphasized that
fact given the level of inappropriate nesting.
2021-12-23 13:17:18 -05:00