thoughts/src/post2meta

#!/usr/bin/gawk -f
# Cache post data in metadata recutils file
#
#  Copyright (C) 2019 Mike Gerwitz
#
#  This program is free software: you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation, either version 3 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#  along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
# Generates database of metadata for a given post in recutils format for use
# by other scripts.  The post must have already been converted to HTML using
# `post2html' or some equivalent means.
#
# This script is also responsible for determining what constitutes the
# abstract, which we consider to be everything after the subject line but
# before the end-of-abstract marker "<!-- more -->".  If no such marker
# exists then the script exits in error.
##

# Output author and post date derived from the file name.
BEGINFILE {
    match( FILENAME, /[^/]+$/, name )

    # TODO: configurable
    print "author: Mike Gerwitz <mtg@gnu.org>"

    printf "date: %s\n",
        gensub( /^(.{10}).*$/, "\\1", "", name[0] )
}

# Wait until after <main>; everything before it is the HTML header.
/^ *<main>/ { main=1 }
!main { next }


# The first header represents the subject/title and also contains the
# unique id for this post (as generated by `post2html').
main && /^<h1 / {
    # Strip header tags from subject.
    print "subject: " gensub( /<\/?h[^>]+>/, "", "g" )

    # Grab the generated id from the header and use it to
    # generate a complete slug.
    printf "slug: %s\n", \
        gensub( /^([0-9]+)-([0-9]+)-[0-9]+-(.*)\.[a-z]+$/,
                "\\1/\\2/\\3",
                "",
                name[0] )

    # Skip the date line immediately following the header and grab the first
    # line of the abstract.
    getline
    getline

    printf "abstract: %s\n", $0
    a = 1
    next
}

# The end-of-abstract marker is "<!-- more -->".  Until we reach that point,
# output each line of the abstract prefixed by a `+', which is the recutils
# line continuation marker.
/^<!-- more -->/ { exit }
a { printf "+ %s\n", $0 }

# If we get to this point, that means that there is no end-of-abstract
# marker, which we will consider to be an error just to make sure that the
# author didn't forget to add one.  If the entire post is to be considered
# part of the abstract, then the marker can be added at the end of the post.
ENDFILE {
    print "error: missing '<!-- more -->'" > "/dev/stderr"
    exit 1
}
Majority of work on generation of new static site I didn't originally intend for all of this to be in a single commit. But here we are. I don't have the time to split these up more cleanly; this project is taking more time than I originally hoped that it would. This is a new static site generator. More information to follow in the near future (hopefully in the form of an article), but repo2html is now removed. See code comments for additional information; I tried to make it suitable as a learning resource for others. It is essentially a set of shell scripts with a fairly robust build for incremental generation. The site has changed drastically, reflecting that its purpose has changed over the years: it is now intended for publishing quality works (or at least I hope), not just a braindump. This retains most of the text of the original pages verbatim, with the exception of the About page. Other pages may have their text modified in commits that follow. Enhancements to follow in future commits. 2019-01-08 00:11:20 -05:00			`#!/usr/bin/gawk -f`
			`# Cache post data in metadata recutils file`
			`#`
			`# Copyright (C) 2019 Mike Gerwitz`
			`#`
			`# This program is free software: you can redistribute it and/or modify`
			`# it under the terms of the GNU General Public License as published by`
			`# the Free Software Foundation, either version 3 of the License, or`
			`# (at your option) any later version.`
			`#`
			`# This program is distributed in the hope that it will be useful,`
			`# but WITHOUT ANY WARRANTY; without even the implied warranty of`
			`# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`
			`# GNU General Public License for more details.`
			`#`
			`# You should have received a copy of the GNU General Public License`
			`# along with this program. If not, see <http://www.gnu.org/licenses/>.`
			`#`
			`# Generates database of metadata for a given post in recutils format for use`
			`# by other scripts. The post must have already been converted to HTML using`
			# `post2html' or some equivalent means.
			`#`
			`# This script is also responsible for determining what constitutes the`
			`# abstract, which we consider to be everything after the subject line but`
			`# before the end-of-abstract marker "<!-- more -->". If no such marker`
			`# exists then the script exits in error.`
			`##`

			`# Output author and post date derived from the file name.`
			`BEGINFILE {`
			`match( FILENAME, /[^/]+$/, name )`

			`# TODO: configurable`
			`print "author: Mike Gerwitz <mtg@gnu.org>"`

			`printf "date: %s\n",`
			`gensub( /^(.{10}).*$/, "\\1", "", name[0] )`
			`}`

			`# Wait until after <main>; everything before it is the HTML header.`
			`/^ *<main>/ { main=1 }`
			`!main { next }`


			`# The first header represents the subject/title and also contains the`
			# unique id for this post (as generated by `post2html').
			`main && /^<h1 / {`
			`# Strip header tags from subject.`
			`print "subject: " gensub( /<\/?h[^>]+>/, "", "g" )`

			`# Grab the generated id from the header and use it to`
			`# generate a complete slug.`
Generate slug from post filenames Rather than having Pandoc generate the id, which has the potential to change over time and cause 404s, let's just generate the slug from the filename so that the ids will never change. This also solves the awkward question of what the filename should be, since it was previously something arbitrary. This mass rename was accomplished via this simple shell script: for p in *.meta; do slug=$( recsel -P slug "$p" \| xargs basename ) mv -v "${p/.meta/.md}" "${p:0:10}-$slug.md" done with minor manual tweaks where I saw fit. Of course, now I have some pretty long filenames, which is undesirable. The next step is to compare it with the slugs currently on mikegerwitz.com and make them match. That's the next commit, and should be pretty simple. 2019-01-08 01:07:25 -05:00			`printf "slug: %s\n", \`
			`gensub( /^([0-9]+)-([0-9]+)-[0-9]+-(.*)\.[a-z]+$/,`
			`"\\1/\\2/\\3",`
			`"",`
			`name[0] )`
Majority of work on generation of new static site I didn't originally intend for all of this to be in a single commit. But here we are. I don't have the time to split these up more cleanly; this project is taking more time than I originally hoped that it would. This is a new static site generator. More information to follow in the near future (hopefully in the form of an article), but repo2html is now removed. See code comments for additional information; I tried to make it suitable as a learning resource for others. It is essentially a set of shell scripts with a fairly robust build for incremental generation. The site has changed drastically, reflecting that its purpose has changed over the years: it is now intended for publishing quality works (or at least I hope), not just a braindump. This retains most of the text of the original pages verbatim, with the exception of the About page. Other pages may have their text modified in commits that follow. Enhancements to follow in future commits. 2019-01-08 00:11:20 -05:00
			`# Skip the date line immediately following the header and grab the first`
			`# line of the abstract.`
			`getline`
			`getline`

			`printf "abstract: %s\n", $0`
			`a = 1`
			`next`
			`}`

			`# The end-of-abstract marker is "<!-- more -->". Until we reach that point,`
			# output each line of the abstract prefixed by a `+', which is the recutils
			`# line continuation marker.`
			`/^<!-- more -->/ { exit }`
			`a { printf "+ %s\n", $0 }`

			`# If we get to this point, that means that there is no end-of-abstract`
			`# marker, which we will consider to be an error just to make sure that the`
			`# author didn't forget to add one. If the entire post is to be considered`
			`# part of the abstract, then the marker can be added at the end of the post.`
			`ENDFILE {`
			`print "error: missing '<!-- more -->'" > "/dev/stderr"`
			`exit 1`
			`}`