repo2html/msgfmt

#!/bin/bash
#
# Formats a Git commit message
#
#  Copyright (C) 2012  Mike Gerwitz
#
#  This file is part of repo2html.
#
#  This program is free software: you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation, either version 3 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#  along with this program.  If not, see <http://www.gnu.org/licenses/>.
# #

# optional id (for cref errors)
id="$1"

# HTML replacements (default)
lquo='\&ldquo;'
rquo='\&rdquo;'
mdash='\&mdash;'
opar='<p>'
epar='</p>'

# redefines replacements to yield plain text (instead of HTML entities)
nohtml()
{
  lquo=\"
  rquo=\"
  mdash=---
}

# no paragraph tags should be output
nopar()
{
  opar=
  epar=
}


while getopts nP opt; do
  case "$opt" in
    n) nohtml;;
    P) nopar;;
  esac
done

# calculate this after options have been parsed
refopar="${opar:+${opar%>} id="ref-\\1" class="ref">}"

# format the commit message, stopping at the diff (if any)
awk -vid="$id" -vurl_root="${url_root%/}" -vcref_errlog="$cref_errlog" '
    # replace commit refs with generated URL (allows linking to prior commits
    # without hard-coding the configurable links that could change or be
    # relative to where the content is hosted); this will then be processed as a
    # normal URL by the remainder of the script
    match($0, /\[cref:(.*?)\]/, g) {
      # retrieve the URL from the hashcache and perform the line replacement
      # (which will be reflected once we print the line)
      c = "./hashcache " g[1]
      c | getline result

      # if a cref error logfile path was provided, log unknown refs so that they
      # can be re-processed (if commits are processed in reverse order and the
      # hashcache is cleared before the run, then this is likely to occur for
      # every cref)
      if ( result == "" && cref_errlog && id ) {
        printf id"\n" >>cref_errlog
      }

      gsub(/\[cref:.*?\]/, (url_root "/" result))
    }

    # stop printing at diff
    /^diff --git/ { exit }

    # otherwise, print everything
    { print }
  ' \
  | sed ':a;N;$!ba;
    # handle <>-delimited links (strip delimiters)
    s#<\([fh]ttps\?://[^ ]\+\)>#\1#g;

    # escaping
    s/\&/\&amp;/g;
    s/</\&lt;/g;
    s/>/\&gt;/g;

    # quoting (initiated by an indented paragraph and terminated by a new
    # paragraph, unless that paragraph is also indented)
    s#\n\n  \+\(\([^\n]\+\n\(\n  \+\)\?\)\+\)#<blockquote>\1</blockquote>#g

    # pre-formatted block. markdown-style
    s#\n\n  \+\(\([^\n]\+\n\(\n  \+\)\?\)\+\)#<blockquote>\1</blockquote>#g

    # unfortunately, non-greedy matches make it difficult to exclude punctuation
    # at the end of a link, so we will handle it in a separate expression
    s#[fh]ttps\?://[^]\n )]\+#<a href="&">&</a>#g;
    s#<a href="\([^"]\+\)\([.;,!]\)">\([^<]\+\).</a>#<a href="\1">\3</a>\2#g;

    # reference definitions (footnotes)
    s#\n\[\([0-9]\+\)\]#'"$epar$refopar"'&#g;

    # references in text (note that references that enclose text as a hyperlink
    # must not start with a number, otherwise they will be considered to be a
    # reference number)
    s|\[\([^0-9][^]]\+\)\]\[\([0-9]\+\)\]|<a href="#ref-\2">\1</a>\[\2\]|g
    s|\[\([0-9]\+\)\]|<sup><a href="#ref-\1">&</a></sup>|g

    # paragraphs
    s#\n\n#'"$epar"'&'"$opar"'#g;
    /^/i'"$opar"'
    /$/a'"$epar"'

    # basic formatting
    s/---/'"$mdash"'/g;
    s#``#'"$lquo"'#g;
    s#'\'\''#'"$rquo"'#g;
    s#\(\W\)\*\*\([^\*]\+\)\*\*\(\W\)#\1<strong>\2</strong>\3#g;
    s#\(\W\)\*\([^\*]\+\)\*\(\W\)#\1<em>\2</em>\3#g;
  '
Initial repo2html concept Certain things are thrown in place just to demonstrate the concept. Many things are also hard-coded that should be configurable (such as the title, copyright and RSS item count). Licensed under the GPLv3 instead of the AGPLv3 because the license does not apply to generated content unless that content uses a portion of the program. 2012-10-07 08:23:31 -04:00			`#!/bin/bash`
			`#`
			`# Formats a Git commit message`
			`#`
			`# Copyright (C) 2012 Mike Gerwitz`
			`#`
Corrected copyright notice ..this is what happens when you copy notices from other projects blindly (at least it was my own) 2012-10-07 11:22:49 -04:00			`# This file is part of repo2html.`
Initial repo2html concept Certain things are thrown in place just to demonstrate the concept. Many things are also hard-coded that should be configurable (such as the title, copyright and RSS item count). Licensed under the GPLv3 instead of the AGPLv3 because the license does not apply to generated content unless that content uses a portion of the program. 2012-10-07 08:23:31 -04:00			`#`
			`# This program is free software: you can redistribute it and/or modify`
			`# it under the terms of the GNU General Public License as published by`
			`# the Free Software Foundation, either version 3 of the License, or`
			`# (at your option) any later version.`
			`#`
			`# This program is distributed in the hope that it will be useful,`
			`# but WITHOUT ANY WARRANTY; without even the implied warranty of`
			`# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`
			`# GNU General Public License for more details.`
			`#`
			`# You should have received a copy of the GNU General Public License`
			`# along with this program. If not, see <http://www.gnu.org/licenses/>.`
			`# #`

Added cref-errlog to post-process cref errors rather than priming the hashcache This is more performant, contains additional logging and will properly output invalid crefs. 2013-03-10 21:51:08 -04:00			`# optional id (for cref errors)`
			`id="$1"`

RSS feed no longer outputs HTML entities in the title 2012-10-13 00:42:29 -04:00			`# HTML replacements (default)`
			`lquo='\“'`
			`rquo='\”'`
			`mdash='\—'`
Added -P option to msgfmt and moved addition of beginning and ending p tags from commit2html to msgfmt This not only makes more sense, but allows raw (-i) mode to work properly 2013-05-27 16:50:34 -04:00			`opar='<p>'`
			`epar='</p>'`
RSS feed no longer outputs HTML entities in the title 2012-10-13 00:42:29 -04:00
			`# redefines replacements to yield plain text (instead of HTML entities)`
			`nohtml()`
			`{`
			`lquo=\"`
			`rquo=\"`
			`mdash=---`
			`}`

Added -P option to msgfmt and moved addition of beginning and ending p tags from commit2html to msgfmt This not only makes more sense, but allows raw (-i) mode to work properly 2013-05-27 16:50:34 -04:00			`# no paragraph tags should be output`
			`nopar()`
			`{`
			`opar=`
			`epar=`
			`}`
RSS feed no longer outputs HTML entities in the title 2012-10-13 00:42:29 -04:00
Added -P option to msgfmt and moved addition of beginning and ending p tags from commit2html to msgfmt This not only makes more sense, but allows raw (-i) mode to work properly 2013-05-27 16:50:34 -04:00
			`while getopts nP opt; do`
RSS feed no longer outputs HTML entities in the title 2012-10-13 00:42:29 -04:00			`case "$opt" in`
			`n) nohtml;;`
Added -P option to msgfmt and moved addition of beginning and ending p tags from commit2html to msgfmt This not only makes more sense, but allows raw (-i) mode to work properly 2013-05-27 16:50:34 -04:00			`P) nopar;;`
RSS feed no longer outputs HTML entities in the title 2012-10-13 00:42:29 -04:00			`esac`
			`done`

Added -P option to msgfmt and moved addition of beginning and ending p tags from commit2html to msgfmt This not only makes more sense, but allows raw (-i) mode to work properly 2013-05-27 16:50:34 -04:00			`# calculate this after options have been parsed`
			`refopar="${opar:+${opar%>} id="ref-\\1" class="ref">}"`

Initial repo2html concept Certain things are thrown in place just to demonstrate the concept. Many things are also hard-coded that should be configurable (such as the title, copyright and RSS item count). Licensed under the GPLv3 instead of the AGPLv3 because the license does not apply to generated content unless that content uses a portion of the program. 2012-10-07 08:23:31 -04:00			`# format the commit message, stopping at the diff (if any)`
Ensuring url_root will always contain forward slash when concatenated with cref Trailing slash, if any, is stripped and explicitly added. 2013-06-04 22:42:47 -04:00			`awk -vid="$id" -vurl_root="${url_root%/}" -vcref_errlog="$cref_errlog" '`
Added cref tag support to msgfmt, permitting ref of previous commits 2013-03-09 09:50:32 -05:00			`# replace commit refs with generated URL (allows linking to prior commits`
			`# without hard-coding the configurable links that could change or be`
			`# relative to where the content is hosted); this will then be processed as a`
			`# normal URL by the remainder of the script`
			`match($0, /\[cref:(.*?)\]/, g) {`
			`# retrieve the URL from the hashcache and perform the line replacement`
			`# (which will be reflected once we print the line)`
			`c = "./hashcache " g[1]`
			`c \| getline result`
Added cref-errlog to post-process cref errors rather than priming the hashcache This is more performant, contains additional logging and will properly output invalid crefs. 2013-03-10 21:51:08 -04:00
			`# if a cref error logfile path was provided, log unknown refs so that they`
			`# can be re-processed (if commits are processed in reverse order and the`
			`# hashcache is cleared before the run, then this is likely to occur for`
			`# every cref)`
			`if ( result == "" && cref_errlog && id ) {`
Corrected cref_errlog output from msgfmt (was overwriting) 2013-03-12 22:47:01 -04:00			`printf id"\n" >>cref_errlog`
Added cref-errlog to post-process cref errors rather than priming the hashcache This is more performant, contains additional logging and will properly output invalid crefs. 2013-03-10 21:51:08 -04:00			`}`

Ensuring url_root will always contain forward slash when concatenated with cref Trailing slash, if any, is stripped and explicitly added. 2013-06-04 22:42:47 -04:00			`gsub(/\[cref:.*?\]/, (url_root "/" result))`
Added cref tag support to msgfmt, permitting ref of previous commits 2013-03-09 09:50:32 -05:00			`}`

			`# stop printing at diff`
			`/^diff --git/ { exit }`

			`# otherwise, print everything`
			`{ print }`
			`' \`
Initial repo2html concept Certain things are thrown in place just to demonstrate the concept. Many things are also hard-coded that should be configurable (such as the title, copyright and RSS item count). Licensed under the GPLv3 instead of the AGPLv3 because the license does not apply to generated content unless that content uses a portion of the program. 2012-10-07 08:23:31 -04:00			`\| sed ':a;N;$!ba;`
Hyperlink-related enhancements to msgfmt - Removing <> delimiters from links - Stripping punctuation from the end of links Yes, Perl would be easier, but I'd prefer to avoid the dependency if at all possible. If this gets too much more complicated, Perl may be a necessity to prevent a maintinance nightmare. 2012-10-28 00:03:31 -04:00			`# handle <>-delimited links (strip delimiters)`
			`s#<\([fh]ttps\?://[^ ]\+\)>#\1#g;`

Initial repo2html concept Certain things are thrown in place just to demonstrate the concept. Many things are also hard-coded that should be configurable (such as the title, copyright and RSS item count). Licensed under the GPLv3 instead of the AGPLv3 because the license does not apply to generated content unless that content uses a portion of the program. 2012-10-07 08:23:31 -04:00			`# escaping`
Moved ampersand escaping in msgfmt to occur before other replacements Oops 2012-10-27 23:09:03 -04:00			`s/\&/\&/g;`
Initial repo2html concept Certain things are thrown in place just to demonstrate the concept. Many things are also hard-coded that should be configurable (such as the title, copyright and RSS item count). Licensed under the GPLv3 instead of the AGPLv3 because the license does not apply to generated content unless that content uses a portion of the program. 2012-10-07 08:23:31 -04:00			`s/</\</g;`
			`s/>/\>/g;`

Added blockquote support to msgfmt 2012-10-16 22:49:46 -04:00			`# quoting (initiated by an indented paragraph and terminated by a new`
msgfmt will now properly handle adjacent blockquote paragraphs 2012-10-27 23:18:45 -04:00			`# paragraph, unless that paragraph is also indented)`
			`s#\n\n \+\(\([^\n]\+\n\(\n \+\)\?\)\+\)#<blockquote>\1</blockquote>#g`
Pre-formatted block (blockquote) in msgfmt This has existed for a while; must have forgotten to commit 2014-11-30 16:31:57 -05:00
			`# pre-formatted block. markdown-style`
			`s#\n\n \+\(\([^\n]\+\n\(\n \+\)\?\)\+\)#<blockquote>\1</blockquote>#g`
Added blockquote support to msgfmt 2012-10-16 22:49:46 -04:00
Initial repo2html concept Certain things are thrown in place just to demonstrate the concept. Many things are also hard-coded that should be configurable (such as the title, copyright and RSS item count). Licensed under the GPLv3 instead of the AGPLv3 because the license does not apply to generated content unless that content uses a portion of the program. 2012-10-07 08:23:31 -04:00			`# unfortunately, non-greedy matches make it difficult to exclude punctuation`
Hyperlink-related enhancements to msgfmt - Removing <> delimiters from links - Stripping punctuation from the end of links Yes, Perl would be easier, but I'd prefer to avoid the dependency if at all possible. If this gets too much more complicated, Perl may be a necessity to prevent a maintinance nightmare. 2012-10-28 00:03:31 -04:00			`# at the end of a link, so we will handle it in a separate expression`
Initial repo2html concept Certain things are thrown in place just to demonstrate the concept. Many things are also hard-coded that should be configurable (such as the title, copyright and RSS item count). Licensed under the GPLv3 instead of the AGPLv3 because the license does not apply to generated content unless that content uses a portion of the program. 2012-10-07 08:23:31 -04:00			`s#[fh]ttps\?://[^]\n )]\+#<a href="&">&</a>#g;`
Hyperlink-related enhancements to msgfmt - Removing <> delimiters from links - Stripping punctuation from the end of links Yes, Perl would be easier, but I'd prefer to avoid the dependency if at all possible. If this gets too much more complicated, Perl may be a necessity to prevent a maintinance nightmare. 2012-10-28 00:03:31 -04:00			`s#<a href="\([^"]\+\)\([.;,!]\)">\([^<]\+\).</a>#<a href="\1">\3</a>\2#g;`
Initial repo2html concept Certain things are thrown in place just to demonstrate the concept. Many things are also hard-coded that should be configurable (such as the title, copyright and RSS item count). Licensed under the GPLv3 instead of the AGPLv3 because the license does not apply to generated content unless that content uses a portion of the program. 2012-10-07 08:23:31 -04:00
References in text are now converted into hyperlinks References are bracketed---they are converted into superscripts. If text immediately preceeding the reference is bracketed, it is also hyperlinked. 2012-10-09 23:31:40 -04:00			`# reference definitions (footnotes)`
Added -P option to msgfmt and moved addition of beginning and ending p tags from commit2html to msgfmt This not only makes more sense, but allows raw (-i) mode to work properly 2013-05-27 16:50:34 -04:00			`s#\n\[\([0-9]\+\)\]#'"$epar$refopar"'&#g;`
References in text are now converted into hyperlinks References are bracketed---they are converted into superscripts. If text immediately preceeding the reference is bracketed, it is also hyperlinked. 2012-10-09 23:31:40 -04:00
Enclosed text in reference hyperlinks must not begin with a non-numeric character This is not the best of solutions, but will at least help to eliminate the problem of multiple adjacent references. 2013-03-15 16:26:05 -04:00			`# references in text (note that references that enclose text as a hyperlink`
			`# must not start with a number, otherwise they will be considered to be a`
			`# reference number)`
			`s\|\[\([^0-9][^]]\+\)\]\[\([0-9]\+\)\]\|<a href="#ref-\2">\1</a>\[\2\]\|g`
References in text are now converted into hyperlinks References are bracketed---they are converted into superscripts. If text immediately preceeding the reference is bracketed, it is also hyperlinked. 2012-10-09 23:31:40 -04:00			`s\|\[\([0-9]\+\)\]\|<sup><a href="#ref-\1">&</a></sup>\|g`

Initial repo2html concept Certain things are thrown in place just to demonstrate the concept. Many things are also hard-coded that should be configurable (such as the title, copyright and RSS item count). Licensed under the GPLv3 instead of the AGPLv3 because the license does not apply to generated content unless that content uses a portion of the program. 2012-10-07 08:23:31 -04:00			`# paragraphs`
Added -P option to msgfmt and moved addition of beginning and ending p tags from commit2html to msgfmt This not only makes more sense, but allows raw (-i) mode to work properly 2013-05-27 16:50:34 -04:00			`s#\n\n#'"$epar"'&'"$opar"'#g;`
			`/^/i'"$opar"'`
			`/$/a'"$epar"'`
Initial repo2html concept Certain things are thrown in place just to demonstrate the concept. Many things are also hard-coded that should be configurable (such as the title, copyright and RSS item count). Licensed under the GPLv3 instead of the AGPLv3 because the license does not apply to generated content unless that content uses a portion of the program. 2012-10-07 08:23:31 -04:00
			`# basic formatting`
RSS feed no longer outputs HTML entities in the title 2012-10-13 00:42:29 -04:00			`s/---/'"$mdash"'/g;`
			s#``#'"$lquo"'#g;
			`s#'\'\''#'"$rquo"'#g;`
Added support for strong emphasis with double-star notation (msgfmt) 2013-06-04 22:29:34 -04:00			`s#\(\W\)\\\([^\]\+\)\\*\(\W\)#\1<strong>\2</strong>\3#g;`
Corrected msgfmt em formatting with asterisks 2013-05-28 23:38:11 -04:00			`s#\(\W\)\\([^\]\+\)\*\(\W\)#\1<em>\2</em>\3#g;`
Initial repo2html concept Certain things are thrown in place just to demonstrate the concept. Many things are also hard-coded that should be configurable (such as the title, copyright and RSS item count). Licensed under the GPLv3 instead of the AGPLv3 because the license does not apply to generated content unless that content uses a portion of the program. 2012-10-07 08:23:31 -04:00			`'`