tame/bin
Mike Gerwitz fb5947d59e Prevent hanging tame client when tamed runner is killed mid-process
It's a tad embarrassing that this has been eluding me for quite some
time.  I happened to run into it while testing the previous commit, which in
turn only existed because I was trying to optimize runner performance.

We'd have situations where, following a runner reload (exit code 129 =
SIGHUP), the build would simply hang indefinitely.  Apparently, `tame`, in
`command-runner`, blocks on a `read` without a timeout, expecting that the
FIFO attached to stdin will close if the runner crashes.  Maybe that used to
be the case, but that is no longer true today.

Because of that, the FIFO stays open, and read continues to block, waiting
for `DONE`.

Now, `tamed`, when seeing that a runner has crashed (which could have been
due to a reload), will check to see if that runner is marked busy.  If so,
that means that the client `tame` did not see `DONE`, because it did not
clear the flag via `command-runner`'s `mark-available.`  To notify the
client that someone went wrong, `tamed` will inject a `DONE` into the output
FIFO, which will allow the client to fail properly.

`dslc` catches exceptions and should output `DONE` under normal operating
conditions.  However, since some of our systems require so much memory to
build, we may encounter the OOM killer.  In that case, the process has no
time to recover (it is killed with SIGKILL), and therefore cannot output
`DONE`.  I suspect this is what has been happening to cause occasional build
hangs.

One final thing to clean this up: since we're properly handling reloads now,
based on this commit and the immediately preceding one, we can suppress the
warning when the code is 129 (see comments).

DEV-10806
2023-10-03 14:14:47 -04:00
..
dslc.in dscl: Replace process with `java` 2023-10-03 14:14:43 -04:00
tame Prevent hanging tame client when tamed runner is killed mid-process 2023-10-03 14:14:47 -04:00
tamed Prevent hanging tame client when tamed runner is killed mid-process 2023-10-03 14:14:47 -04:00