cs4m/slides.org

83 KiB
Raw Blame History

Computational Symbiosis: Methods That Meld Mind and Machine

Slides [0/6]

Introduction   B_note

Hello, everyone!

My name is Mike Gerwitz. I am a free software hacker and activist with a focus on user privacy and security. I'm also a GNU Maintainer and software evaluator, and hold other various other administrative duties within GNU. I have about twenty years of programming experience, half of that professionally, And I've been a computer user for longer.

So I've been around long enough to see a decent evolution in how we interact with computers. I've gotten a sense of what feel right and wrong as both a user and a hacker. And interestingly, what I've settled on for my computing is really a toolset that was devised decades before I was even born, with some light modernization.

And those tools don't work for everyone. But I think a subset of them can.

So I'm here today to try to explore a healthy balance, and walk you through what I see as an efficient means of computing, based on the problems that I've encountered, and the problems I've seen others encounter over the years.

Choreographed Workflows   B_fullframe

Choreographed Workflows

Practical Freedom   B_fullframe

Practical Freedom

Practical Example: Web Browser [0/9]

Example: Web Browser   B_frame

/mikegerwitz/cs4m/src/commit/596fff5295b86d69a4fbbb630f8eedbea799b4a0/images/web-browser.png

Notes   B_noteNH

One of the only GUIs I use on a day-to-day basis is my web browser. In this case, GNU Icecat, which is a Firefox derivative. This is a screenshot of an admittedly staged session, and contains a number of addons. Perhaps most prominent is Tree-Style Tab, which displays tabs in a hierarchy off to the left rather than flatly at the top.

I chose a graphical web browser as an example because I feel that it's something that most everyone can relate to using, and most everyone can recognize the utility in—most people using Internet-connected devices use one at least a few times a week, if not every day.

Finding Text (Mouse-Driven GUI Interaction)   B_frame

Images   B_columns
Left   B_column

/mikegerwitz/cs4m/src/commit/596fff5295b86d69a4fbbb630f8eedbea799b4a0/images/ff-find-menu.png

Right   B_column

/mikegerwitz/cs4m/src/commit/596fff5295b86d69a4fbbb630f8eedbea799b4a0/images/ff-find-matches.png

Notes   B_noteNH

The Web is used for many different things today, but its original purpose is to render documents. Take Wikipedia for instance. Or the LibrePlanet conference website.

If you are looking for something specific on a page, a common operation is to search for a word or phrase, like shown here.

Now, how exactly to do this with a mouse varies depending on what program you're using, but here I highlighted the steps in a modern Icecat or Firefox. You start by clicking on the little hamburger, hotdog, or whatever-you-want-to-call-it menu in the upper-right, and then click on "Find in This Page" within the popup. This then opens a bar at the bottom of the page with an area to type the word or phrase you're searching for. It highlights and scrolls to the first match as you type, and has a button to highlight all results. It also shows the number of results off to the right. It's a simple, yet powerful mechanism that is pretty easy to use.

So does a GUI provide the right tool for the job? If you're just searching for a name or concept, sure, that seems to be true. A GUI is useful here.

But notice how I had to convey these steps to you. I had to take screenshots and highlight where to click with the mouse. Since a GUI is inherently very visual, so are the instructions on how to use it. There is no canonical representation for these instructions, because it involves clicking on elements that have no clear name to the user. Unless you're in software or UX development, you may not know what to call that menu in the upper-right. Further, what do you call the bar at the bottom of the page? You have to describe it in a way that reproduces what the user sees.

GUIs Change Over Time   B_frame

Images   B_columns
Left   B_column

/mikegerwitz/cs4m/src/commit/596fff5295b86d69a4fbbb630f8eedbea799b4a0/images/ff-find-menu.png

Right   B_column

/mikegerwitz/cs4m/src/commit/596fff5295b86d69a4fbbb630f8eedbea799b4a0/images/ff-edit-find.png

Ctrl+F

Notes   B_noteNH

Another difficult thing is: GUIs change over time. I'm sure many people here remember earlier versions of Firefox that didn't have the hamburger menu, where the Find menu option was in the Edit menu. By the way, those old menus do still exist if you hit Alt. I miss the old menus, personally, because it did make it easier to convey actions in text. Saying "Go to Edit - Find" is pretty clear, and those menu positions were always in the same place across the entire desktop environment. Now individual programs may vary in the their user experience.

But do you notice something in common between these two screenshots? There's something that hasn't changed over time—something that has been the same for decades! Ctrl+F.

Ctrl+F—Just Works   B_frame

  • Most GUI programs that offer search
  • Context-sensitive—Do The Right Thing
Notes   B_noteNH

When you type Ctrl+F, it immediately opens that search bar and gives focus to the textbox, so you can just start typing. Further, it works in all major browsers. Not only that, but Ctrl+F is so universal that it works in nearly every GUI program that offers some type of search! And it's context-sensitive! The program will just Do The Right Thing depending on where you are or what it is that you're doing.

Muscle Memory   B_fullframe

Muscle Memory

Visual ⇒ Tactile

Notes   B_noteNH

But there's something more profound that has happened here, that many users don't even think about. We have switched our mode of interaction.

With a mouse and a GUI, interaction is driven by visual indicators. The position of your hand on the mousepad or your fingers on a touchpad isn't meaningful, because your mouse cursor could be anywhere on the screen at any given time; your eyes provide the context. It's hard to use a GUI with your eyes closed while using a mouse.

But by hitting Ctrl+F, we've completely changed how we interact with the system. It's now tactile. You associate a finger placement; a motion; and the feeling of the keys being pressed beneath your fingers; with an action—finding something. You develop muscle memory. You can trigger this feature with your eyes closed.

<Repeatedly make motion with hand and fingers like a madman during the above paragraph.>

But that's a pretty trivial example.

A Research Task   B_fullframe

Research Task:

Given a list of webpage URLs

find all that do not contain ``free software''

Notes   B_noteNH

Let's explore a fairly simple research task together. Let's say I email you a handfull of URLs—say, maybe 5 or 10 of them—that are articles about software or technology. And I want you to come up with a list of the webpages that do not contain the phrase ``free software'' so that I can get a better idea of ones to focus my activism on.

How might we approach this problem as an average user?

Executing the Research Task   B_frame

Approaches   B_columns
Mouse   B_column

Mouse

  1. Click `+' for each new tab, enter URL
  2. Menu → Find in This Page
  3. Type ``free software''
  4. If found, go to #9
  5. If not found, highlight URL, right-click, copy
  6. Click on text editor
  7. Right-click, paste URL, hit RET for newline
  8. Click on web browser
  9. Click `X' on tab, go to #2
Notes   B_noteNH

(Perhaps I should demonstrate this right away rather than reading through the list first, to save time?)

Let's first use the mouse as many users probably would. To set up, let's open each URL in a new tab. We click on the little `+' icon for a new tab and then enter the URL, once for each webpage, perhaps copying the URL from the email message. Once we're all set up, we don't care about the email anymore, but we need a place to store our results, so we open a text editor to paste URLs into.

Now, for each tab, we click on the little hamburger menu, click on ``Find in This Page'', and then type ``free software''. If we do not see a result, we move our mouse to the location bar, click on it to highlight the URL, right-click on it to copy it to our clipboard, click on the text editor to give it focus, right-click on the editor and click on ``Paste'' to insert the URL, and then hit the return key to move to the next line. We then go back to the web browser. If we do see a result, we skip copying over the URL. In either case, we then close the tab by clicking on the `X'.

And then we repeat this for each tab, until they have all been closed. When we're done, whatever is in our text editor is the list of URLs of webpages that do not reference ``free software'', and we're done.

Simple enough, right? But it's a bit of a pain in the ass. All this clicking around doesn't really feel like we're melding mind and machine, does it? I think you'd be pretty disappointed if this is is the crap that you attended this session for.

What if we used our Ctrl+F trick? That saves us a couple clicks. But what if we could save even more clicks?

Keyboard   B_column

Keyboard

  1. Ctrl+T for each new tab, enter URL
  2. Ctrl+F to find
  3. Type ``free software''
  4. If found, go to #9
  5. If not found, Ctrl+L Ctrl+C to copy URL
  6. Alt+Tab to text editor
  7. Ctrl+V RET to paste URL and add newline
  8. Alt+Tab to web browser
  9. Ctrl+W to close tab, go to #2
Notes   B_noteNH

Fortunately we have many more keybindings at our disposal!

We'll start with opening each new tab with Ctrl+T instead of clicking on `+' with the mouse. (Maybe show copying the URL from the email without the mouse?)

To open our text editor, we'll use Alt+F4, which is a common keybinding for many window managers and operating systems to open a dialog to enter a program to run.

Once we're all set up, we start with the first tab and use Ctrl+F as we've seen before, and then type ``free software''. If we do not find a match, we're ready to copy the URL. Hitting Ctrl+L will take us to the location bar and highlight the URL. We can then hit Ctrl+C to copy the URL to the clipboard. Alt+Tab is supported by a wide variety of window managers on a variety of operating systems to switch between windows of running programs, usually in the order of most recently focused. So hitting it once should take us to our text editor. We then paste with Ctrl+V and hit return to insert a newline. We can then go back to the web browser by hitting Alt+Tab again. Once again, if there was a match, we skip that copy step. We then close the tab with Ctrl+W.

Repeat, and we're done all the same as before. As a bonus, save with Ctrl+S.

What's interesting about this approach is that we didn't have to use the mouse at all, unless maybe you used it to highlight the URL in the email. You could get into quite the rhythm with this approach, and your hands never have to leave the keyboard. This is a bit of a faster, more efficient way to convey our thoughts to the machine, right? We don't have to seek out our actions each time in the GUI—the operations are always at our fingertips, literally.

GUIs of a Feather   B_fullframe

Same Keybindings Across (Most) GUIs!

Browser, Editor, Window Manager, OS, \ldots

Notes   B_noteNH

Another powerful benefit of this approach is—these same exact keybindings work across most GUIs! If we switch out Icecat here with nearly any other major web browser, and switch out gedit with many other text editors or even word processors, this will work all the same! There are some notable text editors for which these keybindings won't work, for those of you screaming in your head. We'll get to that.

If you use Windows instead of GNU/Linux—which I strongly discourage, but if you do—then it'll work the same.

I can sit down at a completely different system and let that muscle memory take over and wind up with the same thing. It's liberating.

Let's look at those keybindings a bit more concisely, since that last slide was a mess, to put it nicely.

Macro-Like Keyboard Instructions   B_fullframe

Macro-Like

Ctrl+T ``https://...'' <N times>

Ctrl+F ``free sofware''
[ Ctrl+L Ctrl+C Alt+Tab Ctrl+V RET Alt+Tab ]
Ctrl+W
<N times>
  • <2> Requires visual inspection for conditional
  • <2> Still manual and tedious—what if there were 1000 URLs?
Notes   B_noteNH

If we type out the keybindings like this, in an isolated format, it looks a bit more like instructions for the machine, doesn't it? Some of you may be familiar with macros—with the ability to record keypresses and play them back later. If we were able to do that, then we could fully automate this task away!

But unfortunately…we can't. At least, not with the tools we're using right now. Why is that?

Well, for one, it requires visual inspection to determine whether or not a match has occurred. That drives conditional logic—that bracketed part there. We also need to know how many times to repeat, which requires that we either count or watch the progress. We also need to be able to inspect the email for URLs and copy them into the web browser.

This also scales really poorly. While using the keyboard is certainly faster than using the mouse, we're only dealing with a small set of URLs here. What if I gave you 100 of them? 1000? More? Suddenly this doesn't feel like a very efficient way to convey our intent to the machine. I don't wish that suffering upon anyone.

To get around that, we need to change how we think about our computing a bit. And that's why I've dragged you through this drawn-out example—to make sure you understand the significance of these progressive enhancements to our workflow.

A New Perspective [0/16]

Secrets?   B_fullframe

Notes   B_noteNH

So what if I told you that, while the average user is cursing me out for sending them 1000 URLs, I could go grab some coffee and play with my kids and come back however much time later to a list that has been generated for me, and it'd still be done before the user has even had a chance to open all of the URLs, letalone check them? And what if I told you that it'd only take a minute or two to for me to create this process?

This is where the whole concept of ``wizardry'' comes in. Some of you are sitting in the audience or watching this remotely rolling your eyes thinking ``oh this guy thinks he's so sweet'', because the answer is obvious to you. But to those of you who are confined to the toolset that I just demonstrated…it's not going to be obvious. You may still be thinking in terms of that toolset—thinking of how we can continue to use those same tools. You don't know what you don't know.

The problem is that there is a whole world and way of computing that is hidden from most users. And it's not hidden because it's a secret. I'd hate to disappoint you, but there's no magic or secrets in this talk. It's because modern interfaces have come to completely mask it or provide alternatives to it that happen to be ``good enough'' for the job.

But ``good enough'' is only good enough until it's not.

Lifting the Curtain   B_frame

/mikegerwitz/cs4m/src/commit/596fff5295b86d69a4fbbb630f8eedbea799b4a0/images/ff-inspect-menu.png

/mikegerwitz/cs4m/src/commit/596fff5295b86d69a4fbbb630f8eedbea799b4a0/images/ff-inspect.png

Notes   B_noteNH

Let's lift the curtain, so to speak, on what's really going on in the web browser. Don't worry, we're only going to give it a little peek; nothing too complicated.

Take the LibrePlanet 2019 speaker list page for instance. If you right-click on my name and click on ``Inspect Element'', you are presented with the developer tools for this browser which shows what makes up the webpage. You'll notice that it has a hierarchical structure made up of nodes describing how certain data ought to be formatted. For example, my name is in an h2 element, which represents a heading. Below that we see the talk title and then the talk abstract in a paragraph element, denoted by p.

If we look in that bottom p element, you'll see the exact text that's displayed on the webpage. And sure enough, that text contains the term ``free software''.

Now, technically, this inspector represents something called the DOM, which represents the current state of the page. If the webpage were dynamic, then you'd see this updating. We could even change it in here and the page would update. But you don't need to use a specialized tool to view the structure of the initial webpage; I just did this for visualization.

Web Page Source Code   B_block

Columns   B_columns
Left   B_column

/mikegerwitz/cs4m/src/commit/596fff5295b86d69a4fbbb630f8eedbea799b4a0/images/ff-view-source-menu.png

Right   B_column

/mikegerwitz/cs4m/src/commit/596fff5295b86d69a4fbbb630f8eedbea799b4a0/images/ff-view-source.png

Notes   B_noteNH

If we instead select "View Page Source" from the context menu, then we get a new tab that contains a much uglier version of what we just saw in the DOM inspector. This is the raw source code of the webpage. Most of it, anyway. It is a document language called HTML. And as you may have noticed, it's plain text. Structured, but plain, text.

And as you can see, if we hit Ctrl+F, ``free software'' is there all the same. We don't need to view the webpage with all its fancy formatting. For the problem we're trying to solve, the graphical representation provides little benefit.

Text   B_fullframe

Text.

Notes   B_noteNH

As we're about to see, this simple fact—that that the webpage is represented by plain text—opens up a whole new world to us. We have stripped away all the complex visual GUI stuff and we're left with the raw substance of the page which still contains the information that we are looking for.

But we're still within the web browser.

We don't have to be. We can copy all of that text and paste it into our editor. Ctrl+A Ctrl+C Alt-Tab Ctrl+V. And sure enough, search works all the same. Ctrl+F and we can still find ``free software''. Completely different program, and we can still find the text using the same keybinding.

Text is a Universal Interface   B_fullframe

Text is a Universal Interface

Notes   B_noteNH

Text is a universal interface. And what I mean by that is—you don't need any special tools to work with it. You can view it in your web browser. You can view it in your text editor. You can paste it in a text message. You can put it in a book. You can write it down on a paper and type it back into your computer.

Text is how we communicate with one-another as human beings.

Let's save this HTML as a file, speakers.html.

If we opened this file, it would open in our web browser and we would see the same webpage, although it would look a bit different since a lot of the styling is stored outside of this HTML file, and there won't be any images. But this isn't a talk about web development so I'm not going to go deeper than that.

But if again we opened this HTML file in our text editor, you would see that same plain text HTML as before; one program just chooses to render it differently than another.

Even though we can view the HTML in our text editor, we haven't yet eliminated the web browser; we still need it to navigate to the webpage and view its source. But if that's all we're using the web browser for, then it's one hell of an inefficient way of telling the computer that we just want the HTML document at a certain URL.

Up until this point, the keyboard has been used as a secondary interface—as an alternative to something. Now we're going to venture into a world where it is the interface.

The Shell Command Prompt   B_frame

mikegerwitz@lp2019-laptop:~$
# ^ user    ^ host        ^ working directory (home)

This presentation will show:

$ command
output line 1
output line 2
...
output line N
Notes   B_noteNH

If you open a terminal, also called a VTE, or virtual terminal emulator, you will be greeted with a curious string of characters. This is a command prompt.

The program that is prompting you for a command is called the shell. The GNU shell is bash, which is the default on most GNU/Linux systems. It's also the default on Mac OSX, if you happen to be using that, though I recommend against it. And Windows now has something they call ``Bash on Ubuntu on Windows'', which is GNU/Linux running atop of the Windows kernel. Of course a fully free GNU/Linux system is better.

Bash isn't required to run any of the commands I'm presenting as part of this talk, but I will be mentioning some features specific to bash, which I'll note when I do in case you happen to be using a different shell.

Eliminating the Web Browser   B_frame

$ wget https://libreplanet.org/2019/speakers/
--2019-03-24 00:00:00-- https://libreplanet.org/2019/speakers/
Resolving libreplanet.org (libreplanet.org)... 209.51.188.248
Connecting to libreplanet.org (libreplanet.org)|209.51.188.248|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: index.html
...
2019-03-24 00:00:00 (1.78 MB/s) - index.html saved [67789]
$ wget -O speakers.html \
       https://libreplanet.org/2019/speakers/
Notes   B_noteNH

Alright! The goal is to retrieve the HTML file at a given URL. GNU/Linux distributions usually come with GNU wget, which does precisely that. To invoke it, we type the name of the command, followed by a space, followed by the URL we wish to retrieve, and then hit enter.

What follows is quite a bit of text. The details aren't particularly important as long as it's successful, but notice that it says it saved to index.html. That's not intuitive to those who don't understand why that name was used.

So let's tell wget what file we want to output to. We do this with the O option, like so. It takes a single argument, which is the name of the output file. The backslash here allows us to continue the command onto the next line; otherwise, a newline tells the shell to execute the command.

So remember previously that we manually created speakers.html by viewing the source of the webpage in Icecat. If we open this file, we'll find that it contains exactly the same text, and we never had to open a web browser. And we can search it all the same as before for ``free software''.

Browser vs. wget Comparison   B_frame

Ctrl+L ``https://libreplanet.org/2019/speakers/''
$ wget https://libreplanet.org/2019/speakers/
Notes   B_noteNH

This is a very different means of interacting with the computer, but if we compare this with the keyboard shortcut used previously, they are very similar. Not so scary, right? It's hard to imagine a more direct line of communication with the computer for downloading a webpage, short of reading your mind.

It's true that you can save the HTML using Icecat by hitting Ctrl+S, but that saves a lot more than just the HTML page—it also saves all the images and stylesheets and other resources, which is much more than we need. You can do this in wget too, by the way. You can even archive an entire website for offline viewing by using -m. Just don't do that with this URL, or it'll also download the entire LibrePlanet wiki too, which isn't quite what you want.

Finding Text on the Command Line   B_frame

$ grep 'free software' speakers.html

\vdots

<p>Mike Gerwitz is a free software hacker and activist with a focus on
exclusively free software. Mike spends most of his free time with his

\vdots

Notes   B_noteNH

Not having to open a web browser is nice, but having to run wget and then open the downloaded HTML file is a bit of a pain; is there a command that can help us there too?

We want to know whether a page contains the term ``free software''. For that we use a tool called grep.

The first argument to grep is the search string, and the remaining arguments—just one here—tell it where it should search. The first argument to grep is quoted because it contains a space, otherwise the shell would think our search phrase was only `free' and that the files we wanted to search were `software' and `speakers.html'.

You'll get a bunch of output; I just included a small snippet here. But notice how it happens to include exactly the text we were looking at in the web browser.

And with that we have replicated Ctrl+F. But did we do a good job conveying our thoughts to the machine?

We just wanted to know whether the page contains the phrase; we don't care to see it! So while we have efficiently conveyed a search string, we didn't receive an efficient reply—it's information overload.

A More Gentle Reply   B_frame

$ grep --quiet 'free software' speakers.html && echo yes

yes
$ echo 'Hello, world!'

Hello, world!
$ grep --quiet 'open source' speakers.html || echo no

no
Notes   B_noteNH

First we tell grep to modify its behavior with the quiet flag. You can also use the short form, which is just -q. Rather than outputting results, grep will exit silently and it will instead return a status to the shell that says whether or not the search failed.

POSIX-like shells, like Bash, offer the ability to say ``run this command if the previous succeeds'', and this is done by putting two ampersands between the commands.

The command to run if grep succeeds in finding a match is echo. All echo does is takes its arguments and spits them right back out again as output. So this essentially states: ``search for `free software' in speakers.html and output `yes' if it is found''.

Since echo is its own command, it also works by itself. Here's the classic ``hello, world'' program in shell.

But if you recall our research task, it was to search for pages that do not contain the term ``free software''. We can do that too, by using two pipes in place of two ampersands, which states: ``search for `free software' in speakers.html and output `no' if it fails''. Alternatively you could also pass -v or --invert-match which inverts the exit status, and continue using the ampersands.

Writing to Files (Redirection)   B_frame

  • Commands write to standard out (stdout) by default
  • Output redirection writes somewhere else
# overwrites each time
$ echo 'Hello, world!'       > hello.txt
$ echo 'Hello again, world!' > hello.txt

# appends (echo adds a newline)
$ echo 'First line'  >> results.txt
$ echo 'Second line' >> results.txt

# truncates file (empties)
> results.txt
Notes   B_noteNH

Alright, we're well on our way now! But we still haven't gotten rid of that damn text editor, because we need to save a list of URLs to a file to hold our final results!

Well as it so happens, writing to a file is such a common operation that it's built right into the shell. We use a feature called redirection.

There are two types of output redirection. If you place a single greater-than symbol followed by a filename after a command, then the output of that command will replace anything already in the file. So the result of the first two commands will be a hello.txt that contains only a single line: ``Hello again, world!''.

The second type, which uses two greater-than symbols, appends to the file. echo by default adds a newline, so the result of the second two commands is a results.txt containing two lines, ``First line'' and ``Second line'' respectively. If the file doesn't yet exist, an empty one will be created before writing.

I think maybe you can see where I'm going with this.

Starting Our List   B_fullframe




$ wget --quiet -O speakers.html \
       https://libreplanet.org/2019/speakers/ \
    && grep --quiet 'free software' speakers.html \
         || echo https://libreplanet.org/2019/speakers/ \
              >> results.txt
Notes   B_noteNH

Take a look at that for a moment. <pause ~5s> Can anyone tell me what the result of this command line will be? <pause ~5s> <react appropriately>

As exciting as it is to start to bring these things together, the result is pretty anti-climatic—we know that speakers.html does contain the string ``free software'', and so the result is that results.txt contains nothing! In fact, if results.txt didn't exist yet, it still wouldn't even exist.

<for the sake of demonstration, maybe modify it to output>

At this point, we have successfully eliminated both the web browser and text editor. But this is a hefty command to have to modify each time we want to try a different URL.

Command Refactoring   B_fullframe




$ wget --quiet -O speakers.html \
       https://libreplanet.org/2019/speakers/ \
    && grep --quiet 'free software' speakers.html \
         || echo https://libreplanet.org/2019/speakers/ \
              >> results.txt

⊂skip



$ URL=https://libreplanet.org/2019/speakers/
$ wget --quiet -O speakers.html \
       "$URL" \
    && grep --quiet 'free software' speakers.html \
         || echo "$URL" \
              >> results.txt


$ URL=https://libreplanet.org/2019/speakers/
$ wget -qO speakers.html \
       "$URL" \
    && grep -q 'free software' speakers.html \
         || echo "$URL" \
              >> results.txt


$ URL=https://libreplanet.org/2019/speakers/
$ wget -qO - \
       "$URL" \
    | grep -q 'free software' \
        || echo "$URL" \
             >> results.txt


$ URL=https://libreplanet.org/2019/speakers/
$ wget -qO - "$URL" \
    | grep -q 'free software' || echo "$URL" >> results.txt
$ alias fetch-url='wget -qO-'

$ URL=https://libreplanet.org/2019/speakers/
$ fetch-url "$URL" \
    | grep -q 'free software' || echo "$URL" >> results.txt
Notes   B_noteNH

We can simplify it by introducing a variable. First we assign the URL to a variable named URL. There must be no spaces on either size of the assignment operator, which is the equal sign. We then reference its value by prefixing it with a dollar sign everywhere the URL previously appeared. You should always put variable references in double quotes for safety if they are intended to represent a single argument—that ensures that, if our variable contains a space or other special character, it isn't interpreted by the shell differently than we intended.

We can also make this command line a bit more concise by using the short name for the --quiet flag, which is -q. Notice how in wget I combined them into -qO instead of using two separate dashes with spaces between them. This is optional; if you feel it's easier to read the other way, that's fine.

Something else feels dirty. We're creating this speakers.html file just to pass to grep. It's not needed after the fact. In fact, it's just polluting our filesystem. What if we didn't have to create it at all to begin with?

I'm first going to introduce the notation, and then I'll go into more into why it works.

If we replace the output file speakers.html with a single dash, that tells wget to write to standard out. This is normally the default behavior of command line programs, like grep and echo, but wget is a bit different.

We then omit the speakers.html from grep entirely. grep will read from standard in by default.

We then connect standard out of wget to the standard in of grep using a single pipe; this is called a pipeline.

Now that we've freed up some characters, let's reformat this slightly to be a bit more readable. And that wget command looks a bit cryptic. How about we define an alias so that it looks a bit more friendly, and then we can stop worrying about what it does?

Now here's the original command we started with, and where we're at now.

This little bit of abstraction has made our intent even more clear. It can now clearly be read that we're defining a URL, retrieving that URL, searching for a term, and then appending it to a file on a non-match.

But before we keep going, I want to go back to a point I mentioned previously.

Again: Text is a Universal Interface   B_againframe

Notes   B_noteNH

Text is a universal interface.

Notice how we started out our journey manually inspecting text, and began replacing the human part of the workflow at each step with a command. That's because text is something that both humans and computers can work with easily.

This is a fundamental design principle in the Unix tools that I have begun to present to you.

Pipelines   B_fullframe

``Expect the output of every program to become the input to another''

—Doug McIlroy (1978)

Notes   B_noteNH

The invention of the Unix pipe is credited to Doug McIlroy. As part of the Unix philosophy, he stated: ``expect the output of every program to become the input to another''.

More broadly, the Unix philosophy can be summarized as:

Summary of the Unix Philosophy   B_fullframe

The Unix Philosophy

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

Notes   B_noteNH

<Read it>

Up to this point, we have shifted how we communicate with the machine by moving away from a visual interface driven primarily by movement, to a textual interface that puts mind and machine on equal footing. And now here we're talking about another profound shift in how we think.

We start to think of how to decompose problems into small operations that exist as part of a larger pipeline. We think of how to chain small, specialized programs together, transforming text at each step to make it more suitable for the next.

Program Composition [0/10]

Clarifying Pipelines   B_fullframe

$ fetch-url https://libreplanet.org/2019/speakers/ \
    | head -n3
<!DOCTYPE html>
<html lang="en">
  <head>
$ fetch-url https://libreplanet.org/2019/speakers/ \
    | wc -l
1422
$ fetch-url https://libreplanet.org/2019/speakers/ \
    | grep 'free software' \
    | wc -l
38
$ fetch-url https://libreplanet.org/2019/speakers/ \
    | grep 'free software' \
    | tail -n1
Expat</a>. The JavaScript is free software with
Notes   B_noteNH

Let's observe the profound consequences of these design decisions. Remember that standard out is displayed to us on the terminal by default. If we were to just run that wget command and nothing else, we'd be spammed with output. But I just want to prove to you that HTML is being output to standard out; I don't want a ton of spam. So we can pipe it to head to show only the first few lines.

But how many lines did we save ourselves? We can pipe it to wc instead, which stands for ``word count'', and tell it to count the number of newlines with -l.

What about the number of lines that contain the string ``free software''? Or how about the last such line?

It's all a simple matter of composing existing programs with pipes.

Tor   B_fullframe

$ alias fetch-url='torify wget -qO-'
Notes   B_noteNH

By the way, retrieving a bunch of URLs in an automated manner may be a privacy concern for you. You can easily send all these requests through Tor, assuming it is installed and the daemon running, by prefixing wget with torify. Since we abstracted our fetching away into the fetch-url alias, our previous examples continue to work as-is.

LP Sessions   B_fullframe

$ fetch-url https://libreplanet.org/2019/speakers/ \
    | grep -A5 speaker-header \
    | head -n14
<header class="keynote-speaker-header" id="garbee">
<hgroup>
<h2>Bdale Garbee</h2>
</hgroup>
</header>
<p><em>Closing keynote</em></p>
--
<header class="keynote-speaker-header" id="loubani">
<hgroup>
<h2>Tarek Loubani</h2>
</hgroup>
</header>
<p><em>Opening keynote (Day 1)</em></p>
--
$ fetch-url https://libreplanet.org/2019/speakers/ \
    | grep -A5 speaker-header \
    | grep '<em>'
<p><em>Closing keynote</em></p>
<p><em>Opening keynote (Day 1)</em></p>
<p><em>Opening keynote (Day 2)</em></p>
[...]
<p><em>The Tor Project: State of the Onion</em> and <em>Library Freedom Institute: A new hope</em></p>
<p><em>The Tor Project: State of the Onion</em></p>
[...]
<p><em>Large-scale collaboration with free software</em></p>
<p><em>Large-scale collaboration with free software</em></p>
$ fetch-url https://libreplanet.org/2019/speakers/ \
    | grep -A5 speaker-header \
    | grep -o '<em>[^<]\+</em>'
<em>Closing keynote</em>
<em>Opening keynote (Day 1)</em>
<em>Opening keynote (Day 2)</em>
[...]
<em>The Tor Project: State of the Onion</em>
<em>Library Freedom Institute: A new hope</em>
<em>The Tor Project: State of the Onion</em>
[...]
<em>Large-scale collaboration with free software</em>
<em>Large-scale collaboration with free software</em>
$ fetch-url https://libreplanet.org/2019/speakers/ \
    | grep -A5 speaker-header \
    | grep -o '<em>[^<]\+</em>' \
    | sort \
    | uniq -cd
      2 <em>Hackerspace Rancho Electrónico</em>
      4 <em>Large-scale collaboration with free software</em>
      2 <em>Library Freedom Institute: A new hope</em>
      2 <em>Right to Repair and the DMCA</em>
      2 <em>Teaching privacy and security via free software</em>
      2 <em>The joy of bug reporting</em>
      5 <em>The Tor Project: State of the Onion</em>
$ fetch-url https://libreplanet.org/2019/speakers/ \
    | grep -A5 speaker-header \
    | grep -o '<em>[^<]\+</em>' \
    | sort \
    | uniq -cd \
    | sort -nr \
    | head -n5
      5 <em>The Tor Project: State of the Onion</em>
      4 <em>Large-scale collaboration with free software</em>
      2 <em>The joy of bug reporting</em>
      2 <em>Teaching privacy and security via free software</em>
      2 <em>Right to Repair and the DMCA</em>
$ fetch-url https://libreplanet.org/2019/speakers/ \
    | grep -A5 speaker-header \
    | grep -o '<em>[^<]\+</em>' \
    | sort \
    | uniq -cd \
    | sort -nr \
    | head -n5 \
    | sed 's#^ *\(.\+\) <em>\(.*\)</em>#\2 has \1 speakers#g'
The Tor Project: State of the Onion has 5 speakers
Large-scale collaboration with free software has 4 speakers
The joy of bug reporting has 2 speakers
Teaching privacy and security via free software has 2 speakers
Right to Repair and the DMCA has 2 speakers
$ fetch-url https://libreplanet.org/2019/speakers/ \
    | grep -A5 speaker-header \
    | grep -o '<em>[^<]\+</em>' \
    | sort \
    | uniq -cd \
    | sort -nr \
    | head -n5 \
    | sed 's#^ *\(.\+\) <em>\(.*\)</em>#\2 has \1 speakers#g'
    | espeak
Notes   B_noteNH

How about something more involved. I noticed that some LibrePlanet sessions had multiple speakers, and I wanted to know which ones had the most speakers.

The HTML of the speakers page includes a header for each speaker. Here are the first two. These are keynote speakers, but there's also non-keynote ones that are just speaker-header.

Let's get just the talk titles that those speakers are associated with. Looking at this output, we see that the talks titles have an em tag, so let's just go with that. Pipe to grep instead of head.

It looks like at least one of those results has multiple talks. But note that each is enclosed in its own set of em tags. If we add -o to grep, which stands for only, then it'll only return the portion of the line that matches, rather than the entire line. Further, if there are multiple matches on a line, it'll output each match independently on its own line. That's exactly what we want! But we have to modify our regex a little bit to prevent it from grabbing everything between the first and last em tag, by prohibiting it from matching on a less than character in the title. Don't worry if you don't understand the regular expression; they take time to learn and tend to be easier to write than they are to read. This one just says ``match one or more non-less-than characters between em tags''. grep actually supports three flavors of regular expressions; if you used Perl's with -P, it'd be even simpler to write, but I show the POSIX regex here for portability since Perl regexes aren't available on all systems.

Now assuming that the talk titles are consistent, we can get a count. uniq has the ability to count consecutive lines that are identical, as well as output a count. We also use -d to tell it to only output duplicate lines. But uniq doesn't sort lines before processing, so we first pipe it to sort. That gives us a count of each talk!

But I want to know the talks with the most speakers, so let's sort it again, this time numerically and in reverse order, and take the top five.

And we have our answer!

But just for the hell of it, let's go a step further. Using sed, which stands for stream editor, we can match on portions of the input and reference those matches in a replacement. So we can reformat the uniq output into an English sentence, like so. sed is actually a turing complete programming language, but it is often used in pipelines with inline scripts like this. I chose the pound characters delimit the match from the replacement. The numbers in the replacement reference the parenthesized groups in the match.

And then we're going to pipe it to the program espeak, which is a text-to-speech synthesizer. Your computer will speak the top five talks by presenter count to you. Listening to computers speak is all the rage right now, right?

Interactive, Incremental, Iterative Development   B_fullframe

Incremental Development

Interactive REPL, Iterative Decomposition

Notes   B_noteNH

Notice how we approached that problem. I presented it here just as I developed it. I didn't open my web browser and inspect the HTML; I just looked at the wget output and then started to manipulate it in useful ways working toward my final goal. This is just one of the many ways to write it. And this is part of what makes working in a shell so powerful.

In software development, we call environments like this REPLs, which stands for ``read-eval-print loop''. The shell reads a command line, evaluates it, prints a result, and then does that all over again. As a hacker, this allows me to easily inspect and iterate on my script in real time, which can be a very efficient process. I can quickly prototype something and then clean it up later. Or maybe create a proof-of-concept in shell before writing the actual implementation in another language.

But most users aren't programmers. They aren't experts in these commands; they have to play around and discover as they go. And the shell is perfect for this discovery. If something doesn't work, just keep trying different things and get immediate feedback!

And because we're working with text as data, a human can replace any part of this process!

Discovering URLs   B_fullframe


$ grep -o 'https\?://[^ ]\+' email-of-links.txt
https://en.wikipedia.org/wiki/Free_software
https://en.wikipedia.org/wiki/Open_source
https://en.wikipedia.org/wiki/Microsoft
https://opensource.org/about

$ grep -o 'https\?://[^ ]\+' email-of-links.txt \
    | while read URL; do
        echo "URL is $URL"
      done
URL is https://en.wikipedia.org/wiki/Free_software
URL is https://en.wikipedia.org/wiki/Open_source
URL is https://en.wikipedia.org/wiki/Microsoft
URL is https://opensource.org/about

$ grep -o 'https\?://[^ ]\+' email-of-links.txt \
    | while read URL; do
        fetch-url "$URL" | grep -q 'free software' \
          || echo "$URL" >> results.txt
      done

$ grep -o 'https\?://[^ ]\+' email-of-links.txt \
    | while read URL; do
        fetch-url "$URL" | grep -q 'free software' \
          || echo "$URL" | tee -a results.txt
      done
https://en.wikipedia.org/wiki/Microsoft
https://opensource.org/about

$ grep -o 'https\?://[^ ]\+' email-of-links.txt \
    | while read URL; do
        fetch-url "$URL" | grep -q 'free software' \
          || echo "$URL" | tee -a results.txt
      done \
    | xclip -i -selection clipboard

$ grep -o 'https\?://[^ ]\+' email-of-links.txt \
    | while read URL; do
        fetch-url "$URL" | grep -q 'free software' \
          || echo "$URL" | tee -a results.txt
      done \
    | tee >( xclip -i -selection clipboard )
https://en.wikipedia.org/wiki/Microsoft
https://opensource.org/about
$ xclip -o -selection clipboard \
    | grep -o 'https\?://[^ ]\+' \
    | while read URL; do
        fetch-url "$URL" | grep -q 'free software' \
          || echo "$URL" | tee -a results.txt
      done \
    | tee >( xclip -i -selection clipboard )
https://en.wikipedia.org/wiki/Microsoft
https://opensource.org/about
$ xclip -o -selection clipboard \
    | grep -o 'https\?://[^ ]\+' \
    | while read URL; do
        fetch-url "$URL" | grep -q 'free software' \
          || echo "$URL" | tee -a results.txt
      done
https://en.wikipedia.org/wiki/Microsoft
https://opensource.org/about

$ xclip -i -selection clipboard < results.txt
$ xclip -o -selection clipboard \
    | grep -o 'https\?://[^ ]\+' \
    | while read URL; do
        fetch-url "$URL" | grep -q 'free software' \
          || tee -a results.txt <<< "$URL"
      done
https://en.wikipedia.org/wiki/Microsoft
https://opensource.org/about

$ xclip -i -selection clipboard < results.txt
Notes   B_noteNH

Okay, back to searching webpages. Now that we have a means of creating the list of results, how do we feed the URLs into our pipeline? Why not pull them right out of the email with grep?

Let's say you saved the email in email-of-links.txt. This simple regex should grab most URLs for both HTTP and HTTPS protocols, but it's far from perfect. For example, it'd grab punctuation at the end of a sentence. But we're assuming a list of URLs. Here's some example output with a few URLs.

For each of these, we need to run our pipeline. It's time to introduce while and read. while will continue to execute its body in a loop until its command fails. read will read line-by-line into one or more variables, and will fail when there are no more lines to read.

So if we insert our fetch-url pipeline into the body, we get this. But if we just redirect output into results.txt, we can't see the output unless we inspect the file. For convenience, let's use tee, which is named for a pipe tee; it'll send output through the pipeline while also writing the same output to a given file. The -a flag tells it to append rather than overwrite. So now we can both observe the results and have them written to a file!

But we were just going to reply to an email with those results. Let's assume we're still using a GUI email client. Wouldn't it be convenient if those results were already on the clipboard for us so we can just paste them into the message? We can accomplish that by piping to xclip as shown here. There's also the program xsel, which I typically use because its arguments are far more concise, but I don't show it here.

Ah, crap, but now we can't see the output again. So let's use tee again. But rather than outputting to a file on disk, we're going to use a special notation that tells bash to invoke a command in a subshell and replace that portion of the command line with a path to a virtual file representing the standard input of that subshell. Now we can see the output again!

Well, if we're writing to the clipboard, why don't we just read from it too? Instead of saving our mail to a file, we can just copy the relevant portion and have that piped directly to grep! If you have a list of URLs and you just copy that portion, then you can just get rid of grep entirely.

Because we're writing to results.txt, another option is to just let this run and copy to the clipboard at a later time. We can do that by reading results.txt in place of standard input to xclip, as shown here.

And while we're at it, here's a special notation to get rid of echo for the tee in the body of while: three less-than symbols provides the given string on standard in. This is a bash-specific feature.

Phew!

Go Grab a Coffee   B_fullframe

Go Grab a Coffee

Notes   B_noteNH

Remember when I said I could go grab a coffee and play with the kids while the script did its thing? Well now's that time.

But grabbing a coffee means that this system is a bottleneck. The Internet is fast nowadays; ideally, we wouldn't have to wait long. Can we do better?

Async Processes   B_fullframe

$ sleep 1 && echo done & echo start
start
done

(Don't do this for large numbers of URLs!)

$ while read URL; do
    fetch-url "$URL" | grep -q 'free software' \
      || tee -a results.txt <<< "$URL" &
  done
Notes   B_noteNH

Indeed we can. This process is executing serially—one URL at a time, waiting for one to complete before checking another. What if we could query multiple URLs in parallel?

Shells have built-in support for backgrounding tasks so that they can run while you do other things; all you have to do is place a single ampersand at the end of a command. So in this example, we sleep for one second and then echo ``done''. But that sleep and subsequent echo is put into the background, and the shell proceeds to execute echo start while sleep is running in the background. One second later, it outputs ``done''.

So here's the loop we were just writing. If we add an ampersand at the end of that pipeline, it'll run in the background and immediately proceed to the next URL, executing the loop again.

But there's a problem with this approach. Sure, it's fine if we only have a few URLs. But what if we have 1000? Do we really want to spawn 1000s of processes and make 1000 network requests at once? That isn't efficient, and it's a bit rude to DOS servers.

Executable Shell Script and Concurrency   B_fullframe

url-grep   B_block
#!/bin/bash
search="$1"
url="$2"
search="${1?Missing search string}"
url="${2?Missing URL}"
wget -qO- "$url" \
  | grep -q "$search" || echo "$url"
Execute   B_ignoreheading
$ chmod +x url-grep
$ while read URL; do
    ./url-grep 'free software' "$URL" >> results.txt
  done
$ xargs -n1 ./url-grep 'free software' > results.txt
$ xargs -n1 -P5 ./url-grep 'free software' > results.txt
#             ^ 5 concurrent processes
Notes   B_noteNH

Before we continue, we're going to have to write our pipeline in a way that other programs can run it. Up to this point, the program has just been embedded within an interactive shell session. One of the nice things about shell is that you can take what you entered onto the command line and paste it directly into a file and, with some minor exceptions, it'll work all the same.

Let's take our pipeline and name it url-grep. Aliases only work in interactive sessions by default, so we're going to just type wget directly here. Alternatively, you can define a function. We use the positional parameters 1 and 2 here to represent the respective arguments to the url-grep command.

The comment at the top of the file is called a ``shebang''. This is used by the kernel so that it knows what interpreter to use to run our program.

To make it executable, we use chmod to set the executable bits on the file. We can then invoke it as if it were an executable. If it were in our PATH, which isn't something I'm going to get into here, you'd be able to run it like any other command without having to prefix it with ./.

We can also do a primitive form of error handling and documentation by modifying our positional parameters like so, which will show an error message if we don't specify one of them.

Now we replace the while loop with xargs. It takes values from standard in and appends them as arguments to the provided command. We specify -n1 to say that only one argument should be read from stdin for any invocation of the command; that makes it run a new command for every line of input. Otherwise it'd just append N URLs as N arguments.

And now we can simply use -P to tell it how many processes to use at once. Here we specify 5, meaning xargs will run five processes at a time. You can change that to whatever number makes sense for you.

Again: A Research Task   B_againframe

Notes   B_noteNH

So this was the research task that we started with.

If I were to approach this problem myself, I'd write a quick-and-dirty script that is just throwaway, because it's such a simple problem. So, let's combine everything we've seen so far:

A Quick-n-Dirty Solution   B_frame

echo 'wget -qO- "$2" | grep -q "$1" || echo "$2"' > url-grep
$ xargs -n1 -P10 bash url-grep 'free software' < url-list
    | mail -s 'URLs not mentioning free software' mtg@gnu.org
$ wc -l url-list
1000

$ time xargs -n1 -P10 bash url-grep 'free software' < url-list
real    0m17.548s
user    0m8.283s
sys     0m4.877s
Notes   B_noteNH

I'd first echo the pipeline into url-grep. Instead of making it executable, I'll just pass it as an argument to bash instead, which saves me a step; it's a temporary file anyway. I used 10 processes instead of 5. And then to top it all off, if you have a MTA configured on your system, we can just pipe the output to the mail command to send that URL list directly to me.

It only takes a minute or two to come up with this script. But how long does it take to run?

I took a few URLs and just repeated them in a file so that I had 1000 of them. Running the xargs command, it finishes in under 18 seconds on my system at home. Obviously YMMV, and certain sites may be slower to respond than others.

So in only a couple of minutes, the task has been automated away and completed, all by gluing together existing programs. You don't need to be a programmer to know how to do this; you just need to be familiar with the tools and know what's possible, which comes with a little bit of practice.

This is certainly an efficient means of communicating with the machine. We've come a long way from using the web browser and a mouse.

Thank You   B_fullframe

Thank you.

Mike Gerwitz

mtg@gnu.org">mtg@gnu.org

\bigskip

Slides and Source Code Available Online

<https://mikegerwitz.com/talks/cs4m>

\bigskip

\vfill

Licensed under the Creative Commons Attribution ShareAlike 4.0 International License