#+startup: beamer #+TITLE: Computational Symbiosis: Methods That Meld Mind and Machine #+AUTHOR: Mike Gerwitz #+EMAIL: mtg@gnu.org #+DATE: LibrePlanet 2019 #+BEAMER_HEADER: \title{Computational Symbiosis} #+BEAMER_HEADER: \subtitle{Methods That Meld Mind and Machine} #+OPTIONS: H:3 num:nil toc:nil p:nil todo:nil stat:nil #+LaTeX_CLASS: beamer #+LaTeX_CLASS_OPTIONS: [presentation] #+BEAMER_THEME: Madrid #+BEAMER_COLOR_THEME: beetle #+BEAMER_HEADER: \input{slides-preamble.tex} #+TODO: DEVOID(v) LACKING(l) RAW(r) DRAFT(d) AUGMENT(A) REVIEWED(R) | READY(,) REHEARSED(.) #+COLUMNS: %40ITEM %10DURATION{:} %8TODO %BEAMER_ENV(ENVIRONMENT) * Project Notes :noexport: ** Core Guidelines - Blur distinction between user and programmer. - Use only examples that non-programmers could come up with and understand. - GUIs /are/ sometimes the most efficient means of melding mind and machine. - De-emphasize doing everything on a terminal just because its "cool". ** Concepts [8/27] - [ ] =/sys= and such - [ ] Automating multi-step workflows with GNU Make - [ ] Concurrency - [-] Connecting graphical and command line (X11 integration) [1/4] - [ ] Clipboard (image) - [X] Clipboard (text) ([[*Program Composition]]) - [ ] X11 commands - [ ] Interesting example idea: =import=, =tesseract ss.png - | xsel -ib= so that you can select an area of text on the screen and perform OCR, then have the result on your clipboard - I've never done this, but it goes to show how trivial it is to create something that is seemingly novel. - [ ] Discoverability - Example: I used GIMP for editing some images for this talk because I thought it'd be faster to discover how to that way vs. ImageMagick docs. - [ ] Editor should keep up with your thoughts - Emphasize muscle memory - [-] Efficiency gains of light use of keybindings [2/3] - [X] Seldom change ([[*Browser Topics][Browser Topics]]) - [X] Some are universal (between programs) ([[*Browser Topics][Browser Topics]]) - [ ] Some context-sensitive and Just Work without having to hunt through context-specific menus - [ ] Emphasize free software community and how OS is built by so many contributors - Symbiosis is not only with machine, but community - [ ] Globbing - [ ] GnuPG and smart cards - [ ] Incremental, iterative, interactive development ([[*Program Composition]]) - [X] Looping ([[*Perspective Topics][Perspective Topics]]) - [ ] McIlroy and Knuth - [X] Mouse has dependency on positioning of UI elements ([[*Browser Topics][Browser Topics]]) - Changes over time, subject to user/system preferences, etc - [ ] Multiplexing with GNU Screen - [X] Output redirection ([[*A New Perspective]]) - [ ] Phone as an example of a device that best works with touch and GUIs and sacrifices practical freedom. - Requires specialized knowledge and is inconvenient to work with. - [X] Pipelines instead of intermediate files ([[*Perspective Topics][Perspective Topics]]). - [ ] Practically exercising software freedom [0/2] - [ ] Users need to be able to convey their thoughts to the computer without being programmers - [ ] Concise primitives / building blocks - [ ] Readline (history and editing) [0/2] - [ ] Develop as you go, perhaps just referencing history early on - [ ] Transfer commands from history into scripts and aliases for re-use - [X] Regular expressions ([[*Perspective Topics][Perspective Topics]]) - [ ] Remote commands via SSH - [X] Text as a universal interface - [X] All programs can work with one-another ([[*Perspective Topics][Perspective Topics]]) - [X] Can replace any part of the process with a human---no difference between input from a program vs. from a keyboard. ([[*Browser Topics][Browser Topics]]) - [X] The Unix Philosophy ([[*Perspective Topics][Perspective Topics]]) - [X] Using keybindings in a GUI with no mouse ([[*Browser Topics][Browser Topics]]) - Including transferring data between programs - [ ] Using macros to script keypresses (Vim) - [ ] Version control ** Programs [0/4] - [-] Coreutils / Common [7/14] - [X] =cut= - [ ] =column= - [ ] =diff= - [X] =echo= - [ ] =find= - [ ] =gpg= - [X] =grep= - [ ] =make= / Makefiles - [ ] =man= - [X] =sort= - [ ] =ssh= - [X] =uniq= - [X] =wc= - [X] =wget= / =curl= - [-] Miscellaneous [4/8] - [ ] =awk= - [ ] =cowsay= - [ ] =dict= - [X] =espeak= - [ ] =git= - [X] =mail= - [X] =torify= - [X] =sed= - [-] X11 / Graphical [1/5] - [ ] =convert= - [ ] =i3= / =xmonad= / etc - [ ] =import= - [ ] =setxkbmap= - [X] =xclip= / =xsel= - [ ] Editors / Multiplexing [0/4] - [ ] =emacs= - [ ] =nano= - [ ] =vim= - [ ] =screen= * LACKING Slides [0/6] :PROPERTIES: :ID: slides :END: ** Summary :noexport: #+BEGIN: columnview :hlines 2 :maxlevel 3 :indent t :id slides | ITEM | DURATION | TODO | ENVIRONMENT | |-------------------------------------------------------+----------+--------+-------------| | Slides | 0:36:19 | DEVOID | | |-------------------------------------------------------+----------+--------+-------------| | \_ Summary | | | | |-------------------------------------------------------+----------+--------+-------------| | \_ Introduction | | RAW | note | |-------------------------------------------------------+----------+--------+-------------| | \_ Choreographed Workflows | | DEVOID | fullframe | |-------------------------------------------------------+----------+--------+-------------| | \_ Practical Freedom | | DEVOID | fullframe | |-------------------------------------------------------+----------+--------+-------------| | \_ Practical Example: Web Browser | 0:09:32 | DRAFT | | | \_ Browser Topics | | | | | \_ Example: Web Browser | 0:00:40 | DRAFT | frame | | \_ Finding Text (Mouse-Driven GUI Interaction) | 0:01:39 | DRAFT | frame | | \_ GUIs Change Over Time | 0:00:45 | DRAFT | frame | | \_ Ctrl+F---Just Works | 0:00:25 | DRAFT | frame | | \_ Muscle Memory | 0:00:40 | DRAFT | fullframe | | \_ A Research Task | 0:00:25 | DRAFT | fullframe | | \_ Executing the Research Task | 0:03:00 | DRAFT | frame | | \_ GUIs of a Feather | 0:00:40 | DRAFT | fullframe | | \_ Macro-Like Keyboard Instructions | 0:01:19 | DRAFT | fullframe | |-------------------------------------------------------+----------+--------+-------------| | \_ A New Perspective | 0:14:22 | DRAFT | | | \_ Perspective Topics | | | | | \_ Secrets? | 0:01:19 | DRAFT | fullframe | | \_ Lifting the Curtain | 0:01:00 | DRAFT | frame | | \_ Web Page Source Code | 0:00:35 | DRAFT | block | | \_ Text | 0:00:35 | DRAFT | fullframe | | \_ Text is a Universal Interface | 0:01:19 | DRAFT | fullframe | | \_ The Shell Command Prompt | 0:00:45 | DRAFT | frame | | \_ Eliminating the Web Browser | 0:01:00 | DRAFT | frame | | \_ Browser vs. =wget= Comparison | 0:00:40 | DRAFT | frame | | \_ Finding Text on the Command Line | 0:01:00 | DRAFT | frame | | \_ A More Gentle Reply | 0:01:00 | DRAFT | frame | | \_ Writing to Files (Redirection) | 0:00:55 | DRAFT | frame | | \_ Starting Our List | 0:01:10 | DRAFT | fullframe | | \_ Command Refactoring | 0:02:00 | DRAFT | fullframe | | \_ Again: Text is a Universal Interface | 0:00:20 | DRAFT | againframe | | \_ Pipelines | 0:00:15 | DRAFT | fullframe | | \_ Summary of the Unix Philosophy | 0:00:30 | DRAFT | fullframe | |-------------------------------------------------------+----------+--------+-------------| | \_ Program Composition | 0:12:25 | DRAFT | | | \_ Composition Topics | | | | | \_ Clarifying Pipelines | 0:00:45 | DRAFT | fullframe | | \_ Tor | 0:00:20 | DRAFT | fullframe | | \_ LP Sessions | 0:02:50 | DRAFT | fullframe | | \_ Interactive, Incremental, Iterative Development | 0:01:10 | DRAFT | fullframe | | \_ Discovering URLs | 0:02:50 | DRAFT | fullframe | | \_ Go Grab a Coffee | 0:00:15 | DRAFT | fullframe | | \_ Async Processes | 0:01:00 | DRAFT | fullframe | | \_ Executable Shell Script and Concurrency | 0:01:50 | DRAFT | fullframe | | \_ Again: A Research Task | 0:00:15 | DRAFT | againframe | | \_ A Quick-n-Dirty Solution | 0:01:10 | DRAFT | frame | |-------------------------------------------------------+----------+--------+-------------| | \_ Thank You | 00:00:01 | | fullframe | #+END: ** RAW Introduction :B_note: :PROPERTIES: :BEAMER_env: note :END: Hello, everyone! My name is Mike Gerwitz. I am a free software hacker and activist with a focus on user privacy and security. I'm also a GNU Maintainer and software evaluator, and hold other various other administrative duties within GNU. I have about twenty years of programming experience, half of that professionally, And I've been a computer user for longer. So I've been around long enough to see a decent evolution in how we interact with computers. I've gotten a sense of what feel right and wrong as both a user and a hacker. And interestingly, what I've settled on for my computing is really a toolset that was devised decades before I was even born, with some light modernization. And those tools don't work for everyone. But I think a subset of them can. So I'm here today to try to explore a healthy balance, and walk you through what I see as an efficient means of computing, based on the problems that I've encountered, and the problems I've seen others encounter over the years. ** DEVOID Choreographed Workflows :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \fullslidetext Choreographed Workflows ** DEVOID Practical Freedom :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \fullslidetext Practical Freedom ** DRAFT Practical Example: Web Browser [0/9] *** Browser Topics [3/3] :noexport: This example is the main segue from GUIs into the utility of keybindings into shell, so it has to be compelling. I chose something that is frequently done by users: visiting webpages and searching for text. - [X] Most webpages include styling, images, and other media that are best represented in a GUI. - Of course, I'm a firm believer in progressive enhancement, but this isn't a web development talk. - [X] Is it good for finding a word or phrase on a page? [4/4] - [X] Menu -> Find in This Page -> Type. Sure. - [X] Instructing a user how to do this is inherently visual. Comment on my need to highlight areas of screenshots. - [X] More experienced users may prefer =C-f=. - Emphasize the motion with my hand. Explain how that mental association is made. Really focus on this. - [X] Location of menu has changed over time (show screenshots), but =C-f= has been unchanged for decades. - [X] What if I had a list of five webpages and I wanted to get rid of the ones that didn't have that word? [4/4] - [X] Mouse and GUI way: - Open each page in a new tab. - Menu -> Find on page -> Type. - Found, copy URL, open a word processor, paste. - Go back, click X on the tab. - Repeat. - [X] Then show keybinding way: - *Demonstrate this on the LP laptop.* - Open programs with =M-F2= - Open each page in a new tab (=C-t=) - =C-f ESC= - =C-l C-c= - =M-TAB C-v RET= - =M-TAB C-w= - Repeat - [X] This would work with nearly /any/ combination of web browser and text editor. - [X] But what if I had 10? 100? 1000? This is still tedious and slow. *** DRAFT Example: Web Browser :B_frame: :PROPERTIES: :BEAMER_env: frame :END: [[file:images/web-browser.png]] **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:40 :END: One of the only GUIs I use on a day-to-day basis is my web browser. In this case, GNU Icecat, which is a Firefox derivative. This is a screenshot of an admittedly staged session, and contains a number of addons. Perhaps most prominent is Tree-Style Tab, which displays tabs in a hierarchy off to the left rather than flatly at the top. I chose a graphical web browser as an example because I feel that it's something that most everyone can relate to using, and most everyone can recognize the utility in---most people using Internet-connected devices use one at least a few times a week, if not every day. *** DRAFT Finding Text (Mouse-Driven GUI Interaction) :B_frame: :PROPERTIES: :BEAMER_env: frame :END: **** Images :B_columns: :PROPERTIES: :BEAMER_env: columns :END: ***** Left :B_column: :PROPERTIES: :BEAMER_env: column :BEAMER_col: 0.30 :END: #+ATTR_LATEX: :width 1.5in [[file:images/ff-find-menu.png]] ***** Right :B_column: :PROPERTIES: :BEAMER_env: column :BEAMER_col: 0.70 :END: #+ATTR_LATEX: :width 3in [[file:images/ff-find-matches.png]] **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:40 :END: The Web is used for many different things today, but its original purpose is to render documents. Take Wikipedia for instance. Or the LibrePlanet conference website. If you are looking for something specific on a page, a common operation is to search for a word or phrase, like shown here. Now, how exactly to do this with a mouse varies depending on what program you're using, but here I highlighted the steps in a modern Icecat or Firefox. You start by clicking on the little hamburger, hotdog, or whatever-you-want-to-call-it menu in the upper-right, and then click on "Find in This Page" within the popup. This then opens a bar at the bottom of the page with an area to type the word or phrase you're searching for. It highlights and scrolls to the first match as you type, and has a button to highlight all results. It also shows the number of results off to the right. It's a simple, yet powerful mechanism that is pretty easy to use. So does a GUI provide the right tool for the job? If you're just searching for a name or concept, sure, that seems to be true. A GUI _is_ useful here. But notice how I had to convey these steps to you. I had to take screenshots and highlight where to click with the mouse. Since a GUI is inherently very visual, so are the instructions on how to use it. There is no canonical representation for these instructions, because it involves clicking on elements that have no clear name to the user. Unless you're in software or UX development, you may not know what to call that menu in the upper-right. Further, what do you call the bar at the bottom of the page? You have to describe it in a way that reproduces what the user sees. *** DRAFT GUIs Change Over Time :B_frame: :PROPERTIES: :BEAMER_env: frame :END: **** Images :B_columns: :PROPERTIES: :BEAMER_env: columns :END: ***** Left :B_column: :PROPERTIES: :BEAMER_env: column :BEAMER_col: 0.50 :END: #+ATTR_LATEX: :width 1.5in [[file:images/ff-find-menu.png]] ***** Right :B_column: :PROPERTIES: :BEAMER_env: column :BEAMER_col: 0.50 :END: #+ATTR_LATEX: :width 1.5in [[file:images/ff-edit-find.png]] #+BEAMER: \uncover<2>{\Large Ctrl+F #+BEAMER: } **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:45 :END: Another difficult thing is: GUIs change over time. I'm sure many people here remember earlier versions of Firefox that didn't have the hamburger menu, where the Find menu option was in the Edit menu. By the way, those old menus do still exist if you hit Alt. I miss the old menus, personally, because it /did/ make it easier to convey actions in text. Saying "Go to Edit - Find" is pretty clear, and those menu positions were always in the same place across the entire desktop environment. Now individual programs may vary in the their user experience. But do you notice something in common between these two screenshots? There's something that /hasn't/ changed over time---something that has been the same for /decades/! =Ctrl+F=. *** DRAFT Ctrl+F---Just Works :B_frame: :PROPERTIES: :BEAMER_env: frame :END: - Most GUI programs that offer search - Context-sensitive---Do The Right Thing **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:25 :END: When you type =Ctrl+F=, it immediately opens that search bar and gives focus to the textbox, so you can just start typing. Further, it works /in all major browsers/. Not only that, but =Ctrl+F= is so universal that it works /in nearly every GUI program that offers some type of search/! And it's /context-sensitive/! The program will just Do The Right Thing depending on where you are or what it is that you're doing. *** DRAFT Muscle Memory :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \fullslidetext Muscle Memory #+BEAMER: \fullsubtext Visual \Rightarrow Tactile **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:40 :END: But there's something more profound that has happened here, that many users don't even think about. We have switched our mode of interaction. With a mouse and a GUI, interaction is driven by visual indicators. The position of your hand on the mousepad or your fingers on a touchpad isn't meaningful, because your mouse cursor could be anywhere on the screen at any given time; your /eyes/ provide the context. It's hard to use a GUI with your eyes closed while using a mouse. But by hitting =Ctrl+F=, we've completely /changed/ how we interact with the system. It's now /tactile/. You associate a finger placement; a motion; and the feeling of the keys being pressed beneath your fingers; with an action---finding something. You develop muscle memory. You _can_ trigger this feature with your eyes closed. // But that's a pretty trivial example. *** DRAFT A Research Task :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \fullslidetext Research Task: #+BEAMER: \fullsubtext Given a list of webpage URLs #+BEAMER: \smallsubskip find all that /do not/ contain ``free software'' **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:25 :END: Let's explore a fairly simple research task together. Let's say I email you a handfull of URLs---say, maybe 5 or 10 of them---that are articles about software or technology. And I want you to come up with a list of the webpages that /do not/ contain the phrase ``free software'' so that I can get a better idea of ones to focus my activism on. How might we approach this problem as an average user? *** DRAFT Executing the Research Task :B_frame: :PROPERTIES: :BEAMER_env: frame :END: **** Approaches :B_columns: :PROPERTIES: :BEAMER_env: columns :END: ***** Mouse :B_column: :PROPERTIES: :BEAMER_env: column :BEAMER_col: 0.50 :END: _Mouse_ 1. Click `+' for each new tab, enter\nbsp{}URL 2. Menu \rightarrow Find in This Page 3. Type ``free software'' 4. If found, go to #9 5. If /not/ found, highlight URL, right-click, copy 6. Click on text editor 7. Right-click, paste URL, hit =RET= for newline 8. Click on web browser 9. Click `X' on tab, go to #2 ******* Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:30 :END: /(Perhaps I should demonstrate this right away rather than reading through the list first, to save time?)/ Let's first use the mouse as many users probably would. To set up, let's open each URL in a new tab. We click on the little `+' icon for a new tab and then enter the URL, once for each webpage, perhaps copying the URL from the email message. Once we're all set up, we don't care about the email anymore, but we need a place to store our results, so we open a text editor to paste URLs into. Now, for each tab, we click on the little hamburger menu, click on ``Find in This Page'', and then type ``free software''. If we /do not/ see a result, we move our mouse to the location bar, click on it to highlight the URL, right-click on it to copy it to our clipboard, click on the text editor to give it focus, right-click on the editor and click on ``Paste'' to insert the URL, and then hit the return key to move to the next line. We then go back to the web browser. If we /do/ see a result, we skip copying over the URL. In either case, we then close the tab by clicking on the `X'. And then we repeat this for each tab, until they have all been closed. When we're done, whatever is in our text editor is the list of URLs of webpages that do not reference ``free software'', and we're done. Simple enough, right? But it's a bit of a pain in the ass. All this clicking around doesn't really /feel/ like we're melding mind and machine, does it? I think you'd be pretty disappointed if /this/ is is the crap that you attended this session for. What if we used our =Ctrl+F= trick? That saves us a couple clicks. But what if we could save even more clicks? ***** Keyboard :B_column: :PROPERTIES: :BEAMER_env: column :BEAMER_col: 0.50 :END: #+BEAMER: \uncover<2>{ _Keyboard_ 1. *=Ctrl+T=* for each new tab, enter\nbsp{}URL 2. *=Ctrl+F=* to find 3. Type ``free software'' 4. If found, go to #9 5. If /not/ found, *=Ctrl+L Ctrl+C=* to copy URL 6. *=Alt+Tab=* to text editor 7. *=Ctrl+V RET=* to paste URL and add newline 8. *=Alt+Tab=* to web browser 9. *=Ctrl+W=* to close tab, go to #2 #+BEAMER: } ******* Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:30 :END: Fortunately we have /many/ more keybindings at our disposal! We'll start with opening each new tab with =Ctrl+T= instead of clicking on `+' with the mouse. /(Maybe show copying the URL from the email without the mouse?)/ To open our text editor, we'll use =Alt+F4=, which is a common keybinding for many window managers and operating systems to open a dialog to enter a program to run. Once we're all set up, we start with the first tab and use =Ctrl+F= as we've seen before, and then type ``free software''. If we /do not/ find a match, we're ready to copy the URL. Hitting =Ctrl+L= will take us to the location bar and highlight the URL. We can then hit =Ctrl+C= to copy the URL to the clipboard. =Alt+Tab= is supported by a wide variety of window managers on a variety of operating systems to switch between windows of running programs, usually in the order of most recently focused. So hitting it once should take us to our text editor. We then paste with =Ctrl+V= and hit return to insert a newline. We can then go back to the web browser by hitting =Alt+Tab= again. Once again, if there /was/ a match, we skip that copy step. We then close the tab with =Ctrl+W=. Repeat, and we're done all the same as before. As a bonus, save with =Ctrl+S=. What's interesting about this approach is that we didn't have to use the mouse at all, unless maybe you used it to highlight the URL in the email. You could get into quite the rhythm with this approach, and your hands never have to leave the keyboard. This is a bit of a faster, more efficient way to convey our thoughts to the machine, right? We don't have to seek out our actions each time in the GUI---the operations are always at our fingertips, literally. *** DRAFT GUIs of a Feather :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \fullslidetext Same Keybindings Across (Most) GUIs! #+BEAMER: \fullsubtext Browser, Editor, Window Manager, OS, \ldots **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:40 :END: Another powerful benefit of this approach is---these /same exact keybindings work across most GUIs/! If we switch out Icecat here with nearly any other major web browser, and switch out gedit with many other text editors or even word processors, this will work all the same! There are some notable text editors for which these keybindings won't work, for those of you screaming in your head. We'll get to that. If you use Windows instead of GNU/Linux---which I strongly discourage, but if you do---then it'll work the same. I can sit down at a completely different system and let that muscle memory take over and wind up with the same thing. It's liberating. Let's look at those keybindings a bit more concisely, since that last slide was a mess, to put it nicely. *** DRAFT Macro-Like Keyboard Instructions :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \begingroup\fullslidetext Macro-Like #+BEAMER: \endgroup\subskip #+BEGIN_SRC Ctrl+T ``https://...'' Ctrl+F ``free sofware'' [ Ctrl+L Ctrl+C Alt+Tab Ctrl+V RET Alt+Tab ] Ctrl+W #+END_SRC - <2> /Requires visual inspection/ for conditional - <2> Still manual and tedious---what if there were 1000 URLs? **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:20 :END: If we type out the keybindings like this, in an isolated format, it looks a bit more like instructions for the machine, doesn't it? Some of you may be familiar with macros---with the ability to record keypresses and play them back later. If we were able to do that, then we could fully automate this task away! But unfortunately...we can't. At least, not with the tools we're using right now. Why is that? Well, for one, it requires visual inspection to determine whether or not a match has occurred. That drives conditional logic---that bracketed part there. We also need to know how many times to repeat, which requires that we either count or watch the progress. We also need to be able to inspect the email for URLs and copy them into the web browser. This also scales really poorly. While using the keyboard is certainly faster than using the mouse, we're only dealing with a small set of URLs here. What if I gave you 100 of them? 1000? More? Suddenly this doesn't feel like a very efficient way to convey our intent to the machine. I don't wish that suffering upon anyone. To get around that, we need to change how we think about our computing a bit. And that's why I've dragged you through this drawn-out example---to make sure you understand the significance of these progressive enhancements to our workflow. ** DRAFT A New Perspective [0/16] *** Perspective Topics [13/13] :noexport: - [X] What if I could walk away and get a coffee, play with the kids, come back and have it done for me? - Emphasize how users familiar with the tools I haven't yet mentioned may know how to do this, but people who don't know of Unix tools will still be thinking in terms of the UI, trying to figure out how to automate it. - [X] Right-click on page and inspect an element. - Briefly mention DOM and how this represents the current state of the page. - Maybe show how it can be modified to illustrate the structure better. - But I don't want to get too deep into this. - [X] Right-click on page and View Source. - Explain how this is like the source code to the webpage (HTML). - [X] Copy text into editor, =C-f=. - We can still find the text /using the same keybinding/. - [X] Save HTML file. - [X] Demonstrate opening it in a web browser vs. an editor. - Same exact document, different ways of rendering it. - [X] Open a terminal [4/4] - [X] Type =wget URL= - [X] Show how it downloaded a file - [X] Open it in a text editor, show how the text is there. =C-f=. - We eliminated the web browser - [X] =grep word file.html= - We have just eliminated =Ctrl+F= in the editor. - [X] But it's too verbose, so show =grep -q && echo=. - Talk about echo and exit status briefly, but don't go into exit codes. - [X] The goal is to output a URL to a file - Introduce redirection (overwrite and append) - [X] Now we have duplication: URL in two places - Introduce variables - [X] Eliminate the temporary file entirely using a pipe - [X] Alias =fetch-url= for =wget -qO-=. - [X] Introduce the Unix philosophy *** DRAFT Secrets? :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: # Slide intentionally left blank **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:20 :END: So what if I told you that, while the average user is cursing me out for sending them 1000 URLs, I could go grab some coffee and play with my kids and come back however much time later to a list that has been generated for me, and it'd still be done before the user has even had a chance to open all of the URLs, letalone check them? And what if I told you that it'd only take a minute or two to for me to create this process? This is where the whole concept of ``wizardry'' comes in. Some of you are sitting in the audience or watching this remotely rolling your eyes thinking ``oh this guy thinks he's so sweet'', because the answer is obvious to you. But to those of you who are confined to the toolset that I just demonstrated...it's /not/ going to be obvious. You may still be thinking in terms of that toolset---thinking of how we can continue to use those same tools. You don't know what you don't know. The problem is that there is a whole world and way of computing that is hidden from most users. And it's not hidden because it's a secret. I'd hate to disappoint you, but there's no magic or secrets in this talk. It's because modern interfaces have come to completely mask it or provide alternatives to it that happen to be ``good enough'' for the job. But ``good enough'' is only good enough until it's not. *** DRAFT Lifting the Curtain :B_frame: :PROPERTIES: :BEAMER_env: frame :END: #+BEAMER: \only<1>{ #+ATTR_LATEX: :width 2in [[file:images/ff-inspect-menu.png]] #+BEAMER: } #+BEAMER: \only<2>{ [[file:images/ff-inspect.png]] #+BEAMER: } # ATTR_LATEX does not appear to work with :trim or :clip #+BEAMER: \only<3>{\begin{center} #+BEAMER: \includegraphics[width=0.9\linewidth,trim={0px 0px 350px 375px},clip]{images/ff-inspect.png} #+BEAMER: \end{center}} **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:00 :END: Let's lift the curtain, so to speak, on what's really going on in the web browser. Don't worry, we're only going to give it a little peek; nothing too complicated. Take the LibrePlanet 2019 speaker list page for instance. If you right-click on my name and click on ``Inspect Element'', you are presented with the developer tools for this browser which shows what makes up the webpage. You'll notice that it has a hierarchical structure made up of nodes describing how certain data ought to be formatted. For example, my name is in an =h2= element, which represents a heading. Below that we see the talk title and then the talk abstract in a paragraph element, denoted by =p=. If we look in that bottom =p= element, you'll see the exact text that's displayed on the webpage. And sure enough, that text contains the term ``free software''. Now, technically, this inspector represents something called the DOM, which represents the current state of the page. If the webpage were dynamic, then you'd see this updating. We could even change it in here and the page would update. But you don't need to use a specialized tool to view the structure of the initial webpage; I just did this for visualization. *** DRAFT Web Page Source Code :B_block: :PROPERTIES: :BEAMER_env: block :END: **** Columns :B_columns: :PROPERTIES: :BEAMER_env: columns :END: ***** Left :B_column: :PROPERTIES: :BEAMER_env: column :BEAMER_col: 0.30 :END: [[file:images/ff-view-source-menu.png]] ***** Right :B_column: :PROPERTIES: :BEAMER_env: column :BEAMER_col: 0.70 :END: [[file:images/ff-view-source.png]] **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:35 :END: If we instead select "View Page Source" from the context menu, then we get a new tab that contains a much uglier version of what we just saw in the DOM inspector. This is the raw source code of the webpage. Most of it, anyway. It is a document language called HTML. And as you may have noticed, it's plain text. Structured, but plain, text. And as you can see, if we hit =Ctrl+F=, ``free software'' is there all the same. We don't need to view the webpage with all its fancy formatting. For the problem we're trying to solve, the graphical representation provides little benefit. *** DRAFT Text :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \fullslidetext Text. **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:35 :END: As we're about to see, this simple fact---that that the webpage is represented by plain text---opens up a whole new world to us. We have stripped away all the complex visual GUI stuff and we're left with the raw substance of the page which still contains the information that we are looking for. But we're still within the web browser. We don't have to be. We can copy all of that text and paste it into our editor. =Ctrl+A Ctrl+C Alt-Tab Ctrl+V=. And sure enough, search works all the same. =Ctrl+F= and we can still find ``free software''. Completely different program, and we can still find the text using the same keybinding. *** DRAFT Text is a Universal Interface :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \fullslidetext Text is a Universal Interface **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:20 :END: Text is a universal interface. And what I mean by that is---you don't need any special tools to work with it. You can view it in your web browser. You can view it in your text editor. You can paste it in a text message. You can put it in a book. You can write it down on a paper and type it back into your computer. Text is how we communicate with one-another as human beings. Let's save this HTML as a file, =speakers.html=. If we opened this file, it would open in our web browser and we would see the same webpage, although it would look a bit different since a lot of the styling is stored outside of this HTML file, and there won't be any images. But this isn't a talk about web development so I'm not going to go deeper than that. But if again we opened this HTML file in our text editor, you would see that same plain text HTML as before; one program just chooses to render it differently than another. Even though we can view the HTML in our text editor, we haven't yet eliminated the web browser; we still need it to navigate to the webpage and view its source. But if that's all we're using the web browser for, then it's one hell of an inefficient way of telling the computer that we just want the HTML document at a certain URL. Up until this point, the keyboard has been used as a secondary interface---as an /alternative/ to something. Now we're going to venture into a world where it is /the/ interface. *** DRAFT The Shell Command Prompt :B_frame: :PROPERTIES: :BEAMER_env: frame :END: #+BEGIN_SRC sh mikegerwitz@lp2019-laptop:~$ # ^ user ^ host ^ working directory (home) #+END_SRC #+BEAMER: \begin{uncoverenv}<2-> #+BEAMER: \subskip This presentation will show: #+BEAMER: \medskip #+BEGIN_SRC $ command output line 1 output line 2 ... output line N #+END_SRC #+BEAMER: \end{uncoverenv} **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:45 :END: If you open a terminal, also called a VTE, or virtual terminal emulator, you will be greeted with a curious string of characters. This is a /command prompt/. The program that is prompting you for a command is called the /shell/. The GNU shell is =bash=, which is the default on most GNU/Linux systems. It's also the default on Mac OSX, if you happen to be using that, though I recommend against it. And Windows now has something they call ``Bash on Ubuntu on Windows'', which is GNU/Linux running atop of the Windows kernel. Of course a fully free GNU/Linux system is better. Bash isn't required to run any of the commands I'm presenting as part of this talk, but I will be mentioning some features specific to bash, which I'll note when I do in case you happen to be using a different shell. *** DRAFT Eliminating the Web Browser :B_frame: :PROPERTIES: :BEAMER_env: frame :END: #+BEGIN_SRC sh $ wget https://libreplanet.org/2019/speakers/ #+END_SRC #+BEAMER: \begin{uncoverenv}<2-> #+BEGIN_SRC sh --2019-03-24 00:00:00-- https://libreplanet.org/2019/speakers/ Resolving libreplanet.org (libreplanet.org)... 209.51.188.248 Connecting to libreplanet.org (libreplanet.org)|209.51.188.248|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘index.html’ ... 2019-03-24 00:00:00 (1.78 MB/s) - ‘index.html’ saved [67789] #+END_SRC #+BEAMER: \end{uncoverenv} #+BEAMER: \begin{uncoverenv}<3> #+BEGIN_SRC sh $ wget -O speakers.html \ https://libreplanet.org/2019/speakers/ #+END_SRC #+BEAMER: \end{uncoverenv} **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:00 :END: Alright! The goal is to retrieve the HTML file at a given URL. GNU/Linux distributions usually come with GNU =wget=, which does precisely that. To invoke it, we type the name of the command, followed by a space, followed by the URL we wish to retrieve, and then hit enter. What follows is quite a bit of text. The details aren't particularly important as long as it's successful, but notice that it says it saved to =index.html=. That's not intuitive to those who don't understand why that name was used. So let's tell =wget= what file we want to output to. We do this with the =O= option, like so. It takes a single argument, which is the name of the output file. The backslash here allows us to continue the command onto the next line; otherwise, a newline tells the shell to execute the command. So remember previously that we manually created =speakers.html= by viewing the source of the webpage in Icecat. If we open this file, we'll find that it contains /exactly the same text/, and we never had to open a web browser. And we can search it all the same as before for ``free software''. *** DRAFT Browser vs. =wget= Comparison :B_frame: :PROPERTIES: :BEAMER_env: frame :END: #+BEGIN_SRC Ctrl+L ``https://libreplanet.org/2019/speakers/'' #+END_SRC #+BEAMER: \subskip #+BEGIN_SRC sh $ wget https://libreplanet.org/2019/speakers/ #+END_SRC **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:40 :END: This is a very different means of interacting with the computer, but if we compare this with the keyboard shortcut used previously, they are very similar. Not so scary, right? It's hard to imagine a more direct line of communication with the computer for downloading a webpage, short of reading your mind. It's true that you can save the HTML using Icecat by hitting =Ctrl+S=, but that saves a lot more than just the HTML page---it also saves all the images and stylesheets and other resources, which is much more than we need. You can do this in =wget= too, by the way. You can even archive an entire website for offline viewing by using =-m=. Just don't do that with this URL, or it'll also download the entire LibrePlanet wiki too, which isn't quite what you want. *** DRAFT Finding Text on the Command Line :B_frame: :PROPERTIES: :BEAMER_env: frame :END: #+BEGIN_SRC sh $ grep 'free software' speakers.html #+END_SRC #+BEAMER: \begin{uncoverenv}<2-> \vdots #+BEGIN_SRC

Mike Gerwitz is a free software hacker and activist with a focus on exclusively free software. Mike spends most of his free time with his #+END_SRC \vdots #+BEAMER: \end{uncoverenv} **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:00 :END: Not having to open a web browser is nice, but having to run =wget= and then open the downloaded HTML file is a bit of a pain; is there a command that can help us there too? We want to know whether a page contains the term ``free software''. For that we use a tool called =grep=. The first argument to =grep= is the search string, and the remaining arguments---just one here---tell it where it should search. The first argument to =grep= is quoted because it contains a space, otherwise the shell would think our search phrase was only `free' and that the files we wanted to search were `software' and `speakers.html'. You'll get a bunch of output; I just included a small snippet here. But notice how it happens to include exactly the text we were looking at in the web browser. And with that we have replicated =Ctrl+F=. But did we do a good job conveying our thoughts to the machine? We just wanted to know whether the page /contains/ the phrase; we don't care to see it! So while we have efficiently conveyed a search string, we didn't receive an efficient reply---it's information overload. *** DRAFT A More Gentle Reply :B_frame: :PROPERTIES: :BEAMER_env: frame :END: #+BEGIN_SRC sh $ grep --quiet 'free software' speakers.html && echo yes yes #+END_SRC #+BEAMER: \subskip #+BEAMER: \begin{onlyenv}<2> #+BEGIN_SRC sh $ echo 'Hello, world!' Hello, world! #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<3> #+BEGIN_SRC sh $ grep --quiet 'open source' speakers.html || echo no no #+END_SRC #+BEAMER: \end{onlyenv} **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:00 :END: First we tell =grep= to modify its behavior with the =quiet= flag. You can also use the short form, which is just =-q=. Rather than outputting results, =grep= will exit silently and it will instead return a status to the shell that says whether or not the search failed. POSIX-like shells, like Bash, offer the ability to say ``run this command if the previous succeeds'', and this is done by putting two ampersands between the commands. The command to run if =grep= succeeds in finding a match is =echo=. All echo does is takes its arguments and spits them right back out again as output. So this essentially states: ``search for `free software' in =speakers.html= and output `yes' if it is found''. Since =echo= is its own command, it also works by itself. Here's the classic ``hello, world'' program in shell. But if you recall our research task, it was to search for pages that do /not/ contain the term ``free software''. We can do that too, by using two pipes in place of two ampersands, which states: ``search for `free software' in =speakers.html= and output `no' if it fails''. Alternatively you could also pass =-v= or =--invert-match= which inverts the exit status, and continue using the ampersands. *** DRAFT Writing to Files (Redirection) :B_frame: :PROPERTIES: :BEAMER_env: frame :END: - Commands write to standard out (stdout) by default - /Output redirection/ writes somewhere else #+BEAMER: \subskip #+BEGIN_SRC sh # overwrites each time $ echo 'Hello, world!' > hello.txt $ echo 'Hello again, world!' > hello.txt # appends (echo adds a newline) $ echo 'First line' >> results.txt $ echo 'Second line' >> results.txt # truncates file (empties) > results.txt #+END_SRC **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:55 :END: Alright, we're well on our way now! But we still haven't gotten rid of that damn text editor, because we need to save a list of URLs to a file to hold our final results! Well as it so happens, writing to a file is such a common operation that it's built right into the shell. We use a feature called /redirection/. There are two types of output redirection. If you place a single greater-than symbol followed by a filename after a command, then the output of that command will replace anything already in the file. So the result of the first two commands will be a =hello.txt= that contains only a single line: ``Hello again, world!''. The second type, which uses /two/ greater-than symbols, /appends/ to the file. =echo= by default adds a newline, so the result of the second two commands is a =results.txt= containing two lines, ``First line'' and ``Second line'' respectively. If the file doesn't yet exist, an empty one will be created before writing. I think maybe you can see where I'm going with this. *** DRAFT Starting Our List :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEGIN_SRC sh $ wget --quiet -O speakers.html \ https://libreplanet.org/2019/speakers/ \ && grep --quiet 'free software' speakers.html \ || echo https://libreplanet.org/2019/speakers/ \ >> results.txt #+END_SRC **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:10 :END: Take a look at that for a moment. // Can anyone tell me what the result of this command line will be? // // As exciting as it is to start to bring these things together, the result is pretty anti-climatic---we know that =speakers.html= /does/ contain the string ``free software'', and so the result is that =results.txt= contains /nothing/! In fact, if =results.txt= didn't exist yet, it still wouldn't even exist. // At this point, we have successfully eliminated both the web browser and text editor. But this is a hefty command to have to modify each time we want to try a different URL. *** DRAFT Command Refactoring :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \begin{onlyenv}<6> #+BEGIN_SRC sh $ wget --quiet -O speakers.html \ https://libreplanet.org/2019/speakers/ \ && grep --quiet 'free software' speakers.html \ || echo https://libreplanet.org/2019/speakers/ \ >> results.txt #+END_SRC \subskip #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<1> #+BEGIN_SRC sh $ URL=https://libreplanet.org/2019/speakers/ $ wget --quiet -O speakers.html \ "$URL" \ && grep --quiet 'free software' speakers.html \ || echo "$URL" \ >> results.txt #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<2> #+BEGIN_SRC sh $ URL=https://libreplanet.org/2019/speakers/ $ wget -qO speakers.html \ "$URL" \ && grep -q 'free software' speakers.html \ || echo "$URL" \ >> results.txt #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<3> #+BEGIN_SRC sh $ URL=https://libreplanet.org/2019/speakers/ $ wget -qO - \ "$URL" \ | grep -q 'free software' \ || echo "$URL" \ >> results.txt #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<4> #+BEGIN_SRC sh $ URL=https://libreplanet.org/2019/speakers/ $ wget -qO - "$URL" \ | grep -q 'free software' || echo "$URL" >> results.txt #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<5-> #+BEGIN_SRC sh $ alias fetch-url='wget -qO-' $ URL=https://libreplanet.org/2019/speakers/ $ fetch-url "$URL" \ | grep -q 'free software' || echo "$URL" >> results.txt #+END_SRC #+BEAMER: \end{onlyenv} **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:02:00 :END: We can simplify it by introducing a /variable/. First we assign the URL to a variable named URL. There must be no spaces on either size of the assignment operator, which is the equal sign. We then reference its value by prefixing it with a dollar sign everywhere the URL previously appeared. You should always put variable references in double quotes for safety if they are intended to represent a single argument---that ensures that, if our variable contains a space or other special character, it isn't interpreted by the shell differently than we intended. We can also make this command line a bit more concise by using the short name for the =--quiet= flag, which is =-q=. Notice how in =wget= I combined them into =-qO= instead of using two separate dashes with spaces between them. This is optional; if you feel it's easier to read the other way, that's fine. Something else feels dirty. We're creating this =speakers.html= file just to pass to =grep=. It's not needed after the fact. In fact, it's just polluting our filesystem. What if we didn't have to create it at all to begin with? I'm first going to introduce the notation, and then I'll go into more into why it works. If we replace the output file =speakers.html= with a single dash, that tells =wget= to write to standard out. This is normally the default behavior of command line programs, like =grep= and =echo=, but =wget= is a bit different. We then omit the =speakers.html= from =grep= entirely. =grep= will read from standard in by default. We then connect standard out of =wget= to the standard in of =grep= using a single pipe; this is called a /pipeline/. Now that we've freed up some characters, let's reformat this slightly to be a bit more readable. And that =wget= command looks a bit cryptic. How about we define an alias so that it looks a bit more friendly, and then we can stop worrying about what it does? Now here's the original command we started with, and where we're at now. This little bit of abstraction has made our intent even more clear. It can now clearly be read that we're defining a URL, retrieving that URL, searching for a term, and then appending it to a file on a non-match. But before we keep going, I want to go back to a point I mentioned previously. *** DRAFT Again: Text is a Universal Interface :B_againframe: :PROPERTIES: :BEAMER_env: againframe :BEAMER_ref: *Text is a Universal Interface :BEAMER_act: :END: **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:20 :END: Text is a universal interface. Notice how we started out our journey manually inspecting text, and began replacing the human part of the workflow at each step with a command. That's because text is something that both humans and computers can work with easily. This is a fundamental design principle in the Unix tools that I have begun to present to you. *** DRAFT Pipelines :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \fullslidetext /``Expect the output of every program to become the input to another''/ #+BEAMER: \fullsubtext\hfill ---Doug McIlroy (1978) **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:15 :END: The invention of the Unix pipe is credited to Doug McIlroy. As part of the Unix philosophy, he stated: ``expect the output of every program to become the input to another''. More broadly, the Unix philosophy can be summarized as: *** DRAFT Summary of the Unix Philosophy :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :DURATION: 0:00:30 :END: #+BEAMER: \begingroup\fullslidetext The Unix Philosophy #+BEAMER: \endgroup\subskip #+BEGIN_QUOTE This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. #+BEAMER: \smallsubskip\hfill---Doug McIlroy #+END_QUOTE **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:30 :END: // Up to this point, we have shifted how we communicate with the machine by moving away from a visual interface driven primarily by movement, to a textual interface that puts mind and machine on equal footing. And now here we're talking about another profound shift in how we think. We start to think of how to decompose problems into small operations that exist as part of a larger pipeline. We think of how to chain small, specialized programs together, transforming text at each step to make it more suitable for the next. ** DRAFT Program Composition [0/10] *** Composition Topics [6/6] :noexport: - [X] Clarify how pipelines work with existing =wget | grep=. - [X] More involved pipeline with more than two programs. - [X] Emphasize iterative development and how the shell is a REPL. - Useful for programmers for prototyping and debugging, but also essential to average users for discovery. - Evolve by making portions of command dynamic (variables, subshells) - [X] Now script discovering what pages contain a certain word [3/3] - [X] Mention previous example of being emailed a list of URLs. Rather than pasting them into a file, let's discover them using the same tool: =grep=. Save email to a file. - [X] =grep -o 'https\?://[^ ]\+'=, show how it gets a list of URLs. - [X] Introduce =while= and =read=, showing how we can iteratively run commands, maybe =... | while read url; do echo "URL: $url"; done=. - Not enough talk time to walk away and get a coffee, but let's see if we can maybe let it run for 10--30s while I blabber on. Depends on the connection speed at MIT with all the participants. - [X] Extract =url-grep= into script. - [X] Demonstrate running jobs in parallel with =xargs=. *** DRAFT Clarifying Pipelines :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEGIN_SRC sh $ fetch-url https://libreplanet.org/2019/speakers/ \ | head -n3 #+END_SRC #+BEAMER: \begin{uncoverenv}<2-> #+BEGIN_SRC sh $ fetch-url https://libreplanet.org/2019/speakers/ \ | wc -l 1422 #+END_SRC #+BEAMER: \end{uncoverenv} #+BEAMER: \begin{uncoverenv}<3-> #+BEGIN_SRC sh $ fetch-url https://libreplanet.org/2019/speakers/ \ | grep 'free software' \ | wc -l 38 #+END_SRC #+BEAMER: \end{uncoverenv} #+BEAMER: \begin{uncoverenv}<4-> #+BEGIN_SRC sh $ fetch-url https://libreplanet.org/2019/speakers/ \ | grep 'free software' \ | tail -n1 Expat. The JavaScript is free software with #+END_SRC #+BEAMER: \end{uncoverenv} **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:45 :END: Let's observe the profound consequences of these design decisions. Remember that standard out is displayed to us on the terminal by default. If we were to just run that =wget= command and nothing else, we'd be spammed with output. But I just want to prove to you that HTML is being output to standard out; I don't want a ton of spam. So we can pipe it to =head= to show only the first few lines. But how many lines did we save ourselves? We can pipe it to =wc= instead, which stands for ``word count'', and tell it to count the number of newlines with =-l=. What about the number of lines that contain the string ``free software''? Or how about the last such line? It's all a simple matter of composing existing programs with pipes. *** DRAFT Tor :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEGIN_SRC sh $ alias fetch-url='torify wget -qO-' #+END_SRC **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:20 :END: By the way, retrieving a bunch of URLs in an automated manner may be a privacy concern for you. You can easily send all these requests through Tor, assuming it is installed and the daemon running, by prefixing =wget= with =torify=. Since we abstracted our fetching away into the =fetch-url= alias, our previous examples continue to work as-is. *** DRAFT LP Sessions :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ fetch-url https://libreplanet.org/2019/speakers/ \ | grep -A5 speaker-header \ | head -n14

Bdale Garbee

Closing keynote

--

Tarek Loubani

Opening keynote (Day 1)

-- #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ fetch-url https://libreplanet.org/2019/speakers/ \ | grep -A5 speaker-header \ | grep ''

Closing keynote

Opening keynote (Day 1)

Opening keynote (Day 2)

[...]

The Tor Project: State of the Onion and Library Freedom Institute: A new hope

The Tor Project: State of the Onion

[...]

Large-scale collaboration with free software

Large-scale collaboration with free software

#+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ fetch-url https://libreplanet.org/2019/speakers/ \ | grep -A5 speaker-header \ | grep -o '[^<]\+' Closing keynote Opening keynote (Day 1) Opening keynote (Day 2) [...] The Tor Project: State of the Onion Library Freedom Institute: A new hope The Tor Project: State of the Onion [...] Large-scale collaboration with free software Large-scale collaboration with free software #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ fetch-url https://libreplanet.org/2019/speakers/ \ | grep -A5 speaker-header \ | grep -o '[^<]\+' \ | sort \ | uniq -cd 2 Hackerspace Rancho Electrónico 4 Large-scale collaboration with free software 2 Library Freedom Institute: A new hope 2 Right to Repair and the DMCA 2 Teaching privacy and security via free software 2 The joy of bug reporting 5 The Tor Project: State of the Onion #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ fetch-url https://libreplanet.org/2019/speakers/ \ | grep -A5 speaker-header \ | grep -o '[^<]\+' \ | sort \ | uniq -cd \ | sort -nr \ | head -n5 5 The Tor Project: State of the Onion 4 Large-scale collaboration with free software 2 The joy of bug reporting 2 Teaching privacy and security via free software 2 Right to Repair and the DMCA #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ fetch-url https://libreplanet.org/2019/speakers/ \ | grep -A5 speaker-header \ | grep -o '[^<]\+' \ | sort \ | uniq -cd \ | sort -nr \ | head -n5 \ | sed 's#^ *\(.\+\) \(.*\)#\2 has \1 speakers#g' The Tor Project: State of the Onion has 5 speakers Large-scale collaboration with free software has 4 speakers The joy of bug reporting has 2 speakers Teaching privacy and security via free software has 2 speakers Right to Repair and the DMCA has 2 speakers #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ fetch-url https://libreplanet.org/2019/speakers/ \ | grep -A5 speaker-header \ | grep -o '[^<]\+' \ | sort \ | uniq -cd \ | sort -nr \ | head -n5 \ | sed 's#^ *\(.\+\) \(.*\)#\2 has \1 speakers#g' | espeak #+END_SRC #+BEAMER: \end{onlyenv} **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:02:50 :END: How about something more involved. I noticed that some LibrePlanet sessions had multiple speakers, and I wanted to know which ones had the /most/ speakers. The HTML of the speakers page includes a header for each speaker. Here are the first two. These are keynote speakers, but there's also non-keynote ones that are just =speaker-header=. Let's get just the talk titles that those speakers are associated with. Looking at this output, we see that the talks titles have an =em= tag, so let's just go with that. Pipe to =grep= instead of =head=. It looks like at least one of those results has /multiple/ talks. But note that each is enclosed in its own set of =em= tags. If we add =-o= to =grep=, which stands for /only/, then it'll only return the portion of the line that matches, rather than the entire line. Further, if there are multiple matches on a line, it'll output each match independently on its own line. That's exactly what we want! But we have to modify our regex a little bit to prevent it from grabbing everything between the first and /last/ =em= tag, by prohibiting it from matching on a less than character in the title. Don't worry if you don't understand the regular expression; they take time to learn and tend to be easier to write than they are to read. This one just says ``match one or more non-less-than characters between =em= tags''. =grep= actually supports three flavors of regular expressions; if you used Perl's with =-P=, it'd be even simpler to write, but I show the POSIX regex here for portability since Perl regexes aren't available on all systems. Now assuming that the talk titles are consistent, we can get a count. =uniq= has the ability to count consecutive lines that are identical, as well as output a count. We also use =-d= to tell it to only output duplicate lines. But =uniq= doesn't sort lines before processing, so we first pipe it to =sort=. That gives us a count of each talk! But I want to know the talks with the /most/ speakers, so let's sort it /again/, this time numerically and in reverse order, and take the top five. And we have our answer! But just for the hell of it, let's go a step further. Using =sed=, which stands for /stream editor/, we can match on portions of the input and reference those matches in a replacement. So we can reformat the =uniq= output into an English sentence, like so. =sed= is actually a turing complete programming language, but it is often used in pipelines with inline scripts like this. I chose the pound characters delimit the match from the replacement. The numbers in the replacement reference the parenthesized groups in the match. And then we're going to pipe it to the program =espeak=, which is a text-to-speech synthesizer. Your computer will speak the top five talks by presenter count to you. Listening to computers speak is all the rage right now, right? *** DRAFT Interactive, Incremental, Iterative Development :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \fullslidetext Incremental Development #+BEAMER: \fullsubtext Interactive REPL, Iterative Decomposition **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:10 :END: Notice how we approached that problem. I presented it here just as I developed it. I didn't open my web browser and inspect the HTML; I just looked at the =wget= output and then started to manipulate it in useful ways working toward my final goal. This is just /one/ of the many ways to write it. And this is part of what makes working in a shell so powerful. In software development, we call environments like this REPLs, which stands for ``read-eval-print loop''. The shell reads a command line, evaluates it, prints a result, and then does that all over again. As a hacker, this allows me to easily inspect and iterate on my script in real time, which can be a very efficient process. I can quickly prototype something and then clean it up later. Or maybe create a proof-of-concept in shell before writing the actual implementation in another language. But most users aren't programmers. They aren't experts in these commands; they have to play around and discover as they go. And the shell is perfect for this discovery. If something doesn't work, just keep trying different things and get immediate feedback! And because we're working with text as data, a human can replace any part of this process! *** DRAFT Discovering URLs :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ grep -o 'https\?://[^ ]\+' email-of-links.txt https://en.wikipedia.org/wiki/Free_software https://en.wikipedia.org/wiki/Open_source https://en.wikipedia.org/wiki/Microsoft https://opensource.org/about #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ grep -o 'https\?://[^ ]\+' email-of-links.txt \ | while read URL; do echo "URL is $URL" done URL is https://en.wikipedia.org/wiki/Free_software URL is https://en.wikipedia.org/wiki/Open_source URL is https://en.wikipedia.org/wiki/Microsoft URL is https://opensource.org/about #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ grep -o 'https\?://[^ ]\+' email-of-links.txt \ | while read URL; do fetch-url "$URL" | grep -q 'free software' \ || echo "$URL" >> results.txt done #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ grep -o 'https\?://[^ ]\+' email-of-links.txt \ | while read URL; do fetch-url "$URL" | grep -q 'free software' \ || echo "$URL" | tee -a results.txt done https://en.wikipedia.org/wiki/Microsoft https://opensource.org/about #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ grep -o 'https\?://[^ ]\+' email-of-links.txt \ | while read URL; do fetch-url "$URL" | grep -q 'free software' \ || echo "$URL" | tee -a results.txt done \ | xclip -i -selection clipboard #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ grep -o 'https\?://[^ ]\+' email-of-links.txt \ | while read URL; do fetch-url "$URL" | grep -q 'free software' \ || echo "$URL" | tee -a results.txt done \ | tee >( xclip -i -selection clipboard ) https://en.wikipedia.org/wiki/Microsoft https://opensource.org/about #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ xclip -o -selection clipboard \ | grep -o 'https\?://[^ ]\+' \ | while read URL; do fetch-url "$URL" | grep -q 'free software' \ || echo "$URL" | tee -a results.txt done \ | tee >( xclip -i -selection clipboard ) https://en.wikipedia.org/wiki/Microsoft https://opensource.org/about #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ xclip -o -selection clipboard \ | grep -o 'https\?://[^ ]\+' \ | while read URL; do fetch-url "$URL" | grep -q 'free software' \ || echo "$URL" | tee -a results.txt done https://en.wikipedia.org/wiki/Microsoft https://opensource.org/about $ xclip -i -selection clipboard < results.txt #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh $ xclip -o -selection clipboard \ | grep -o 'https\?://[^ ]\+' \ | while read URL; do fetch-url "$URL" | grep -q 'free software' \ || tee -a results.txt <<< "$URL" done https://en.wikipedia.org/wiki/Microsoft https://opensource.org/about $ xclip -i -selection clipboard < results.txt #+END_SRC #+BEAMER: \end{onlyenv} **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:02:50 :END: Okay, back to searching webpages. Now that we have a means of creating the list of results, how do we feed the URLs into our pipeline? Why not pull them right out of the email with =grep=? Let's say you saved the email in =email-of-links.txt=. This simple regex should grab most URLs for both HTTP and HTTPS protocols, but it's far from perfect. For example, it'd grab punctuation at the end of a sentence. But we're assuming a list of URLs. Here's some example output with a few URLs. For each of these, we need to run our pipeline. It's time to introduce =while= and =read=. =while= will continue to execute its body in a loop until its command fails. =read= will read line-by-line into one or more variables, and will fail when there are no more lines to read. So if we insert our =fetch-url= pipeline into the body, we get this. But if we just redirect output into =results.txt=, we can't see the output unless we inspect the file. For convenience, let's use =tee=, which is named for a pipe tee; it'll send output through the pipeline while also writing the same output to a given file. The =-a= flag tells it to /append/ rather than overwrite. So now we can both observe the results /and/ have them written to a file! But we were just going to reply to an email with those results. Let's assume we're still using a GUI email client. Wouldn't it be convenient if those results were already on the clipboard for us so we can just paste them into the message? We can accomplish that by piping to =xclip= as shown here. There's also the program =xsel=, which I typically use because its arguments are far more concise, but I don't show it here. Ah, crap, but now we can't see the output again. So let's use =tee= again. But rather than outputting to a file on disk, we're going to use a special notation that tells bash to invoke a command in a subshell and replace that portion of the command line with a path to a virtual file representing the standard input of that subshell. Now we can see the output again! Well, if we're /writing/ to the clipboard, why don't we just /read/ from it too? Instead of saving our mail to a file, we can just copy the relevant portion and have that piped directly to =grep=! If you have a list of URLs and you just copy that portion, then you can just get rid of =grep= entirely. Because we're writing to =results.txt=, another option is to just let this run and copy to the clipboard at a later time. We can do that by reading =results.txt= in place of standard input to =xclip=, as shown here. And while we're at it, here's a special notation to get rid of =echo= for the =tee= in the body of =while=: three less-than symbols provides the given string on standard in. This is a bash-specific feature. Phew! *** DRAFT Go Grab a Coffee :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEAMER: \fullslidetext Go Grab a Coffee **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:15 :END: Remember when I said I could go grab a coffee and play with the kids while the script did its thing? Well now's that time. But grabbing a coffee means that this system is a bottleneck. The Internet is fast nowadays; ideally, we wouldn't have to wait long. Can we do better? *** DRAFT Async Processes :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: #+BEGIN_SRC sh $ sleep 1 && echo done & echo start start done #+END_SRC #+BEAMER: \begin{uncoverenv}<2>\subskip /(Don't do this for large numbers of URLs!)/ #+BEGIN_SRC sh $ while read URL; do fetch-url "$URL" | grep -q 'free software' \ || tee -a results.txt <<< "$URL" & done #+END_SRC #+BEAMER: \end{uncoverenv} **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:00 :END: Indeed we can. This process is executing serially---one URL at a time, waiting for one to complete before checking another. What if we could query multiple URLs in parallel? Shells have built-in support for backgrounding tasks so that they can run while you do other things; all you have to do is place a single ampersand at the end of a command. So in this example, we sleep for one second and then echo ``done''. But that sleep and subsequent echo is put into the background, and the shell proceeds to execute =echo start= while =sleep= is running in the background. One second later, it outputs ``done''. So here's the loop we were just writing. If we add an ampersand at the end of that pipeline, it'll run in the background and immediately proceed to the next URL, executing the loop again. But there's a problem with this approach. Sure, it's fine if we only have a few URLs. But what if we have 1000? Do we really want to spawn 1000s of processes and make 1000 network requests at once? That isn't efficient, and it's a bit rude to DOS servers. *** DRAFT Executable Shell Script and Concurrency :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :END: **** =url-grep= :B_block: :PROPERTIES: :BEAMER_env: block :END: #+BEGIN_SRC sh #!/bin/bash #+END_SRC #+BEAMER: \begin{onlyenv}<+> #+BEGIN_SRC sh search="$1" url="$2" #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<+-> #+BEGIN_SRC sh search="${1?Missing search string}" url="${2?Missing URL}" #+END_SRC #+BEAMER: \end{onlyenv} #+BEGIN_SRC sh wget -qO- "$url" \ | grep -q "$search" || echo "$url" #+END_SRC **** Execute :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEGIN_SRC sh $ chmod +x url-grep #+END_SRC #+BEAMER: \begin{onlyenv}<1-2> #+BEGIN_SRC sh $ while read URL; do ./url-grep 'free software' "$URL" >> results.txt done #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<3> #+BEGIN_SRC sh $ xargs -n1 ./url-grep 'free software' > results.txt #+END_SRC #+BEAMER: \end{onlyenv} #+BEAMER: \begin{onlyenv}<4> #+BEGIN_SRC sh $ xargs -n1 -P5 ./url-grep 'free software' > results.txt # ^ 5 concurrent processes #+END_SRC #+BEAMER: \end{onlyenv} **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:50 :END: Before we continue, we're going to have to write our pipeline in a way that other programs can run it. Up to this point, the program has just been embedded within an interactive shell session. One of the nice things about shell is that you can take what you entered onto the command line and paste it directly into a file and, with some minor exceptions, it'll work all the same. Let's take our pipeline and name it =url-grep=. Aliases only work in interactive sessions by default, so we're going to just type =wget= directly here. Alternatively, you can define a function. We use the positional parameters =1= and =2= here to represent the respective arguments to the =url-grep= command. The comment at the top of the file is called a ``shebang''. This is used by the kernel so that it knows what interpreter to use to run our program. To make it executable, we use =chmod= to set the executable bits on the file. We can then invoke it as if it were an executable. If it were in our =PATH=, which isn't something I'm going to get into here, you'd be able to run it like any other command without having to prefix it with =./=. We can also do a primitive form of error handling and documentation by modifying our positional parameters like so, which will show an error message if we don't specify one of them. Now we replace the =while= loop with =xargs=. It takes values from standard in and appends them as arguments to the provided command. We specify =-n1= to say that only one argument should be read from stdin for any invocation of the command; that makes it run a new command for every line of input. Otherwise it'd just append N URLs as N arguments. And now we can simply use =-P= to tell it how many processes to use at once. Here we specify =5=, meaning =xargs= will run five processes at a time. You can change that to whatever number makes sense for you. *** DRAFT Again: A Research Task :B_againframe: :PROPERTIES: :BEAMER_env: againframe :BEAMER_ref: *A Research Task :BEAMER_act: :END: **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:00:15 :END: So this was the research task that we started with. If I were to approach this problem myself, I'd write a quick-and-dirty script that is just throwaway, because it's such a simple problem. So, let's combine everything we've seen so far: *** DRAFT A Quick-n-Dirty Solution :B_frame: :PROPERTIES: :BEAMER_env: frame :END: #+BEGIN_SRC sh echo 'wget -qO- "$2" | grep -q "$1" || echo "$2"' > url-grep $ xargs -n1 -P10 bash url-grep 'free software' < url-list | mail -s 'URLs not mentioning free software' mtg@gnu.org #+END_SRC #+BEAMER: \begin{onlyenv}<2>\subskip #+BEGIN_SRC sh $ wc -l url-list 1000 $ time xargs -n1 -P10 bash url-grep 'free software' < url-list real 0m17.548s user 0m8.283s sys 0m4.877s #+END_SRC #+BEAMER: \end{onlyenv} **** Notes :B_noteNH: :PROPERTIES: :BEAMER_env: noteNH :DURATION: 00:01:10 :END: I'd first echo the pipeline into =url-grep=. Instead of making it executable, I'll just pass it as an argument to =bash= instead, which saves me a step; it's a temporary file anyway. I used 10 processes instead of 5. And then to top it all off, if you have a MTA configured on your system, we can just pipe the output to the =mail= command to send that URL list directly to me. It only takes a minute or two to come up with this script. But how long does it take to run? I took a few URLs and just repeated them in a file so that I had 1000 of them. Running the =xargs= command, it finishes in under 18 seconds on my system at home. Obviously YMMV, and certain sites may be slower to respond than others. So in only a couple of minutes, the task has been automated away and completed, all by gluing together existing programs. You don't need to be a programmer to know how to do this; you just need to be familiar with the tools and know what's possible, which comes with a little bit of practice. This is certainly an efficient means of communicating with the machine. We've come a long way from using the web browser and a mouse. ** Thank You :B_fullframe: :PROPERTIES: :BEAMER_env: fullframe :DURATION: 00:00:01 :END: #+BEGIN_COMMENT Thank you. #+END_COMMENT #+BEGIN_CENTER Mike Gerwitz [[mailto:mtg@gnu.org][=mtg@gnu.org=]] \bigskip Slides and Source Code Available Online <[[https://mikegerwitz.com/talks/cs4m]]> \bigskip \vfill Licensed under the Creative Commons Attribution ShareAlike 4.0 International License #+END_CENTER * Exporting :noexport: For a non-interactive build, including all dependencies, simply run =make=. Once all dependencies are built, you should be able to simply export this buffer as a Beamer presentation (=C-c C-e l P=) to get an updated PDF (or you can just run =make= again). * Copyright and Licenses :noexport: This file Copyright (C) 2019 Mike Gerwitz. This file is licensed under the Creative Commons Attribution ShareAlike 4.0 International License. See [[file:COPYING.CCBYSA]] for the full license text. Additionally, all code fragments are dual-licensed as such: All code fragments are free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program (see [[file::COPYING]]). If not, see . * Local Variables :noexport: # Local Variables: # org-todo-keyword-faces: (("DRAFT" . org-upcoming-deadline) \ # ("DEVOID" . (:inherit org-warning \ # :inverse-video t)) \ # ("LACKING" . org-warning) \ # ("REVIEWED" . "yellow") \ # ("AUGMENT" . (:foreground "yellow" :bold t :underline t)) \ # ("READY" . (:inherit org-scheduled :bold t :underline t))) # eval: (add-to-list 'org-structure-template-alist # '("C" "#+BEGIN_COMMENT\n?\n#+END_COMMENT")) # End: