MozFest 2013: Journalists should command the command line

Journalists who want to learn more technology often jump into HTML, CSS and Javascript. Those are great places to start (as Knight Lab and others have written before), but if you want to maximize the potential of your computer, one of the first things you should learn is the command line!

Some quick background: Regular computer users access the computer via a graphical user interface (GUI). This interface allows you to interact with the machine using a mouse and images on your screen to make the computer do what you want.

But you can also navigate and control your computer by using text-based commands on something called the command line. By firing up a program called Terminal (at least on Mac), you can enter text commands and navigate through your computer.

Noah Veltman, a 2013 Knight-Mozilla Fellow at BBC News, led a session at MozFest called “Solve A Murder Mystery on the Command Line." The game involved a folder of .txt files with thousands of lines. To solve the murder mystery, we had to search through the files using the command line to pull out specific phrases.

After channeling my inner detective during the session, it occurred to me that I didn’t know how to use the command line until a couple months after I started programming. Once I started using the command line, I understood my machine much more and learned programming more quickly.

I talked to Noah after the session to gather a couple more ideas on why journalists should learn the command line.

It makes it easier to work with text data


When it comes to working with text data, especially large and messy files, command line tools are your best friends. Datasets might come to you as .csv files from a government agency, or as .txt files from local companies. To navigate through these files or to extract useful and interesting information the command line helps you do so quickly.

You might wonder, Why not just open Excel or Access to look at the data? While Excel works great with a 50MB file, the software isn't nearly as nimble with a 5GB file. By way of example, Illinois School Board School Scorecard files include more than 10,000 columns and precious little metadata to explain what’s contained in each one. The big and complicated files are very hard to comprehend and it turns out to be a challenge to analyze the data.

With the command line, a couple basic commands will help you analyze text data:

cat filename
The cat command allows you to read the files. It opens a text file and prints the content in terminal. Type the word cat followed by a space and the name of the file.

grep option pattern filename
The grep command is used to search text for specific patterns. For example if I would like to search for the word “corgi” in a file called corgi.txt. I would enter the command:
grep “corgi” corgi.txt.

head option filename
The head command reads the first few lines of any files. For example if I would like to look at the first 20 lines of a file I would enter the command:
head -n 20 corgi.txt

It lets you use tools built just for journalists


Tons of tools built for journalists involve using the command line, if only during the installation phase. You’ll often hear that “software package X is exactly what you need" for a particular task, but then you wind up looking at a GitHub repo where you learn that installation of that software involves various command line operations.

csvkit, for example, converts files to .csv and helps you to clean up and standardize data. Using csvkit and the command line, you can filter a .csv down to a subset of columns, search and filter rows, join various .csv files, etc. These tasks are done by simple commands like in2csv, csvcut csvgrep, etc. As opposed to copying and pasting from one Excel file to another, these short commands allow you to clean and organize your data in a matter of seconds.

It’s a gateway to full scripting languages


Noah referred to command line tools as “a gateway drug” to actually learning a full scripting language like Python. Once you begin to feel comfortable with a few basic commands, you can begin to combine them in all sorts of ways to get what you need even if you cannot write a custom data processing script.

There are plenty of reasons to learn the command line. Hopefully your curiosity is sparked. If it is, click the links below to learn more:

http://cli.learncodethehardway.org/book/
http://www.youtube.com/watch?v=Fzn6jbaw6O0&feature=related
http://www.linuxjournal.com/content/downloading-entire-web-site-wget
http://csvkit.readthedocs.org/en/latest/

Latest Posts

  • A Google Spreadsheets change affecting TimelineJS users

    Google recently changed something about their Sheets service which is causing many people to run into an error when they are making a new timeline. Note: there should be no impact on existing timelines! After this change, many of you click on the "preview" and get this message: An unexpected error occurred trying to read your spreadsheet data [SyntaxError] Timeline configuration has no events. There is a straightforward work-around, but it requires those of you who have...

    Continue Reading

  • How Americans think and feel about gun violence

    A man killed his wife, then himself. I want you to see his face and learn that he enjoyed fishing with his grandchildren. A small-time drug dealer is shot by two men in a parking lot. I find his Facebook profile and a photo shows him striking a playfully irreverent pose, giving the camera the middle finger. The photo’s comments take a mournful turn after a certain date. “Rest easy bro ???” Gun Memorial runs...

    Continue Reading

  • Software developers interested in journalism: Northwestern and The Washington Post want you!

    Northwestern University and The Washington Post are offering a unique opportunity for two talented software developers interested in applying their programming skills in media and journalism. Here’s the proposition: (1) a full-tuition scholarship to earn a master’s degree in journalism at Northwestern University, followed by (2) a six-month paid internship with The Post’s world-class engineering team, with the possibility of subsequent full-time employment. These opportunities are made possible by the John S. and James L....

    Continue Reading

  • What happened when Gun Memorial let anyone contribute directly to victim profiles

    If you’re reporting local or niche news, there’s a good chance that your audience collectively knows more about the story than you do. That’s especially true for us at Gun Memorial, a small publication with a nationwide mission of covering every American who is shot dead. In our latest, mostly successful, experiment, we let readers add to our stories without editor intervention. This article shares some lessons from that experience. Asking for reader contributions A...

    Continue Reading

  • How conversational interfaces make the internet more accessible for everyone

    This story is part of a series on bringing the journalism we produce to as many people as possible, regardless of language, access to technology, or physical capability. Find the series introduction, as well as a list of published stories here. In 2004, human-computer interaction professor Alan Dix published the third edition of Human-Computer Interaction along with his colleagues, Janet Finley, Gregory Abowd, and Russell Beale. In a chapter called “The Interaction,” the authors wrote...

    Continue Reading

  • Three tools to help you make colorblind-friendly graphics

    This story is part of a series on bringing the journalism we produce to as many people as possible, regardless of language, access to technology, or physical capability. Find the series introduction, as well as a list of published stories here. I am one of the 8% of men of Northern European descent who suffers from red-green colorblindness. Specifically, I have a mild case of protanopia (also called protanomaly), which means that my eyes lack...

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More