Monday, June 16, 2014

Remove white space from journal articles

Before the days of internet journal databases, academic articles were printed in book form. These volumes, usually a little larger than a trade paperback, were easy to read, but meant getting a copy of the entire issue (although some journals, like Past & Present, did issue reprints of popular articles) and carrying it around with you. For students, this either meant fighting for the only copy or hours of photocopying (or both).

The internet has dramatically changed the way that scholarly writing and research work. Keyword searches and enormous databases, such as JSTOR, have broadened the amount of academic material available. Research is no longer constrained by a university's library catalogue and much more material is available than ever before.

A typical pdf journal article

Most journals still publish in print format, but articles are now also available to download as pdf files (most residing behind paywalls on the databases). The beauty of pdf articles is that most are text-searchable, making skimming much easier. Electronic files can be stored on your computer, linked to citations with software such as Zotero, and won't take up all the space on your desk. For those of us who still prefer reading on paper, articles can also be printed just like any other pdf, allowing you to have the best of the printed and electronic worlds.

Problem: globs of text due to white space

However, the conventional dimensions of a printed article have migrated to the digital world. Depending on the specific journal, you will either find a nicely-formatted pdf that is easy to print or the journal page superimposed onto a 8.5x11 pdf page. This second kind is a no-win because if you try to print multiple pages on one sheet of paper (this is my trick to print what I read without killing too many trees in the process), the white space shrinks the pages into tiny globs. Likewise, it makes the full-screen reading mode on a tablet or computer much harder as the white space means less space is available for the text.

Full-screen white space

This problem has always frustrated me, but I think I have found a solution. Wouldn't it be easy if you could remove the white space from the around the text? It turns out that you can - and for free!

Enter Briss, an unfortunately-named piece of software that lets you remove the white space from a pdf (provided the file isn't secured). The program uses Java and is very fast, sorting out a 25-page article in a matter of seconds.

Once I downloaded and opened the "gz" file from the website, I was confronted with a lot of ".jar" files. Thankfully, the program is much easier to use than I feared. The following instruction apply for a Mac, but should be similar for Windows and Linux too:
  1. Move the "briss-0.9" folder to your computer's "application" folder.
  2. Open the "briss-0.9" folder.
  3. Locate the file named "briss-0.9.jar".
  4. Right-click and choose "Make Alias", this will make a shortcut to that file.
  5. Move the shortcut to a useful place (like your folder of articles). You can also rename the shortcut to a more useful name.
  6. Double-click on the shortcut.
  7. Briss will open.
  8. Choose File, Load.
  9. Select the file you want to crop.
  10. Briss will ask if you want to exclude some pages from cropping. This is an optional step.
  11. When the pdf has loaded, blue boxes will cover the parts of the file to keep. You can resize and move the boxes to keep the content you want. Note: On the file I wanted to crop, Briss was going to crop the page numbers, so I resized the boxes to keep them.
  12. Choose Action, Crop.
  13. Choose a name and location for the saved file.
  14. Done!
The result was a cropped file that was much easier to print and better for reading on screens and printing.

Success!

Technically, tampering with pdf articles is probably against the terms of use for the databases, but it is no more destructive than highlighting or making notes on the electronic copy.