I explore ways to compile a bunch of markdown files into a book using Linux and MultiMarkdown
I explore ways to compile a bunch of markdown files into a book using Linux and MultiMarkdown
Unless you're a masochist, you probably write your novels in separate files representing scenes, chapters, or even days if you live and breathe the Nanowrimo daily word count. But what do you do when it comes time to merge and publish those documents?
I use markdown with each chapter written as a separate file using a standard naming convention of chapter 001.md, chapter 002.md and so on. I used this structure when I participated in Nanowrimo.
When it came to verifying my word count, I needed a way to merge all those documents quickly that didn't involve cutting and pasting them individually using a text editor. Fortunately this is a trivial exercise in any Unix-like system, which includes Linux and Mac OS X. To do so in fact requires only one command, cat.
To do so:
- Open your terminal and change to the directory where your files are located.
- Type the following command:
cat *.md > book.md
The cat command concatenates (joins) files and by passing the parameter .md, we've told it to merge all markdown documents in the current directory. Using > book.md tells cat to output the merged docs into a new one in the same directory, instead of displaying the text stream in your terminal window.
Now you are free to convert your markdown document to any format you like using whatever markdown compiler or compatible editor you have installed on your system. This step is optional -- we could paste in raw markdown -- but I wanted to clean out the syntax characters and hide the notes. I use MultiMarkdown so all I had to do was type another command:
multimarkdown book.md -o book.html
From there I opened the book.html file in Firefox, copied the text and pasted it into the Nanowrimo verification window. Job sorted...
...but it's quick and dirty and using this as a place to start, we can do something much more elegant and build it into our desktop for future use.
Backtracking a little and I want to put this workflow into context. As a former Mac OS X user I've always liked Automator, an application that let's you drag and drop actions to make an automated workflow; these can then be saved as OS X Services or Finder Folder Actions. It's seriously cool and where the built-in actions aren't enough you have the ability to write your own using any scripting language supported by OS X, which in fairness is quite a lot.
When I first switched, I missed the little Automator robot and resigned myself to writing automation scripts for use on the command line. That is until I discovered similar functionality with Custom Actions for Thunar, Nautilus and Caja, the default file managers for Xubuntu, Ubuntu (Gnome) and Ubuntu Mate. While Custom Actions don't have the drag and drop actions facility that Automator does, it allows you to invoke a script from the file manager when you select certain files. That's good enough for me and good enough for today's exercise.
Going back to our publishing workflow...
The idea is to write a script that merges and builds markdown files into html and other formats. That script is then connected to a Custom Action so that we can call it by selecting the files and using the context (right-click) menu to select the action.
Note that credit goes to Ian Hocking for inspiration and the basic approach of my solution. When I first came across Ian's tutorial, I developed a similar workflow on Mac OS X but where he used Calibre's command line tools for the document conversion, I used PrinceXML and Pandoc.
In this workflow however I only want to use free software. So I'm replacing PrinceXML with wkhtmltopdf, which runs on most unix-like systems. For ebooks, specifically epub, I haven't decided if I'll use pandoc or Calibre's command line converter and in the meantime I'll just use pandoc. Also I don't have a Kindle, so I won't be converting to Mobi format.
I also want to be able to use this with any-old markdown file I have on my computer, regardless of whether I've created a separate metadata file or not.
To maintain maximum compatibility, I'm going write the script in bash, which means that I should be able to run it on Linux (and OS X with minimal tweaking).
Here's the script's procedure:
- Check for the presence of passed files (for the command line version)
- Check for the existence of a build directory and create if necessary
- Check for the existence of a metadata file and then extract it as variables
- If we don't have one, we'll let the user add some using a form
- Get the user to enter PDF options using a form
- Merge the files using cat to build/shorttitle.md
- Convert the merged markdown to html
- Convert the html to epub using pandoc/ebook-creator (TBC)
- Convert the html to PDF using wkhtmltopdf
- Notify the user when the job is complete
In addition to the script, we also need a Cascading Stylesheet (CSS) to decorated our generated HTML and PFD documents. For demonstration purposes, I'll use a basic CSS file that gives me a clean-looking novel.
Note that the script/s and CSS files are hosted on my github page.
The script I've written relies on the following dependencies:
On Ubuntu, you'll have to install the first two; Zenity is installed by default.
MultiMarkdown is an impressive variant of John Gruber's original markdown language that adds several features including CriticMarkup, which I personally find very useful. It's most familiar to Mac users because it ships with Scrivener and Fletcher's Mac-exclusive Markdown Composer.
There's binaries for Mac OS X but you have to build your own from source on Linux. This is well documented though on the project's github page.
Note, you're free to hack my script to use Pandoc or any other Markdown compiler you wish.
wkhtmltopdf is an application that converts HTML to PDF using webkit and QT. It's not as easy-to-use as PrinceXML nor as feature-rich but it produces good-quality PDFs. My script uses Zenity to permit the user to set some basic PDF options.
I strongly recommend downloading a binary from the project's homepage because they are newer than what you'll find in your repo and they are built using patched versions of QT which provides a lot more features.
Why not use Latex?
I've thought about it. Latex produces arguably the best looking PDFs out there but I find it a pig to configure and use. Also installing something like TexLive requires nearly 500MB of hard drive space on Ubuntu 14.04 vs 40MB for wkhtmltopdf. On a laptop with a 128GB SSD a small footprint is good! Besides, Latex's kitchen-sink list of features is overkill for a novel.
I've tried to keep this workflow as generic as possible or easy to tweak if you find my way of doing things doesn't gel with yours.
The workflow makes assumptions partly on the way I like to work and partly on the requirements of MultiMarkdown.
I structure my metadata using the conventions laid out by MultiMarkdown so if you use a different markdown compiler, you'll have to adapt accordingly.
In a single document I write metadata at the top of the file whereas in projects split into many files (like a book), I write the metadata in a file called 000_metadata.md. Naming it in this way means it will be the first file in the list in most system file managers.
At a minimum, I include a title, short title and author but I'll often add other information too. MultiMarkdown has a bunch of built-in key-value pairs you can use but if that's not enough you are free to invent your own.
At the absolute minimum, we need a short title, because this is what the script uses to create the exported files. This is why if no short title is detected, the script prompts the user to create one.
Here's an example of a metadata block:
Title: The Florentine Conspiracy Author: Chris Rosser ShortTitle: tfc CSS: /home/chris/projects/stylesheets/book.css
File naming conventions
I name my files using a convention that orders them according to binary sorting rather than natural sorting. This is to compensate for the way computers handle files on the command line and in some file managers.
For example, name your files chpt 1, chpt 2, chpt 3 chpt 10 chpt 11 et cetera and your computer's files system will probably sort them as:
- chpt 1
- chpt 10
- chpt 11
- chpt 2
- chpt 3
To get around this quirk of machine sorting I typically name my files as follows:
000_metadata.md chapter 001.md chapter 002.md ~ chapter 010.md chapter 011.md
If I want I can add front matter (010_frontmatter.md) and end matter (end_matter.md); using these titles, the front/end matter will sandwich the main body of the book. Alternatively, I could cat all the chapters into a single 020_body.md file too and the result would be the same.
The script is pretty flexible however and you can use any convention you like as long as the metadata comes first. Providing you've named your files so that that compile in order, you can process documents in smaller subdivisions, such as scenes.
Note that at some point I may add the ability to traverse subdirectories so that a chapter can be a folder.
Adding some interactivity
I wanted to add some basic interactivity so that I can customise the output of the script. Thanks to a utility called Zenity, it's very easy to add basic forms and dialogues to the script process. Zenity is installed by default on Ubuntu (and most flavours including Xubuntu and Ubuntu Mate).
The main interactivity I'm adding is to prompt for user input to set some basic PDF parameters such as the page size and what text you want to add in the headers and footers. Since I'm building for personal rather than production use, I've kept customisation to a minimum but will likely expand this in future. Wkhtmltopdf gives you a lot of options for customising the output of your PDF.
If I get around to adding interactivity to the Mac version, I won't have access to Zenity, but will instead use a utility called Platypus with cocoaDialog.
Note that I'll also produce a version without any interactivity, which is best for running as part of a workflow without user input. The upshot of this is that it will work on any Unix-like system.
Creating the Custom Action
By linking the script to a custom action, we can call the workflow from the file manager in much the same way as a Folder Action or Service in Mac OS X. Coupled with Zenity for user input and we don't have to touch the command line at all.
Thunar It's easy to create a custom action in Thunar and attach to a script to it. In fact it's supported out of the box. See this page for more information.
Nautilus On systems using Nautilus as the default file manager, it takes a little more work to get the script working because the Nautilus Actions plugin is not installed by default. Once installed the configuration utility gives you more options to wade through.
- Firstly let's install Nautilus-Actions. In your terminal, type:
sudo apt-get update && sudo apt-get install nautilus-actions
- Search for Nautilus-Actions Configuration Tool in the Unity Dash and open it.
- Create a New action by clicking the + icon in the tool bar or using the Control-N keyboard shortcut.
- Under the Action tab:
- Enter a Context Label, i.e. Build Markdown
- Enter a tooltip
- Choose an icon if you want
- Under the Command tab:
- In the Path field, enter the full path to the build_markdown.sh script or click Browse... to find it in the File picker
- In Parameters enter %F
- In Working directory enter %d
- Under the Mimetypes tab
- Add the text/markdown mimetype to the list of filters and ensure the Must match one of option is selected
- Under the Folders tab add a folder filter if you wish.
- Hit 'Control S' to Save and then quit.
Caja Caja is the default file manager of the Mate desktop environment. Unfortunately there's no caja actions package in the Ubuntu repos but you can install it from binary (or build it from source) by visiting the project's github page.
Once installed, the process is the same as for the Nautilus tool.
Interesting post, Chris. I think I understood some of it! I'm actually working on an updated version of my own Markdown > novel workflow (since what I tend to use these days is a little different). Great solution here, though.