Once I started using pandoc for all my writing, I found that using the command-line interface was a bit cumbersome because of the many options I used. Of course I used the shell's history so I did not have to retype the pandoc invocations each time I used them, but as I write multiple documents at the same time and often on different computers, this felt as a stop-gap solution at best. Would it not be great if I could specify all the command-line options to pandoc in the markdown files themselves? To that end, I developed do-pandoc.rb.
I developed do-pandoc.rb in two steps:
- first I wrote a ruby module to mine the pandoc markdown files for its YAML metadata.
- using that module, I wrote another script that would use the former to get the pandoc command-line options to use from an input file, fed these options into a dynamically generated pandoc converter, and then use this converter on that same input file to generate my output file.
One of the interesting aspects of pandoc's markdown format is its allowance for metadata in so-called YAML blocks. Using paru and Ruby it is easy to strip a pandoc file for its metadata through pandoc's JSON output/input format: the script/module [pandoc2yaml.rb (which you will also find in the examples sub directory). Furthermore, it is also installed as an executable when you install paru, so you can run it from the command line like:
pandoc2yaml.rb my-noce-pandoc-file.md
The pandoc2yaml.rb
script is quite straightforward:
::paru::insert ../bin/pandoc2yaml.rb ruby
pandoc2yaml.rb is built in two parts:
- a library module
Pandoc2Yaml
, which we will be using later again in do-pandoc.rb, - and a script that checks if there is an argument to the script and, if so, interprets it as a path to a file, and mines its contents for YAML metadata using the libray module.
The library module Pandoc2Yaml
has one method, extract_metadata
that takes
one argument, the path to a pandoc markdown file.
::paru::insert ../lib/paru/pandoc2yaml.rb ruby
This method converts the contents of that file to a JSON representation of the document. Since pandoc version 1.18, this JSON representation consists of three elements:
- the version of the pandoc-types
API used
(
"pandoc-api-version"
), - the metadata in the document (
"meta"
), - and the contents of the document (
"blocks"
).
The contents of the document are discarded and the metadata is converted back
to pandoc's markdown format, which now only contains YAML metadata. Note that
the JSON_2_PANDOC
converter uses the standalone
option. Without using it,
pandoc does not convert the metadata back to its own markdown format.
Using the library module Pandoc2Yaml
discussed in the previous section, it
is easy to write a script that runs pandoc on a markdown file using the pandoc
options specified in that same file in a YAML metadata
block:
::paru::insert ../bin/do-pandoc.rb ruby
The script do-pandoc.rb
first checks if there is one argument. If so, it is treated
as a path to a pandoc markdown file. That file is mined for its metadata and
if that metadata contains the property pandoc, the fields of that property
are interpreted are used to configure a paru pandoc converter. The key of a
property is called as a method on a `Paru::Pandoc`` object with the
property's value as its argument. Thus, a pandoc markdown file that contains a
metadata block like:
---
pandoc:
from: markdown
to: html5
toc: true
standalone: true
bibliography: 'path/to/bibliography.bib'
...
will configure a Paru::Pandoc
object to convert the contents of that pandoc
markdown file from markdown to standalone html code with a table of
contents while using path/to/bibliography.bib
as the bibliographic
database.
do-pandoc.rb
is also installed as an executable script when you istall paru.
You can run it from the command line as follows:
do-pandoc.rb my-file.md
In Chapter 4 this script do-pandoc.rb
is used on
paru's documentation file, documentation/documentation.md
to generate a new
pandoc markdown file, index.md
, that is converted to HTML into the manual
you are reading now!
Note how do-pandoc.rb
defaults to outputting the results of a conversion to
standard out unless the output option is specified in the pandoc property
in the metadata.