Is This Thing On?

Nick Moore's Blog

Blogart

What Is This?

These pages are static HTML generated by a bundle of scripts and hacks which I call "blogart", pronounced to rhyme with Boggart.

Oh, Great, Another Blog Engine, And Now I Suppose You're Going To Want To Tell Me All About It

Well, the thing is, I used to host this domain on wordpress.com, and while they were actually pretty good, I found the whole experience of trying to write something sensible into that itty-bitty in-browser editor incredibly frustrating, to the point where if I had anything sensible to write I generally didn't. I looked at a few other blog/CMS systems but none of them quite did what I wanted to:

  • Accept pages in multiple formats
    • ReStructuredText for everyday stuff
    • LyX for anything with lots of equations
    • (maybe) HTML if neither of those were flexible enough
  • Provide navigation
    • site map
    • "recent changes"
    • RSS feed
  • Output static HTML files so I can rsync them to my hosting provider

  • Version Control integration

  • Use make or similar to avoid rebuilding all documents every time.

  • (eventually) produce PDF files of documents for nicer printing.

Getting Static

Why static HTML? Well, in the time it took me to even just preview a couple of pieces of blogging software, some doorknob-rattler had snuck in and managed to write a bunch of stuff straight into that vritual's public_html directory. Not impressed. By going to static pages, the only software in the loop is the web server ...

One requirement missing from that list is Comments. In the almost two years I've had my blog up and running, I've had two contentful comments, about 30 various thankyous and about 3300 attempted spam messages. I can live without them. In any case, when visiting peoples' blogs these days I mostly go comment on reddit or HN instead of on the blog itself anyway. And third-party services like Disqus seem happy to take over that part of the job anyway.

Implementation

The closest contenders seemed to be Blogofile and Pelican. However, since my requirements are so minimal, the whole thing seemed so simple and I just went ahead and implemented it ... if this system proves too limiting over the next couple of years it should be quite easy to import the RST files into some other blogging software.

Toolchain

Fortunately, most of the job is already done by existing utilities:

GNU make
Handles detecting changed files and working out which commands to run.
rst2html
Converts ReStructuredText into HTML.
elyxer
Converts LyX format into HTML.
xmlstarlet
Useful for extracting and modifying XML files from the command line (or Makefile).
rsync
Undisputed champion of copying stuff from here to there.

Rather than do my own templating, I would like to use the templating options of rst2html and elyxer to connect the pages together.

For example, part of the Makefile looks like:

RSTFILES := $(shell find art/ -name '*.rst')

HTMLFILES := $(patsubst %.rst,%.html,$(RSTFILES))

html: $(HTMLFILES)

%.html: %.rst template/rst2html.html
        rst2html --template=template/rst2html.html $< $@

Indexing

What is left is to create a couple of index files which show:

  • The structure of the documents
  • The most recently changed documents

I couldn't find existing tools for these functions, but they are very simple to implement in Python using lxml.etree to inspect the documents and svn info to work out when they were last modified. This is pretty slow, so instead of rebuilding this index every time, another Makefile rule updates only the files which are more recent that the index.pickle:

index.pickle: $(HTMLFILES)
        bin/update_index.py $@ $?

the update_index.py script loads the pickle, deletes entries for files which are in the pickle but no longer exist, updates any files which are supplied on the command line, then writes it back out. Another script then uses the pickle to produce the various navigation HTMLs and RSS feeds and stuff.

Putting Pages Together

Each page shows:

  • A header
  • A left navigation bar w/ the site map
  • A right navigation bar w/ the 'recent changes'
  • The article itself
  • A footer

The navigation bars work by extracting information from the HTML files, so we can't build them until the HTML is done but O NOES we want to include them in the HTML. Fortunately, Apache includes support for Server Side Includes which let us generate the nav bars later and have the webserver include them. This is kind of cheating on the "static" point, and there is an alternative of generating the pages with "stub" nav bars and having them templated in later. But for the moment, SSI will do.

Mobile Webkit

I've been doing some stuff with mobile webkit (eg: iPhone Safari) for a client, and that inspired me to add some CSS which hides the left and right nav columns and instead displays a navigation button up top. This makes the site much more readable on pocket devices, I think.

It really is as simple as writing a few extra CSS rules, such as:

@media screen and (max-width: 999px) {
    div.navleft, div.navright {
        display: none;
    }
}

This also makes the site render more nicely on a narrow window, such as when the browser is sharing the monitor with an editor. It also helps to drop mobile webkit a little hint about the viewport:

<meta name="viewport" content="width=device-width" />

In Production

Well, here it is! What do you think? Hey, did I ever tell you about this time I was on this ship, and there was this albatross, and ...

Performance

I switched from wordpress.com over to the static site at the end of April. Google Webmaster Tools shows a distinct improvement in request latency:

Time spent downloading a page (in milliseconds)

Time spent downloading a page (in milliseconds)

Minimizing latency improves the user experience.

It also reduces load on the server, which probably no big deal if you've got a datacenter behind you, but if you're running on a cheap shared hosting instance it makes a big difference!

Also, Google seems to index the site far more often now ... I'm not sure why that is. Perhaps I need to fix the caching headers or something.

Pages crawled per day

Pages crawled per day

Future Improvements