New Year, New Blog

Based on Stefan Baumgartners presentation on Jekyll at the last Technologieplauscherl meetup, I decided to migrate my blog to GitHub Pages. GitHub pages are public webpages freely hosted on GitHub. One way to generate these GitHub pages is by using Jekyll. Jekyll is a so-called static website generator. In the case of Jekyll, the web generators code is in Ruby. As the static web site generator category seems to be some kind of hipster thing right now, there are generators in/for various programming languages: Java, Groovy, Clojure etc. The main advantage of Jekyll is its GitHub support.

The overall workflow when using the generator is:

After pushing the changes to GitHub, the GitHub pages generator will recognize the Jekyll project and generate the HTML files from the given sources.

Modifying Jekyll Generated HTML

As my blog is an original Wordpress blog, I needed to migrate all the existing posts and pages. There is a tool called jekyll-import that takes the Wordpress XML and generates separate HTML pages to be used by Jekyll. The resulting HTML files need to be slightly adapted to the needs of Jekyll. For example, all the source code examples where surrounded by a [source language="..."] block. In Jekyll, the highlight ... tag is used to surround and therefore format later on source code examples. Thus I needed a little script to convert the Jekyll generated HTML pages.

I decided to write the script in Clojure. As I am currently using Clojure in a project, I thought it would be a good fit for doing a little bit more “hands-on” training.

Here is the script that implemented various HTML transformations for my Wordpress posts:

(ns clj-wp-import.core
  (:require [clojure.java.io :as jio]
            [clojure.string  :as str]))
 
(defn post-files
  [f]
  (file-seq (jio/file f)))
 
(defn remove-pre-code 
  [txt]
  (str/replace txt #"(?s)(?i)<pre>(.+?)</pre>" "{% highlight groovy %} $1 {% endhighlight %}"))
 
(defn remove-code-language
  ([tag attr txt] (remove-code-language (vector "groovy") tag attr txt))
  ([lang-coll tag attr txt]
     (reduce #(str/replace %1
                           (re-pattern (str  "(?s)(?i)\[" tag " " attr "="" %2 ""\](.+?)\[/" tag "\]"))
                           (str "{% highlight " %2 " %} $1 {% endhighlight %}")) txt lang-coll)))
 
(defn format-link-section [txt]
  (str/replace txt
               (re-pattern "(\[\d{1,2}\] <a.*)")
               (str "<div>$1</div>")))
 
(defn format-first-link-of-link-section [txt]
  (str/replace txt
               (re-pattern "(<div>\[0\] <a.*)")
               (str "<br><br>$1")))
 
(defn remove-double-curly-braces [txt]
  (str/replace (str/replace txt "" ""))
 
(defn replace-html-entites [txt]
  (let [entities [["&quot;" """] ["&lt;" "<"] ["&gt;" ">"]]]
    (reduce #(str/replace %1 (nth %2 0) (nth %2 1)) txt entities)))
 
(defn replace-html-entites-in-highlight [txt]
  (reduce #(str/replace %1 (first %2) (replace-html-entites (first %2))) txt (re-seq (re-pattern "(?s)\{% highlight (.*?) %\} (.*?) \{% endhighlight %\}") txt)))
 
(defn process-post-file [file]
  (let [s (slurp file)]
    (->> (remove-pre-code s)
 
         (remove-code-language ["groovy", "xml", "java", "scala", "kotlin"] "code" "language")
         (remove-code-language ["groovy", "xml", "java", "scala", "kotlin"] "code" "lang")
  
         (remove-code-language ["groovy", "xml", "java", "scala", "kotlin"] "sourcecode" "language")
         (remove-code-language ["groovy", "xml", "java", "scala", "kotlin"] "sourcecode" "lang")
 
         (format-link-section)
         (format-first-link-of-link-section)
 
         (remove-double-curly-braces)
 
         (replace-html-entites-in-highlight)
         )))
		

As it turned out after the migration and deployment, the script is not quite complete but the majority of my blog posts have been created successfully. During the migration, I moved all comments to Disqus, which allows for easy embedding of comment sections for various types of blogs.

Overall, I am quite happy with the status quo, although I am going to replace pygments (the default Jekyll highlighter) with highlight.js. In the next couple of days, I’ll do some manual cleanup for all the blog posts, so please be patient if there are little formatting errors every now and then.